dot-iu-cutter v0.4 Extensible Information Unit Metadata Labels GPT Mandate
dot-iu-cutter v0.4 — Extensible Information Unit Metadata / Labels GPT Mandate
Date: 2026-05-17 Authority: GPT acting on User instruction
User strategic requirement
As the system grows to hundreds of thousands and later millions of information units, the number of metadata fields, labels, classifications, and management dimensions will increase. Re-classification will happen more often at higher scale. The information unit schema must support adding labels and metadata dimensions easily without forcing frequent destructive schema redesigns.
The system must avoid hardcoding labels and must preserve SQL-first governance/queryability while allowing flexible metadata expansion.
Binding principle
information_unit_metadata_extensible: true
labels_must_not_be_hardcoded_in_runtime: true
new_label_addition_should_not_require_code_deploy_by_default: true
new_low_frequency_metadata_should_not_require_table_column_migration_by_default: true
high_value_high_query_metadata_can_be_promoted_to_SQL_columns_or_indexed_tables: true
SQL_remains_authority_for_identity_lifecycle_governance_and_indexed_labels: true
JSONB_or_NoSQL_allowed_for_flexible_payloads_but_not_hidden_authority: true
Required architectural posture
The design must support a multi-tier metadata model:
metadata_tiers:
tier_1_core_columns:
purpose: stable_identity_lifecycle_and_hot_query_fields
examples:
- id
- canonical_address
- authority
- format_version
- lifecycle_status
- source_doc_ref
- created_at
- updated_at
tier_2_label_dictionary_and_assignments:
purpose: scalable_user_or_system_labels_without_column_growth
shape:
- label_dictionary
- label_assignment
- optional_label_namespace
- optional_label_value_or_weight
- assigned_by
- assigned_at
- confidence_or_source
tier_3_typed_metadata_registry:
purpose: add_new_metadata_keys_without_runtime_hardcode
shape:
- metadata_key_registry
- metadata_value_table_or_JSONB_payload
- data_type
- cardinality
- allowed_values_or_vocab_ref
- index_policy
- promotion_policy
tier_4_JSONB_payload:
purpose: sparse_evolving_metadata_not_yet_promoted
constraints:
- key_must_be_registered_if_used_by_runtime
- JSONB_not_authority_for_core_identity
- JSONB_fields_can_graduate_to_SQL_when_hot
tier_5_vector_or_NoSQL_projection:
purpose: semantic_search_and_acceleration_only
constraints:
- rebuildable_from_SQL_or_source_artifact
- no_hidden_lifecycle_authority
- collection_names_configured_not_hardcoded
Label design requirements
Future design must consider adding a label system rather than adding columns for every new dimension.
label_requirements:
- support_many_labels_per_information_unit
- support_label_namespaces
- support_hierarchical_labels
- support_label_versioning_or_vocabulary_version
- support_label_source_system_or_human_or_model
- support_confidence_score_for_AI_assigned_labels
- support_effective_from_effective_to_if_labels_change_over_time
- support_soft_delete_or_supersession_without_losing_audit
- support query_by_label_at_scale
- support bulk_reclassification_jobs_with_idempotency
Metadata promotion policy
The system must support moving a metadata item through lifecycle stages:
metadata_promotion_lifecycle:
- experimental_JSONB_or_payload_key
- registered_metadata_key
- indexed_JSONB_expression_or_assignment_table
- normalized_SQL_column_or_relation_if_hot_and_stable
- materialized_projection_or_search_index_if_needed
Promotion criteria:
promotion_criteria:
- query_frequency_high
- used_in_workflow_gating
- used_in_permissions_or_authority
- used_in_idempotency_or_dedup
- used_in_partitioning_or_routing
- needs_referential_integrity
- needs_aggregate_reporting_at_scale
Scale requirements
At 100k→1M+ information units, label/metadata design must avoid:
scale_antipatterns:
- one_new_column_per_new_label
- unindexed_JSONB_scans_for_hot_labels
- hardcoded_label_lists_in_python
- unbounded_full_table_sweeps_for_reclassification
- duplicate_labels_without_unique_constraints_or_idempotency
- uncontrolled_vector_metadata_as_authority
Required scale posture:
scale_posture:
- keyset_pagination_for_label_reclassification
- batch_size_configured
- unique_constraints_for_label_assignment_idempotency
- indexes_on_unit_id_label_id_namespace_effective_status
- optional_partitioning_later_for_large_append_only_assignment_history
- materialized_views_or_summary_tables_later_for_hot_dashboards
Immediate impact on current v0.4 work
The current LedgerWriter schema-binding code fix should not attempt to build the full label system. However, it must not close off the path.
For the current cycle, Agent must:
current_cycle_requirements:
- avoid_hardcoded_label_or_metadata_key_lists_in_runtime
- avoid_new_schema_columns_for_temporary_labels
- keep_JSONB_payloads_as_payloads_but_do_not_treat_unregistered_keys_as_authority
- document_any_label_like_fields_encountered
- preserve_canonical_address_alias_as_separate_governance_table_not_general_label_store
- do_not_use_vector_or_NoSQL_as_source_of_truth
Future required phase
Before large-scale reclassification or rich metadata management, open a dedicated phase:
future_phase: information_unit_metadata_label_registry_design
outputs:
- label_dictionary_design
- label_assignment_design
- metadata_key_registry_design
- metadata_value_or_JSONB_registry_policy
- index_and_partition_plan
- migration_from_JSONB_hot_keys_plan
- reclassification_job_design
- SQL_NoSQL_projection_sync_design
Gate
PG_backed_dry_run_current: not_blocked_by_full_label_system
large_scale_labeling_or_reclassification: blocked_until_label_registry_design
metadata_runtime_hardcoding: forbidden_now