KB-1422

dot-iu-cutter v0.4 Extensible Information Unit Metadata Labels GPT Mandate

7 min read Revision 1

dot-iu-cutterv0.4metadatalabelsscaleinformation-unitsql-nosql-hybridextensibilitynon-hardcode

dot-iu-cutter v0.4 — Extensible Information Unit Metadata / Labels GPT Mandate

Date: 2026-05-17 Authority: GPT acting on User instruction

User strategic requirement

As the system grows to hundreds of thousands and later millions of information units, the number of metadata fields, labels, classifications, and management dimensions will increase. Re-classification will happen more often at higher scale. The information unit schema must support adding labels and metadata dimensions easily without forcing frequent destructive schema redesigns.

The system must avoid hardcoding labels and must preserve SQL-first governance/queryability while allowing flexible metadata expansion.

Binding principle

information_unit_metadata_extensible: true
labels_must_not_be_hardcoded_in_runtime: true
new_label_addition_should_not_require_code_deploy_by_default: true
new_low_frequency_metadata_should_not_require_table_column_migration_by_default: true
high_value_high_query_metadata_can_be_promoted_to_SQL_columns_or_indexed_tables: true
SQL_remains_authority_for_identity_lifecycle_governance_and_indexed_labels: true
JSONB_or_NoSQL_allowed_for_flexible_payloads_but_not_hidden_authority: true

Required architectural posture

The design must support a multi-tier metadata model:

metadata_tiers:
  tier_1_core_columns:
    purpose: stable_identity_lifecycle_and_hot_query_fields
    examples:
      - id
      - canonical_address
      - authority
      - format_version
      - lifecycle_status
      - source_doc_ref
      - created_at
      - updated_at
  tier_2_label_dictionary_and_assignments:
    purpose: scalable_user_or_system_labels_without_column_growth
    shape:
      - label_dictionary
      - label_assignment
      - optional_label_namespace
      - optional_label_value_or_weight
      - assigned_by
      - assigned_at
      - confidence_or_source
  tier_3_typed_metadata_registry:
    purpose: add_new_metadata_keys_without_runtime_hardcode
    shape:
      - metadata_key_registry
      - metadata_value_table_or_JSONB_payload
      - data_type
      - cardinality
      - allowed_values_or_vocab_ref
      - index_policy
      - promotion_policy
  tier_4_JSONB_payload:
    purpose: sparse_evolving_metadata_not_yet_promoted
    constraints:
      - key_must_be_registered_if_used_by_runtime
      - JSONB_not_authority_for_core_identity
      - JSONB_fields_can_graduate_to_SQL_when_hot
  tier_5_vector_or_NoSQL_projection:
    purpose: semantic_search_and_acceleration_only
    constraints:
      - rebuildable_from_SQL_or_source_artifact
      - no_hidden_lifecycle_authority
      - collection_names_configured_not_hardcoded

Label design requirements

Future design must consider adding a label system rather than adding columns for every new dimension.

label_requirements:
  - support_many_labels_per_information_unit
  - support_label_namespaces
  - support_hierarchical_labels
  - support_label_versioning_or_vocabulary_version
  - support_label_source_system_or_human_or_model
  - support_confidence_score_for_AI_assigned_labels
  - support_effective_from_effective_to_if_labels_change_over_time
  - support_soft_delete_or_supersession_without_losing_audit
  - support query_by_label_at_scale
  - support bulk_reclassification_jobs_with_idempotency

Metadata promotion policy

The system must support moving a metadata item through lifecycle stages:

metadata_promotion_lifecycle:
  - experimental_JSONB_or_payload_key
  - registered_metadata_key
  - indexed_JSONB_expression_or_assignment_table
  - normalized_SQL_column_or_relation_if_hot_and_stable
  - materialized_projection_or_search_index_if_needed

Promotion criteria:

promotion_criteria:
  - query_frequency_high
  - used_in_workflow_gating
  - used_in_permissions_or_authority
  - used_in_idempotency_or_dedup
  - used_in_partitioning_or_routing
  - needs_referential_integrity
  - needs_aggregate_reporting_at_scale

Scale requirements

At 100k→1M+ information units, label/metadata design must avoid:

scale_antipatterns:
  - one_new_column_per_new_label
  - unindexed_JSONB_scans_for_hot_labels
  - hardcoded_label_lists_in_python
  - unbounded_full_table_sweeps_for_reclassification
  - duplicate_labels_without_unique_constraints_or_idempotency
  - uncontrolled_vector_metadata_as_authority

Required scale posture:

scale_posture:
  - keyset_pagination_for_label_reclassification
  - batch_size_configured
  - unique_constraints_for_label_assignment_idempotency
  - indexes_on_unit_id_label_id_namespace_effective_status
  - optional_partitioning_later_for_large_append_only_assignment_history
  - materialized_views_or_summary_tables_later_for_hot_dashboards

Immediate impact on current v0.4 work

The current LedgerWriter schema-binding code fix should not attempt to build the full label system. However, it must not close off the path.

For the current cycle, Agent must:

current_cycle_requirements:
  - avoid_hardcoded_label_or_metadata_key_lists_in_runtime
  - avoid_new_schema_columns_for_temporary_labels
  - keep_JSONB_payloads_as_payloads_but_do_not_treat_unregistered_keys_as_authority
  - document_any_label_like_fields_encountered
  - preserve_canonical_address_alias_as_separate_governance_table_not_general_label_store
  - do_not_use_vector_or_NoSQL_as_source_of_truth

Future required phase

Before large-scale reclassification or rich metadata management, open a dedicated phase:

future_phase: information_unit_metadata_label_registry_design
outputs:
  - label_dictionary_design
  - label_assignment_design
  - metadata_key_registry_design
  - metadata_value_or_JSONB_registry_policy
  - index_and_partition_plan
  - migration_from_JSONB_hot_keys_plan
  - reclassification_job_design
  - SQL_NoSQL_projection_sync_design

Gate

PG_backed_dry_run_current: not_blocked_by_full_label_system
large_scale_labeling_or_reclassification: blocked_until_label_registry_design
metadata_runtime_hardcoding: forbidden_now