dot-iu-cutter v0.5 — Information-Unit Label/Metadata Registry Design (concept; no creation) (2026-05-17)
dot-iu-cutter v0.5 — Information-Unit Label/Metadata Registry Design (concept)
Date: 2026-05-17 · Status: DESIGN / CONCEPT ONLY — no registry created, no table/column added, no write, nothing executed. Parent: v0.5 pre-scale foundation.
Grounding: today there are no label columns anywhere; IU metadata = public.tac_logical_unit scalar columns (tier,section_type,section_code,authority,lifecycle_status,canonical_address,…) + sparse identity_profile jsonb (observed keys: body_sha256, canonical_address, source_span). cutter_governance carries governance metadata only.
1. Why a registry (problem)
Cutting many units (full document) invites ad-hoc per-run label/key strings → runtime label hardcoding, vocabulary drift, unindexable scans. GPT requires a label/metadata registry before any large-scale labeling/reclassification. This document designs the concept (for GPT review); it creates nothing.
2. Proposed registry concept (logical model — NOT created)
Four logical components (physical schema deferred to a separate authorized cycle):
- Label dictionary — the controlled vocabulary:
label_key,namespace,display,value_type(enum: string|int|bool|uuid-ref|date),cardinality(single|multi),status(active|deprecated),governance_owner,version. The single source of allowed labels — no runtime label is valid unless registered here (replaces hardcoding). - Label assignment — IU↔label edges:
target_unit_id(→tac_logical_unit.id),label_key(→ dictionary),value,assigned_by,assigned_at,supersedes/append-only lineage (consistent with the append-only ledger ethos; no destructive relabel). - Metadata key registry — for evolving descriptive keys (the JSONB world):
meta_key,value_type,expected_cardinality,selectivity_class,index_policy(none|promote-scalar-btree|… ),is_authority(must be false — authority stays in SQL columns/governance),lifecycle. - Promotion ledger — records when a metadata key is promoted from JSONB to an indexed scalar column (see extensible-metadata-strategy), with the authorizing cycle reference.
3. Key type / cardinality / index policy
- Each registered key declares
value_type+cardinality+selectivity_class; the registry'sindex_policydecides indexing centrally (no ad-hoc indexes). Defaultindex_policy = none(sparse JSONB); promotion to scalar+BTREE only on justification (hot path, equality/range, selectivity) — never GiN-by-default. - Multi-valued labels → assignment rows (not arrays-in-JSONB) so they are indexable and append-only auditable.
4. No runtime hardcoded labels (enforcement concept)
- Writers may only emit labels/keys present in the dictionary/key-registry; an unregistered key is rejected at the binding layer (mirrors the existing
schema_binding.SCHEMA_BINDING_VOCABULARYcontract-test pattern — vocabulary lives in one registered place, asserted by tests, never a literal in the loop). - The cutter's existing centralised
schema_bindingconstants are the precedent: extend that discipline to IU labels/metadata via the registry rather than new literals.
5. JSONB posture
identity_profile (and any future metadata JSONB) = sparse, evolving, descriptive metadata only — NEVER an authority store. Authority remains SQL scalar columns + cutter_governance. JSONB is never filtered on a hot path at scale (promote instead). Registry-governed.
6. Boundaries explicitly honoured
No label columns added by default. No registry table created. No write. The registry is a concept for GPT; its physical DDL (if approved) is a separate design→review→dry-run→command-review→sovereign cycle, additive-only, SQL-SSOT-preserving.
7. Open decisions for GPT
- OD-L1 Registry physical home: new schema (e.g.
label_registry) vs withincutter_governancevspublicadjunct. - OD-L2 Label assignment append-only-with-supersede (recommended) vs mutable.
- OD-L3 Scope of v1 vocabulary (which IU labels are actually needed for the first full document — likely none beyond existing
tier/section_type; possibly defer registry until a real labeling need). - OD-L4 Enforcement point (binding-layer reject vs DB CHECK/FK to dictionary).
- OD-L5 Is a registry even needed for the first dry-run-at-volume (which can run on existing columns with zero new labels)? Possibly defer.
Boundaries / Git
Concept/design only — nothing created/written/committed. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No fixed IP/DSN/password/container/vector-collection; no runtime label/key hardcoding; no new label columns by default; SQL = SSOT; no vector/NoSQL. Next = GPT review.