KB-1E5B

dot-iu-cutter v0.5 — Information-Unit Label/Metadata Registry Design (concept; no creation) (2026-05-17)

5 min read Revision 1

dot-iu-cutterv0.5pre-scale-foundationlabel-metadata-registryconceptdesign-onlydieu44

dot-iu-cutter v0.5 — Information-Unit Label/Metadata Registry Design (concept)

Date: 2026-05-17 · Status: DESIGN / CONCEPT ONLY — no registry created, no table/column added, no write, nothing executed. Parent: v0.5 pre-scale foundation. Grounding: today there are no label columns anywhere; IU metadata = public.tac_logical_unit scalar columns (tier,section_type,section_code,authority,lifecycle_status,canonical_address,…) + sparse identity_profile jsonb (observed keys: body_sha256, canonical_address, source_span). cutter_governance carries governance metadata only.

1. Why a registry (problem)

Cutting many units (full document) invites ad-hoc per-run label/key strings → runtime label hardcoding, vocabulary drift, unindexable scans. GPT requires a label/metadata registry before any large-scale labeling/reclassification. This document designs the concept (for GPT review); it creates nothing.

2. Proposed registry concept (logical model — NOT created)

Four logical components (physical schema deferred to a separate authorized cycle):

Label dictionary — the controlled vocabulary: label_key, namespace, display, value_type (enum: string|int|bool|uuid-ref|date), cardinality (single|multi), status (active|deprecated), governance_owner, version. The single source of allowed labels — no runtime label is valid unless registered here (replaces hardcoding).
Label assignment — IU↔label edges: target_unit_id (→ tac_logical_unit.id), label_key (→ dictionary), value, assigned_by, assigned_at, supersedes/append-only lineage (consistent with the append-only ledger ethos; no destructive relabel).
Metadata key registry — for evolving descriptive keys (the JSONB world): meta_key, value_type, expected_cardinality, selectivity_class, index_policy (none|promote-scalar-btree|… ), is_authority (must be false — authority stays in SQL columns/governance), lifecycle.
Promotion ledger — records when a metadata key is promoted from JSONB to an indexed scalar column (see extensible-metadata-strategy), with the authorizing cycle reference.

3. Key type / cardinality / index policy

Each registered key declares value_type + cardinality + selectivity_class; the registry's index_policy decides indexing centrally (no ad-hoc indexes). Default index_policy = none (sparse JSONB); promotion to scalar+BTREE only on justification (hot path, equality/range, selectivity) — never GiN-by-default.
Multi-valued labels → assignment rows (not arrays-in-JSONB) so they are indexable and append-only auditable.

4. No runtime hardcoded labels (enforcement concept)

Writers may only emit labels/keys present in the dictionary/key-registry; an unregistered key is rejected at the binding layer (mirrors the existing schema_binding.SCHEMA_BINDING_VOCABULARY contract-test pattern — vocabulary lives in one registered place, asserted by tests, never a literal in the loop).
The cutter's existing centralised schema_binding constants are the precedent: extend that discipline to IU labels/metadata via the registry rather than new literals.

5. JSONB posture

identity_profile (and any future metadata JSONB) = sparse, evolving, descriptive metadata only — NEVER an authority store. Authority remains SQL scalar columns + cutter_governance. JSONB is never filtered on a hot path at scale (promote instead). Registry-governed.

6. Boundaries explicitly honoured

No label columns added by default. No registry table created. No write. The registry is a concept for GPT; its physical DDL (if approved) is a separate design→review→dry-run→command-review→sovereign cycle, additive-only, SQL-SSOT-preserving.

7. Open decisions for GPT

OD-L1 Registry physical home: new schema (e.g. label_registry) vs within cutter_governance vs public adjunct.
OD-L2 Label assignment append-only-with-supersede (recommended) vs mutable.
OD-L3 Scope of v1 vocabulary (which IU labels are actually needed for the first full document — likely none beyond existing tier/section_type; possibly defer registry until a real labeling need).
OD-L4 Enforcement point (binding-layer reject vs DB CHECK/FK to dictionary).
OD-L5 Is a registry even needed for the first dry-run-at-volume (which can run on existing columns with zero new labels)? Possibly defer.

Boundaries / Git

Concept/design only — nothing created/written/committed. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No fixed IP/DSN/password/container/vector-collection; no runtime label/key hardcoding; no new label columns by default; SQL = SSOT; no vector/NoSQL. Next = GPT review.