KB-ED7E

dot-iu-cutter v0.5 — Extensible Information-Unit Metadata Strategy + Hot-Key Promotion (design only) (2026-05-17)

4 min read Revision 1
dot-iu-cutterv0.5pre-scale-foundationmetadata-strategyhot-key-promotionjsonbdesign-onlydieu44

dot-iu-cutter v0.5 — Extensible Information-Unit Metadata Strategy + Hot-Key Promotion

Date: 2026-05-17 · Status: DESIGN ONLY — no column add, no backfill, no index, no write. Parent: v0.5 pre-scale foundation.

1. Principle

Two metadata tiers, by access pattern:

  • Authority / hot tier = SQL scalar columns (tac_logical_unit/cutter_governance): typed, constrained, indexable, the SSOT. All governance/runtime-queried attributes live here.
  • Sparse / evolving tier = JSONB (identity_profile, observed keys body_sha256,canonical_address,source_span): descriptive, low-frequency, schema-light, never authority, never scanned at scale.

JSONB gives extensibility without migrations for cold descriptive keys; the moment a key becomes hot (queried on a runtime/at-scale path) it is promoted to the scalar tier. This avoids both migration churn (cold keys) and JSONB-scan-at-scale (hot keys).

2. Hot-key promotion policy (JSONB key → SQL indexed scalar)

A registered metadata key (see registry design) is promoted iff all hold:

  1. On a runtime or at-scale query path (would otherwise force WHERE jsonb->>'k' = … over a growing table).
  2. Needs equality/range/uniqueness or ordering (not merely displayed).
  3. Selectivity/cardinality justifies an index (not a near-constant flag with no filtering value).
  4. Stable semantics (key meaning settled — promoting a churning key is premature).

Promotion mechanics (each its own separately-authorized additive cycle — NOT done here):

  • add nullable scalar column (additive; no rewrite, no NOT NULL retro),
  • backfill from JSONB in a controlled batched write cycle (idempotent, append-safe),
  • add CREATE INDEX CONCURRENTLY on the new column (additive),
  • switch the read path to the column; JSONB copy may remain (denormalised, non-authority) or be dropped from new writes,
  • record in the registry promotion ledger.

Never index JSONB with GiN as a substitute for promotion on hot paths (maintenance cost + still slower than a typed scalar btree for equality/range; and it would bless JSONB as a query authority surface, which the SSOT rule forbids).

3. Anti-scan rules at scale

  • No runtime/hot SQL may filter/join on a JSONB expression once volume is non-trivial. Hot filters must be scalar columns (promoted if needed).
  • The cutter runtime today does not filter on JSONB (verified: all find() predicates are scalar columns) — this strategy keeps it that way as documents scale.
  • Audit/ad-hoc analytics over JSONB is allowed off the hot path only (and not at production-blocking volume).

4. Relationship to the proposed indexes

The §index-only-ddl proposals are exactly the scalar hot columns already present (status, source_doc_ref, manifest_id, change_set_id, …) — no promotion needed for the first full document: the runtime's hot keys are already scalar. Promotion policy is the forward rule for any new metadata introduced later (labels, classifications), routed through the registry, never as raw JSONB on a hot path.

5. Open decisions for GPT

  • OD-M1 Confirm "promote, don't GiN" as the standing rule.
  • OD-M2 On promotion: keep the JSONB copy (denormalised) or stop writing it (single-source) — recommend stop-writing post-promotion to preserve one source of truth.
  • OD-M3 Whether any metadata key is hot enough to need promotion before the first full-document dry-run (assessment: no — existing scalar columns suffice).

Boundaries / Git

Design only — no column/backfill/index/write/commit. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; no new columns by default; JSONB = sparse non-authority; SQL = SSOT; no vector/NoSQL. Next = GPT review.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-pre-scale-foundation-design/dot-iu-cutter-v0.5-extensible-information-unit-metadata-strategy-2026-05-17.md