dot-iu-cutter v0.5 — Extensible Information-Unit Metadata Strategy + Hot-Key Promotion (design only) (2026-05-17)
dot-iu-cutter v0.5 — Extensible Information-Unit Metadata Strategy + Hot-Key Promotion
Date: 2026-05-17 · Status: DESIGN ONLY — no column add, no backfill, no index, no write. Parent: v0.5 pre-scale foundation.
1. Principle
Two metadata tiers, by access pattern:
- Authority / hot tier = SQL scalar columns (
tac_logical_unit/cutter_governance): typed, constrained, indexable, the SSOT. All governance/runtime-queried attributes live here. - Sparse / evolving tier = JSONB (
identity_profile, observed keysbody_sha256,canonical_address,source_span): descriptive, low-frequency, schema-light, never authority, never scanned at scale.
JSONB gives extensibility without migrations for cold descriptive keys; the moment a key becomes hot (queried on a runtime/at-scale path) it is promoted to the scalar tier. This avoids both migration churn (cold keys) and JSONB-scan-at-scale (hot keys).
2. Hot-key promotion policy (JSONB key → SQL indexed scalar)
A registered metadata key (see registry design) is promoted iff all hold:
- On a runtime or at-scale query path (would otherwise force
WHERE jsonb->>'k' = …over a growing table). - Needs equality/range/uniqueness or ordering (not merely displayed).
- Selectivity/cardinality justifies an index (not a near-constant flag with no filtering value).
- Stable semantics (key meaning settled — promoting a churning key is premature).
Promotion mechanics (each its own separately-authorized additive cycle — NOT done here):
- add nullable scalar column (additive; no rewrite, no NOT NULL retro),
- backfill from JSONB in a controlled batched write cycle (idempotent, append-safe),
- add
CREATE INDEX CONCURRENTLYon the new column (additive), - switch the read path to the column; JSONB copy may remain (denormalised, non-authority) or be dropped from new writes,
- record in the registry promotion ledger.
Never index JSONB with GiN as a substitute for promotion on hot paths (maintenance cost + still slower than a typed scalar btree for equality/range; and it would bless JSONB as a query authority surface, which the SSOT rule forbids).
3. Anti-scan rules at scale
- No runtime/hot SQL may filter/join on a JSONB expression once volume is non-trivial. Hot filters must be scalar columns (promoted if needed).
- The cutter runtime today does not filter on JSONB (verified: all
find()predicates are scalar columns) — this strategy keeps it that way as documents scale. - Audit/ad-hoc analytics over JSONB is allowed off the hot path only (and not at production-blocking volume).
4. Relationship to the proposed indexes
The §index-only-ddl proposals are exactly the scalar hot columns already present (status, source_doc_ref, manifest_id, change_set_id, …) — no promotion needed for the first full document: the runtime's hot keys are already scalar. Promotion policy is the forward rule for any new metadata introduced later (labels, classifications), routed through the registry, never as raw JSONB on a hot path.
5. Open decisions for GPT
- OD-M1 Confirm "promote, don't GiN" as the standing rule.
- OD-M2 On promotion: keep the JSONB copy (denormalised) or stop writing it (single-source) — recommend stop-writing post-promotion to preserve one source of truth.
- OD-M3 Whether any metadata key is hot enough to need promotion before the first full-document dry-run (assessment: no — existing scalar columns suffice).
Boundaries / Git
Design only — no column/backfill/index/write/commit. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; no new columns by default; JSONB = sparse non-authority; SQL = SSOT; no vector/NoSQL. Next = GPT review.