KB-3D22

dot-iu-cutter v0.5 — Scale, Index & Label/Metadata Risk Note (design only; index DDL NOT authorized) (2026-05-17)

5 min read Revision 1
dot-iu-cutterv0.5full-document-trialscaleindexlabel-metadatarisk-notedesign-onlydieu44

dot-iu-cutter v0.5 — Scale, Index & Label/Metadata Risk Note

Date: 2026-05-17 · Status: DESIGN / RISK ASSESSMENT ONLY — no index DDL, no label registry, no schema change, nothing executed. Parent: design-master.

1. Current index state (grounded, read-only)

cutter_governance (12 tables) has only primary-key btree indexes plus: cut_change_set unique(idempotency_key), unique(rollback_key); canonical_address_alias secondary indexes (alias_kind, alias_text, target_unit_id, (valid_from, valid_until)). No secondary index exists on the runtime's per-IU lookup columns.

2. Unindexed hot paths (per-IU; O(n) → O(n²) over a full document)

From the accepted phases.py @ e93424b5…:

Phase Lookup Column(s) Index? Scale risk
MARK dedup find(decision_backlog_entry, entry_id=…) entry_id (PK) yes (PK) OK at scale
SWEEP find(decision_backlog_entry, status=…) status no seq scan every sweep
REVIEW/CUT lineage find(manifest_envelope, source_doc_ref=…) source_doc_ref no seq scan per IU
REVIEW/CUT lineage find(review_decision, manifest_id=…) manifest_id no seq scan per IU
CUT G-CUT-ONCE find(cut_change_set, decision_backlog_entry_id=…) decision_backlog_entry_id no seq scan per IU
VERIFY find(verify_result, change_set_id=…) change_set_id no seq scan per IU
(signatures) dot_pair_signature.cross_reference_* xref cols no scan if queried

Single-IU trial: trivial (empty/tiny tables). Full document (hundreds of IUs, thousands of rows, repeated per-IU lookups against growing tables): each per-IU op scans the whole table → total cost ~O(n²). Conclusion: pre-scale index-only DDL is a HARD prerequisite before any full-document/bulk cut.

3. Proposed index set — PROPOSAL ONLY, NOT AUTHORIZED, NOT EXECUTED (OD-10)

A future, separately-designed-and-GPT-reviewed index-only DDL cycle (its own dry-run + command-review + sovereign prompt) would add btree indexes on: decision_backlog_entry(status), manifest_envelope(source_doc_ref), review_decision(manifest_id), cut_change_set(decision_backlog_entry_id), verify_result(change_set_id), and (if queried) dot_pair_signature(cross_reference_change_set_id), dot_pair_signature(cross_reference_verify_result_id). This list is illustrative for planning only — no CREATE INDEX is proposed for execution in v0.5; index DDL remains forbidden without separate authorization.

4. Label / metadata risk

No label/metadata registry exists or is authorized. Cutting many units risks ad-hoc runtime label/key hardcoding. Mandate carried forward: no large-scale labeling/reclassification before a separate label/metadata registry design → GPT review; SQL / deployed cutter_governance remains the only authority store; identity_profile jsonb carries no hidden authority; the DOT lane↔reference map stays the centralised schema-binding-tested constant. v0.5 records the need only (OD-7).

5. Scale blockers register (must clear before any full-document execution)

  1. Index-only DDL (this note §3) — design → GPT review → dry-run → command-review → sovereign prompt → apply. BLOCKS bulk/full-document.
  2. Label/metadata registry design — BLOCKS large-scale labeling/reclassification.
  3. Canonical-address grammar for multi-level documents (Chương) — format decision, separately gated. BLOCKS constitution cut.
  4. Existing-corpus tier normalisation (DIEU-32/35) — data-quality, separately authorized.
  5. Authoritative Hiến pháp source + ingestion (upstream of the cutter) — undesigned/unauthorized.
  6. Dry-run-first at volume — must precede any production full-document/staged batch.
  7. Vector/NoSQL stays projection/search only — no integration in the write path.

Boundaries / Git

Design/risk only; no index DDL, no label registry, no schema change, nothing executed. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; SQL SSOT; vector/NoSQL projection/search only. Open: OD-7, OD-10. Next = GPT review.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-full-document-trial-design/dot-iu-cutter-v0.5-scale-index-and-label-metadata-risk-note-2026-05-17.md