KB-3D22

dot-iu-cutter v0.5 — Scale, Index & Label/Metadata Risk Note (design only; index DDL NOT authorized) (2026-05-17)

5 min read Revision 1

dot-iu-cutterv0.5full-document-trialscaleindexlabel-metadatarisk-notedesign-onlydieu44

dot-iu-cutter v0.5 — Scale, Index & Label/Metadata Risk Note

Date: 2026-05-17 · Status: DESIGN / RISK ASSESSMENT ONLY — no index DDL, no label registry, no schema change, nothing executed. Parent: design-master.

1. Current index state (grounded, read-only)

cutter_governance (12 tables) has only primary-key btree indexes plus: cut_change_set unique(idempotency_key), unique(rollback_key); canonical_address_alias secondary indexes (alias_kind, alias_text, target_unit_id, (valid_from, valid_until)). No secondary index exists on the runtime's per-IU lookup columns.

2. Unindexed hot paths (per-IU; O(n) → O(n²) over a full document)

From the accepted phases.py @ e93424b5…:

Phase	Lookup	Column(s)	Index?	Scale risk
MARK dedup	`find(decision_backlog_entry, entry_id=…)`	`entry_id` (PK)	yes (PK)	OK at scale
SWEEP	`find(decision_backlog_entry, status=…)`	`status`	no	seq scan every sweep
REVIEW/CUT lineage	`find(manifest_envelope, source_doc_ref=…)`	`source_doc_ref`	no	seq scan per IU
REVIEW/CUT lineage	`find(review_decision, manifest_id=…)`	`manifest_id`	no	seq scan per IU
CUT G-CUT-ONCE	`find(cut_change_set, decision_backlog_entry_id=…)`	`decision_backlog_entry_id`	no	seq scan per IU
VERIFY	`find(verify_result, change_set_id=…)`	`change_set_id`	no	seq scan per IU
(signatures)	`dot_pair_signature.cross_reference_*`	xref cols	no	scan if queried

Single-IU trial: trivial (empty/tiny tables). Full document (hundreds of IUs, thousands of rows, repeated per-IU lookups against growing tables): each per-IU op scans the whole table → total cost ~O(n²). Conclusion: pre-scale index-only DDL is a HARD prerequisite before any full-document/bulk cut.

3. Proposed index set — PROPOSAL ONLY, NOT AUTHORIZED, NOT EXECUTED (OD-10)

A future, separately-designed-and-GPT-reviewed index-only DDL cycle (its own dry-run + command-review + sovereign prompt) would add btree indexes on: decision_backlog_entry(status), manifest_envelope(source_doc_ref), review_decision(manifest_id), cut_change_set(decision_backlog_entry_id), verify_result(change_set_id), and (if queried) dot_pair_signature(cross_reference_change_set_id), dot_pair_signature(cross_reference_verify_result_id). This list is illustrative for planning only — no CREATE INDEX is proposed for execution in v0.5; index DDL remains forbidden without separate authorization.

4. Label / metadata risk

No label/metadata registry exists or is authorized. Cutting many units risks ad-hoc runtime label/key hardcoding. Mandate carried forward: no large-scale labeling/reclassification before a separate label/metadata registry design → GPT review; SQL / deployed cutter_governance remains the only authority store; identity_profile jsonb carries no hidden authority; the DOT lane↔reference map stays the centralised schema-binding-tested constant. v0.5 records the need only (OD-7).

5. Scale blockers register (must clear before any full-document execution)

Index-only DDL (this note §3) — design → GPT review → dry-run → command-review → sovereign prompt → apply. BLOCKS bulk/full-document.
Label/metadata registry design — BLOCKS large-scale labeling/reclassification.
Canonical-address grammar for multi-level documents (Chương) — format decision, separately gated. BLOCKS constitution cut.
Existing-corpus tier normalisation (DIEU-32/35) — data-quality, separately authorized.
Authoritative Hiến pháp source + ingestion (upstream of the cutter) — undesigned/unauthorized.
Dry-run-first at volume — must precede any production full-document/staged batch.
Vector/NoSQL stays projection/search only — no integration in the write path.

Boundaries / Git

Design/risk only; no index DDL, no label registry, no schema change, nothing executed. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; SQL SSOT; vector/NoSQL projection/search only. Open: OD-7, OD-10. Next = GPT review.