dot-iu-cutter v0.5 — Scale, Index & Label/Metadata Risk Note (design only; index DDL NOT authorized) (2026-05-17)
dot-iu-cutter v0.5 — Scale, Index & Label/Metadata Risk Note
Date: 2026-05-17 · Status: DESIGN / RISK ASSESSMENT ONLY — no index DDL, no label registry, no schema change, nothing executed. Parent: design-master.
1. Current index state (grounded, read-only)
cutter_governance (12 tables) has only primary-key btree indexes plus: cut_change_set unique(idempotency_key), unique(rollback_key); canonical_address_alias secondary indexes (alias_kind, alias_text, target_unit_id, (valid_from, valid_until)). No secondary index exists on the runtime's per-IU lookup columns.
2. Unindexed hot paths (per-IU; O(n) → O(n²) over a full document)
From the accepted phases.py @ e93424b5…:
| Phase | Lookup | Column(s) | Index? | Scale risk |
|---|---|---|---|---|
| MARK dedup | find(decision_backlog_entry, entry_id=…) |
entry_id (PK) |
yes (PK) | OK at scale |
| SWEEP | find(decision_backlog_entry, status=…) |
status |
no | seq scan every sweep |
| REVIEW/CUT lineage | find(manifest_envelope, source_doc_ref=…) |
source_doc_ref |
no | seq scan per IU |
| REVIEW/CUT lineage | find(review_decision, manifest_id=…) |
manifest_id |
no | seq scan per IU |
| CUT G-CUT-ONCE | find(cut_change_set, decision_backlog_entry_id=…) |
decision_backlog_entry_id |
no | seq scan per IU |
| VERIFY | find(verify_result, change_set_id=…) |
change_set_id |
no | seq scan per IU |
| (signatures) | dot_pair_signature.cross_reference_* |
xref cols | no | scan if queried |
Single-IU trial: trivial (empty/tiny tables). Full document (hundreds of IUs, thousands of rows, repeated per-IU lookups against growing tables): each per-IU op scans the whole table → total cost ~O(n²). Conclusion: pre-scale index-only DDL is a HARD prerequisite before any full-document/bulk cut.
3. Proposed index set — PROPOSAL ONLY, NOT AUTHORIZED, NOT EXECUTED (OD-10)
A future, separately-designed-and-GPT-reviewed index-only DDL cycle (its own dry-run + command-review + sovereign prompt) would add btree indexes on: decision_backlog_entry(status), manifest_envelope(source_doc_ref), review_decision(manifest_id), cut_change_set(decision_backlog_entry_id), verify_result(change_set_id), and (if queried) dot_pair_signature(cross_reference_change_set_id), dot_pair_signature(cross_reference_verify_result_id). This list is illustrative for planning only — no CREATE INDEX is proposed for execution in v0.5; index DDL remains forbidden without separate authorization.
4. Label / metadata risk
No label/metadata registry exists or is authorized. Cutting many units risks ad-hoc runtime label/key hardcoding. Mandate carried forward: no large-scale labeling/reclassification before a separate label/metadata registry design → GPT review; SQL / deployed cutter_governance remains the only authority store; identity_profile jsonb carries no hidden authority; the DOT lane↔reference map stays the centralised schema-binding-tested constant. v0.5 records the need only (OD-7).
5. Scale blockers register (must clear before any full-document execution)
- Index-only DDL (this note §3) — design → GPT review → dry-run → command-review → sovereign prompt → apply. BLOCKS bulk/full-document.
- Label/metadata registry design — BLOCKS large-scale labeling/reclassification.
- Canonical-address grammar for multi-level documents (Chương) — format decision, separately gated. BLOCKS constitution cut.
- Existing-corpus tier normalisation (DIEU-32/35) — data-quality, separately authorized.
- Authoritative Hiến pháp source + ingestion (upstream of the cutter) — undesigned/unauthorized.
- Dry-run-first at volume — must precede any production full-document/staged batch.
- Vector/NoSQL stays projection/search only — no integration in the write path.
Boundaries / Git
Design/risk only; no index DDL, no label registry, no schema change, nothing executed. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; SQL SSOT; vector/NoSQL projection/search only. Open: OD-7, OD-10. Next = GPT review.