dot-iu-cutter v0.5 — Scale-Index & Volume Execution Roadmap (DESIGN ONLY) (2026-05-17)
dot-iu-cutter v0.5 — Scale-Index & Volume Execution Roadmap
Date: 2026-05-17
Phase: v0_5_constitution_hardtest_and_information_unit_factory_master_plan
Nature: DESIGN ONLY. No index DDL executes. No volume run executes.
Parent: dot-iu-cutter-v0.5-constitution-hardtest-master-plan-2026-05-17.md
1. Where the index package sits in sequencing
The pre-scale index DDL is already authored and GPT-PASSed, with the D-2 partial/full ruling decided, and execution explicitly deferred to a separate sovereign cycle. It is the immediate prerequisite for any volume work.
index_package_status:
authoring: COMPLETE + GPT-PASS
execution: NOT AUTHORIZED (deferred)
package: knowledge/dev/laws/dieu44-trien-khai/v0.5-pre-scale-index-ddl-authoring/
approved_index_set (7 hot paths):
full_btree:
- idx_dbe_status_emitted_keyset decision_backlog_entry(status,emitted_at,entry_id)
- idx_me_source_doc_ref manifest_envelope(source_doc_ref)
- idx_rd_manifest_id review_decision(manifest_id)
- idx_vr_change_set_id verify_result(change_set_id)
partial_btree (WHERE col IS NOT NULL):
- idx_ccs_dbe_id cut_change_set(decision_backlog_entry_id)
- idx_dps_xref_cs dot_pair_signature(cross_reference_change_set_id)
- idx_dps_xref_vr dot_pair_signature(cross_reference_verify_result_id)
Rationale recap: single-IU trial is safe unindexed, but full-document/bulk is O(n²) on SWEEP cursor + lineage + cut-once guard without these indexes.
2. Roadmap (gated, none authorized now)
roadmap:
Q1_index_dry_run:
do: execute the 7 indexes ONLY in an isolated restored-schema DB
verify: catalog-structural assertions (NOT pg_get_indexdef string equality)
gate: GPT review of dry-run
Q2_index_command_review_then_production:
do: command-review package; CREATE INDEX CONCURRENTLY for production
gate: sovereign GPT/User approval; post-run structural verification + backup/restore
Q3_dry_run_at_volume:
fixture: existing 3-doc corpus and/or synthetic doc (OD-V1)
measure: EXPLAIN/timing with vs without indexes; invariant; resume; no dup cut
gate: GPT review
Q4_tier_normalization_if_needed:
scope: DIEU_32 / DIEU_35 blank-tier — separate read-review-write cycle
Q5_label_metadata_registry_design_cycle: schema design only
Q6_source_registry_ingestion_design_cycle: source authority + parser profiles
Q7_grammar_profile_validation: incomex-architecture-constitution-v4 (no cut)
Q8_hien_phap_dry_run_at_volume: full Constitution in isolated env
Q9_hien_phap_staged_production_small_batch: bounded + checkpoint/resume + sovereign
Cắt hiến pháp becomes available only after Q9, per-batch.
3. Dry-run-at-volume plan (design)
dry_run_at_volume:
environment: isolated DB restored from production schema backup (NO prod touch)
fixtures (OD-V1):
- F1: replay existing DIEU_28/32/35 corpus shape at multiplied volume
- F2: synthetic N-IU document (parameterized N = 1e2,1e3,1e4)
- F3: real Constitution (only at Q8, after grammar profile validated)
assertions:
- row-delta invariant: +15 per IU (per-IU manifest, OD-M1) OR documented revision
- rerun delta-0: replaying a completed batch creates zero new rows (idempotency)
- checkpoint/resume: kill mid-batch at IU k; resume; final state == uninterrupted
- no duplicate cut: cut-once guard holds under concurrency + resume
- DOT lane separation: cutter_exec / cutter_verify lanes never overlap at volume
- performance: EXPLAIN ANALYZE on the 7 hot paths; verify index usage; record
p50/p95 per stage at each N; flag any seq-scan on hot path as FAIL
exit: GPT review; no production write implied by any of this
4. Checkpoint / resume model (design)
checkpoint:
unit: per-IU (atomic per-phase txn already validated in v0.4 RERUN4)
cursor: decision_backlog_entry(status, emitted_at, entry_id) keyset (indexed in Q2)
resume: on restart, SWEEP continues from last committed entry; completed IUs
re-evaluated as no-op (idempotent) -> delta-0
batch_bound: max_iu_per_batch is a config parameter, NOT a hardcoded constant
failure: a failing IU is parked (forward-compensation), batch continues per policy
5. Staged production rollout (design)
staged_rollout:
precondition: Q1..Q3 PASS + Q8 PASS + sovereign approval
batching:
- explicit target range per batch (e.g. Điều 0..5), recorded
- small first batch, widen only after each batch GPT-reviewed
per_batch: backup -> bounded execute -> verify -> review -> next
rollback_policy:
- NO document-wide delete rollback
- per-IU forward-compensation only (append corrective ledger rows)
- a bad IU is compensated forward, never the whole document reverted
abort: any invariant breach halts the batch; no auto-continue
6. Volume estimate caveat
Handoff estimated 300–500 leaf IUs / 5000–7500 governance rows assuming a clause/point grammar. The fixture is not clause/point shaped (canonicalization doc §6). Volume estimate is unreliable until OD-G2 (leaf granularity) is ruled. Do not size batches off the handoff number.
7. Open decisions
open_decisions:
OD-V1: dry-run-at-volume fixture choice (F1/F2/F3 mix)
OD-V2: target N ladder for synthetic scaling (1e2 -> 1e6)
OD-V3: per-stage performance pass/fail thresholds
OD-I1: index execution route (continue Q1 now vs hold for master-plan ratification)
OD-M1: per-IU vs document-level manifest (drives invariant)
8. Do not run yet
No index DDL execution, no dry-run, no volume run, no production batch, no checkpoint write, no schema migration, no code change. Design only. Forbidden list = master plan §10.
9. Git
git: { branch: main, HEAD: e93424b5ff7fa5e4b8406131977ce4339cd0856a,
status_short_iu_cutter: clean, code_changed: false, commit_made: false }