dot-iu-cutter v0.5 — Hiến pháp Trial Cut Routing Design (design only) (2026-05-17)
dot-iu-cutter v0.5 — Hiến pháp Trial Cut Routing Design
Date: 2026-05-17 · Status: DESIGN ONLY — no cut, no dry-run, no write. Parent: design-master.
1. Granularity model (proposed; OD-3/OD-4 OPEN)
Vietnamese legal hierarchy → corpus model (tier / section_type / canonical-address-v1 grammar D38-DIEU<NN>-<S>[-P<n>]):
| Legal level | tier | section_type (existing vocab) | address fragment |
|---|---|---|---|
| Document (Hiến pháp) | root | heading | <DOC>-ROOT |
| Chương (chapter) | section | heading | <DOC>-C<n> |
| Điều (article) | section/unit | article | <DOC>-C<n>-DIEU<m> |
| Khoản (clause) | unit | paragraph / principle | …-DIEU<m>-K<k> |
| Điểm (point) | unit | paragraph | …-K<k>-D<p> |
| Đoạn / information unit | unit | paragraph | leaf address |
The leaf information unit (Khoản/Điểm/đoạn) is the cut granularity proven by the single-IU trial. OD-3: leaf = article vs clause vs point. OD-4: the constitution adds a Chương level absent from canonical-address-v1 (current grammar starts at DIEU); needs a v1-compatible extension or a v2 grammar — a grammar/format change is itself a separately-gated design (no schema/format change here).
2. Routing through the validated pipeline (unchanged runtime)
Per IU: MARK → SWEEP → REVIEW(approve) → CUT → VERIFY — exactly the RERUN#4/production-validated path at commit e93424b5…, principal-split (cutter_exec MARK/SWEEP/REVIEW/CUT, cutter_verify VERIFY), DOT-991→change_set / DOT-992→verify_result, append-only, one-atomic-txn-per-phase, +15 governance rows per IU. A full document = N independent per-IU pipelines driven over the document's leaf IUs in deterministic order (canonical address / sort_order). No runtime/code change is proposed; canonicalization remains the stub (OD-2 of v0.4 — real canonical-address/alias still deferred), so a full-document trial validates the governance spine at volume, not real alias resolution.
3. Trial mode (recommended; OD-1 OPEN)
- Dry-run first — full-document, isolated PG (restored prod schema, dry-run-only roles, sysid-guarded, ephemeral, exact-name, torn down — the proven dry-run harness pattern), to validate volume behaviour, the +15×N invariant, lineage/sweep/G-CUT-ONCE at scale, and timing.
- Then staged production trial by small batch (e.g. chapter-by-chapter or fixed N IUs/batch, size = OD-5), each batch its own command-review + sovereign prompt, checkpointed.
- Not recommended: single big-bang production full-document cut (unindexed O(n²) hot paths; high blast radius). Hard gate: pre-scale index-only DDL (design→review→apply) and label/metadata posture decided before any production full-document/bulk cut.
4. Expected volume (estimate; OD-5)
Hiến pháp 2013 ≈ 120 Điều, ~11 Chương; leaf IUs ≈ 300–500 (exact post-ingestion). Governance rows = 15 × IU → ~1,800 (article-granularity) to ~5,000–7,500 (clause/point). Plus ~1 tac_logical_unit row per IU+section+root. Drives index necessity + batch sizing.
5. Manifest strategy (OD-6)
(A) per-IU envelope (1 envelope + 1 unit_block per IU) — preserves the validated +15-per-IU invariant; recommended for the first full-document trial. (B) one document-level envelope + N unit_blocks (composite PK supports it) — fewer envelopes, but changes the per-IU invariant/atomicity; deferred optimisation. GPT decides.
6. Rollback / forward-compensation (multi-IU)
- Per-IU: one-atomic-txn-per-phase, append-only, forward-compensation/no-delete (validated).
- Document-level: no document-wide rollback, no delete. On a per-IU failure → STOP that IU, preserve all prior committed IUs, forward-compensate the failed IU via the reviewed path, honest report, no auto-advance.
- Resumability: deterministic
entry_id(uuid5 of the idempotency key) → replaying a batch re-resolves already-MARKed IUs to their existing entry (no duplicate), so a stopped run is safely resumable from a checkpoint. - Staged batches with explicit checkpoints; backup-restore = disaster backstop only. OD-11 confirm.
Boundaries / Git
Design only; no cut/dry-run/write/code/commit. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoded path/label/destination; SQL SSOT; vector/NoSQL projection/search only. Open: OD-1/3/4/5/6/11. Next = GPT review.