dot-iu-cutter v0.5 — Full-Document Trial Routing DESIGN MASTER (design only; no execution) (2026-05-17)
dot-iu-cutter v0.5 — Full-Document Trial Routing DESIGN MASTER
Date: 2026-05-17 · Status: DESIGN ONLY — nothing executed. No production write, no CUT/VERIFY, no deploy/restart, no schema migration, no index DDL, no label registry, no vector/NoSQL, no alias write, no code change, no commit.
Predecessor: first controlled production CUT/VERIFY trial CLOSED_PASS (single-IU production write proven; GPT-reviewed). Accepted code commit e93424b5ff7fa5e4b8406131977ce4339cd0856a (branch main, iu-cutter clean).
This package (6 docs) is for GPT review only. Self-advance to any execution is PROHIBITED.
0. Purpose
Plan an end-to-end cut of one full document (example candidate = Hiến pháp / the 2013 Constitution) and define how to merge/store it alongside the three previously cut documents. v0.5 is a routing/design phase: it produces the plan + open decisions; it does not select a final target, write code, or touch production.
1. The three previously cut documents + storage (grounded, read-only)
Live read-only inventory of public.tac_logical_unit (production, sysid 7611578671664259111): exactly three doc_code values exist — the three already-cut documents:
| doc_code | rows | tiers present | authority / lifecycle / fmt |
|---|---|---|---|
DIEU-28 |
27 | root, section, unit | draft / draft_only / canonical-address-v1 |
DIEU-32 |
23 | (tier blank) | draft / draft_only / canonical-address-v1 |
DIEU-35 |
36 | (tier blank) | draft / draft_only / canonical-address-v1 |
| total | 86 | uniformly draft, non-enacted |
- Storage (SSOT):
public.tac_logical_unit(PostgresdirectusDB) holds the logical units;cutter_governance.*(12 tables) holds the cut/verify governance ledger + manifest. The first production trial added the cut/verify ledger for one DIEU-28 IU (D38-DIEU28-S3-P1); the rest of the corpus is logical units only. - Canonical address grammar (canonical-address-v1):
D38-DIEU<NN>-<SECTION>[-P<n>](e.g.D38-DIEU28-ROOT,D38-DIEU28-S3,D38-DIEU28-S3-P1); tier ∈ {root, section, unit}; section_type ∈ {heading, paragraph, principle, technical_spec, governance_process, checklist, process, definition, article, …}. - Data-shape inconsistency (flag): only
DIEU-28is fully tiered (root/section/unit);DIEU-32andDIEU-35have blanktier(59 of 86 rows have empty tier). Any merge/full-document design must treat tier-normalisation of the existing corpus as an explicit OPEN item (OD-9), not silently assume uniform shape. - Detail: see
…-three-existing-cut-documents-merge-and-storage-design-….
2. Authoritative source for Hiến pháp
Not present in the corpus (no doc_code matching hiến pháp / constitution / HP; only DIEU-28/32/35 exist). Therefore the authoritative source is external and must be defined before any cut: the officially promulgated 2013 Constitution of the Socialist Republic of Vietnam (Hiến pháp năm 2013) — structure: Lời nói đầu (preamble) + 11 Chương (chapters) + 120 Điều (articles); articles contain Khoản (clauses) and some Điểm (points). The exact canonical source/edition + ingestion path is OD-2 (OPEN for GPT) — no source is invented here. Detail: …-source-document-and-authority-selection-….
3. Target granularity (proposed mapping → tier / section_type / canonical address)
| Legal level | Proposed tier | Proposed section_type | Address fragment |
|---|---|---|---|
| Document (Hiến pháp) | root | heading | <DOC>-ROOT |
| Chương (chapter) | section | heading | <DOC>-C<n> |
| Điều (article) | section or unit | article | <DOC>-C<n>-DIEU<m> |
| Khoản (clause) | unit | paragraph/principle | …-DIEU<m>-K<k> |
| Điểm (point) | unit | paragraph | …-K<k>-P<p> |
| Đoạn (paragraph) / information unit | unit | paragraph | leaf address |
The information unit is the leaf (Khoản/Điểm/đoạn) — the cut granularity proven by the single-IU trial. Article-level vs clause/point-level leaf granularity is OD-3 (OPEN). The 2013-Constitution chapter level adds a tier (Chương) not present in canonical-address-v1; a v1-compatible extension or v2 grammar is OD-4 (OPEN). Detail: …-hien-phap-trial-cut-routing-design-….
4. Next-trial mode recommendation
Recommended (NOT self-authorized): dry-run first (full-document, isolated PG) → then staged production trial by small batch, gated behind pre-scale index-only DDL (design+review+apply) and a label/metadata posture decision. A single big-bang production full-document cut is not recommended (unindexed hot paths → O(n²); high blast radius for a 120-article document). If/when production: draft authority only, staged batches with checkpoints, never enacting. Final mode + whether Hiến pháp is the actual first full-document target = OD-1 (OPEN for GPT) — GPT explicitly said "Do NOT cut Hiến pháp yet"; this design treats it as the worked example, not a commitment.
5. Expected IU count / row volume (estimate)
- Hiến pháp 2013 ≈ 120 Điều, ~11 Chương, preamble; leaf IUs (Khoản/Điểm/đoạn) ≈ 300–500 (estimate; exact = post-ingestion count, OD-5).
- Governance rows = +15 per IU (validated invariant). Article-granularity ≈ 120 × 15 ≈ ~1,800
cutter_governancerows; clause/point-granularity ≈ 350–500 × 15 ≈ ~5,000–7,500 rows; plus ~onetac_logical_unitrow per IU + section + root. - Order-of-magnitude only — drives the index-DDL necessity and batch sizing. Detail in
…-scale-index-and-label-metadata-risk-note-….
6. Pre-scale index DDL — REQUIRED (assessed; design only)
CONFIRMED prerequisite. cutter_governance currently has only PK indexes + a few unique/alias indexes. The runtime's per-IU hot paths are unindexed sequential scans that degrade O(n²) over a full document: SWEEP decision_backlog_entry.status; lineage manifest_envelope.source_doc_ref, review_decision.manifest_id; G-CUT-ONCE cut_change_set.decision_backlog_entry_id; VERIFY verify_result.change_set_id. Index-only DDL design + GPT review + separate authorization must precede any full-document/bulk cut. No index DDL is proposed for execution here. Detail + concrete index list (proposal only): …-scale-index-and-label-metadata-risk-note-….
7. Label / metadata needs
No label/metadata registry exists or is authorized. Cutting many units must not introduce runtime label/key hardcoding; large-scale labeling/reclassification is blocked on a separate label/metadata registry design → GPT review. v0.5 only records the need (OD-7). SQL / deployed cutter_governance remains SSOT.
8. Merge / storage destination
Destination = the same public.tac_logical_unit corpus (new doc_code, e.g. HIENPHAP-2013, config-driven not hardcoded) co-resident with DIEU-28/32/35, plus the cutter_governance manifest/ledger per IU. Canonical-address namespace for the constitution + co-residency/no-collision rules + the DIEU-32/35 tier-normalisation question = OD-4/OD-8/OD-9. Detail: …-three-existing-cut-documents-merge-and-storage-design-….
9. Manifest strategy (full document)
Two candidate strategies, OD-6 (OPEN for GPT):
- (A) per-IU envelope (proven by the single-IU trial): N
manifest_enveloperows, each 1manifest_unit_block; preserves the validated +15-per-IU invariant; simplest, most auditable; N×15 rows. - (B) one document-level envelope with N
manifest_unit_blockrows (composite PK(envelope_id, unit_local_id)already supports this): fewer envelopes, but changes the per-IU +15 invariant and the cut/verify granularity/atomicity model. Recommendation leans (A) for the first full-document trial (keeps the validated invariant), with (B) as a deferred optimisation — GPT to decide.
10. Rollback / forward-compensation (multi-IU)
Per-IU pipeline stays one-atomic-transaction-per-phase, append-only, forward-compensation/no-delete (as validated). Document-level = N independent per-IU pipelines. Policy: on a per-IU failure → STOP that IU, preserve all prior committed IUs (append-only; no document-wide rollback, no delete), forward-compensate the failed IU via the reviewed path, report honestly; resumability via deterministic entry_id idempotency (replay-safe MARK). Staged batches with checkpoints. Backup-restore = disaster backstop only. Detail: …-hien-phap-trial-cut-routing-design-….
11. No-hardcode strategy
- No fixed source path: the Hiến pháp source location is config/parameter-driven, never a literal in code or script.
- No hardcoded labels: no runtime label/metadata key literals; labels only via the future separately-designed registry.
- No hardcoded storage destination: DB/schema/doc_code/canonical-address grammar are config-driven or derived, not literals (the only permitted literals remain auditable safety constants: prod sysid, accepted-commit pin, exact role/lane names).
12. SQL / NoSQL hybrid posture
SQL is SSOT (public.tac_logical_unit + cutter_governance). incomex-qdrant / any vector store = projection / search only, never an authority store, never in the cut/verify write path. No vector/NoSQL integration in v0.5. Unchanged from prior phases.
13. Open decisions for GPT (consolidated)
- OD-1 Next-trial mode (dry-run-first vs staged production batch vs single full-document) AND whether Hiến pháp is the actual first full-document target (GPT said not yet).
- OD-2 Authoritative Hiến pháp source/edition + ingestion path (none invented).
- OD-3 Leaf cut granularity: article vs clause vs point.
- OD-4 Canonical-address grammar for the constitution (Chương tier): canonical-address-v1 extension vs a v2.
- OD-5 Exact IU count (post-ingestion) + batch size for staged production.
- OD-6 Manifest strategy: per-IU envelope (A) vs document-level envelope (B).
- OD-7 Label/metadata registry: design now vs defer; scope before multi-unit labeling.
- OD-8 Storage doc_code/namespace for the constitution; co-residency & no-collision rules with DIEU-28/32/35.
- OD-9 Normalise the existing corpus tier inconsistency (DIEU-32/35 blank tier) before/independent of the full-document trial?
- OD-10 Pre-scale index-only DDL set + when (must precede bulk) — design/review as its own gated cycle.
- OD-11 Multi-IU partial-failure/resumability policy confirmation.
- OD-12 Authority posture for the trial (draft-only, never enacting) confirmation.
Boundaries / Git / Hardcode
Design only · no production writes · no CUT/VERIFY · no deploy/restart · no schema migration · no index DDL · no label registry creation · no vector/NoSQL integration · no alias writes · no code change · no git commit. Git: branch main · HEAD e93424b5ff7fa5e4b8406131977ce4339cd0856a · git status --short -- iu-cutter = clean (0 lines). No fixed IP/DSN/password/container/vector-collection; no runtime label/key hardcoding; no schema change. Scale blockers recorded (index DDL, label/metadata registry) — must clear before any full-document execution. Next = GPT review of this v0.5 design package; self-advance PROHIBITED.