KB-494E

dot-iu-cutter v0.5 — Full-Document Trial Routing DESIGN MASTER (design only; no execution) (2026-05-17)

12 min read Revision 1
dot-iu-cutterv0.5full-document-trialdesign-masterdesign-onlydieu44gpt-review

dot-iu-cutter v0.5 — Full-Document Trial Routing DESIGN MASTER

Date: 2026-05-17 · Status: DESIGN ONLY — nothing executed. No production write, no CUT/VERIFY, no deploy/restart, no schema migration, no index DDL, no label registry, no vector/NoSQL, no alias write, no code change, no commit. Predecessor: first controlled production CUT/VERIFY trial CLOSED_PASS (single-IU production write proven; GPT-reviewed). Accepted code commit e93424b5ff7fa5e4b8406131977ce4339cd0856a (branch main, iu-cutter clean). This package (6 docs) is for GPT review only. Self-advance to any execution is PROHIBITED.

0. Purpose

Plan an end-to-end cut of one full document (example candidate = Hiến pháp / the 2013 Constitution) and define how to merge/store it alongside the three previously cut documents. v0.5 is a routing/design phase: it produces the plan + open decisions; it does not select a final target, write code, or touch production.

1. The three previously cut documents + storage (grounded, read-only)

Live read-only inventory of public.tac_logical_unit (production, sysid 7611578671664259111): exactly three doc_code values exist — the three already-cut documents:

doc_code rows tiers present authority / lifecycle / fmt
DIEU-28 27 root, section, unit draft / draft_only / canonical-address-v1
DIEU-32 23 (tier blank) draft / draft_only / canonical-address-v1
DIEU-35 36 (tier blank) draft / draft_only / canonical-address-v1
total 86 uniformly draft, non-enacted
  • Storage (SSOT): public.tac_logical_unit (Postgres directus DB) holds the logical units; cutter_governance.* (12 tables) holds the cut/verify governance ledger + manifest. The first production trial added the cut/verify ledger for one DIEU-28 IU (D38-DIEU28-S3-P1); the rest of the corpus is logical units only.
  • Canonical address grammar (canonical-address-v1): D38-DIEU<NN>-<SECTION>[-P<n>] (e.g. D38-DIEU28-ROOT, D38-DIEU28-S3, D38-DIEU28-S3-P1); tier ∈ {root, section, unit}; section_type ∈ {heading, paragraph, principle, technical_spec, governance_process, checklist, process, definition, article, …}.
  • Data-shape inconsistency (flag): only DIEU-28 is fully tiered (root/section/unit); DIEU-32 and DIEU-35 have blank tier (59 of 86 rows have empty tier). Any merge/full-document design must treat tier-normalisation of the existing corpus as an explicit OPEN item (OD-9), not silently assume uniform shape.
  • Detail: see …-three-existing-cut-documents-merge-and-storage-design-….

2. Authoritative source for Hiến pháp

Not present in the corpus (no doc_code matching hiến pháp / constitution / HP; only DIEU-28/32/35 exist). Therefore the authoritative source is external and must be defined before any cut: the officially promulgated 2013 Constitution of the Socialist Republic of Vietnam (Hiến pháp năm 2013) — structure: Lời nói đầu (preamble) + 11 Chương (chapters) + 120 Điều (articles); articles contain Khoản (clauses) and some Điểm (points). The exact canonical source/edition + ingestion path is OD-2 (OPEN for GPT) — no source is invented here. Detail: …-source-document-and-authority-selection-….

3. Target granularity (proposed mapping → tier / section_type / canonical address)

Legal level Proposed tier Proposed section_type Address fragment
Document (Hiến pháp) root heading <DOC>-ROOT
Chương (chapter) section heading <DOC>-C<n>
Điều (article) section or unit article <DOC>-C<n>-DIEU<m>
Khoản (clause) unit paragraph/principle …-DIEU<m>-K<k>
Điểm (point) unit paragraph …-K<k>-P<p>
Đoạn (paragraph) / information unit unit paragraph leaf address

The information unit is the leaf (Khoản/Điểm/đoạn) — the cut granularity proven by the single-IU trial. Article-level vs clause/point-level leaf granularity is OD-3 (OPEN). The 2013-Constitution chapter level adds a tier (Chương) not present in canonical-address-v1; a v1-compatible extension or v2 grammar is OD-4 (OPEN). Detail: …-hien-phap-trial-cut-routing-design-….

4. Next-trial mode recommendation

Recommended (NOT self-authorized): dry-run first (full-document, isolated PG) → then staged production trial by small batch, gated behind pre-scale index-only DDL (design+review+apply) and a label/metadata posture decision. A single big-bang production full-document cut is not recommended (unindexed hot paths → O(n²); high blast radius for a 120-article document). If/when production: draft authority only, staged batches with checkpoints, never enacting. Final mode + whether Hiến pháp is the actual first full-document target = OD-1 (OPEN for GPT) — GPT explicitly said "Do NOT cut Hiến pháp yet"; this design treats it as the worked example, not a commitment.

5. Expected IU count / row volume (estimate)

  • Hiến pháp 2013 ≈ 120 Điều, ~11 Chương, preamble; leaf IUs (Khoản/Điểm/đoạn) ≈ 300–500 (estimate; exact = post-ingestion count, OD-5).
  • Governance rows = +15 per IU (validated invariant). Article-granularity ≈ 120 × 15 ≈ ~1,800 cutter_governance rows; clause/point-granularity ≈ 350–500 × 15 ≈ ~5,000–7,500 rows; plus ~one tac_logical_unit row per IU + section + root.
  • Order-of-magnitude only — drives the index-DDL necessity and batch sizing. Detail in …-scale-index-and-label-metadata-risk-note-….

6. Pre-scale index DDL — REQUIRED (assessed; design only)

CONFIRMED prerequisite. cutter_governance currently has only PK indexes + a few unique/alias indexes. The runtime's per-IU hot paths are unindexed sequential scans that degrade O(n²) over a full document: SWEEP decision_backlog_entry.status; lineage manifest_envelope.source_doc_ref, review_decision.manifest_id; G-CUT-ONCE cut_change_set.decision_backlog_entry_id; VERIFY verify_result.change_set_id. Index-only DDL design + GPT review + separate authorization must precede any full-document/bulk cut. No index DDL is proposed for execution here. Detail + concrete index list (proposal only): …-scale-index-and-label-metadata-risk-note-….

7. Label / metadata needs

No label/metadata registry exists or is authorized. Cutting many units must not introduce runtime label/key hardcoding; large-scale labeling/reclassification is blocked on a separate label/metadata registry design → GPT review. v0.5 only records the need (OD-7). SQL / deployed cutter_governance remains SSOT.

8. Merge / storage destination

Destination = the same public.tac_logical_unit corpus (new doc_code, e.g. HIENPHAP-2013, config-driven not hardcoded) co-resident with DIEU-28/32/35, plus the cutter_governance manifest/ledger per IU. Canonical-address namespace for the constitution + co-residency/no-collision rules + the DIEU-32/35 tier-normalisation question = OD-4/OD-8/OD-9. Detail: …-three-existing-cut-documents-merge-and-storage-design-….

9. Manifest strategy (full document)

Two candidate strategies, OD-6 (OPEN for GPT):

  • (A) per-IU envelope (proven by the single-IU trial): N manifest_envelope rows, each 1 manifest_unit_block; preserves the validated +15-per-IU invariant; simplest, most auditable; N×15 rows.
  • (B) one document-level envelope with N manifest_unit_block rows (composite PK (envelope_id, unit_local_id) already supports this): fewer envelopes, but changes the per-IU +15 invariant and the cut/verify granularity/atomicity model. Recommendation leans (A) for the first full-document trial (keeps the validated invariant), with (B) as a deferred optimisation — GPT to decide.

10. Rollback / forward-compensation (multi-IU)

Per-IU pipeline stays one-atomic-transaction-per-phase, append-only, forward-compensation/no-delete (as validated). Document-level = N independent per-IU pipelines. Policy: on a per-IU failure → STOP that IU, preserve all prior committed IUs (append-only; no document-wide rollback, no delete), forward-compensate the failed IU via the reviewed path, report honestly; resumability via deterministic entry_id idempotency (replay-safe MARK). Staged batches with checkpoints. Backup-restore = disaster backstop only. Detail: …-hien-phap-trial-cut-routing-design-….

11. No-hardcode strategy

  • No fixed source path: the Hiến pháp source location is config/parameter-driven, never a literal in code or script.
  • No hardcoded labels: no runtime label/metadata key literals; labels only via the future separately-designed registry.
  • No hardcoded storage destination: DB/schema/doc_code/canonical-address grammar are config-driven or derived, not literals (the only permitted literals remain auditable safety constants: prod sysid, accepted-commit pin, exact role/lane names).

12. SQL / NoSQL hybrid posture

SQL is SSOT (public.tac_logical_unit + cutter_governance). incomex-qdrant / any vector store = projection / search only, never an authority store, never in the cut/verify write path. No vector/NoSQL integration in v0.5. Unchanged from prior phases.

13. Open decisions for GPT (consolidated)

  • OD-1 Next-trial mode (dry-run-first vs staged production batch vs single full-document) AND whether Hiến pháp is the actual first full-document target (GPT said not yet).
  • OD-2 Authoritative Hiến pháp source/edition + ingestion path (none invented).
  • OD-3 Leaf cut granularity: article vs clause vs point.
  • OD-4 Canonical-address grammar for the constitution (Chương tier): canonical-address-v1 extension vs a v2.
  • OD-5 Exact IU count (post-ingestion) + batch size for staged production.
  • OD-6 Manifest strategy: per-IU envelope (A) vs document-level envelope (B).
  • OD-7 Label/metadata registry: design now vs defer; scope before multi-unit labeling.
  • OD-8 Storage doc_code/namespace for the constitution; co-residency & no-collision rules with DIEU-28/32/35.
  • OD-9 Normalise the existing corpus tier inconsistency (DIEU-32/35 blank tier) before/independent of the full-document trial?
  • OD-10 Pre-scale index-only DDL set + when (must precede bulk) — design/review as its own gated cycle.
  • OD-11 Multi-IU partial-failure/resumability policy confirmation.
  • OD-12 Authority posture for the trial (draft-only, never enacting) confirmation.

Boundaries / Git / Hardcode

Design only · no production writes · no CUT/VERIFY · no deploy/restart · no schema migration · no index DDL · no label registry creation · no vector/NoSQL integration · no alias writes · no code change · no git commit. Git: branch main · HEAD e93424b5ff7fa5e4b8406131977ce4339cd0856a · git status --short -- iu-cutter = clean (0 lines). No fixed IP/DSN/password/container/vector-collection; no runtime label/key hardcoding; no schema change. Scale blockers recorded (index DDL, label/metadata registry) — must clear before any full-document execution. Next = GPT review of this v0.5 design package; self-advance PROHIBITED.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-full-document-trial-design/dot-iu-cutter-v0.5-full-document-trial-design-master-2026-05-17.md