KB-7D8A

dot-iu-cutter v0.5 — Pre-Scale Foundation DESIGN Report (routes package to GPT; design only) (2026-05-17)

7 min read Revision 1
dot-iu-cutterv0.5pre-scale-foundationreportdesign-onlygpt-reviewdieu44

dot-iu-cutter v0.5 — Pre-Scale Foundation DESIGN Report

Date: 2026-05-17 · Status: DESIGN ONLY — nothing executed; routes the package to GPT. Phase trigger: GPT v0.5 full-document design verdict = PASS_WITH_BLOCKERS (full-document/Hiến pháp/second-IU/bulk NOT allowed now; pre-scale index DDL + label/metadata registry required first; vector/NoSQL projection/search only). Accepted commit e93424b5ff7fa5e4b8406131977ce4339cd0856a unchanged.

1. Package (7 docs, …/v0.5-pre-scale-foundation-design/)

  1. pre-scale-index-hot-path-analysis · 2. index-only-ddl-design-for-runtime-paths · 3. information-unit-label-metadata-registry-design · 4. extensible-information-unit-metadata-strategy · 5. existing-corpus-tier-normalization-plan · 6. full-document-dry-run-at-volume-plan · 7. this report.

Read-only grounding only (deployed cutter_governance/tac_logical_unit columns, accepted phases.py/db_adapter.py, indexes, jsonb key shape, row counts). No production write, CUT/VERIFY, DDL, registry, code, commit, vector/NoSQL.

2. Required analysis — answered

  1. Hot paths identified (exact deployed columns): MARK idempotency = decision_backlog_entry.entry_id (PK, OK); SWEEP = decision_backlog_entry.status + (emitted_at,entry_id) keyset (unindexed); lineage = manifest_envelope.source_doc_ref, review_decision.manifest_id (unindexed); cut-once = cut_change_set.decision_backlog_entry_id (unindexed, nullable); verify = verify_result.change_set_id (unindexed); DOT signature = dot_pair_signature.cross_reference_change_set_id|_verify_result_id (unindexed, XOR); manifest/unit-block = composite PK/PK (OK). Single-IU fine because tables empty; full-document ≈ O(n²) → confirms the blocker.
  2. Index-only DDL designed — additive, CREATE INDEX CONCURRENTLY, no rewrite/semantic/data migration; BTREE default; partial only where structurally justified (review live-tail, nullable cut-once, XOR signature); explicitly no GiN (promote, don't GiN). Proposal only — not authorized/executed.
  3. Label/metadata registry concept — label dictionary + label assignment (append-only) + metadata key registry + promotion ledger; key type/cardinality/index policy centralised; no runtime hardcoded labels (binding-layer enforcement, extends existing schema_binding vocabulary discipline); JSONB = sparse evolving non-authority. Nothing created.
  4. Hot-key promotion policy — JSONB→scalar+BTREE only when on a runtime/at-scale path, needs eq/range/uniqueness, selectivity justifies, semantics stable; mechanics = additive nullable column → batched backfill → CONCURRENTLY index → switch read path → registry ledger; never scan JSONB at scale; assessment: no key needs promotion before the first full document (runtime hot keys are already scalar).
  5. Tier normalisation (DIEU-32/35) — read-only assessment done (59/86 rows blank tier; tier is derivable from already-correct canonical_address+section_type; DIEU-28 = oracle); derivation rule designed; the actual UPDATE deferred to a separate write cycle (not authorized here).
  6. Dry-run-at-volume plan — isolated postgres:16, restored prod schema, dry-run-only roles, sysid-guarded, accepted code RO-mounted, ~300–500 IU, +15×N expectation, config-driven batching + deterministic-entry_id checkpoint/resume, EXPLAIN proves index-scan + ~linear timing; no bulk production until this PASSes.
  7. SQL/NoSQL hybrid confirmed — SQL (tac_logical_unit+cutter_governance) = SSOT; incomex-qdrant/vector = projection/search only, fully rebuildable from SQL/source, never authority, never in the cut/verify write path; no NoSQL integration.
  8. Open decisions — consolidated below.

3. Open decisions for GPT (consolidated)

  • Index: OD-I1 approve/adjust index set · OD-I2 include history(entry_id) now/defer · OD-I3 index-before-dryrun vs measure-both · OD-I4 confirm CONCURRENTLY on prod.
  • Label/metadata: OD-L1 registry physical home · OD-L2 assignment append-only vs mutable · OD-L3 v1 vocabulary scope · OD-L4 enforcement point · OD-L5 registry needed before first dry-run? (assessment: no).
  • Metadata: OD-M1 "promote not GiN" standing rule · OD-M2 keep/stop JSONB copy post-promotion · OD-M3 any pre-dry-run promotion (assessment: none).
  • Tier: OD-T1 approve derivation rule (DIEU-28 oracle) · OD-T2 sequence vs dry-run · OD-T3 separate write cycle (recommended).
  • Dry-run: OD-V1 synthetic vs real-sample · OD-V2 unindexed baseline too · OD-V3 N for first dry-run · OD-V4 batch/checkpoint policy.
  • Sequencing: OD-S1 order of the gated cycles — recommended: index-DDL design approve → (tier-normalisation track, independent) → dry-run-at-volume → staged production. Label/metadata registry possibly deferred (OD-L5/OD-M3: not needed for the first volume dry-run).

4. Scale blockers (status)

Index DDL = designed (proposal), not executed · label/metadata registry = concept, not created (possibly deferrable) · multi-level canonical-address grammar (Chương) = still open (v0.5 full-doc package) · corpus tier normalisation = assessed, write deferred · Hiến pháp source/ingestion = still external/undesigned · dry-run-at-volume = planned, not run. None cleared for execution; all carried forward.

5. Git / hardcode / boundaries

  • Git: branch main · HEAD e93424b5ff7fa5e4b8406131977ce4339cd0856a · git status --short -- iu-cutter = clean (0 lines) — no code change, no commit (none expected/authorized).
  • No fixed IP/DSN/password/container/vector-collection; no runtime label/key hardcoding; no new label columns by default; no schema change; SQL = SSOT.
  • Boundaries honoured: design only · no production writes · no CUT/VERIFY · no second IU · no bulk cut · no deploy/restart · no schema migration · no index DDL execution · no label registry creation · no vector/NoSQL integration · no alias writes · no code change · no commit · no self-advance.

6. Next

GPT review of the v0.5 pre-scale foundation package + rulings on the open decisions / cycle sequencing. Each downstream step (index-DDL execution, tier-normalisation write, dry-run-at-volume, any label registry DDL, any staged production) remains gated by a separate GPT review + (where applicable) sovereign authorization. Self-advance PROHIBITED.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-pre-scale-foundation-design/dot-iu-cutter-v0.5-pre-scale-foundation-design-report-2026-05-17.md