dot-iu-cutter v0.5 — Constitution Hardtest & Information-Unit Factory Master Plan (DESIGN ONLY) (2026-05-17)
dot-iu-cutter v0.5 — Constitution Hardtest & Information-Unit Factory Master Plan
Date: 2026-05-17
Phase: v0_5_constitution_hardtest_and_information_unit_factory_master_plan
Nature: DESIGN ONLY — nothing in this package is authorized to execute.
Author: Agent (grounded read-only on KB + production metadata + source URL HEAD/GET only)
0. Purpose
Design the end-to-end production information-unit factory so that a future user
request Cắt hiến pháp can run safely, automatically, with provenance,
checksum/versioning, checkpoint/resume, dry-run-at-volume, and staged rollout.
The Constitution document is the hardtest fixture, not the goal. The goal is
a general, config-driven information-unit pipeline that scales to
hundreds-of-thousands → millions of information units (IU).
This document is the spine. The other 7 documents in
knowledge/dev/laws/dieu44-trien-khai/v0.5-constitution-hardtest-design/
expand each layer.
1. CRITICAL GROUNDING FINDING — source is not the national Constitution
A read-only GET of the configured source URL
https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution returns:
source_actual_identity:
title: "Hiến pháp Kiến trúc Hệ thống Incomex v4.6.3 BAN HÀNH"
kb_identifier: KB-7294 rev 44
version: v4.6.3 (enacted)
last_update: 2026-04-18 (S178 Fix 15)
language: Vietnamese + English technical terms
format: HTML rendered by Directus portal (markdown source w/ tables, nested lists, code citations)
hierarchy_actual:
- "15 Nguyên tắc Nền tảng (Principles 1..15)"
- "KIẾN TRÚC HẠ TẦNG (sections A, B, C)"
- "Mục lục Luật (Điều 0..44; each Điều has Tên / File / Ghi chú)"
hierarchy_NOT_present:
- "Chương (national-constitution chapters)"
- "Khoản / Điểm canonical clause-point tree"
status_markers: "mix of ✅ ENACTED and 📋 CONTROLLED DRAFT (e.g. Điều 44)"
Consequence: the handoff/earlier reviews assumed a
Chương / Điều / Khoản / Điểm / đoạn grammar (the 2013 national Constitution
shape). The configured fixture does not have that shape. This is the single
most important reason the canonicalization layer must be grammar-detected and
config-driven, never hardcoded. It is escalated as open decision OD-G1
(see §9 and the canonicalization design doc) and as risk R-SRC-1 (risk doc).
This finding does not block the master plan; it validates the master plan's core principle (no hardcoded grammar/labels/paths) and changes which grammar profile the Constitution hardtest must ship.
2. What happens, end to end, when the user says Cắt hiến pháp
Target future behaviour (designed here, NOT enabled now):
flow_cat_hien_phap:
S0_intent_resolution:
- resolve the human phrase "Cắt hiến pháp" to a registered source_document_ref
via the source-document registry (NO hardcoded URL/path in runtime)
- reject if source_document_ref not registered/authorized
S1_ingestion:
- fetch source by registered URL (read-only)
- identify format + parser_profile
- compute content checksum
- normalize encoding, strip non-content noise
- emit immutable document_version_id (deterministic from checksum+url+retrieved_at)
- persist source span anchors
S2_canonicalization:
- detect grammar profile for this document_version_id
- extract hierarchy per detected profile (here: Nguyên tắc / Kiến trúc A·B·C / Điều)
- derive stable canonical_address per node
- derive stable IU id / entry id (deterministic)
- link every IU to its source span (provenance)
- route ambiguous segments to a review queue (no silent guessing)
S3_plan:
- produce per-IU MARK plan against decision_backlog_entry
- compute expected row delta (+15 per IU under current per-IU manifest model)
S4_dry_run_at_volume:
- run the full plan in an isolated restored-schema DB
- assert invariants, performance, checkpoint/resume, no duplicate cuts
S5_staged_production:
- human/GPT sovereign approval gate
- execute in small bounded batches with checkpoint/resume
- forward-compensation only; NO document-wide delete rollback
S6_projection:
- rebuild SQL→vector/NoSQL projections (rebuildable, non-authoritative)
S7_closeout:
- backup + restore verification + provenance audit + report
Each stage maps to one design document in this package (§4 map).
3. Current production reality (grounded, read-only)
schema:
governance_schema: cutter_governance (PostgreSQL, directus DB)
base_tables: 12
primary_keys: 12
in_schema_FKs: 19
observe_views: 12 (v_*_observe)
roles: { read: cutter_ro, writers: [cutter_exec, cutter_verify] }
code:
ssot: /opt/incomex/dot (VPS authoritative; local copies are non-git snapshots)
branch: main
HEAD: e93424b5ff7fa5e4b8406131977ce4339cd0856a # == accepted HEAD in handoff
git_status_iu_cutter: clean (no pending changes)
validated_runtime:
pg_backed_dry_run_RERUN4: PASS
happy_path: MARK -> SWEEP -> REVIEW -> CUT -> VERIFY
per_IU_row_delta_invariant: 15
first_production_trial:
status: CLOSED_PASS
target_IU: 04e0c674-2a71-53b7-8d30-9c1a78d6fd17
canonical_address: D38-DIEU28-S3-P1
tac_logical_unit_row: not mutated (append-only ledger only)
existing_corpus_in_system:
DIEU_28: 27 rows
DIEU_32: 23 rows # blank tier (data-quality issue)
DIEU_35: 36 rows # blank tier (data-quality issue)
hien_phap_in_system: false # must be ingested from external authoritative source
index_state:
pre_scale_index_DDL: AUTHORED + GPT-PASS, EXECUTION DEFERRED (7 hot paths, D-2 partial ruling)
The core ledger is correct for single-IU append-only CUT/VERIFY. It is not yet complete as a production information-unit factory. Missing layers: source registry, ingestion profiles, canonical grammar, label/metadata registry, scale indexes (authored, not executed), volume harness, source versioning/provenance.
4. Layer map → design documents
documents:
master_plan: this document (spine, sequencing, governance)
ingestion: dot-iu-cutter-v0.5-source-document-ingestion-pipeline-design-2026-05-17.md
canonicalization: dot-iu-cutter-v0.5-canonicalization-and-address-grammar-design-2026-05-17.md
label_metadata: dot-iu-cutter-v0.5-information-unit-label-metadata-registry-master-design-2026-05-17.md
scale_index_volume: dot-iu-cutter-v0.5-scale-index-and-volume-execution-roadmap-2026-05-17.md
projection: dot-iu-cutter-v0.5-sql-nosql-projection-and-rebuild-strategy-2026-05-17.md
risk_gate: dot-iu-cutter-v0.5-constitution-hardtest-risk-and-gate-plan-2026-05-17.md
report: dot-iu-cutter-v0.5-constitution-hardtest-design-report-2026-05-17.md
5. Architecture principles (binding)
principles:
P1_SQL_is_SSOT: cutter_governance + public.tac_logical_unit are sole authority
P2_append_only_ledger: CUT/VERIFY never mutate source IU rows; forward-compensation only
P3_no_hardcoding:
- no runtime hardcoded source URL/path (registry-resolved)
- no runtime hardcoded label values (label registry)
- no runtime hardcoded metadata keys (metadata-key registry)
- no runtime hardcoded grammar (grammar profile registry)
P4_deterministic_identity: canonical_address, IU id, entry id, document_version_id
all deterministic + reproducible from inputs
P5_provenance_total: every IU traces to (document_version_id, source_span)
P6_projection_only: vector/NoSQL is a rebuildable projection, never authority
P7_jsonb_not_hidden_authority: JSONB allowed for sparse evolving metadata,
but hot/queried keys must be promoted to indexed SQL/registry
P8_no_big_bang: full-document/Hiến-phi cut is always dry-run-at-volume first,
then staged small-batch with checkpoint/resume
P9_idempotent_resumable: re-running a batch must produce delta-0 (no duplicate cuts)
P10_sovereign_gates: every state-changing transition needs explicit GPT/User approval
6. Master sequencing (supersedes nothing; refines foundation review sequence)
The foundation review accepted a 7-step sequence. This master plan keeps it and inserts the source-identity correction and explicit gates:
sequence:
Q0_master_plan_review: GPT review of THIS package (design only) # current
Q1_index_ddl_dry_run_then_command_review: execute 7 indexes in isolated env first
Q2_index_production_execution_if_PASS: separate sovereign cycle
Q3_dry_run_at_volume: existing 3-doc corpus and/or synthetic doc, restored schema
Q4_tier_normalization_if_needed: DIEU_32 / DIEU_35 blank-tier read-review-write cycle
Q5_label_metadata_registry_design_cycle: schema design (still no creation)
Q6_source_registry_and_ingestion_design_cycle: source authority + profiles
Q7_canonical_grammar_profile_for_incomex_constitution: detect+validate (no cut)
Q8_hien_phap_dry_run_at_volume: full Constitution in isolated env
Q9_hien_phap_staged_production_small_batch: bounded, checkpoint/resume, sovereign
Note: Cắt hiến pháp is enabled only after Q9, and only per-batch.
7. Hardtest acceptance criteria (what "the factory works" means)
acceptance:
A1: user phrase resolves to registered source_document_ref with checksum+version
A2: re-ingest of identical source yields identical document_version_id (determinism)
A3: canonical_address + IU id stable across re-runs and across re-ingest of same version
A4: every IU has non-null source_span provenance to document_version_id
A5: dry-run-at-volume reproduces row-delta invariant (+15/IU or revised) with delta-0 on rerun
A6: checkpoint/resume: interrupting mid-batch and resuming yields no duplicate cuts
A7: no hardcoded URL/label/metadata-key/grammar found in runtime code path (design-asserted)
A8: SQL SSOT unchanged by projection rebuild; projection fully reconstructable from SQL
A9: staged rollout bounded; no document-wide delete; forward-compensation auditable
A10: existing DIEU_28/32/35 corpus coexists without identity collision
8. Merge / coexistence with existing corpus
coexistence:
store: SAME SQL SSOT (public.tac_logical_unit + cutter_governance) — no new store
isolation: Constitution IUs get a distinct source_document_ref + document_version_id
address_namespace: canonical_address MUST be globally unique across documents;
proposal = document-scoped prefix derived from source_document_ref
(decision OD-A1 in canonicalization doc)
collision_guard: deterministic IU id derived from (document_version_id, canonical_address)
so DIEU_28 vs Constitution cannot collide
tier_quality: DIEU_32/DIEU_35 blank-tier normalization is a SEPARATE cycle (Q4),
not bundled into Constitution cut
9. Open decisions for GPT / User (consolidated; details in sub-docs)
open_decisions:
OD-G1: source is internal Incomex Architecture Constitution, NOT national 2013
Constitution — confirm grammar profile target (Nguyên tắc/Kiến trúc/Điều)
and that Chương/Khoản/Điểm tree is NOT expected for this fixture
OD-G2: leaf-IU definition for this document (Điều-level? sub-bullet-level?
status-marker-aware?) — affects volume estimate and +15 invariant scaling
OD-A1: canonical_address namespacing across documents (document-scoped prefix?)
OD-S1: source authority — is KB-7294 rev44 authoritative enough to cut, or does
it need an enacted-only snapshot gate (exclude 📋 CONTROLLED DRAFT Điều 44)?
OD-V1: dry-run-at-volume fixture: existing 3-doc corpus vs synthetic vs Constitution
OD-M1: manifest strategy — keep per-IU envelope (+15 invariant) vs document-level
OD-L1..L5: label/metadata registry shape, cardinality, mutability, hot-key promotion
OD-P1: projection store choice + rebuild trigger model (deferred, projection doc)
OD-R1: forward-compensation record shape for multi-IU document corrections
OD-I1: index execution route (Route A continue) vs hold until master plan ratified
GPT/User must rule on OD-G1, OD-S1, OD-M1 before any Constitution dry-run.
10. Do not run yet (binding for this and all sibling docs)
The following are forbidden until separately and explicitly authorized:
forbidden:
- Cắt hiến pháp execution (any batch)
- full_document_CUT_VERIFY
- second_production_IU
- bulk_cut
- production_reclassification_batch
- production_write of any kind
- schema_migration
- index_DDL_execution (the 7 authored indexes remain unexecuted)
- label_registry_schema_creation (no real label/metadata-key registry tables)
- tier_normalization_write (DIEU_32/35)
- vector_NoSQL_integration / any vector or NoSQL write
- alias_writes (canonical_address_alias)
- deploy_or_restart of any service
- code_change without an explicitly opened code phase
- dry_run_at_volume execution (design only here)
Read-only allowed and used: KB read, production metadata read-only, single
read-only git status on VPS SSOT, single read-only GET of the source URL HEAD/body
for grammar grounding. No secrets were read.
11. Git status (no code change expected, no commit made)
git:
repo_ssot: /opt/incomex/dot (VPS; local ~/.iu-cutter-stage and ~/iu-cutter-build are non-git snapshots)
branch: main
HEAD: e93424b5ff7fa5e4b8406131977ce4339cd0856a
status_short_iu_cutter: (empty — clean)
code_changed: false
commit_made: false
12. Routing
Next action = GPT review of this design package. Agent self-advance to any execution, index DDL, dry-run, registry creation, or Constitution cut is PROHIBITED. Recommended first downstream step after PASS: Q1 (index DDL dry-run then command review), unless GPT re-prioritizes Q6/Q7 source+grammar work given the OD-G1 finding.