KB-6FA1
dot-iu-cutter v0.5 — Constitution source_document/version Identity Plan (deterministic, schema-accurate, NOT executed)
8 min read Revision 1
dot-iu-cutterv0.5constitution-fixturesource-document-version-plandeterministicchecksumversioningdesign-onlynot-executeddieu442026-05-18
dot-iu-cutter v0.5 — Constitution source_document / source_version Identity Plan
Phase:
v0_5_constitution_fixture_source_grammar_ratification· Nature:design_only__no_INSERT__no_execution· Date: 2026-05-18insert_executed: 0 ; dml: none ; this is a DETERMINISTIC PLAN, not a draft to run targets_LIVE_schema (read-only confirmed 2026-05-18), not the 2026-05-17 design proposal decision_authority: GPT / User ONLY ; self_advance: PROHIBITED
1. LIVE target schema (read-only confirmed — plan is schema-accurate)
cutter_governance.source_document_registry:
cols: source_document_ref(PK,NN), address_docprefix(NN,UNIQUE),
source_url(null), source_family(NN, FK->source_family_registry),
authority_class(NN), display_vi(null), display_en(null),
lifecycle(NN), registered_by(NN), registered_at(NN)
rows_now: 0
cutter_governance.source_document_version_registry:
cols: document_version_id(PK,NN), source_document_ref(NN, FK->source_document_registry),
content_checksum(NN), retrieval_timestamp(NN), source_format(null),
authoritative_version(null), version_status(null), provenance(jsonb,null),
registered_by(NN), registered_at(NN)
uq: (source_document_ref, content_checksum)
rows_now: 0
DESIGN-VS-LIVE DELTA (MISMATCH-5, flagged): the 2026-05-17 ingestion design
proposed columns NOT present live — human_aliases, expected_format,
parser_profile_ref, grammar_profile_ref (on source_document), and
raw_checksum / supersedes_version_id (on version). The LIVE schema is leaner;
this plan adapts to LIVE and routes the missing fields into provenance jsonb.
grammar binding is via source_family -> grammar_profile (NOT a column here).
2. Proposed source_document identity (NOT inserted)
source_document_ref: "incomex-constitution" # stable logical key (canon/ingestion design)
address_docprefix: "ICX-CONST" # BR-A1 docprefix; UNIQUE-enforced live
source_url: "https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution"
source_family: "internal_incomex_constitution" # FK -> live; valid
authority_class: "authoritative" # doc = v4.6.3 BAN HÀNH (enacted/official)
display_vi: "Hiến pháp Kiến trúc Hệ thống Incomex"
display_en: "Internal Incomex Architecture Constitution"
lifecycle: "active"
grammar_binding (DERIVED, not a column): source_family internal_incomex_constitution
-> grammar_profile_ref incomex-architecture-constitution-v4 (live)
authority_semantics_default (DERIVED from source_family): normative_authority
3. docprefix recommendation (BR-A1)
scheme (locked, address_template at.icx.const.v4): "<DOCPREFIX>/<L1>-<L2>-...-<Lk>"
docprefix: "ICX-CONST"
rationale:
- matches canonicalization design examples (ICX-CONST/NT-12, ICX-CONST/KT-A,
ICX-CONST/DIEU-44)
- guarantees no collision with legacy D38-DIEU28/32/35 addresses (distinct prefix)
- address_docprefix is UNIQUE in source_document_registry (1 doc -> 1 prefix)
- docprefix flows into iu_id => Constitution IU cannot collide with DIEU_28 IU
- encodes_status=false: status markers NEVER appear in the address (metadata only)
4. Checksum / versioning plan (deterministic — NOT executed; QG4)
raw_checksum:
rule: sha256(raw fetched bytes, pre-normalization)
observed_this_session (grounding ONLY, not identity, not persisted):
1186671 bytes ; sha256 d19679599e0794e8051b872009c40ba766f46ed44702797b40d3b4120e041b26
storage: NO raw_checksum column live -> record under
source_document_version_registry.provenance->>'raw_checksum' (forensic/audit)
content_checksum (the identity basis):
rule: sha256( normalize( strip_chrome( raw ) ) )
normalize: UTF-8 NFC ; strip BOM ; CRLF->LF ; collapse internal whitespace ;
PRESERVE Vietnamese diacritics AND status tokens (✅ 📋 📝 ⛔)
strip_chrome: remove Nuxt SPA chrome/script/style/nav (MISMATCH-1: platform is
Nuxt, not Directus — noise-strip ruleset belongs to a parser_profile, which
is a SEPARATE ingestion-design item; exact ruleset = OD-SR2, NOT decided here)
storage: source_document_version_registry.content_checksum (NOT NULL)
property: stable across cosmetic Nuxt re-render (raw churns, content stable)
document_version_id (deterministic PK):
rule: document_version_id = "icxconst-" || left( sha256_hex(
content_checksum || '|' || source_document_ref ), 32 )
-> pure function of (content_checksum, source_document_ref) [ingestion §5 / canon §5]
idempotency: live UNIQUE(source_document_ref, content_checksum) guarantees the
same content+ref cannot create a 2nd version row; same inputs => same id
change-handling: different content => different content_checksum => NEW
document_version_id; supersession recorded in
provenance->>'supersedes_version_id' (no supersedes column live — MISMATCH-5)
version_status / authoritative_version:
version_status (proposed): "ready" only AFTER ratification + grammar amendment;
until then a (future) version row would be "fetched"/"normalized" (NOT cut)
authoritative_version: observed label "v4.6.3 BAN HÀNH" recorded as a LABEL in
this column / provenance — NEVER used as identity (identity = content_checksum)
living_document_note (MISMATCH-3): CHANGELOG now extends to 2026-05-18 while
label stays v4.6.3 — confirms identity MUST be content_checksum, and each
re-ingest of the moved content yields a new document_version_id + supersedes chain.
5. source_span / provenance plan (design-only)
source_span: NOT physical in WS-Q5 substrate (ingestion-design proposal only).
plan: anchors emitted over NORMALIZED content (char_start/end + byte_start/end)
BEFORE canonicalization; span_checksum = sha256(exact substring).
status: source_span physicalization is a DEFERRED downstream gate (Q5/Q6) —
NOT created or written here.
iu_provenance (master-plan A4/P5, for the FUTURE cut, design-only):
every IU must carry { document_version_id, source_span.span_id, node_path,
span_checksum_at_cut }; an IU with no resolvable span is INVALID at REVIEW.
version_registry.provenance jsonb (proposed contents, NOT written):
{ raw_checksum, raw_bytes, http_status, content_type, server, retrieved_at,
source_format:"html/nuxt", normalize_profile_ref, supersedes_version_id,
observed_title, observed_version_label } — no secrets.
6. Explicit no-INSERT statement
No row authored for
source_document_registryorsource_document_version_registry. No SQL drafted to run. This file is a deterministic identity rule specification to be turned into a seed-authoring package only in a separate, later, sovereign-gated phase after the grammar status-marker amendment and OD-S1/OD-G2 rulings. No INSERT, no DML, no schema change executed.
7. Statements
- Identity plan is deterministic and reproducible from inputs (QG4) and accurate to the LIVE schema (MISMATCH-5 adaptations flagged). Nothing executed (QG6).
- docprefix
ICX-CONSTunder locked BR-A1 scheme; source_family/grammar/authority mappings traced to live rows; checksum/version rule = f(content_checksum, source_document_ref); raw observation recorded as grounding only. - Self-advance PROHIBITED — doc 3 of 5; STOP after package → route GPT/User.
Companion: grounding-report, grammar-applicability-review, status-marker-and-scope-ruling-request, ratification-readiness-report.