KB-6FA1

dot-iu-cutter v0.5 — Constitution source_document/version Identity Plan (deterministic, schema-accurate, NOT executed)

8 min read Revision 1
dot-iu-cutterv0.5constitution-fixturesource-document-version-plandeterministicchecksumversioningdesign-onlynot-executeddieu442026-05-18

dot-iu-cutter v0.5 — Constitution source_document / source_version Identity Plan

Phase: v0_5_constitution_fixture_source_grammar_ratification · Nature: design_only__no_INSERT__no_execution · Date: 2026-05-18

insert_executed: 0 ; dml: none ; this is a DETERMINISTIC PLAN, not a draft to run
targets_LIVE_schema (read-only confirmed 2026-05-18), not the 2026-05-17 design proposal
decision_authority: GPT / User ONLY ; self_advance: PROHIBITED

1. LIVE target schema (read-only confirmed — plan is schema-accurate)

cutter_governance.source_document_registry:
  cols: source_document_ref(PK,NN), address_docprefix(NN,UNIQUE),
        source_url(null), source_family(NN, FK->source_family_registry),
        authority_class(NN), display_vi(null), display_en(null),
        lifecycle(NN), registered_by(NN), registered_at(NN)
  rows_now: 0
cutter_governance.source_document_version_registry:
  cols: document_version_id(PK,NN), source_document_ref(NN, FK->source_document_registry),
        content_checksum(NN), retrieval_timestamp(NN), source_format(null),
        authoritative_version(null), version_status(null), provenance(jsonb,null),
        registered_by(NN), registered_at(NN)
  uq: (source_document_ref, content_checksum)
  rows_now: 0
DESIGN-VS-LIVE DELTA (MISMATCH-5, flagged): the 2026-05-17 ingestion design
  proposed columns NOT present live — human_aliases, expected_format,
  parser_profile_ref, grammar_profile_ref (on source_document), and
  raw_checksum / supersedes_version_id (on version). The LIVE schema is leaner;
  this plan adapts to LIVE and routes the missing fields into provenance jsonb.
  grammar binding is via source_family -> grammar_profile (NOT a column here).

2. Proposed source_document identity (NOT inserted)

source_document_ref: "incomex-constitution"       # stable logical key (canon/ingestion design)
address_docprefix:   "ICX-CONST"                  # BR-A1 docprefix; UNIQUE-enforced live
source_url:          "https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution"
source_family:       "internal_incomex_constitution"   # FK -> live; valid
authority_class:     "authoritative"              # doc = v4.6.3 BAN HÀNH (enacted/official)
display_vi:          "Hiến pháp Kiến trúc Hệ thống Incomex"
display_en:          "Internal Incomex Architecture Constitution"
lifecycle:           "active"
grammar_binding (DERIVED, not a column): source_family internal_incomex_constitution
  -> grammar_profile_ref incomex-architecture-constitution-v4 (live)
authority_semantics_default (DERIVED from source_family): normative_authority

3. docprefix recommendation (BR-A1)

scheme (locked, address_template at.icx.const.v4): "<DOCPREFIX>/<L1>-<L2>-...-<Lk>"
docprefix: "ICX-CONST"
rationale:
  - matches canonicalization design examples (ICX-CONST/NT-12, ICX-CONST/KT-A,
    ICX-CONST/DIEU-44)
  - guarantees no collision with legacy D38-DIEU28/32/35 addresses (distinct prefix)
  - address_docprefix is UNIQUE in source_document_registry (1 doc -> 1 prefix)
  - docprefix flows into iu_id => Constitution IU cannot collide with DIEU_28 IU
  - encodes_status=false: status markers NEVER appear in the address (metadata only)

4. Checksum / versioning plan (deterministic — NOT executed; QG4)

raw_checksum:
  rule: sha256(raw fetched bytes, pre-normalization)
  observed_this_session (grounding ONLY, not identity, not persisted):
    1186671 bytes ; sha256 d19679599e0794e8051b872009c40ba766f46ed44702797b40d3b4120e041b26
  storage: NO raw_checksum column live -> record under
    source_document_version_registry.provenance->>'raw_checksum' (forensic/audit)

content_checksum (the identity basis):
  rule: sha256( normalize( strip_chrome( raw ) ) )
  normalize: UTF-8 NFC ; strip BOM ; CRLF->LF ; collapse internal whitespace ;
    PRESERVE Vietnamese diacritics AND status tokens (✅ 📋 📝 ⛔)
  strip_chrome: remove Nuxt SPA chrome/script/style/nav (MISMATCH-1: platform is
    Nuxt, not Directus — noise-strip ruleset belongs to a parser_profile, which
    is a SEPARATE ingestion-design item; exact ruleset = OD-SR2, NOT decided here)
  storage: source_document_version_registry.content_checksum (NOT NULL)
  property: stable across cosmetic Nuxt re-render (raw churns, content stable)

document_version_id (deterministic PK):
  rule: document_version_id = "icxconst-" || left( sha256_hex(
          content_checksum || '|' || source_document_ref ), 32 )
  -> pure function of (content_checksum, source_document_ref) [ingestion §5 / canon §5]
  idempotency: live UNIQUE(source_document_ref, content_checksum) guarantees the
    same content+ref cannot create a 2nd version row; same inputs => same id
  change-handling: different content => different content_checksum => NEW
    document_version_id; supersession recorded in
    provenance->>'supersedes_version_id' (no supersedes column live — MISMATCH-5)

version_status / authoritative_version:
  version_status (proposed): "ready" only AFTER ratification + grammar amendment;
    until then a (future) version row would be "fetched"/"normalized" (NOT cut)
  authoritative_version: observed label "v4.6.3 BAN HÀNH" recorded as a LABEL in
    this column / provenance — NEVER used as identity (identity = content_checksum)

living_document_note (MISMATCH-3): CHANGELOG now extends to 2026-05-18 while
  label stays v4.6.3 — confirms identity MUST be content_checksum, and each
  re-ingest of the moved content yields a new document_version_id + supersedes chain.

5. source_span / provenance plan (design-only)

source_span: NOT physical in WS-Q5 substrate (ingestion-design proposal only).
  plan: anchors emitted over NORMALIZED content (char_start/end + byte_start/end)
        BEFORE canonicalization; span_checksum = sha256(exact substring).
  status: source_span physicalization is a DEFERRED downstream gate (Q5/Q6) —
          NOT created or written here.
iu_provenance (master-plan A4/P5, for the FUTURE cut, design-only):
  every IU must carry { document_version_id, source_span.span_id, node_path,
  span_checksum_at_cut }; an IU with no resolvable span is INVALID at REVIEW.
version_registry.provenance jsonb (proposed contents, NOT written):
  { raw_checksum, raw_bytes, http_status, content_type, server, retrieved_at,
    source_format:"html/nuxt", normalize_profile_ref, supersedes_version_id,
    observed_title, observed_version_label } — no secrets.

6. Explicit no-INSERT statement

No row authored for source_document_registry or source_document_version_registry. No SQL drafted to run. This file is a deterministic identity rule specification to be turned into a seed-authoring package only in a separate, later, sovereign-gated phase after the grammar status-marker amendment and OD-S1/OD-G2 rulings. No INSERT, no DML, no schema change executed.

7. Statements

  • Identity plan is deterministic and reproducible from inputs (QG4) and accurate to the LIVE schema (MISMATCH-5 adaptations flagged). Nothing executed (QG6).
  • docprefix ICX-CONST under locked BR-A1 scheme; source_family/grammar/authority mappings traced to live rows; checksum/version rule = f(content_checksum, source_document_ref); raw observation recorded as grounding only.
  • Self-advance PROHIBITED — doc 3 of 5; STOP after package → route GPT/User.

Companion: grounding-report, grammar-applicability-review, status-marker-and-scope-ruling-request, ratification-readiness-report.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-constitution-fixture-source-grammar-ratification/dot-iu-cutter-v0.5-constitution-source-document-version-plan-2026-05-18.md