KB-5DFB

dot-iu-cutter v0.5 — Snapshot-Bound Seed Strategy + Verification Plan (content_checksum := snapshot; provenance; live-drift must not invalidate)

8 min read Revision 1
dot-iu-cutterv0.5constitution-fixturesource-snapshot-captureseed-strategyverification-planprovenancedesign-onlyno-executiondieu442026-05-18

dot-iu-cutter v0.5 — Snapshot-Bound Seed Strategy + Verification Plan

Phase: v0_5_constitution_source_snapshot_capture_authoring · Date: 2026-05-18 · doc 4 of 5

nature: STRATEGY + PLAN ONLY — no seed, no DML, no checksum persisted/updated
bindings: Q2 keep document_version_id rule ; Q3 keep parser_profile ;
  Q4 minimal version_status now ; Q5 supersede via provenance jsonb
decision_authority: GPT / User ONLY ; self_advance: PROHIBITED

1. source_document (discovery / current)

unchanged (GPT: live URL allowed as discovery/current):
  source_document_ref='incomex-constitution' ; address_docprefix='ICX-CONST'
  source_url='https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution'
  source_family='internal_incomex_constitution' ; authority_class='authoritative'
  lifecycle='active' ; registered_by='constitution-source-seed'
atomicity unchanged: source_document + source_document_version in ONE
  BEGIN/COMMIT, child-after-parent, INSERT-only, no ON CONFLICT.

2. source_document_version — identity binds to the snapshot artifact

content_checksum := the snapshot artifact's normalized_content_checksum
  (== sha256 of the delimited content region ; == filename checksum-prefix
   source ; NEVER a live re-fetch). uq_sdvr_doc_checksum makes a new checksum a
   NEW row; in-place update is structurally impossible AND policy-forbidden.
source_format = 'html/nuxt' ; source_version_label/authoritative_version =
  'v4.6.3 BAN HÀNH' (semantic label, NOT identity)
version_status (Q4 MINIMAL NOW): use a live-schema-permitted value; if
  unconstrained, 'snapshot_captured' (preferred) else 'fetched'. The richer
  lifecycle (fetched->ratified->active ; old->superseded) is a FUTURE layer,
  reconfirmed against live schema at the gated capture/seed phase — NOT asserted.
provenance jsonb (no schema change; R-PP1/Q5 pattern) MUST include:
  snapshot_artifact_path:       knowledge/dev/laws/dieu44-trien-khai/snapshots/
                                constitution/constitution-normalized-<prefix>.md
  snapshot_artifact_checksum:   <== content_checksum>
  snapshot_capture_method:      'captured_artifact_nuxt_v1'
  parser_profile_ref:           'nuxt-incomex-portal-constitution-v1'   (Q3)
  normalized_content_length:    <int>
  marker_counts:                { enacted:19, controlled_draft:1, draft:1, obsolete:1 }
                                (capture-time truth, re-counted from artifact)
  captured_from_live_url:       <source_url>
  captured_at:                  <UTC of gated capture>
  changelog_included:           true
  raw_checksum:                 <forensic-only | null ; MUST NOT equal content_checksum>
  supersedes_document_version_id: null
    RATIONALE: Codex blocked PRE-DML (P3 rows=0); the prior f9d22d05… was NEVER
    persisted -> there is NO prior version row to supersede on this first
    successful seed. The key is present (= null) so future re-captures can set
    it to the prior document_version_id per version-policy CLS_2/CLS_4.

3. document_version_id — UNCHANGED (Q2)

rule (verbatim, Q2 binding):
  document_version_id = 'icxconst-' || left(encode(sha256(
     (content_checksum || '|' || source_document_ref)::bytea),'hex'),32)
properties: deterministic, timestamp-independent, recomputable from stored
  content_checksum + source_document_ref. content_checksum now = the snapshot
  checksum, so the id auto-tracks the pinned snapshot.
NOT computed/persisted in this phase: the concrete id is derived IN-TX at the
  gated seed execution from the verified snapshot checksum (no precompute here —
  avoids baking a value & respects no-checksum-persistence).

4. Verification plan

4.1 Snapshot artifact verification (pre-seed)

SNAP-1  exactly ONE artifact at the checksum-addressed path                 =1
SNAP-2  filename checksum-prefix == first16(metadata.normalized_content_checksum)  exact
SNAP-3  re-extract delimited content region; sha256 == metadata checksum
        == (at seed) registered content_checksum                            exact
SNAP-4  recount length == metadata.normalized_content_length                exact
SNAP-5  recount ✅/📋/📝/⛔ == metadata.marker_counts (codepoint-exact)       exact
SNAP-6  no second artifact sharing the prefix with a different full checksum  none
SNAP-7  raw_checksum (if present) != content_checksum (forensic not identity) MUST

4.2 Seed verification (post-seed, future E2)

POST-1  source_document_registry rows=1 with the doc-1/§1 exact fields        exact
POST-2  source_document_version_registry rows=1                               =1
POST-3  version.content_checksum == provenance.snapshot_artifact_checksum
        == artifact metadata checksum                                         exact
POST-4  document_version_id == 'icxconst-'||left(sha256(content_checksum||'|'
        ||source_document_ref),32)  (recompute == stored)                     reproduce
POST-5  provenance has snapshot_artifact_path/_checksum/_capture_method,
        parser_profile_ref='nuxt-incomex-portal-constitution-v1',
        normalized_content_length, marker_counts, captured_from_live_url,
        changelog_included=true, supersedes_document_version_id (=null first)  exact
POST-6  FK version->source_document intact; UNIQUE(ref,checksum)=1            =1/0
POST-7  enacted_only scope + 4 status markers + 📋 Điều-44 deferral UNCHANGED  unchanged
POST-8  system_identifier pre==post                                          match

4.3 Negative checks (any TRUE ⇒ FAIL ⇒ rollback per doc-3 §3)

NEG-1  content_checksum == old f9d22d05… OR == a raw fetch hash               FALSE
NEG-2  in-place UPDATE of an existing version row's content_checksum          FALSE
NEG-3  >1 source_document or >1 version row                                   FALSE
NEG-4  artifact overwritten / deleted / mutated                               FALSE
NEG-5  seed bound to live re-fetch instead of artifact                        FALSE
NEG-6  schema / GRANT / index / Directus mutation                             FALSE

4.4 Live-drift independence (critical)

LD-1  AFTER seed, re-fetch live + recompute: live MAY differ from the pinned
      snapshot. This is EXPECTED and MUST NOT invalidate or roll back the
      seeded version. Live drift only raises a NEW source_document_version
      CANDIDATE per version-policy (CLS_1/2/4) — never a retro-invalidation
      of the pinned seed, never an in-place checksum update.

5. Statement

  • Seed identity bound to the snapshot artifact checksum; provenance carries full snapshot identity + supersedes(=null, justified); document_version_id rule kept verbatim (QG4); rehash gate required pre-seed (QG3); checksum-addressed/no-overwrite respected (QG2); live drift cannot invalidate a pinned seed. Nothing seeded/executed/mutated (QG5).
  • doc 4 of 5; STOP after 5 files → route GPT/User. Self-advance PROHIBITED.

Companions: operational-framing, artifact-spec, capture-procedure-draft, capture-authoring-report.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-constitution-source-snapshot-capture-authoring/dot-iu-cutter-v0.5-constitution-source-snapshot-seed-strategy-and-verification-plan-2026-05-18.md