KB-5DFB
dot-iu-cutter v0.5 — Snapshot-Bound Seed Strategy + Verification Plan (content_checksum := snapshot; provenance; live-drift must not invalidate)
8 min read Revision 1
dot-iu-cutterv0.5constitution-fixturesource-snapshot-captureseed-strategyverification-planprovenancedesign-onlyno-executiondieu442026-05-18
dot-iu-cutter v0.5 — Snapshot-Bound Seed Strategy + Verification Plan
Phase:
v0_5_constitution_source_snapshot_capture_authoring· Date: 2026-05-18 · doc 4 of 5nature: STRATEGY + PLAN ONLY — no seed, no DML, no checksum persisted/updated bindings: Q2 keep document_version_id rule ; Q3 keep parser_profile ; Q4 minimal version_status now ; Q5 supersede via provenance jsonb decision_authority: GPT / User ONLY ; self_advance: PROHIBITED
1. source_document (discovery / current)
unchanged (GPT: live URL allowed as discovery/current):
source_document_ref='incomex-constitution' ; address_docprefix='ICX-CONST'
source_url='https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution'
source_family='internal_incomex_constitution' ; authority_class='authoritative'
lifecycle='active' ; registered_by='constitution-source-seed'
atomicity unchanged: source_document + source_document_version in ONE
BEGIN/COMMIT, child-after-parent, INSERT-only, no ON CONFLICT.
2. source_document_version — identity binds to the snapshot artifact
content_checksum := the snapshot artifact's normalized_content_checksum
(== sha256 of the delimited content region ; == filename checksum-prefix
source ; NEVER a live re-fetch). uq_sdvr_doc_checksum makes a new checksum a
NEW row; in-place update is structurally impossible AND policy-forbidden.
source_format = 'html/nuxt' ; source_version_label/authoritative_version =
'v4.6.3 BAN HÀNH' (semantic label, NOT identity)
version_status (Q4 MINIMAL NOW): use a live-schema-permitted value; if
unconstrained, 'snapshot_captured' (preferred) else 'fetched'. The richer
lifecycle (fetched->ratified->active ; old->superseded) is a FUTURE layer,
reconfirmed against live schema at the gated capture/seed phase — NOT asserted.
provenance jsonb (no schema change; R-PP1/Q5 pattern) MUST include:
snapshot_artifact_path: knowledge/dev/laws/dieu44-trien-khai/snapshots/
constitution/constitution-normalized-<prefix>.md
snapshot_artifact_checksum: <== content_checksum>
snapshot_capture_method: 'captured_artifact_nuxt_v1'
parser_profile_ref: 'nuxt-incomex-portal-constitution-v1' (Q3)
normalized_content_length: <int>
marker_counts: { enacted:19, controlled_draft:1, draft:1, obsolete:1 }
(capture-time truth, re-counted from artifact)
captured_from_live_url: <source_url>
captured_at: <UTC of gated capture>
changelog_included: true
raw_checksum: <forensic-only | null ; MUST NOT equal content_checksum>
supersedes_document_version_id: null
RATIONALE: Codex blocked PRE-DML (P3 rows=0); the prior f9d22d05… was NEVER
persisted -> there is NO prior version row to supersede on this first
successful seed. The key is present (= null) so future re-captures can set
it to the prior document_version_id per version-policy CLS_2/CLS_4.
3. document_version_id — UNCHANGED (Q2)
rule (verbatim, Q2 binding):
document_version_id = 'icxconst-' || left(encode(sha256(
(content_checksum || '|' || source_document_ref)::bytea),'hex'),32)
properties: deterministic, timestamp-independent, recomputable from stored
content_checksum + source_document_ref. content_checksum now = the snapshot
checksum, so the id auto-tracks the pinned snapshot.
NOT computed/persisted in this phase: the concrete id is derived IN-TX at the
gated seed execution from the verified snapshot checksum (no precompute here —
avoids baking a value & respects no-checksum-persistence).
4. Verification plan
4.1 Snapshot artifact verification (pre-seed)
SNAP-1 exactly ONE artifact at the checksum-addressed path =1
SNAP-2 filename checksum-prefix == first16(metadata.normalized_content_checksum) exact
SNAP-3 re-extract delimited content region; sha256 == metadata checksum
== (at seed) registered content_checksum exact
SNAP-4 recount length == metadata.normalized_content_length exact
SNAP-5 recount ✅/📋/📝/⛔ == metadata.marker_counts (codepoint-exact) exact
SNAP-6 no second artifact sharing the prefix with a different full checksum none
SNAP-7 raw_checksum (if present) != content_checksum (forensic not identity) MUST
4.2 Seed verification (post-seed, future E2)
POST-1 source_document_registry rows=1 with the doc-1/§1 exact fields exact
POST-2 source_document_version_registry rows=1 =1
POST-3 version.content_checksum == provenance.snapshot_artifact_checksum
== artifact metadata checksum exact
POST-4 document_version_id == 'icxconst-'||left(sha256(content_checksum||'|'
||source_document_ref),32) (recompute == stored) reproduce
POST-5 provenance has snapshot_artifact_path/_checksum/_capture_method,
parser_profile_ref='nuxt-incomex-portal-constitution-v1',
normalized_content_length, marker_counts, captured_from_live_url,
changelog_included=true, supersedes_document_version_id (=null first) exact
POST-6 FK version->source_document intact; UNIQUE(ref,checksum)=1 =1/0
POST-7 enacted_only scope + 4 status markers + 📋 Điều-44 deferral UNCHANGED unchanged
POST-8 system_identifier pre==post match
4.3 Negative checks (any TRUE ⇒ FAIL ⇒ rollback per doc-3 §3)
NEG-1 content_checksum == old f9d22d05… OR == a raw fetch hash FALSE
NEG-2 in-place UPDATE of an existing version row's content_checksum FALSE
NEG-3 >1 source_document or >1 version row FALSE
NEG-4 artifact overwritten / deleted / mutated FALSE
NEG-5 seed bound to live re-fetch instead of artifact FALSE
NEG-6 schema / GRANT / index / Directus mutation FALSE
4.4 Live-drift independence (critical)
LD-1 AFTER seed, re-fetch live + recompute: live MAY differ from the pinned
snapshot. This is EXPECTED and MUST NOT invalidate or roll back the
seeded version. Live drift only raises a NEW source_document_version
CANDIDATE per version-policy (CLS_1/2/4) — never a retro-invalidation
of the pinned seed, never an in-place checksum update.
5. Statement
- Seed identity bound to the snapshot artifact checksum; provenance carries full snapshot identity + supersedes(=null, justified); document_version_id rule kept verbatim (QG4); rehash gate required pre-seed (QG3); checksum-addressed/no-overwrite respected (QG2); live drift cannot invalidate a pinned seed. Nothing seeded/executed/mutated (QG5).
- doc 4 of 5; STOP after 5 files → route GPT/User. Self-advance PROHIBITED.
Companions: operational-framing, artifact-spec, capture-procedure-draft, capture-authoring-report.