dot-iu-cutter v0.5 — Constitution Nuxt Parser Reference Implementation: Operations-First Framing (why byte-exact, not prose; PASS/BLOCKED definition)
dot-iu-cutter v0.5 — Constitution Nuxt Parser Reference Implementation: Operations-First Framing
Phase:
v0_5_constitution_nuxt_parser_reference_implementation_authoring· Nature:code/script authoring + test design — no capture, no seed, no DML, no dry-run· Date: 2026-05-18 · doc 1 of 5snapshot_written: false ; source_document_insert: false ; source_document_version_insert: false dml: none ; production_db_mutation: none ; checksum_persisted_into_registry: none dry_run: none ; cut: none ; verify: none ; schema_change: none ; grant_revoke: none directus_mutation: none ; deploy_restart: none ; git_commit: none mutation: none (KB read-only grounding + 5 authoring uploads + local /tmp scratch deleted) decision_authority: GPT / User ONLY ; self_advance: PROHIBITED
Opened per GPT ruling E1_BLOCKED_BY_PARSER_IMPLEMENTATION_DIVERGENCE__REFERENCE_IMPLEMENTATION_NEXT, selected path R2_RATIFY_REFERENCE_IMPLEMENTATION_FIRST (reviews/dot-iu-cutter-v0.5-constitution-source-snapshot-capture-e1-blocked-gpt-ruling-2026-05-18.md).
1. The operational goal this serves
north_star: a user issues one small command — "Cắt Hiến pháp" — and the system
resolves the source, pins a deterministic version identity, cuts enacted_only,
verifies, and returns a concise PASS/BLOCKED report; humans handle exceptions only.
(handoffs/dot-iu-cutter-current-operating-objectives-and-principles-2026-05-18)
The Constitution source is a living Nuxt/Directus-rendered HTML page. "Cắt Hiến pháp" must bind to a reproducible source_document_version identity (content_checksum). The whole snapshot/version-policy chain (Option B captured-snapshot, write-once checksum-addressed artifact, rehash gates) exists to make that identity stable and race-free.
2. Why prose-only parser rules are insufficient (root operational defect)
defect_exposed_by_E1:
- parser_profile nuxt-incomex-portal-constitution-v1 existed ONLY as a prose
spec (N1..N9 + candidate_B span).
- Two faithful readers of that prose produced DIFFERENT byte-level normalized
content for the SAME source content:
Codex canonical executor : 17660443…/len 17522 (ratified canonical)
Claude prose reconstruction: 072983ac…/len 17657 (Δ +135, deterministic 3/3)
- markers identical 19/1/1/1 -> NOT a normative document change; purely an
executor-interpretation gap in one normalization step.
consequence: a one-command "Cắt Hiến pháp" cannot depend on WHICH agent/executor
runs it. A version identity that changes with the executor is not an identity.
Prose is a specification; only a byte-exact, versioned implementation (or a
formally pinned canonical executor) can MINT a production source-version hash.
A prose rule such as "collapse blank-line runs to a single \n" is read by one executor as "one newline, zero blank lines between blocks" and by another as "one blank line between blocks". Both are defensible English; they differ by ~135 characters across the document. Identity must not hinge on that ambiguity.
3. What this phase delivers (operations-first, not mechanism-first)
deliverable: a deterministic, portable REFERENCE IMPLEMENTATION of
parser_profile nuxt-incomex-portal-constitution-v1 such that ANY approved
executor, given the same source bytes, yields identical:
normalized_content / normalized_content_checksum / normalized_content_length
/ marker_counts / extraction_span_diagnostics / parser_version
purpose_in_workflow:
- removes the executor-divergence failure class permanently
- makes E1 snapshot capture reproducible (unblocks B5 path, still gated)
- lets "Cắt Hiến pháp" rehash the pinned artifact and compare deterministically
scope_exclusions (this phase): no snapshot write, no seed, no DML, no dry-run,
no CUT, no VERIFY, no schema/Directus/deploy change, no git commit, no
self-advance to E1 capture or E2 seed.
4. PASS / BLOCKED definition for this authoring phase
PASS (reference implementation authoring):
P1 a single deterministic implementation is authored with EVERY ambiguous
step pinned to one explicit decision (fetch/decode, span, detag, entity
decode, NFC, line endings, h-whitespace, v-whitespace, CHANGELOG boundary,
marker codepoint preservation, BEGIN/END sentinel region semantics)
P2 run read-only against the live source is deterministic across repeated
fetches (same normalized_content_checksum)
P3 EITHER it reproduces the ratified canonical checksum
17660443…/17522 (then -> recommend E1 re-run with this impl),
OR the exact divergence cause is identified and the precise fix stated,
OR live-source drift is proven and a new-fixture-ratification path is
recommended (never a silent canonical-checksum change)
P4 reference implementation source is captured in KB (SSOT) so it is portable
and ratifiable independent of any one environment
BLOCKED:
B1 cannot obtain source read-only, OR
B2 implementation is non-deterministic across fetches under one pinned ruleset
(FAIL_NONDETERMINISM), OR
B3 evidence is insufficient to either reproduce canonical or localize the
divergence (state "insufficient evidence", do NOT guess a checksum)
forbidden_outcome (any case): inventing/altering a canonical checksum,
freezing a non-canonical artifact, or seeding from unpinned content.
5. This phase's result (summary; detail in docs 2–5)
result: PASS (P1–P4 met). The reference implementation, run read-only against
the live source 3/3 deterministically, REPRODUCES THE RATIFIED CANONICAL
identity EXACTLY: 17660443e0f23e994e1807cf8e22920951a9e70c598956dbd0e752f4f5cae80c
/ length 17522 / markers ✅19 📋1 📝1 ⛔1.
divergence_cause: localized precisely to ONE step — N8 vertical-whitespace
("collapse blank-line runs to a single \n"). Canonical = drop ALL empty lines
(single \n between content). Claude E1's prose reconstruction kept one blank
line between blocks -> +135 chars / 072983ac…. Proven by controlled variant
sweep (doc 2 / doc 4): V3 variant == canonical, V4 variant == Claude E1 output.
recommendation: GPT ratify this reference implementation as the canonical
executor for nuxt-incomex-portal-constitution-v1, then re-run E1 capture
using it (decision is GPT/User; not taken here). See doc 5.
6. Statement
- Operations-first framing: byte-exact implementation is required because a prose-ambiguous normalization step (N8) makes version identity executor-dependent; PASS/BLOCKED defined (QG6 framing).
- No snapshot/seed/DML/dry-run/CUT/VERIFY/schema/Directus/deploy/commit; KB read-only + 5 uploads + deleted /tmp scratch only (QG1/QG5/QG7).
- doc 1 of 5; STOP after 5 files → route GPT/User. Self-advance PROHIBITED.
Companions: algorithm-analysis, implementation-draft, test-result, authoring-report.