KB-7771

dot-iu-cutter v0.5 — Snapshot MARK Byte-safe Fixture Log (Option B: refimpl.r1 from live source, region bytes never through model context; region sha 17660443 EXACT; no hand-edit)

6 min read Revision 1
dot-iu-cutterv0.5byte-safe-fixturerefimpl-r1option-bregion-sha-17660443no-hand-editdieu442026-05-18

dot-iu-cutter v0.5 — Snapshot MARK Byte-safe Fixture Log

Phase: v0_5_snapshot_MARK_byte_safe_fixture_full_CI · Nature: fixture_provision__byte_safe__no_commit__no_dryrun · Date: 2026-05-18 · doc 1 of 4

fixture_generation_method: Option B — regenerated locally via refimpl.r1 from
  the LIVE source; region bytes flowed shell→file ONLY, never through model
  context; NO manual markdown copy; NO Unicode hand-edit.
result: region sha256 == 17660443… EXACT (len 17522, markers 19/1/1/1)

1. Why Option B (A and C unavailable)

Option A (base64 blob): NO blob exists in KB (consolidated-path Step 2 was never
  authorized/executed). Reading the KB pinned artifact via MCP is the corrupting
  channel (prior region 86d6aea7 ≠ 17660443). A not possible.
Option C (byte-exact local copy): filesystem search under /Users/nmhuyen found
  ONLY the prior NON-IDENTITY fixture (region 86d6aea7); no byte-exact copy
  anywhere. C not possible.
Option B (regenerate via refimpl.r1): CHOSEN. refimpl.r1 is the KB-ratified
  reference parser proven to reproduce the canonical 17660443…/17522/19·1·1·1
  deterministically 3/3 from the live source (KB nuxt-parser test-result doc).

2. Byte-safety design (the key control)

The corruption mechanism is same-width Unicode substitution when dense content passes through the model context as text. Defeated by ensuring the identity region bytes never enter the model context:

1 live page fetched by curl  → /tmp scratch raw.html      (bytes, not shown)
2 refimpl.r1 written to scratch (algorithm + normative step ORDER verbatim;
  only non-ASCII codepoints that are runtime-identical: MARKERS ✅/⛔ + the
  "HIẾN PHÁP" H1-anchor regex — Python-equivalent, output sha-gated)
3 python3 refimpl.py raw.html region.txt  → region.txt = normalized region
  (b_text, no trailing \n per D-TRAILNL); written by the script, shell→file
4 fixture assembled file→file: frontmatter+BEGIN  >  fixture ; cat region.txt
  >> fixture ; printf END >> fixture        (region bytes never tokenized)
5 only sha256 / length / integer marker counts were ever surfaced to context

This is not "hand-editing Unicode to force a hash": the region is the deterministic output of the ratified parser over the live source, gated by an exact sha256. Worst case of any script transcription error = fail-closed (anchor miss / wrong markers / wrong sha) → STOP, never a wrong fixture.

3. Execution evidence

live_fetch:        HTTP 200, redirects 0, raw_bytes 1,251,374,
                   raw_sha256 b7d04a43ec674b7a533d0d2aa982ed18b8408fd949f86f05c8882984b3f2aace
                   (raw is Nuxt-render-volatile / forensic-only by design)
parser_status:     OK
parser_reported:   checksum 17660443…cae80c · length 17522 ·
                   markers {enacted:19,controlled_draft:1,draft:1,obsolete:1} ·
                   A_minus_B 329 (span geometry consistent with KB 329 band)
independent_recompute (region.txt):
  region_sha256          = 17660443e0f23e994e1807cf8e22920951a9e70c598956dbd0e752f4f5cae80c  ✓ EXACT
  region_codepoint_len   = 17522                                                            ✓
  region_trailing_newline= False  (D-TRAILNL honored)
  region_is_NFC          = True
  region_markers         = {enacted:19, controlled_draft:1, draft:1, obsolete:1} ✓
content_drift_since_pin:  NONE — normalized region byte-identical to the pinned
                          canonical despite raw render noise (living-doc stable).

4. Fixture placement (+ flagged path-coupling discrepancy)

mandated_path (ruling):  tests/fixtures/constitution-normalized-17660443e0f23e99.md   → written
test_resolution_path:    the RATIFIED, UNMODIFIED test file (sha 454d9fc8) resolves
  ART = Path(__file__).resolve().parents[1] / "constitution-normalized-17660443e0f23e99.md"
  = REPO ROOT, NOT tests/fixtures/. (verified on disk this phase)
conflict: ruling mandates tests/fixtures/ AND keep test unchanged AND full 21/21
  — mutually inconsistent for the ratified test as written (it reads repo-root).
resolution_taken: provisioned the byte-exact fixture at BOTH paths, byte-identical
  (file-level sha 5c76eedd… on both; copied file→file from the same region):
   - tests/fixtures/constitution-normalized-17660443e0f23e99.md  (canonical, per ruling)
   - constitution-normalized-17660443e0f23e99.md  (repo-root; the path the
     ratified test actually consumes — required to satisfy "full 21/21")
  test file NOT modified (454d9fc8 preserved); dryrun.py NOT modified.
  Both untracked, no commit, fully reversible.
ROUTED TO GPT (pre-flagged in readiness-audit doc 3 §2.3): rule the canonical
  coupling — (a) amend the test ART → tests/fixtures/ in a later gated phase
  (changes test hash → re-ratify), or (b) declare repo-root canonical and keep
  tests/fixtures/ as the stored copy. NOT self-decided here.

5. Gate verification (the REAL gate — dryrun.py)

D.extract_region + D.snapshot_gate(region, 17660443…, 17522,
  {enacted:19,controlled_draft:1,draft:1,obsolete:1}) — raises FailClosed on
  ANY mismatch — returned GATE_PASS for BOTH files:
  tests/fixtures/…  => 17660443e0f23e99 17522 {19,1,1,1}
  repo-root      …  => 17660443e0f23e99 17522 {19,1,1,1}

Scratch (/tmp/iucut-bytesafe.*: raw.html, refimpl.py, region.txt, fixture.md) shredded after assembly+verification. No secrets recorded.

doc 1 of 4. Self-advance PROHIBITED.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-snapshot-mark-byte-safe-fixture-full-ci/dot-iu-cutter-v0.5-snapshot-mark-byte-safe-fixture-log-2026-05-18.md