KB-1683

dot-iu-cutter v0.5 — Test+Fixture Mismatch Analysis (test file ≡ KB doc-2 verbatim ⇒ ratify 454d9fc8; 31143968 not recoverable; fixture corrupted by transport ⇒ strategy A base64 blob)

7 min read Revision 1
dot-iu-cutterv0.5test-hash-analysisfixture-strategyratify-454d9fc8base64-blobimplementation-readiness-auditdieu442026-05-18

dot-iu-cutter v0.5 — Test File & Fixture Mismatch Analysis

Phase: …_implementation_readiness_audit · Nature: analysis_only · Date: 2026-05-18 · doc 3 of 5 · satisfies QG3 (test-hash decision) + QG4 (fixture strategy)


1. Test file mismatch (tests/test_dryrun_snapshot_mark.py)

current_sha256:  454d9fc84e940fdcf9da10bf29d12c5c420e21b1147ccc8da6a29a81f2843a4a
KB_declared:     31143968f322433cc5da62fa3ccf2a1fbe1905f461940c789a57cb0a116dc1b4

1.1 Is current 454d9fc8 semantically equivalent to KB doc-2?

YES. Prior phase performed a Read-level side-by-side of the repo file (219 lines) vs the KB doc-2 verbatim python block: line-for-line identical — same docstring, imports, ART/SHA/EM constants, _Args, TestGate(5), TestManifest(9), TestFailClosedSynthetic(4), TestNoDbImportIsolation(3); identical assertions, synthetic fixtures, Vietnamese/em-dash/emoji literals. No logic edit, no added/removed test. File hygiene: pure LF, no trailing ws, no tabs, single final newline, UTF-8, is_NFC. Corroborated this audit: the same KB doc-2 also yields cutter_agent/dryrun.py byte-exact (f1f42e83…), and 7/7 fixture-independent tests PASS against that byte-exact module — the test suite is the authored suite, exercising correct semantics.

1.2 Is 31143968 recoverable?

NO. Bounded deterministic experiment (no character guessing) over the faithfully transcribed KB block — as-is, ±final newline, per-line rstrip, CRLF, strip-all-trailing-newlines — produced 454d9fc8, 3c268b36, 0c237699, 6da558f7; none == 31143968. 31143968 was computed on a pre-embed scratch copy whose exact blank-line layout was normalized when the code-authoring agent embedded the block into KB markdown (same transport class that left dryrun.py — ASCII/LF-dominant — unaffected but cost the test file its declared hash). 31143968 is not byte-recoverable from the KB.

1.3 Recommendation — TEST HASH RULING

recommend: GPT/User RATIFY 454d9fc8…f2843a4a as the new canonical
           hash-of-record for tests/test_dryrun_snapshot_mark.py.
basis:     GPT's own "OPT_2 conditionally_allowed_after_side_by_side_review";
           side-by-side DONE (≡ verbatim); dryrun.py byte-exact; 7/7 pass;
           zero semantic divergence; 31143968 unrecoverable.
action_on_ratify: update KB hash-of-record 31143968 → 454d9fc8; thereafter
           ALSO publish the test file as a base64 blob (doc 2 / standard S2)
           so the ratified bytes are durably re-applyable without re-drift.

2. Fixture transport (pinned snapshot)

2.1 Why markdown/MCP transport changes the snapshot bytes

The pinned artifact was byte-correct when written (capture-phase CP-7 rehash PASSED: region sha == 17660443…). Corruption is in the read-back / re-author channel, not the stored identity:

mechanism: MCP delivers KB markdown as text → context → Write. The normalized
  region is dense unicode (U+2192 ×65, U+2014 ×42, U+2705 ×19, U+1F4CB/U+1F4DD/
  U+26D4, §, ·). At least 3× U+2013 EN-DASH appear where the pinned identity
  implies a different equal-width character. Because substitutions are 1:1 same
  codepoint count, region_length (17522) and marker census (19/1/1/1) stay
  invariant while sha256 changes (86d6aea7 ≠ 17660443).
not_the_cause: NOT Unicode normalization (NFC/NFD/NFKC/NFKD all fail to
  recover 17660443; region already is_NFC). NOT a source/markers/span drift.
  NOT a code defect (dryrun.extract_region is correct; proven on synthetic).
class: lossy same-width codepoint substitution by the text transport channel
  for whitespace-/codepoint-significant content (doc 2 root cause).

2.2 Strategy options

A store fixture as base64 blob in KB, decode into repo (gated):
  + ASCII payload immune to ws-normalization & unicode substitution
  + deterministic; region rehash provable == 17660443 BEFORE any test trusts it
  + no network, no live-page dependency, no refimpl re-run
  + matches GPT R2/R3 and doc-2 transport standard exactly
  - one-time: a byte-faithful base64 of the pinned artifact must be produced by
    a byte-trusted path (the capture environment / a tool that base64-encodes
    the on-disk pinned file, not retyped text)

B store fixture compressed+base64 (gzip|b64):
  + smaller payload  - same trust requirement as A + extra codec surface;
  marginal benefit (artifact ~21 KB); A simpler/auditable. Not preferred.

C regenerate fixture locally from refimpl.r1, verify canonical hash:
  + reconstructs from first principles
  - requires live Nuxt page fetch (network) + refimpl execution; page is a
    LIVING doc with prior observed KB-revision drift; "v4.6.3" stable but raw
    is render-volatile; reintroduces the exact divergence risk that 17660443
    was pinned to freeze. Heaviest, most failure modes. Fallback only.

D do not store full fixture — use a generated/synthetic fixture in tests:
  + zero transport of the dense artifact
  - the WHOLE POINT of TestGate/TestManifest is identity over the REAL pinned
    region (sha==17660443, the 15/3/42 + Đ44 cascade). Synthetic already
    covered by TestFailClosedSynthetic. D would delete real-snapshot coverage.
    Rejected for the identity suite (acceptable only as an explicitly-labelled
    interim, see doc 4 CI standard).

2.3 Recommendation — FIXTURE STRATEGY RULING

choose: A — base64 content-addressed blob in KB, gated decode into the repo
        test path, with mandatory post-decode region rehash == 17660443…
        (len 17522, markers 19/1/1/1) else STOP_AND_REPORT.
tradeoff_accepted: requires one byte-trusted base64 production of the pinned
        on-disk artifact (NOT retyped) — a one-time, auditable, deterministic
        cost that permanently removes the recurring transport blocker. C kept
        as documented fallback ONLY if a byte-trusted base64 cannot be produced.
fixture_repo_path (when later applied): the test resolves
  ART = Path(__file__).resolve().parents[1] / "constitution-normalized-17660443e0f23e99.md"
  ⇒ the byte-identity fixture must land at REPO ROOT (not tests/fixtures/), OR
  the ratified test file's ART path be part of the ratified bytes. Flag for
  GPT: decide root-path vs tests/fixtures/ + matching ART (coupled to §1.3).
current_nonidentity_fixture: quarantined at tests/fixtures/ (region 86d6aea7);
  NOT wired, root path absent ⇒ no false PASS. Leave as-is until ruling.

doc 3 of 5. Self-advance PROHIBITED.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-implementation-readiness-audit-byte-artifact-transport/dot-iu-cutter-v0.5-test-fixture-mismatch-analysis-2026-05-18.md