dot-iu-cutter v0.5 — Byte Artifact Transport Standard (classifies every byte-sensitive artifact; markdown-unsafe ⇒ base64/content-addressed blob; required hashes + storage paths)
dot-iu-cutter v0.5 — Byte Artifact Transport Standard
Phase:
…_implementation_readiness_audit_and_byte_artifact_transport_standard· Nature:analysis_only· Date: 2026-05-18 · doc 2 of 5 · satisfies QG2 (all byte-sensitive artifacts classified)
1. Root cause (why a standard is needed)
KB documents are stored/delivered as markdown text through the agent-data MCP. Empirical evidence from prior phases + this audit:
code_artifact (dryrun.py): markdown-fenced transport IS byte-faithful
⇒ reproduced f1f42e83… EXACT from KB doc-2. (ASCII-dominant, LF, no dense unicode)
test_artifact (test file): markdown-fenced transport NOT byte-faithful for the
declared hash ⇒ 454d9fc8 ≠ 31143968 (whitespace/blank-line layout lost at
KB-authoring embed; no deterministic ws/nl/CRLF transform recovers 31143968).
fixture_artifact (snapshot): markdown/MCP transport NOT byte-faithful ⇒ region
86d6aea7 ≠ pinned 17660443 while length 17522 + markers 19/1/1/1 invariant.
Mechanism: ≥3× same-width codepoint substitution (U+2013 EN-DASH class) in a
dense-unicode region (U+2192 ×65, U+2014 ×42, U+2705 ×19, emoji, §). NFC/NFD/
NFKC/NFKD do NOT recover identity — it is a transport substitution, not a
normal-form artifact.
Principle: any artifact whose acceptance depends on an exact sha256 of whitespace- or codepoint-significant content MUST NOT be transported as raw markdown text. It must travel as a base64 / content-addressed blob (ASCII-only payload, immune to whitespace normalization and unicode substitution), with the required hash published alongside and re-verified after decode.
2. Byte-sensitive artifact classification
| # | Artifact | Byte-sensitive? | Markdown-safe? | Needs base64/content-addressed? | Required hash | Recommended KB storage path |
|---|---|---|---|---|---|---|
| A1 | cutter_agent/dryrun.py |
Yes (source; sha-gated) | Observed yes (reproduced exact) | Recommended for durability; not strictly required (proven faithful once) | module sha256 f1f42e83…2efa1422 |
KB doc-2 fenced block + (recommended) …/blobs/dryrun.py.b64 |
| A2 | tests/test_dryrun_snapshot_mark.py |
Yes (sha-gated) | No (454d9fc8 ≠ 31143968) | YES | one of: ratified 454d9fc8…f2843a4a (recommended, doc 3) or recovered 31143968… |
…/blobs/test_dryrun_snapshot_mark.py.b64 |
| A3 | pinned snapshot fixture constitution-normalized-17660443e0f23e99.md |
Yes, critically (region sha == identity) | No (region 86d6aea7 ≠ 17660443) | YES (mandatory) | region sha256 17660443…cae80c (len 17522, markers 19/1/1/1) |
…/blobs/constitution-normalized-17660443e0f23e99.md.b64 (decodes to whole-file; region rehash MUST equal 17660443) |
| A4 | manifest / expected-output fixtures (manifest.json, review_evaluation.json, coverage_proof.json, determinism_digest.md) | Would be Yes IF golden files existed | n/a | n/a — none exist | n/a | NONE — tests assert structurally/in-memory (NT==15, KT==3, count∈[55,78], reconstruction_ok, deterministic digest); no golden output file is stored or transported. No A4 artifact to transport. |
| A5 | parser refimpl.r1 (nuxt-incomex-portal-constitution-v1.refimpl.r1) |
Yes if executed to regenerate A3 | Python ⇒ likely faithful (A1-class) but NOT independently re-proven | Recommended if used for strategy C | reference_script_sha256 8f6220c9… (provenance only, NOT identity) |
embedded in KB parser-refimpl doc; b64 mirror if used to regenerate |
| A6 | command-review package / GPT rulings / report docs | No (prose; semantic, not sha-gated) | Yes | No | n/a | normal KB markdown |
3. Standard (normative)
S1 transport_form:
- sha-gated code/test/fixture artifacts ⇒ store as base64 blob (RFC 4648,
no line wrapping OR fixed 76-col, declared) in a fenced ```text block
inside a dedicated KB blob doc; payload is pure ASCII.
S2 required_metadata (in the blob doc frontmatter/body):
- artifact_logical_path (repo-relative)
- decoded_sha256 (whole-file) AND, for the fixture, region_sha256 + region_length
+ marker_counts
- base64_sha256 (sha of the ASCII b64 text itself — detects transport damage
to the blob doc before decode)
- encoder note (base64 std alphabet, wrap policy)
S3 apply_protocol (future gated phase, NOT now):
- fetch blob doc read-only ; verify base64_sha256 of received text first
- decode to repo path ; recompute decoded_sha256 (and region rehash for fixture)
- MUST equal the required hash byte-for-byte else STOP_AND_REPORT (no guessing)
S4 prohibition:
- never mutate characters to "force" a hash (whitespace/codepoint guessing) —
pre-existing controlling rule, reaffirmed.
S5 scope:
- applies to A1 (recommended), A2 (required), A3 (mandatory), A5 (if used);
A4 not applicable (no golden files); A6 stays plain markdown.
4. Verification chain a future apply must satisfy
KB blob doc ──(read-only)──> verify base64_sha256(received ASCII) == declared
──(base64 -d)──> repo file
──(shasum)─────> decoded_sha256 == declared
fixture only:
──(D.extract_region + sha256)──> region_sha256 == 17660443… (len 17522,
markers 19/1/1/1) else STOP_AND_REPORT
base64 is closed under the markdown/MCP text channel (ASCII [A-Za-z0-9+/=]
only — no whitespace-significant layout, no unicode substitution surface), so
this chain is the durable fix GPT flagged as R2/R3.
doc 2 of 5. Self-advance PROHIBITED.