KB-262B

dot-iu-cutter v0.5 — Byte Artifact Transport Standard (classifies every byte-sensitive artifact; markdown-unsafe ⇒ base64/content-addressed blob; required hashes + storage paths)

6 min read Revision 1
dot-iu-cutterv0.5byte-artifact-transport-standardbase64content-addressedimplementation-readiness-auditdieu442026-05-18

dot-iu-cutter v0.5 — Byte Artifact Transport Standard

Phase: …_implementation_readiness_audit_and_byte_artifact_transport_standard · Nature: analysis_only · Date: 2026-05-18 · doc 2 of 5 · satisfies QG2 (all byte-sensitive artifacts classified)


1. Root cause (why a standard is needed)

KB documents are stored/delivered as markdown text through the agent-data MCP. Empirical evidence from prior phases + this audit:

code_artifact (dryrun.py): markdown-fenced transport IS byte-faithful
  ⇒ reproduced f1f42e83… EXACT from KB doc-2.  (ASCII-dominant, LF, no dense unicode)
test_artifact (test file): markdown-fenced transport NOT byte-faithful for the
  declared hash ⇒ 454d9fc8 ≠ 31143968 (whitespace/blank-line layout lost at
  KB-authoring embed; no deterministic ws/nl/CRLF transform recovers 31143968).
fixture_artifact (snapshot): markdown/MCP transport NOT byte-faithful ⇒ region
  86d6aea7 ≠ pinned 17660443 while length 17522 + markers 19/1/1/1 invariant.
  Mechanism: ≥3× same-width codepoint substitution (U+2013 EN-DASH class) in a
  dense-unicode region (U+2192 ×65, U+2014 ×42, U+2705 ×19, emoji, §). NFC/NFD/
  NFKC/NFKD do NOT recover identity — it is a transport substitution, not a
  normal-form artifact.

Principle: any artifact whose acceptance depends on an exact sha256 of whitespace- or codepoint-significant content MUST NOT be transported as raw markdown text. It must travel as a base64 / content-addressed blob (ASCII-only payload, immune to whitespace normalization and unicode substitution), with the required hash published alongside and re-verified after decode.

2. Byte-sensitive artifact classification

# Artifact Byte-sensitive? Markdown-safe? Needs base64/content-addressed? Required hash Recommended KB storage path
A1 cutter_agent/dryrun.py Yes (source; sha-gated) Observed yes (reproduced exact) Recommended for durability; not strictly required (proven faithful once) module sha256 f1f42e83…2efa1422 KB doc-2 fenced block + (recommended) …/blobs/dryrun.py.b64
A2 tests/test_dryrun_snapshot_mark.py Yes (sha-gated) No (454d9fc8 ≠ 31143968) YES one of: ratified 454d9fc8…f2843a4a (recommended, doc 3) or recovered 31143968… …/blobs/test_dryrun_snapshot_mark.py.b64
A3 pinned snapshot fixture constitution-normalized-17660443e0f23e99.md Yes, critically (region sha == identity) No (region 86d6aea7 ≠ 17660443) YES (mandatory) region sha256 17660443…cae80c (len 17522, markers 19/1/1/1) …/blobs/constitution-normalized-17660443e0f23e99.md.b64 (decodes to whole-file; region rehash MUST equal 17660443)
A4 manifest / expected-output fixtures (manifest.json, review_evaluation.json, coverage_proof.json, determinism_digest.md) Would be Yes IF golden files existed n/a n/a — none exist n/a NONE — tests assert structurally/in-memory (NT==15, KT==3, count∈[55,78], reconstruction_ok, deterministic digest); no golden output file is stored or transported. No A4 artifact to transport.
A5 parser refimpl.r1 (nuxt-incomex-portal-constitution-v1.refimpl.r1) Yes if executed to regenerate A3 Python ⇒ likely faithful (A1-class) but NOT independently re-proven Recommended if used for strategy C reference_script_sha256 8f6220c9… (provenance only, NOT identity) embedded in KB parser-refimpl doc; b64 mirror if used to regenerate
A6 command-review package / GPT rulings / report docs No (prose; semantic, not sha-gated) Yes No n/a normal KB markdown

3. Standard (normative)

S1 transport_form:
  - sha-gated code/test/fixture artifacts ⇒ store as base64 blob (RFC 4648,
    no line wrapping OR fixed 76-col, declared) in a fenced ```text block
    inside a dedicated KB blob doc; payload is pure ASCII.
S2 required_metadata (in the blob doc frontmatter/body):
  - artifact_logical_path (repo-relative)
  - decoded_sha256 (whole-file) AND, for the fixture, region_sha256 + region_length
    + marker_counts
  - base64_sha256 (sha of the ASCII b64 text itself — detects transport damage
    to the blob doc before decode)
  - encoder note (base64 std alphabet, wrap policy)
S3 apply_protocol (future gated phase, NOT now):
  - fetch blob doc read-only ; verify base64_sha256 of received text first
  - decode to repo path ; recompute decoded_sha256 (and region rehash for fixture)
  - MUST equal the required hash byte-for-byte else STOP_AND_REPORT (no guessing)
S4 prohibition:
  - never mutate characters to "force" a hash (whitespace/codepoint guessing) —
    pre-existing controlling rule, reaffirmed.
S5 scope:
  - applies to A1 (recommended), A2 (required), A3 (mandatory), A5 (if used);
    A4 not applicable (no golden files); A6 stays plain markdown.

4. Verification chain a future apply must satisfy

KB blob doc ──(read-only)──> verify base64_sha256(received ASCII) == declared
            ──(base64 -d)──> repo file
            ──(shasum)─────> decoded_sha256 == declared
  fixture only:
            ──(D.extract_region + sha256)──> region_sha256 == 17660443… (len 17522,
                                              markers 19/1/1/1) else STOP_AND_REPORT

base64 is closed under the markdown/MCP text channel (ASCII [A-Za-z0-9+/=] only — no whitespace-significant layout, no unicode substitution surface), so this chain is the durable fix GPT flagged as R2/R3.

doc 2 of 5. Self-advance PROHIBITED.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-implementation-readiness-audit-byte-artifact-transport/dot-iu-cutter-v0.5-byte-artifact-transport-standard-2026-05-18.md