KB-3501

Reserved-Token Rejection Policy (recheck-6 blocker A)

4 min read Revision 1

02 - Reserved-Token Rejection Policy (recheck-6 blocker A)

Load-bearing copy: doc 00 §Canonical hash encoding (FIX7-CANON-V1) → Field rejection policy. This doc is the rationale + worked proof. Decision: REJECT, never escape — none of the legitimate field values (document_ids, hex hashes, enum tokens, integers, booleans, sentinels) ever needs a TAB, LF, CR, NUL, backslash, or a structural sentinel, so rejecting them outright is safe and makes the TAB/LF-delimited records provably injective (no value can contain a separator). No escape syntax = nothing to interpret.

Three layers (all fail-closed)

  1. Per-field whitelist grammar (anchored, full-string). A value that does not match → CANONICAL_FIELD_VALUE_GRAMMAR_REJECTED:

    field grammar
    document_id ^knowledge/dev/reports/architecture/[A-Za-z0-9._/-]+\.md$
    sha256_hex ^[0-9a-f]{64}$
    kb_revision ^[1-9][0-9]*$ or SELF_HOST_PIN_BY_EXCLUDE_REGION_HASH
    doc_status ^(ACTIVE_AUTHORITY|SUPERSEDED_NON_AUTHORITY)$
    boolean ^(true|false)$
    active_section_id_or_range ^(WHOLE_DOCUMENT|WHOLE_DOCUMENT_MINUS_SUPERSEDED_FENCES|WHOLE_DOCUMENT_MINUS_EXCLUDE_AND_SUPERSEDED)$
    fence_range ^L[1-9][0-9]*-L[1-9][0-9]*$ (begin < end)
    superseded_id ^<document_id>#S[1-9][0-9]*$
    marker_kind ^(DOC_STATUS|SUPERSEDED_BEGIN|SUPERSEDED_END|ENVELOPE_EXCLUDE_BEGIN|ENVELOPE_EXCLUDE_END|AUTHORITY_BOUNDARY)$
    marker_literal ^<!--.*-->$ (the ONLY field permitted to carry marker structure; forbidden-byte layer still applies)
    fixed-constant fields (digest_algorithm, full_document_hash_policy, canonical_encoding_version) must equal the pinned constant byte-for-byte
  2. Forbidden-byte rejection. Any value containing TAB 0x09, LF 0x0A, CR 0x0D, NUL 0x00, or backslash 0x5CCANONICAL_FIELD_RESERVED_TOKEN_REJECTED.

  3. Forbidden reserved-token rejection. Any value other than marker_literal containing a structural sentinel → CANONICAL_FIELD_RESERVED_TOKEN_REJECTED. Forbidden list: <!-- ENVELOPE:EXCLUDE-BEGIN -->, <!-- ENVELOPE:EXCLUDE-END -->, <!-- SUPERSEDED_NON_AUTHORITY BEGIN, <!-- SUPERSEDED_NON_AUTHORITY END -->, and every domain tag (FIX7_*_V1). The bare tokens ACTIVE_AUTHORITY / SUPERSEDED_NON_AUTHORITY are permitted only as a doc_status value (layer 1).

No null / empty: null/absent → CANONICAL_FIELD_NULL_REJECTED (use NOT_APPLICABLE / SEAL_AT_CODEX_RECHECK_7 / NON_AUTHORITY_DIAGNOSTIC); empty string → CANONICAL_FIELD_EMPTY_REJECTED.

Any status → STOP authoring → T1 fix + fresh Codex recheck (G-CANONICAL-FIELD-REJECT).

The "one level up" hole I closed myself

Three manifest-bound fields were previously free prose (digest_algorithm, full_document_hash_policy, active_section_id_or_range). Free text is byte-exact but semantically loose — exactly the disguised-hardcode class. Fix: the first two are fixed constants (grammar = exact match), and active_section_id_or_range is a controlled vocabulary validated against the extractor's computed descriptor (doc 03). No free-text authority field remains.

Computed evidence (doc 08, python == shasum)

TAB / LF / CR / NUL / backslash in a value → rejected; a reserved fence token inside a value → rejected; null → rejected; empty → rejected; and after enforcing the policy the membership digest still reproduces f2bda8effc7be19b54722828126b82d7d2d48bee5e5e5dc0c8f347ce210fe251. Each was executed, not asserted.

Back to Knowledge Hub knowledge/dev/reports/architecture/t1-fix7-blueprint-patch-after-codex-recheck-6-byte-exact-envelope-2026-06-09/02-reserved-token-rejection-policy.md