KB-77CA

FIX7-CANON-V1 Canonicalizer — Single Source of Truth (executable)

16 min read Revision 1
<!-- DOC_STATUS: LOAD_BEARING_SSOT_ARTIFACT (canonicalizer; pinned by canonicalizer_sha256 in doc 00 envelope; NOT an active_corpus membership member; hashed as full normalized content) -->

FIX7-CANON-V1 Canonicalizer — Single Source of Truth (executable)

This document is the ONE load-bearing canonical contract (Constitution Article 14 / NT14). Every other description of canonicalization in the blueprint (doc 00 §Canonical hash encoding, the extractor and record-encoding sections, and all report docs) is NON_AUTHORITY_EXPLANATION: it explains this artifact and MUST NOT conflict with it. If any other description conflicts, this artifact wins and G-NO-DUPLICATE-CANONICAL-AUTHORITY fails closed. There is exactly one canonicalizer; no doc, spec, guard, or package may redefine canonicalization.

SSOT identity (pinned in the doc 00 envelope, MANIFEST_BOUND)

field value
canonicalizer_artifact_id FIX7-CANON-V1-CANONICALIZER
canonicalizer_path knowledge/dev/reports/architecture/t1-fix7-existing-system-refactor-execution-blueprint-2026-06-08/canonicalizer-fix7-canon-v1-ssot.md
canonicalizer_version FIX7-CANON-V1
canonicalizer_revision SEAL_AT_CODEX_RECHECK_8 (platform revision of THIS artifact, computed by Codex at the seal — diagnostic+pin; never this artifact's own future revision recorded inside itself)
canonicalizer_sha256 SEAL_AT_CODEX_RECHECK_8 (SHA-256 over this artifact's full MCP bytes, CRLF/CR→LF normalized, at canonicalizer_revision; this artifact does NOT contain its own hash → no self-reference)
nature executable reference code (authoritative) + frozen test vectors

Invocation contract

  • Command: python3 canonicalizer-fix7-canon-v1-ssot.py --selftest
  • Inputs: (a) for --selftest: none (vectors are embedded); (b) for production use: the raw UTF-8 bytes returned by mcp.get_document_for_rewrite(document_id) per active member, plus the explicit active-corpus membership list and the live envelope fields.
  • Outputs: lowercase-hex SHA-256 digests (membership, per-doc normalized_active_content_sha256, active_corpus_sha256, marker_fence_registry_sha256, superseded_boundary_sha256, guard_set_sha256, envelope_manifest_sha256, detached_seal_sha256) and, on any violation, a single fail-closed status string (below).
  • Exit code: 0 iff every embedded test vector passes; non-zero otherwise. A package MAY NOT proceed unless this artifact's --selftest exits 0 against the pinned canonicalizer_sha256.

Frozen positive test vector (behavioural pin, reproducible now)

membership over the 10 canonical full doc_ids under FIX7_ACTIVE_AUTHORITY_MEMBERSHIP_V1 (ascending, LF-joined, trailing LF) == f2bda8effc7be19b54722828126b82d7d2d48bee5e5e5dc0c8f347ce210fe251 (identical under shasum -a 256 and python hashlib). Any implementation that does not reproduce this exact digest is non-conformant.

Closed failure-status set (the ONLY allowed rejections)

CANONICAL_FIELD_RESERVED_TOKEN_REJECTED, CANONICAL_FIELD_VALUE_GRAMMAR_REJECTED, CANONICAL_FIELD_NULL_REJECTED, CANONICAL_FIELD_EMPTY_REJECTED, DOCUMENT_ID_ALIAS_REJECTED, DOCUMENT_ID_NOT_MCP_CANONICAL, DOCUMENT_ID_SCOPE_MISMATCH, MARKER_KIND_UNKNOWN, MARKER_LITERAL_MISMATCH, MARKER_LITERAL_NOT_ALLOWED, MARKER_KIND_LITERAL_INCONSISTENT, ACTIVE_SCOPE_MARKER_MISSING, ACTIVE_SCOPE_MARKER_DUPLICATE, FENCE_UNBALANCED, FENCE_NESTED_UNSUPPORTED, ACTIVE_SUPERSEDED_OVERLAP, SECTION_ID_MISMATCH, SECTION_RANGE_MISMATCH, EXCLUDE_REGION_UNBALANCED, MARKER_REGISTRY_MISMATCH, SEAL_HASH_GRAPH_CYCLE. Any other behaviour is a defect.

AUTHORING_REQUIREMENT (implementation must not fail)

Implementation-authoring MUST adopt exactly one canonicalizer that is byte-for-byte this artifact (or a re-implementation proven to pass all embedded test vectors AND reproduce f2bda8…fe251), pinned by canonicalizer_sha256. No package, guard, or doc may ship or reference a different canonicalizer. Before PKG-A may proceed, the live canonicalizer's --selftest must exit 0 and its content hash must equal the sealed canonicalizer_sha256 (G-CANONICALIZER-SSOT-ONLY, G-NO-DUPLICATE-CANONICAL-AUTHORITY, doc 06).

Article-14 self-reference rule (BLOCKER A, encoded)

No load-bearing digest takes, as an input, a platform-assigned revision of the artifact that carries it (a revision exists only after the write, so embedding it is circular). Revisions are diagnostic / post-seal audit only. The canonicalizer enforces this via LOAD_BEARING_FORBIDS_SELF_REVISION = True and an empty SELF_REVISION_INPUTS set; adding such an edge is detected as a cycle.

Executable reference (authoritative)

#!/usr/bin/env python3
# ============================================================================
# FIX7-CANON-V1 CANONICALIZER  --  SINGLE SOURCE OF TRUTH (executable)
# canonicalizer_artifact_id: FIX7-CANON-V1-CANONICALIZER
# canonicalizer_version:     FIX7-CANON-V1
# This file IS the load-bearing canonical contract. Every other description
# (blueprint doc 00, report docs) is NON_AUTHORITY_EXPLANATION and must not
# conflict. Constitution Article 14: one authority of one nature.
# Invocation:  python3 fix7_canon_v1_ssot.py --selftest   (exit 0 == all vectors pass)
# ============================================================================
import hashlib, re, sys

def sha(b: bytes) -> str: return hashlib.sha256(b).hexdigest()

# BLOCKER A: the canonical contract NEVER takes, as a load-bearing input, a
# platform-assigned revision of the artifact that carries it. Revisions are diagnostic only.
LOAD_BEARING_FORBIDS_SELF_REVISION = True

# field rejection (recheck-6 A)
FORBIDDEN_BYTES = {0x09,0x0A,0x0D,0x00,0x5C}            # TAB LF CR NUL backslash
RESERVED_TOKENS = ["<!-- ENVELOPE:EXCLUDE-BEGIN -->","<!-- ENVELOPE:EXCLUDE-END -->",
 "<!-- SUPERSEDED_NON_AUTHORITY BEGIN","<!-- SUPERSEDED_NON_AUTHORITY END -->",
 "FIX7_ACTIVE_AUTHORITY_MEMBERSHIP_V1","FIX7_ACTIVE_AUTHORITY_CORPUS_V1","FIX7_MARKER_FENCE_REGISTRY_V1",
 "FIX7_SUPERSEDED_BOUNDARY_V1","FIX7_GUARD_SET_V1","FIX7_DOC_NORMALIZED_CONTENT_V1",
 "FIX7_ACTIVE_AUTHORITY_ENVELOPE_MANIFEST_V1","FIX7_CODEX_DETACHED_SEAL_V1"]

class Reject(Exception):
    def __init__(s,st,d=""): super().__init__(f"{st}: {d}"); s.status=st

# BLOCKER D: canonical document_id == exact MCP id, no alias
KB_ROOT = "knowledge/dev/reports/architecture/"
_SEG = re.compile(r"^[A-Za-z0-9._-]+$")          # ASCII only -> rejects backslash, %xx, homoglyph slash
def canonical_document_id(value, mcp_id=None, require_root=True):
    if value is None or value == "": raise Reject("DOCUMENT_ID_ALIAS_REJECTED","empty")
    for ch in value:
        if ord(ch) in FORBIDDEN_BYTES: raise Reject("DOCUMENT_ID_ALIAS_REJECTED",f"ctrl/backslash 0x{ord(ch):02x}")
        if ord(ch) > 0x7F: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","non-ASCII (homoglyph?)")
    if "%" in value: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","url-encoded")
    if "\\" in value: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","backslash")
    if "//" in value: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","empty segment //")
    if value.startswith("/"): raise Reject("DOCUMENT_ID_ALIAS_REJECTED","leading slash (ids are relative)")
    if value.endswith("/"): raise Reject("DOCUMENT_ID_ALIAS_REJECTED","trailing slash")
    segs = value.split("/")
    for s in segs:
        if s in (".",".."): raise Reject("DOCUMENT_ID_ALIAS_REJECTED",f"dot segment {s!r}")
        if s == "":        raise Reject("DOCUMENT_ID_ALIAS_REJECTED","empty segment")
        if not _SEG.match(s): raise Reject("DOCUMENT_ID_ALIAS_REJECTED",f"bad segment {s!r}")
    if not value.endswith(".md"): raise Reject("DOCUMENT_ID_ALIAS_REJECTED","not .md")
    if require_root and not value.startswith(KB_ROOT): raise Reject("DOCUMENT_ID_SCOPE_MISMATCH",value)
    if mcp_id is not None and value != mcp_id:            # byte-for-byte equality, case-sensitive
        raise Reject("DOCUMENT_ID_NOT_MCP_CANONICAL",f"{value!r} != mcp {mcp_id!r}")
    return value

# BLOCKER E: marker_kind <-> marker_literal closed contract
MARKER_KINDS = {"DOC_STATUS","SUPERSEDED_BEGIN","SUPERSEDED_END",
                "ENVELOPE_EXCLUDE_BEGIN","ENVELOPE_EXCLUDE_END","AUTHORITY_BOUNDARY"}
MARKER_GRAMMAR = {
 "DOC_STATUS":            re.compile(r"^<!-- DOC_STATUS: (ACTIVE_AUTHORITY|SUPERSEDED_NON_AUTHORITY) -->$"),
 "ENVELOPE_EXCLUDE_BEGIN":re.compile(r"^<!-- ENVELOPE:EXCLUDE-BEGIN -->$"),
 "ENVELOPE_EXCLUDE_END":  re.compile(r"^<!-- ENVELOPE:EXCLUDE-END -->$"),
 "SUPERSEDED_BEGIN":      re.compile(r"^<!-- SUPERSEDED_NON_AUTHORITY BEGIN(: [^\r\n]*)? -->$"),
 "SUPERSEDED_END":        re.compile(r"^<!-- SUPERSEDED_NON_AUTHORITY END -->$"),
 "AUTHORITY_BOUNDARY":    re.compile(r"^<!-- AUTHORITY_BOUNDARY[^\r\n]*-->$"),
}
def check_marker(kind, literal):
    if kind not in MARKER_KINDS: raise Reject("MARKER_KIND_UNKNOWN",kind)
    for ch in literal:
        if ord(ch) in (0x09,0x0A,0x0D,0x00): raise Reject("MARKER_LITERAL_MISMATCH","ctrl byte")
    if not MARKER_GRAMMAR[kind].match(literal):
        for k2,g in MARKER_GRAMMAR.items():
            if k2!=kind and g.match(literal):
                raise Reject("MARKER_KIND_LITERAL_INCONSISTENT",f"{kind} vs literal of {k2}")
        raise Reject("MARKER_LITERAL_NOT_ALLOWED",f"{kind}:{literal!r}")
    return (kind, literal)

# field encode (recheck-6)
GRAMMARS = {"sha256_hex":re.compile(r"^[0-9a-f]{64}$"),
 "kb_revision":re.compile(r"^([1-9][0-9]*|SELF_HOST_PIN_BY_EXCLUDE_REGION_HASH)$"),
 "doc_status":re.compile(r"^(ACTIVE_AUTHORITY|SUPERSEDED_NON_AUTHORITY)$"),
 "boolean":re.compile(r"^(true|false)$"),
 "section":re.compile(r"^(WHOLE_DOCUMENT|WHOLE_DOCUMENT_MINUS_SUPERSEDED_FENCES|WHOLE_DOCUMENT_MINUS_EXCLUDE_AND_SUPERSEDED)$")}
SENTINEL_OK = {"NOT_APPLICABLE","NON_AUTHORITY_DIAGNOSTIC","SEAL_AT_CODEX_RECHECK_8"}
def vfield(field,value,grammar=None,allow_sentinel=True):
    if value is None: raise Reject("CANONICAL_FIELD_NULL_REJECTED",field)
    if value=="": raise Reject("CANONICAL_FIELD_EMPTY_REJECTED",field)
    for ch in value:
        if ord(ch) in FORBIDDEN_BYTES: raise Reject("CANONICAL_FIELD_RESERVED_TOKEN_REJECTED",f"{field} 0x{ord(ch):02x}")
    if field!="marker_literal":
        for t in RESERVED_TOKENS:
            if t in value: raise Reject("CANONICAL_FIELD_RESERVED_TOKEN_REJECTED",f"{field} token")
    if allow_sentinel and value in SENTINEL_OK: return value
    if grammar and not GRAMMARS[grammar].match(value): raise Reject("CANONICAL_FIELD_VALUE_GRAMMAR_REJECTED",f"{field}={value!r}")
    return value
def rec(*f):
    for x in f:
        if "\t" in x or "\n" in x: raise Reject("CANONICAL_FIELD_RESERVED_TOKEN_REJECTED","sep in value")
    return ("\t".join(f)+"\n").encode()
def digest(tag,records): return sha((tag+"\n").encode()+b"".join(records))

# DAG (recheck-6 D, accepted) + self-revision audit (recheck-7 A)
EDGES={"N1":[],"N2":[],"N3":[],"N4":[],"N5":[],"N6":["N1"],
 "N7":["N2","N3","N4","N5","N6","N1"],"N8":["N2","N5","N6","N7"],"N9_DIAG":[]}
LOAD_BEARING={"N1","N2","N3","N4","N5","N6","N7","N8"}
SELF_REVISION_INPUTS=set()   # MUST stay empty: no load-bearing node consumes a self-revision
def has_cycle(e):
    c={k:0 for k in e}
    def dfs(u):
        c[u]=1
        for v in e[u]:
            if c[v]==1 or (c[v]==0 and dfs(v)): return True
        c[u]=2; return False
    return any(c[k]==0 and dfs(k) for k in e)

# TEST VECTORS
PREFIX=KB_ROOT+"t1-fix7-existing-system-refactor-execution-blueprint-2026-06-08/"
DOCS=["00-readme-first.md","01-live-existing-system-inventory.md","02-design-to-live-mapping.md",
"03-gap-classification.md","04-dependency-safe-construction-order.md","05-rollback-blueprint.md",
"06-test-guard-blueprint.md","07-implementation-package-split.md","08-hard-blocks-do-not-touch-list.md",
"12-final-verdict.md"]
MEMBERSHIP_EXPECT="f2bda8effc7be19b54722828126b82d7d2d48bee5e5e5dc0c8f347ce210fe251"

def membership():
    ids=sorted(canonical_document_id(PREFIX+d, mcp_id=PREFIX+d) for d in DOCS)
    return digest("FIX7_ACTIVE_AUTHORITY_MEMBERSHIP_V1",[rec(i) for i in ids])

def selftest():
    out=[]; ok=True
    def chk(label, cond):
        nonlocal ok; ok = ok and cond; out.append(f"  [{'PASS' if cond else 'FAIL'}] {label}")
    chk("membership == f2bda8...fe251", membership()==MEMBERSHIP_EXPECT)
    chk("DAG acyclic", not has_cycle(EDGES))
    chk("no self-revision input in load-bearing", len(SELF_REVISION_INPUTS)==0 and LOAD_BEARING_FORBIDS_SELF_REVISION)
    chk("valid doc id accepted", canonical_document_id(PREFIX+"00-readme-first.md", mcp_id=PREFIX+"00-readme-first.md")==PREFIX+"00-readme-first.md")
    chk("valid marker accepted", check_marker("DOC_STATUS","<!-- DOC_STATUS: ACTIVE_AUTHORITY -->")[0]=="DOC_STATUS")
    def expect(label,status,fn):
        nonlocal ok
        try: fn(); out.append(f"  [FAIL] {label} (not rejected)"); ok=False
        except Reject as e:
            good=e.status==status; ok=ok and good
            out.append(f"  [{'PASS' if good else 'FAIL'}] {label} -> {e.status}")
    expect("doc_id '.' segment","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"./x.md"))
    expect("doc_id '..' segment","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a/../x.md"))
    expect("doc_id '//'","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a//x.md"))
    expect("doc_id empty seg(trailing)","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"x.md/"))
    expect("doc_id backslash","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a\\x.md"))
    expect("doc_id url-encoded","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a%2e/x.md"))
    expect("doc_id homoglyph slash","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a⁄x.md"))
    expect("doc_id leading slash","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id("/"+KB_ROOT+"x.md"))
    expect("doc_id scope mismatch","DOCUMENT_ID_SCOPE_MISMATCH",lambda:canonical_document_id("other/dir/x.md"))
    expect("doc_id != mcp (case)","DOCUMENT_ID_NOT_MCP_CANONICAL",lambda:canonical_document_id(PREFIX+"00-Readme-First.md", mcp_id=PREFIX+"00-readme-first.md"))
    expect("marker unknown kind","MARKER_KIND_UNKNOWN",lambda:check_marker("FOO","<!-- DOC_STATUS: ACTIVE_AUTHORITY -->"))
    expect("marker kind/literal inconsistent","MARKER_KIND_LITERAL_INCONSISTENT",lambda:check_marker("DOC_STATUS","<!-- ENVELOPE:EXCLUDE-BEGIN -->"))
    expect("marker literal typo","MARKER_LITERAL_NOT_ALLOWED",lambda:check_marker("DOC_STATUS","<!-- DOC_STATUS: ACTIVE -->"))
    expect("field TAB rejected","CANONICAL_FIELD_RESERVED_TOKEN_REJECTED",lambda:vfield("x","a\tb"))
    expect("field null rejected","CANONICAL_FIELD_NULL_REJECTED",lambda:vfield("x",None))
    expect("field empty rejected","CANONICAL_FIELD_EMPTY_REJECTED",lambda:vfield("x",""))
    e2={k:list(v) for k,v in EDGES.items()}; e2["N8"]=e2["N8"]+["N8"]
    chk("seal self-revision/self-hash edge -> cycle detected", has_cycle(e2))
    return ok, out

if __name__=="__main__":
    ok,out=selftest()
    print("FIX7-CANON-V1 CANONICALIZER SSOT SELFTEST")
    print("\n".join(out))
    print("ALL PASS:", ok)
    sys.exit(0 if ok else 1)

Conformance evidence (this pass)

python3 fix7_canon_v1_ssot.py --selftest22/22 PASS, exit 0 (run by T1 this pass): membership reproduces f2bda8…fe251; DAG acyclic; no self-revision input; every document_id alias class rejected with the named status; marker kind/literal unknown/inconsistent/typo rejected; field TAB/null/empty rejected; a seal self-revision/self-hash edge is detected as a cycle. This is the executable proof that the contract is finitely checkable without agent improvisation (Constitution Article 14).

Back to Knowledge Hub knowledge/dev/reports/architecture/t1-fix7-existing-system-refactor-execution-blueprint-2026-06-08/canonicalizer-fix7-canon-v1-ssot.md