FIX7-CANON-V1 Canonicalizer — Single Source of Truth (executable)
FIX7-CANON-V1 Canonicalizer — Single Source of Truth (executable)
This document is the ONE load-bearing canonical contract (Constitution Article 14 / NT14). Every
other description of canonicalization in the blueprint (doc 00 §Canonical hash encoding, the extractor
and record-encoding sections, and all report docs) is NON_AUTHORITY_EXPLANATION: it explains this
artifact and MUST NOT conflict with it. If any other description conflicts, this artifact wins and
G-NO-DUPLICATE-CANONICAL-AUTHORITY fails closed. There is exactly one canonicalizer; no doc, spec,
guard, or package may redefine canonicalization.
SSOT identity (pinned in the doc 00 envelope, MANIFEST_BOUND)
| field | value |
|---|---|
canonicalizer_artifact_id |
FIX7-CANON-V1-CANONICALIZER |
canonicalizer_path |
knowledge/dev/reports/architecture/t1-fix7-existing-system-refactor-execution-blueprint-2026-06-08/canonicalizer-fix7-canon-v1-ssot.md |
canonicalizer_version |
FIX7-CANON-V1 |
canonicalizer_revision |
SEAL_AT_CODEX_RECHECK_8 (platform revision of THIS artifact, computed by Codex at the seal — diagnostic+pin; never this artifact's own future revision recorded inside itself) |
canonicalizer_sha256 |
SEAL_AT_CODEX_RECHECK_8 (SHA-256 over this artifact's full MCP bytes, CRLF/CR→LF normalized, at canonicalizer_revision; this artifact does NOT contain its own hash → no self-reference) |
| nature | executable reference code (authoritative) + frozen test vectors |
Invocation contract
- Command:
python3 canonicalizer-fix7-canon-v1-ssot.py --selftest - Inputs: (a) for
--selftest: none (vectors are embedded); (b) for production use: the raw UTF-8 bytes returned bymcp.get_document_for_rewrite(document_id)per active member, plus the explicit active-corpus membership list and the live envelope fields. - Outputs: lowercase-hex SHA-256 digests (
membership, per-docnormalized_active_content_sha256,active_corpus_sha256,marker_fence_registry_sha256,superseded_boundary_sha256,guard_set_sha256,envelope_manifest_sha256,detached_seal_sha256) and, on any violation, a single fail-closed status string (below). - Exit code:
0iff every embedded test vector passes; non-zero otherwise. A package MAY NOT proceed unless this artifact's--selftestexits 0 against the pinnedcanonicalizer_sha256.
Frozen positive test vector (behavioural pin, reproducible now)
membership over the 10 canonical full doc_ids under FIX7_ACTIVE_AUTHORITY_MEMBERSHIP_V1
(ascending, LF-joined, trailing LF) ==
f2bda8effc7be19b54722828126b82d7d2d48bee5e5e5dc0c8f347ce210fe251
(identical under shasum -a 256 and python hashlib). Any implementation that does not reproduce this
exact digest is non-conformant.
Closed failure-status set (the ONLY allowed rejections)
CANONICAL_FIELD_RESERVED_TOKEN_REJECTED, CANONICAL_FIELD_VALUE_GRAMMAR_REJECTED,
CANONICAL_FIELD_NULL_REJECTED, CANONICAL_FIELD_EMPTY_REJECTED,
DOCUMENT_ID_ALIAS_REJECTED, DOCUMENT_ID_NOT_MCP_CANONICAL, DOCUMENT_ID_SCOPE_MISMATCH,
MARKER_KIND_UNKNOWN, MARKER_LITERAL_MISMATCH, MARKER_LITERAL_NOT_ALLOWED,
MARKER_KIND_LITERAL_INCONSISTENT, ACTIVE_SCOPE_MARKER_MISSING, ACTIVE_SCOPE_MARKER_DUPLICATE,
FENCE_UNBALANCED, FENCE_NESTED_UNSUPPORTED, ACTIVE_SUPERSEDED_OVERLAP, SECTION_ID_MISMATCH,
SECTION_RANGE_MISMATCH, EXCLUDE_REGION_UNBALANCED, MARKER_REGISTRY_MISMATCH, SEAL_HASH_GRAPH_CYCLE.
Any other behaviour is a defect.
AUTHORING_REQUIREMENT (implementation must not fail)
Implementation-authoring MUST adopt exactly one canonicalizer that is byte-for-byte this artifact (or
a re-implementation proven to pass all embedded test vectors AND reproduce f2bda8…fe251), pinned by
canonicalizer_sha256. No package, guard, or doc may ship or reference a different canonicalizer. Before
PKG-A may proceed, the live canonicalizer's --selftest must exit 0 and its content hash must equal the
sealed canonicalizer_sha256 (G-CANONICALIZER-SSOT-ONLY, G-NO-DUPLICATE-CANONICAL-AUTHORITY, doc 06).
Article-14 self-reference rule (BLOCKER A, encoded)
No load-bearing digest takes, as an input, a platform-assigned revision of the artifact that carries
it (a revision exists only after the write, so embedding it is circular). Revisions are diagnostic /
post-seal audit only. The canonicalizer enforces this via LOAD_BEARING_FORBIDS_SELF_REVISION = True
and an empty SELF_REVISION_INPUTS set; adding such an edge is detected as a cycle.
Executable reference (authoritative)
#!/usr/bin/env python3
# ============================================================================
# FIX7-CANON-V1 CANONICALIZER -- SINGLE SOURCE OF TRUTH (executable)
# canonicalizer_artifact_id: FIX7-CANON-V1-CANONICALIZER
# canonicalizer_version: FIX7-CANON-V1
# This file IS the load-bearing canonical contract. Every other description
# (blueprint doc 00, report docs) is NON_AUTHORITY_EXPLANATION and must not
# conflict. Constitution Article 14: one authority of one nature.
# Invocation: python3 fix7_canon_v1_ssot.py --selftest (exit 0 == all vectors pass)
# ============================================================================
import hashlib, re, sys
def sha(b: bytes) -> str: return hashlib.sha256(b).hexdigest()
# BLOCKER A: the canonical contract NEVER takes, as a load-bearing input, a
# platform-assigned revision of the artifact that carries it. Revisions are diagnostic only.
LOAD_BEARING_FORBIDS_SELF_REVISION = True
# field rejection (recheck-6 A)
FORBIDDEN_BYTES = {0x09,0x0A,0x0D,0x00,0x5C} # TAB LF CR NUL backslash
RESERVED_TOKENS = ["<!-- ENVELOPE:EXCLUDE-BEGIN -->","<!-- ENVELOPE:EXCLUDE-END -->",
"<!-- SUPERSEDED_NON_AUTHORITY BEGIN","<!-- SUPERSEDED_NON_AUTHORITY END -->",
"FIX7_ACTIVE_AUTHORITY_MEMBERSHIP_V1","FIX7_ACTIVE_AUTHORITY_CORPUS_V1","FIX7_MARKER_FENCE_REGISTRY_V1",
"FIX7_SUPERSEDED_BOUNDARY_V1","FIX7_GUARD_SET_V1","FIX7_DOC_NORMALIZED_CONTENT_V1",
"FIX7_ACTIVE_AUTHORITY_ENVELOPE_MANIFEST_V1","FIX7_CODEX_DETACHED_SEAL_V1"]
class Reject(Exception):
def __init__(s,st,d=""): super().__init__(f"{st}: {d}"); s.status=st
# BLOCKER D: canonical document_id == exact MCP id, no alias
KB_ROOT = "knowledge/dev/reports/architecture/"
_SEG = re.compile(r"^[A-Za-z0-9._-]+$") # ASCII only -> rejects backslash, %xx, homoglyph slash
def canonical_document_id(value, mcp_id=None, require_root=True):
if value is None or value == "": raise Reject("DOCUMENT_ID_ALIAS_REJECTED","empty")
for ch in value:
if ord(ch) in FORBIDDEN_BYTES: raise Reject("DOCUMENT_ID_ALIAS_REJECTED",f"ctrl/backslash 0x{ord(ch):02x}")
if ord(ch) > 0x7F: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","non-ASCII (homoglyph?)")
if "%" in value: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","url-encoded")
if "\\" in value: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","backslash")
if "//" in value: raise Reject("DOCUMENT_ID_ALIAS_REJECTED","empty segment //")
if value.startswith("/"): raise Reject("DOCUMENT_ID_ALIAS_REJECTED","leading slash (ids are relative)")
if value.endswith("/"): raise Reject("DOCUMENT_ID_ALIAS_REJECTED","trailing slash")
segs = value.split("/")
for s in segs:
if s in (".",".."): raise Reject("DOCUMENT_ID_ALIAS_REJECTED",f"dot segment {s!r}")
if s == "": raise Reject("DOCUMENT_ID_ALIAS_REJECTED","empty segment")
if not _SEG.match(s): raise Reject("DOCUMENT_ID_ALIAS_REJECTED",f"bad segment {s!r}")
if not value.endswith(".md"): raise Reject("DOCUMENT_ID_ALIAS_REJECTED","not .md")
if require_root and not value.startswith(KB_ROOT): raise Reject("DOCUMENT_ID_SCOPE_MISMATCH",value)
if mcp_id is not None and value != mcp_id: # byte-for-byte equality, case-sensitive
raise Reject("DOCUMENT_ID_NOT_MCP_CANONICAL",f"{value!r} != mcp {mcp_id!r}")
return value
# BLOCKER E: marker_kind <-> marker_literal closed contract
MARKER_KINDS = {"DOC_STATUS","SUPERSEDED_BEGIN","SUPERSEDED_END",
"ENVELOPE_EXCLUDE_BEGIN","ENVELOPE_EXCLUDE_END","AUTHORITY_BOUNDARY"}
MARKER_GRAMMAR = {
"DOC_STATUS": re.compile(r"^<!-- DOC_STATUS: (ACTIVE_AUTHORITY|SUPERSEDED_NON_AUTHORITY) -->$"),
"ENVELOPE_EXCLUDE_BEGIN":re.compile(r"^<!-- ENVELOPE:EXCLUDE-BEGIN -->$"),
"ENVELOPE_EXCLUDE_END": re.compile(r"^<!-- ENVELOPE:EXCLUDE-END -->$"),
"SUPERSEDED_BEGIN": re.compile(r"^<!-- SUPERSEDED_NON_AUTHORITY BEGIN(: [^\r\n]*)? -->$"),
"SUPERSEDED_END": re.compile(r"^<!-- SUPERSEDED_NON_AUTHORITY END -->$"),
"AUTHORITY_BOUNDARY": re.compile(r"^<!-- AUTHORITY_BOUNDARY[^\r\n]*-->$"),
}
def check_marker(kind, literal):
if kind not in MARKER_KINDS: raise Reject("MARKER_KIND_UNKNOWN",kind)
for ch in literal:
if ord(ch) in (0x09,0x0A,0x0D,0x00): raise Reject("MARKER_LITERAL_MISMATCH","ctrl byte")
if not MARKER_GRAMMAR[kind].match(literal):
for k2,g in MARKER_GRAMMAR.items():
if k2!=kind and g.match(literal):
raise Reject("MARKER_KIND_LITERAL_INCONSISTENT",f"{kind} vs literal of {k2}")
raise Reject("MARKER_LITERAL_NOT_ALLOWED",f"{kind}:{literal!r}")
return (kind, literal)
# field encode (recheck-6)
GRAMMARS = {"sha256_hex":re.compile(r"^[0-9a-f]{64}$"),
"kb_revision":re.compile(r"^([1-9][0-9]*|SELF_HOST_PIN_BY_EXCLUDE_REGION_HASH)$"),
"doc_status":re.compile(r"^(ACTIVE_AUTHORITY|SUPERSEDED_NON_AUTHORITY)$"),
"boolean":re.compile(r"^(true|false)$"),
"section":re.compile(r"^(WHOLE_DOCUMENT|WHOLE_DOCUMENT_MINUS_SUPERSEDED_FENCES|WHOLE_DOCUMENT_MINUS_EXCLUDE_AND_SUPERSEDED)$")}
SENTINEL_OK = {"NOT_APPLICABLE","NON_AUTHORITY_DIAGNOSTIC","SEAL_AT_CODEX_RECHECK_8"}
def vfield(field,value,grammar=None,allow_sentinel=True):
if value is None: raise Reject("CANONICAL_FIELD_NULL_REJECTED",field)
if value=="": raise Reject("CANONICAL_FIELD_EMPTY_REJECTED",field)
for ch in value:
if ord(ch) in FORBIDDEN_BYTES: raise Reject("CANONICAL_FIELD_RESERVED_TOKEN_REJECTED",f"{field} 0x{ord(ch):02x}")
if field!="marker_literal":
for t in RESERVED_TOKENS:
if t in value: raise Reject("CANONICAL_FIELD_RESERVED_TOKEN_REJECTED",f"{field} token")
if allow_sentinel and value in SENTINEL_OK: return value
if grammar and not GRAMMARS[grammar].match(value): raise Reject("CANONICAL_FIELD_VALUE_GRAMMAR_REJECTED",f"{field}={value!r}")
return value
def rec(*f):
for x in f:
if "\t" in x or "\n" in x: raise Reject("CANONICAL_FIELD_RESERVED_TOKEN_REJECTED","sep in value")
return ("\t".join(f)+"\n").encode()
def digest(tag,records): return sha((tag+"\n").encode()+b"".join(records))
# DAG (recheck-6 D, accepted) + self-revision audit (recheck-7 A)
EDGES={"N1":[],"N2":[],"N3":[],"N4":[],"N5":[],"N6":["N1"],
"N7":["N2","N3","N4","N5","N6","N1"],"N8":["N2","N5","N6","N7"],"N9_DIAG":[]}
LOAD_BEARING={"N1","N2","N3","N4","N5","N6","N7","N8"}
SELF_REVISION_INPUTS=set() # MUST stay empty: no load-bearing node consumes a self-revision
def has_cycle(e):
c={k:0 for k in e}
def dfs(u):
c[u]=1
for v in e[u]:
if c[v]==1 or (c[v]==0 and dfs(v)): return True
c[u]=2; return False
return any(c[k]==0 and dfs(k) for k in e)
# TEST VECTORS
PREFIX=KB_ROOT+"t1-fix7-existing-system-refactor-execution-blueprint-2026-06-08/"
DOCS=["00-readme-first.md","01-live-existing-system-inventory.md","02-design-to-live-mapping.md",
"03-gap-classification.md","04-dependency-safe-construction-order.md","05-rollback-blueprint.md",
"06-test-guard-blueprint.md","07-implementation-package-split.md","08-hard-blocks-do-not-touch-list.md",
"12-final-verdict.md"]
MEMBERSHIP_EXPECT="f2bda8effc7be19b54722828126b82d7d2d48bee5e5e5dc0c8f347ce210fe251"
def membership():
ids=sorted(canonical_document_id(PREFIX+d, mcp_id=PREFIX+d) for d in DOCS)
return digest("FIX7_ACTIVE_AUTHORITY_MEMBERSHIP_V1",[rec(i) for i in ids])
def selftest():
out=[]; ok=True
def chk(label, cond):
nonlocal ok; ok = ok and cond; out.append(f" [{'PASS' if cond else 'FAIL'}] {label}")
chk("membership == f2bda8...fe251", membership()==MEMBERSHIP_EXPECT)
chk("DAG acyclic", not has_cycle(EDGES))
chk("no self-revision input in load-bearing", len(SELF_REVISION_INPUTS)==0 and LOAD_BEARING_FORBIDS_SELF_REVISION)
chk("valid doc id accepted", canonical_document_id(PREFIX+"00-readme-first.md", mcp_id=PREFIX+"00-readme-first.md")==PREFIX+"00-readme-first.md")
chk("valid marker accepted", check_marker("DOC_STATUS","<!-- DOC_STATUS: ACTIVE_AUTHORITY -->")[0]=="DOC_STATUS")
def expect(label,status,fn):
nonlocal ok
try: fn(); out.append(f" [FAIL] {label} (not rejected)"); ok=False
except Reject as e:
good=e.status==status; ok=ok and good
out.append(f" [{'PASS' if good else 'FAIL'}] {label} -> {e.status}")
expect("doc_id '.' segment","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"./x.md"))
expect("doc_id '..' segment","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a/../x.md"))
expect("doc_id '//'","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a//x.md"))
expect("doc_id empty seg(trailing)","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"x.md/"))
expect("doc_id backslash","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a\\x.md"))
expect("doc_id url-encoded","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a%2e/x.md"))
expect("doc_id homoglyph slash","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id(KB_ROOT+"a⁄x.md"))
expect("doc_id leading slash","DOCUMENT_ID_ALIAS_REJECTED",lambda:canonical_document_id("/"+KB_ROOT+"x.md"))
expect("doc_id scope mismatch","DOCUMENT_ID_SCOPE_MISMATCH",lambda:canonical_document_id("other/dir/x.md"))
expect("doc_id != mcp (case)","DOCUMENT_ID_NOT_MCP_CANONICAL",lambda:canonical_document_id(PREFIX+"00-Readme-First.md", mcp_id=PREFIX+"00-readme-first.md"))
expect("marker unknown kind","MARKER_KIND_UNKNOWN",lambda:check_marker("FOO","<!-- DOC_STATUS: ACTIVE_AUTHORITY -->"))
expect("marker kind/literal inconsistent","MARKER_KIND_LITERAL_INCONSISTENT",lambda:check_marker("DOC_STATUS","<!-- ENVELOPE:EXCLUDE-BEGIN -->"))
expect("marker literal typo","MARKER_LITERAL_NOT_ALLOWED",lambda:check_marker("DOC_STATUS","<!-- DOC_STATUS: ACTIVE -->"))
expect("field TAB rejected","CANONICAL_FIELD_RESERVED_TOKEN_REJECTED",lambda:vfield("x","a\tb"))
expect("field null rejected","CANONICAL_FIELD_NULL_REJECTED",lambda:vfield("x",None))
expect("field empty rejected","CANONICAL_FIELD_EMPTY_REJECTED",lambda:vfield("x",""))
e2={k:list(v) for k,v in EDGES.items()}; e2["N8"]=e2["N8"]+["N8"]
chk("seal self-revision/self-hash edge -> cycle detected", has_cycle(e2))
return ok, out
if __name__=="__main__":
ok,out=selftest()
print("FIX7-CANON-V1 CANONICALIZER SSOT SELFTEST")
print("\n".join(out))
print("ALL PASS:", ok)
sys.exit(0 if ok else 1)
Conformance evidence (this pass)
python3 fix7_canon_v1_ssot.py --selftest → 22/22 PASS, exit 0 (run by T1 this pass): membership
reproduces f2bda8…fe251; DAG acyclic; no self-revision input; every document_id alias class rejected
with the named status; marker kind/literal unknown/inconsistent/typo rejected; field TAB/null/empty
rejected; a seal self-revision/self-hash edge is detected as a cycle. This is the executable proof that
the contract is finitely checkable without agent improvisation (Constitution Article 14).