document_id Canonical MCP Grammar (recheck-7 blocker D)
05 - document_id Canonical MCP Grammar (recheck-7 blocker D)
Load-bearing impl: canonical_document_id() in the SSOT artifact. Load-bearing copy of the rule: doc 00
§Field rejection policy → document_id. This doc is the rationale.
The defect
The recheck-6 grammar ^knowledge/dev/reports/architecture/[A-Za-z0-9._/-]+\.md$ allowed path aliases
(., .., // all match), so two different strings could denote the same logical document → duplicate
logical records / ambiguous identity.
The fix — exact MCP canonical identity, REJECT aliases
document_id must equal the MCP-returned document_id byte-for-byte (case-sensitive; no
identity-changing normalization). Algorithm (canonical_document_id()):
- non-empty; no control byte (TAB/LF/CR/NUL) or backslash; ASCII only (rejects homoglyph slashes such as U+2044 / U+2215 and any non-ASCII);
- no
%(rejects URL-encoded aliases like%2e); no\(backslash separator); no//(empty segment); - no leading slash (ids are relative) and no trailing slash;
- split on
/; every segment matches^[A-Za-z0-9._-]+$and is not.,.., or empty; - ends
.md; starts with theknowledge/dev/reports/architecture/root (when scoped); - equals the MCP-returned id (byte-for-byte) — else not canonical.
Fail-closed statuses
DOCUMENT_ID_ALIAS_REJECTED—./../empty////leading-or-trailing slash/backslash/%-encoding/ homoglyph/non-ASCII/control byte.DOCUMENT_ID_SCOPE_MISMATCH— wrong root.DOCUMENT_ID_NOT_MCP_CANONICAL— not byte-equal to the MCP id (includes case variation).
(G-DOCUMENT-ID-CANONICAL-MCP, doc 06.)
Computed evidence (doc 07)
All ten alias classes are rejected with the named status, computed in --selftest:
. segment, .. segment, //, trailing slash, backslash, %-encoded, homoglyph slash, leading slash,
scope mismatch, and case-variation-vs-MCP. The 10 real member doc_ids are all canonical, so membership
stays f2bda8…fe251.
Why REJECT, not normalize
Normalizing a/../b → b would silently accept an alias and could change identity. Article 14 forbids any
normalization that changes identity; the canonicalizer rejects instead, so there is exactly one
accepted spelling per document.