KB-4628

document_id Canonical MCP Grammar (recheck-7 blocker D)

3 min read Revision 1

05 - document_id Canonical MCP Grammar (recheck-7 blocker D)

Load-bearing impl: canonical_document_id() in the SSOT artifact. Load-bearing copy of the rule: doc 00 §Field rejection policy → document_id. This doc is the rationale.

The defect

The recheck-6 grammar ^knowledge/dev/reports/architecture/[A-Za-z0-9._/-]+\.md$ allowed path aliases (., .., // all match), so two different strings could denote the same logical document → duplicate logical records / ambiguous identity.

The fix — exact MCP canonical identity, REJECT aliases

document_id must equal the MCP-returned document_id byte-for-byte (case-sensitive; no identity-changing normalization). Algorithm (canonical_document_id()):

  1. non-empty; no control byte (TAB/LF/CR/NUL) or backslash; ASCII only (rejects homoglyph slashes such as U+2044 / U+2215 and any non-ASCII);
  2. no % (rejects URL-encoded aliases like %2e); no \ (backslash separator); no // (empty segment);
  3. no leading slash (ids are relative) and no trailing slash;
  4. split on /; every segment matches ^[A-Za-z0-9._-]+$ and is not ., .., or empty;
  5. ends .md; starts with the knowledge/dev/reports/architecture/ root (when scoped);
  6. equals the MCP-returned id (byte-for-byte) — else not canonical.

Fail-closed statuses

  • DOCUMENT_ID_ALIAS_REJECTED./../empty////leading-or-trailing slash/backslash/%-encoding/ homoglyph/non-ASCII/control byte.
  • DOCUMENT_ID_SCOPE_MISMATCH — wrong root.
  • DOCUMENT_ID_NOT_MCP_CANONICAL — not byte-equal to the MCP id (includes case variation).

(G-DOCUMENT-ID-CANONICAL-MCP, doc 06.)

Computed evidence (doc 07)

All ten alias classes are rejected with the named status, computed in --selftest: . segment, .. segment, //, trailing slash, backslash, %-encoded, homoglyph slash, leading slash, scope mismatch, and case-variation-vs-MCP. The 10 real member doc_ids are all canonical, so membership stays f2bda8…fe251.

Why REJECT, not normalize

Normalizing a/../bb would silently accept an alias and could change identity. Article 14 forbids any normalization that changes identity; the canonicalizer rejects instead, so there is exactly one accepted spelling per document.

Back to Knowledge Hub knowledge/dev/reports/architecture/t1-fix7-blueprint-patch-after-codex-recheck-7-constitution14-ssot-2026-06-09/05-document-id-canonical-mcp-grammar.md