Acceptance Test Matrix rev4 — Implementation Package DOT v0.1 (every negative test tied to enforcement layer / block point / proof-of-block evidence; offline packet; non-gating non-global denial; PG read-only deferred to export; design only, 2026-06-09)
Acceptance Test Matrix (rev4) — Implementation Package DOT v0.1
Nature: the fail-closed acceptance tests for the future offline, packet-derived, non-gating read/report-only MVP, repaired after the Codex rev3 re-seal so that every negative test is tied to a concrete enforcement layer (Codex blocker 6). Uses the rev4 verdict model (no
READ_LEVEL_ACCEPTABLE, no exit 0, non-gating, non-global denial). Counts appear only inside fixtures named as fixtures; no literal count is a production invariant. KB-first / PG-first / native-driven / local-last without faking it. Date: 2026-06-09 · Supersedes:designs/acceptance-test-matrix-implementation-package-dot-v0-1-rev3-2026-06-09.md(rev3). Retained for trace. Status:ACCEPTANCE_MATRIX_v0_1_REV4_READY_FOR_CODEX. Design only; no test is executed here; no command runs. Production mutation: NO.writes_performed: KB design docs only. Governing authority: rev4 Gap-only Scope Spec (§2 offline, §4.0 non-gating non-global denial, §4 verdicts, §11 exit, §12 sandbox capability) + MVP plan rev4 gates G1–G12.
1. Verdict legend (rev4)
Final dossier verdicts: READ_LEVEL_FAIL / BLOCKED / UNVERIFIED (READ_LEVEL_ACCEPTABLE removed). Per-claim: NO_READ_LEVEL_DEFECT_FOUND (NON_AUTHORITATIVE) / INSUFFICIENT_EVIDENCE_FOR_CLAIM / NOT_EVIDENCED_IN_ALLOWED_SURFACES / EVIDENCE_CONFLICTING / BLOCKED_BY_NO_CALL_CONTRACT / BLOCKED_BY_UNVERIFIED_SOURCE / BLOCKED_BY_UNSAFE_ACCESS. Flags: FLAG_PROSE_ONLY_PASS / FLAG_HARDCODED_DENOMINATOR / FLAG_AUTHORITY_VIOLATION / FLAG_LOCAL_FIRST_AUTHORITY / FLAG_GLOBAL_DENIAL_WORDING. article14_status ∈ {NOT_APPLICABLE_NO_EXECUTABLE_CLAIMS, NOT_PROVEN_EXECUTION_UNVERIFIED}. Every output carries decision_effect=NONE, may_gate=false; every negative verdict carries scope_of_denial. No green verdict / no exit 0 anywhere.
Enforcement layers (referenced by the negative tests):
- L1 host-sandbox — deny-by-default OS/container sandbox (no network namespace; RO input mount; WO output mount; no secret mounts; scrubbed env; seccomp execve/socket/connect/ptrace deny). The primary structural boundary.
- L2 static-build-guard — import/capability denylist + per-module
allowed_actions ⊆ {READ_PACKET_ITEM, WRITE_LOCAL_REPORT}(build-time rejection). Secondary defense-in-depth. - L3 runtime-self-check (P1) — capability-envelope + sandbox-invariant attestation; fail-closed before any read.
- L4 verdict/output-guard — non-gating,
scope_of_denial, non-global-wording lint, no-exit-0. - L5 export-step (DEFERRED, B7) — named-query-catalog review (side-effect-free),
context_pack_readonlygateway. Not part of the MVP; tested at the export contract.
Fixture discipline: every fixture is tagged FIXTURE with as_of example values; a fixture value is never a production invariant. Tests compare surface role / match key / population / provenance / separation / verdict / governed-surface / proof-of-block evidence, never literal counts.
2. Core adequacy matrix (verdict + article14 + criterion) — preserved from rev3, verdict names mapped to rev4
| # | Test | Input condition (FIXTURE) | Expected verdict + article14 | Pass/fail criterion |
|---|---|---|---|---|
| 1 | No green verdict exists | static scan of outputs/enums | n/a | PASS iff READ_LEVEL_ACCEPTABLE/exit 0 never appear; strongest result is UNVERIFIED |
| 2 | Any execution claim caps the dossier | FIXTURE: 1 execution claim + otherwise clean | UNVERIFIED + NOT_PROVEN |
PASS iff no state above UNVERIFIED reachable |
| 3 | Removed tokens absent | static scan | n/a | PASS iff READ_REPORT_PASS/positive EVIDENCE_PRESENT/EVIDENCE_SUFFICIENT_FOR_READ_LEVEL/READ_LEVEL_ACCEPTABLE never appear |
| 4 | Executable claim without governed existence evidence | FIXTURE: "canonicalizer exists/runs" + no resolvable governed item | READ_LEVEL_FAIL/UNVERIFIED + NOT_PROVEN |
PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; existence sub-verdict NOT_EVIDENCED_IN_ALLOWED_SURFACES; never "exists/ran" |
| 5 | Selftest PASS, no run ledger/exit/log | FIXTURE: "selftest 22/22 PASS" + no LOG/EXIT/RUN_LEDGER | READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; tool does NOT run selftest |
| 6 | Hash claim, no pinned hash evidence | FIXTURE: "reproduces hash <ex>" + no HASH_EVIDENCE | READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; tool does NOT recompute |
| 7 | Exit-code claim, no exit-code evidence | FIXTURE: "exit 0" + no EXIT_CODE_EVIDENCE | READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; never assume exit 0 |
| 8 | Command string, no call contract | FIXTURE: "command X ran safely" + no Call Contract | UNVERIFIED/READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff BLOCKED_BY_NO_CALL_CONTRACT + routed to Call Contract; tool makes no call |
| 9 | Collapsed denominator | FIXTURE: report cites one canonical DOT number | BLOCKED |
PASS iff blocked; "all relevant denominators distinct + provenanced; none collapsed"; no numeric minimum |
| 10 | TAC/IU chosen instead of dual-report | FIXTURE: dossier joins/chooses a corpus | BLOCKED |
PASS iff distinct surfaces + independent provenance + joined==false; no literal corpus count |
| 11 | Reconciliation: diagnostic must not override canonical | FIXTURE: name-keyed diagnostic shown as overriding canonical | READ_LEVEL_FAIL |
PASS iff canonical.match_key != diagnostic.match_key, both shown, diagnostic non-overriding; no literal 41/4 |
| 12 | Stale/unverified source as denominator | FIXTURE: actual_count external-sync / local checkout used as denominator |
UNVERIFIED |
PASS iff BLOCKED_BY_UNVERIFIED_SOURCE + held out + marked stale; never a denominator |
| 13 | Prose-only PASS | FIXTURE: prose asserts success, no evidence artifact | READ_LEVEL_FAIL |
PASS iff FLAG_PROSE_ONLY_PASS; tool never re-asserts |
| 14 | Dead-link/coverage over-claim | FIXTURE: "all references resolved" | UNVERIFIED |
PASS iff coverage==ADVISORY_UNVERIFIED; no resolver-completeness claim |
| 15 | Contract status over-claimed | FIXTURE: treats a READY_FOR_GPT_REVIEW contract as binding |
READ_LEVEL_FAIL |
PASS iff FLAG_AUTHORITY_VIOLATION (status assessed at recorded value only) |
| 16 | Evidence artifacts contradict | FIXTURE: two records, exit 0 vs exit 2 | READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff EVIDENCE_CONFLICTING; reports the conflict set, never picks |
| 17 | Claim extractor misses high-risk prose | FIXTURE: executable claim hidden in an unparsed region | UNVERIFIED |
PASS iff region in UNPARSED_REGION[] (high) → completeness UNVERIFIED → manual review |
3. Denial-authority / non-gating tests (rev4 — Codex blocker 1)
| # | Test | Enforcement layer | Attempted bypass | Block point | Expected verdict | Proof-of-block evidence | MVP/deferred |
|---|---|---|---|---|---|---|---|
| 18 | Inspector taxonomy treated as authority | L4 | downstream cites the classifier as governing truth | output/authority check | FLAG_AUTHORITY_VIOLATION ⇒ FAIL |
report shows authority_status=PROVISIONAL_NON_AUTHORITY + flag raised |
MVP |
| 19 | Positive verdict attempted without governed taxonomy source | L2/L4 | build/output emits READ_LEVEL_ACCEPTABLE/exit 0 |
build + output gate | build/output rejected | build-time rejection; no positive enum present | MVP |
| 20 | Inspector output wired as a downstream gate/block | L4 | a consumer treats FAIL/BLOCKED/exit-code as an allow/deny decision | non-gating gate (G8) | FLAG_AUTHORITY_VIOLATION ⇒ FAIL |
every output carries decision_effect=NONE,may_gate=false; gate-use needs a sealed consumer contract |
MVP |
| 21 | Negative verdict missing scope_of_denial |
L4 | emit READ_LEVEL_FAIL/BLOCKED_* without a scope |
verdict guard (F24) | CONTRACT_VIOLATION_IN_DESIGN |
verdict-schema check fails; emission refused | MVP |
| 22 | Global-denial wording | L4 | output says "the artifact does not exist" / "the claim is false" | output lint (F21) | FLAG_GLOBAL_DENIAL_WORDING ⇒ FAIL |
lint flags global-negative phrasing; correct form is NOT_EVIDENCED_IN_ALLOWED_SURFACES |
MVP |
| 23 | Non-global disclaimer present | L4 | run a report and inspect header | output gate | n/a | PASS iff the non-global denial disclaimer appears verbatim in every report | MVP |
4. Capability / bypass-path tests (rev4 — Codex blocker 6: each tied to an enforcement layer + proof-of-block)
| # | Test (attempted bypass) | Enforcement layer | Block point | Expected verdict | Proof-of-block evidence | MVP/deferred |
|---|---|---|---|---|---|---|
| 24 | Module declares a prohibited action (EXECUTE_COMMAND/INVOKE_DOT) |
L2 | static build guard (G4) | CONTRACT_VIOLATION_IN_DESIGN (build) |
build log: module rejected; allowed_actions ⊄ {READ_PACKET_ITEM,WRITE_LOCAL_REPORT} |
MVP |
| 25 | Shell/subprocess attempt (os.system/subprocess/exec*/pty) |
L1 (primary) + L2 | seccomp execve deny; import denylist |
exit 3 / build reject | seccomp EPERM on execve; build-time import rejection |
MVP |
| 26 | Dynamic import / plugin load (importlib/__import__) |
L2 | import denylist (G4) | build reject / exit 3 | build-time rejection; no dynamic-import capability | MVP |
| 27 | General network egress (any endpoint) | L1 (primary) | no network namespace; seccomp socket/connect deny |
exit 3 | socket/connect EPERM; no route exists. Process-level egress denial — NOT a gateway DB allowlist (Codex blocker 2). |
MVP |
| 28 | Credential / environment-secret access | L1 (primary) | scrubbed env + no credential mount | BLOCKED_BY_UNSAFE_ACCESS/exit 3 |
env keyset snapshot: no secret vars; FS namespace: no credential files | MVP |
| 29 | Arbitrary local file read (home/etc/project tree) | L1 (primary) | only the RO input mount is visible | BLOCKED_BY_UNSAFE_ACCESS/exit 3 |
FS namespace listing = {RO input mount, WO output mount}; open() outside ⇒ ENOENT/EACCES |
MVP |
| 30 | Direct DB driver opened (psql/asyncpg/JDBC) | L2 + L1 | import denylist (G4); no network to connect | CONTRACT_VIOLATION_IN_DESIGN (build) |
build-time rejection (no DB driver import); no network namespace | MVP |
| 31 | Raw SQL submitted to the MVP | L1/L2 | MVP has no SQL code path / no DB client | n/a (no path) | build contains no SQL string / no DB client; raw SQL is unreachable | MVP |
| 32 | SELECT side-effect function | L5 (DEFERRED) | export-step query catalog forbids function calls | export-contract review | rejected (no read-only function allowlist entry; empty today) | DEFERRED (export-step contract, B7) |
| 33 | Report write outside the output dir | L1 (primary) | only the WO output mount is writable | BLOCKED_BY_UNSAFE_ACCESS/exit 3 |
mount table: output mount is the only writable path; write elsewhere ⇒ EROFS/EACCES | MVP |
| 34 | KB write attempted by the tool | L1 + L2 | no KB write SDK; no network | BLOCKED_BY_UNSAFE_ACCESS/exit 3 |
build: no KB write SDK import; no network namespace | MVP |
| 35 | Live PG query attempted by the tool | L1 + L2 | no DB client; no network | BLOCKED_BY_UNSAFE_ACCESS/exit 3 |
build: no DB client; no network namespace | MVP |
| 36 | Exit 0 attempted | L4 | G8 exit gate | build failure | no enum/path maps any verdict to exit 0 | MVP |
| 37 | Sandbox invariants unverifiable at startup | L3 | network reachable / extra mount / env secret present | P1 self-check (F23) | CONTRACT_VIOLATION_IN_DESIGN/BLOCKED (exit 3) before any read |
capability-envelope attestation records the failed invariant |
5. Local-first / provenance tests (rev4 — KB-first/local-last)
| # | Test | Enforcement layer | Expected | Proof-of-block evidence | MVP/deferred |
|---|---|---|---|---|---|
| 38 | Local source used as authority where a KB/PG source exists | L4 (G10) | FLAG_LOCAL_FIRST_AUTHORITY ⇒ FAIL |
packet item lacks governed provenance; flag raised; CONFLICT marked, prefers KB/PG |
MVP |
| 39 | Consumed packet item with no governed source_metadata |
L4 (G10) | NOT_EVIDENCED_IN_ALLOWED_SURFACES/FAIL |
provenance check: item has no {governed_surface,…}; held out |
MVP |
| 40 | Review-ready/draft source treated as binding | L4 (G10) | FLAG_AUTHORITY_VIOLATION |
item's authority-status field shows review-ready; flagged (location-is-not-authority) | MVP |
6. FIX7 discoverability tests (rev4 — Codex Gate 5 PASS, preserved)
| # | Test | Input condition (FIXTURE) | Expected | Pass/fail criterion |
|---|---|---|---|---|
| 41 | FIX7 Recheck-8 real dossier (Fixture A) | FIXTURE A: .py SSOT declared; only a wrong-kind .md resolves; selftest/exit/hash asserted as prose |
READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff execution claims INSUFFICIENT_EVIDENCE_FOR_CLAIM, existence sub-verdict NOT_EVIDENCED_IN_ALLOWED_SURFACES (NOT "does not exist"), C4/C5 fire; no command run; no positive verdict |
| 42 | Pure discoverability (Fixture A′) | FIXTURE A′: "executable X exists"; no governed-provenance item resolves X; no prose-only PASS, no contradiction | UNVERIFIED + NOT_PROVEN |
PASS iff NOT_EVIDENCED_IN_ALLOWED_SURFACES, NOT READ_LEVEL_FAIL, report states "not adequately evidenced via allowed surfaces," not global absence |
| 43 | FIX7 resolvable-but-insufficient (Fixture C) | FIXTURE C: cited evidence resolves but is prose-only / wrong-kind / contradictory / unbound | READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff C5/C6/C7 fire; must NOT be NO_READ_LEVEL_DEFECT_FOUND / acceptable / PASS |
| 44 | FIX7 stripped (Fixture B) | FIXTURE B: success asserted, all references removed | READ_LEVEL_FAIL + NOT_PROVEN |
PASS iff C1/C2/C4/C8 = INSUFFICIENT_EVIDENCE_FOR_CLAIM + FLAG_PROSE_ONLY_PASS |
| 45 | Global-denial-wording trap (Fixture D, rev4) | FIXTURE D: missing .py phrased as "the canonicalizer does not exist" |
FLAG_GLOBAL_DENIAL_WORDING ⇒ FAIL |
PASS iff the global negative is flagged and rewritten to NOT_EVIDENCED_IN_ALLOWED_SURFACES |
7. Cross-cutting acceptance invariants (all tests)
- I1: no test path invokes a command, FS DOT, IU command, detector, shell, subprocess, dynamic import, live query, or KB write.
- I2: no test path mutates PG/Directus/registry/filesystem/
system_issues; the MVP performs no live read and holds no DB driver/credential/network. - I3: every emitted count carries a full
denominator_source_record; no bare counts; no literal count is a comparator. - I4: no positive/green verdict and no exit 0 exist; removed tokens never appear.
- I5: denominators stay separate; TAC/IU never joined.
- I6: "any doubt ⇒ FAIL/BLOCK/UNVERIFIED"; no silent acceptance; FLAG/FAIL/BLOCKED/UNVERIFIED map to exit 1/2/3, never 0.
- I7: every fixture is tagged
FIXTUREwithas_of; a fixture value is never a production invariant. - I8:
writes_performed[]enumerates every write (local output paths only; no hidden mutation; no KB write). - I9: every consumed packet item + load-bearing claim cites a governed KB/PG/native surface with an authority status; local-first authority is flagged.
- I10: the inspector's taxonomy is
PROVISIONAL_NON_AUTHORITY,decision_effect=NONE,may_gate=false; it never certifies truth, never gates, and no output claims global absence. - I11 (rev4):
capability_envelope_attestation(sandbox invariants confirmed) +export_provenanceare recorded for every run; a run with unconfirmed sandbox invariants isBLOCKEDbefore any read. - I12 (rev4): every negative verdict carries
scope_of_denial; the non-global denial disclaimer is present in every report.
8. Deferred tests (NOT in v0.1 — gated on named future contracts)
- D1 — actual command run + exit-code capture (Call Contract).
- D2 — claim bound to a real execution result / re-run determinism / global-absence proof (Call / Proof-of-run Contract).
- D3 — generic
package_manifestschema validation (lineage + Codex schema review). - D4 —
--selftest N/N+module_sha256(post-reseal build). - D5 —
audit_dead_links()→system_issues(write contract). - D6 — Directus write-path verification (DOT-control proof contract).
- D7 — OPA/Conftest/Squawk/CI/Git-hook gating (CI/policy-gate contract).
- D8 — positive/green verdict + exit 0 (sealed governed taxonomy authority).
- D9 (rev4) — the live governed export step + its named-query-catalog/driver/network-policy contract (B7); side-effect-function rejection (#32) is tested here, not in the MVP.
- D10 (rev4) — a path-scoped server-enforced KB report writer (B7); until then the MVP writes only the local output mount.
- D11 (rev4) — any downstream consumer/authority contract that would let the output gate/block/authorize anything (B7).
9. Acceptance verdict
ACCEPTANCE_MATRIX_v0_1_REV4_READY_FOR_CODEX — 45 in-scope tests with deterministic fail-closed criteria, each capability/bypass test tied to a named enforcement layer (L1–L5), block point, and proof-of-block evidence (Codex blocker 6), covering: no-green (#1/#3/#19/#36); non-gating/non-global denial (#18/#20–#23/#45); structural sandbox bypass paths (#24–#37); local-last/authority-status (#38–#40); FIX7 discoverability incl. Fixture A′/D (#41–#45); plus preserved Article-14/hardcode/fake-green coverage (#2/#4–#17). 11 deferred tests behind named future contracts (incl. the export-step side-effect-function test #32 → D9). No positive verdict, no exit 0, no literal count invariant; every output non-gating and non-global. Routed with the rev4 packet to Codex re-review.
Cross-references
- Gap-only Spec rev4 / FIX7 pilot rev4 / MVP plan rev4 / fix ledger rev4 (see those docs).
- Codex re-seal:
reviews/codex-reseal-gap-only-spec-rev3-2026-06-09.md. - Superseded rev3:
designs/acceptance-test-matrix-implementation-package-dot-v0-1-rev3-2026-06-09.md.