KB-4722

Acceptance Test Matrix rev4 — Implementation Package DOT v0.1 (every negative test tied to enforcement layer / block point / proof-of-block evidence; offline packet; non-gating non-global denial; PG read-only deferred to export; design only, 2026-06-09)

18 min read Revision 1
tool-kiem-thuimplementation-package-dotacceptance-test-matrixrev4article-14no-fake-greenenforcement-bound-testsoffline-packetnon-gatingdeny-by-default-sandboxdesign-only2026-06-09

Acceptance Test Matrix (rev4) — Implementation Package DOT v0.1

Nature: the fail-closed acceptance tests for the future offline, packet-derived, non-gating read/report-only MVP, repaired after the Codex rev3 re-seal so that every negative test is tied to a concrete enforcement layer (Codex blocker 6). Uses the rev4 verdict model (no READ_LEVEL_ACCEPTABLE, no exit 0, non-gating, non-global denial). Counts appear only inside fixtures named as fixtures; no literal count is a production invariant. KB-first / PG-first / native-driven / local-last without faking it. Date: 2026-06-09 · Supersedes: designs/acceptance-test-matrix-implementation-package-dot-v0-1-rev3-2026-06-09.md (rev3). Retained for trace. Status: ACCEPTANCE_MATRIX_v0_1_REV4_READY_FOR_CODEX. Design only; no test is executed here; no command runs. Production mutation: NO. writes_performed: KB design docs only. Governing authority: rev4 Gap-only Scope Spec (§2 offline, §4.0 non-gating non-global denial, §4 verdicts, §11 exit, §12 sandbox capability) + MVP plan rev4 gates G1–G12.

1. Verdict legend (rev4)

Final dossier verdicts: READ_LEVEL_FAIL / BLOCKED / UNVERIFIED (READ_LEVEL_ACCEPTABLE removed). Per-claim: NO_READ_LEVEL_DEFECT_FOUND (NON_AUTHORITATIVE) / INSUFFICIENT_EVIDENCE_FOR_CLAIM / NOT_EVIDENCED_IN_ALLOWED_SURFACES / EVIDENCE_CONFLICTING / BLOCKED_BY_NO_CALL_CONTRACT / BLOCKED_BY_UNVERIFIED_SOURCE / BLOCKED_BY_UNSAFE_ACCESS. Flags: FLAG_PROSE_ONLY_PASS / FLAG_HARDCODED_DENOMINATOR / FLAG_AUTHORITY_VIOLATION / FLAG_LOCAL_FIRST_AUTHORITY / FLAG_GLOBAL_DENIAL_WORDING. article14_status ∈ {NOT_APPLICABLE_NO_EXECUTABLE_CLAIMS, NOT_PROVEN_EXECUTION_UNVERIFIED}. Every output carries decision_effect=NONE, may_gate=false; every negative verdict carries scope_of_denial. No green verdict / no exit 0 anywhere.

Enforcement layers (referenced by the negative tests):

  • L1 host-sandbox — deny-by-default OS/container sandbox (no network namespace; RO input mount; WO output mount; no secret mounts; scrubbed env; seccomp execve/socket/connect/ptrace deny). The primary structural boundary.
  • L2 static-build-guard — import/capability denylist + per-module allowed_actions ⊆ {READ_PACKET_ITEM, WRITE_LOCAL_REPORT} (build-time rejection). Secondary defense-in-depth.
  • L3 runtime-self-check (P1) — capability-envelope + sandbox-invariant attestation; fail-closed before any read.
  • L4 verdict/output-guard — non-gating, scope_of_denial, non-global-wording lint, no-exit-0.
  • L5 export-step (DEFERRED, B7) — named-query-catalog review (side-effect-free), context_pack_readonly gateway. Not part of the MVP; tested at the export contract.

Fixture discipline: every fixture is tagged FIXTURE with as_of example values; a fixture value is never a production invariant. Tests compare surface role / match key / population / provenance / separation / verdict / governed-surface / proof-of-block evidence, never literal counts.

2. Core adequacy matrix (verdict + article14 + criterion) — preserved from rev3, verdict names mapped to rev4

# Test Input condition (FIXTURE) Expected verdict + article14 Pass/fail criterion
1 No green verdict exists static scan of outputs/enums n/a PASS iff READ_LEVEL_ACCEPTABLE/exit 0 never appear; strongest result is UNVERIFIED
2 Any execution claim caps the dossier FIXTURE: 1 execution claim + otherwise clean UNVERIFIED + NOT_PROVEN PASS iff no state above UNVERIFIED reachable
3 Removed tokens absent static scan n/a PASS iff READ_REPORT_PASS/positive EVIDENCE_PRESENT/EVIDENCE_SUFFICIENT_FOR_READ_LEVEL/READ_LEVEL_ACCEPTABLE never appear
4 Executable claim without governed existence evidence FIXTURE: "canonicalizer exists/runs" + no resolvable governed item READ_LEVEL_FAIL/UNVERIFIED + NOT_PROVEN PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; existence sub-verdict NOT_EVIDENCED_IN_ALLOWED_SURFACES; never "exists/ran"
5 Selftest PASS, no run ledger/exit/log FIXTURE: "selftest 22/22 PASS" + no LOG/EXIT/RUN_LEDGER READ_LEVEL_FAIL + NOT_PROVEN PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; tool does NOT run selftest
6 Hash claim, no pinned hash evidence FIXTURE: "reproduces hash <ex>" + no HASH_EVIDENCE READ_LEVEL_FAIL + NOT_PROVEN PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; tool does NOT recompute
7 Exit-code claim, no exit-code evidence FIXTURE: "exit 0" + no EXIT_CODE_EVIDENCE READ_LEVEL_FAIL + NOT_PROVEN PASS iff INSUFFICIENT_EVIDENCE_FOR_CLAIM; never assume exit 0
8 Command string, no call contract FIXTURE: "command X ran safely" + no Call Contract UNVERIFIED/READ_LEVEL_FAIL + NOT_PROVEN PASS iff BLOCKED_BY_NO_CALL_CONTRACT + routed to Call Contract; tool makes no call
9 Collapsed denominator FIXTURE: report cites one canonical DOT number BLOCKED PASS iff blocked; "all relevant denominators distinct + provenanced; none collapsed"; no numeric minimum
10 TAC/IU chosen instead of dual-report FIXTURE: dossier joins/chooses a corpus BLOCKED PASS iff distinct surfaces + independent provenance + joined==false; no literal corpus count
11 Reconciliation: diagnostic must not override canonical FIXTURE: name-keyed diagnostic shown as overriding canonical READ_LEVEL_FAIL PASS iff canonical.match_key != diagnostic.match_key, both shown, diagnostic non-overriding; no literal 41/4
12 Stale/unverified source as denominator FIXTURE: actual_count external-sync / local checkout used as denominator UNVERIFIED PASS iff BLOCKED_BY_UNVERIFIED_SOURCE + held out + marked stale; never a denominator
13 Prose-only PASS FIXTURE: prose asserts success, no evidence artifact READ_LEVEL_FAIL PASS iff FLAG_PROSE_ONLY_PASS; tool never re-asserts
14 Dead-link/coverage over-claim FIXTURE: "all references resolved" UNVERIFIED PASS iff coverage==ADVISORY_UNVERIFIED; no resolver-completeness claim
15 Contract status over-claimed FIXTURE: treats a READY_FOR_GPT_REVIEW contract as binding READ_LEVEL_FAIL PASS iff FLAG_AUTHORITY_VIOLATION (status assessed at recorded value only)
16 Evidence artifacts contradict FIXTURE: two records, exit 0 vs exit 2 READ_LEVEL_FAIL + NOT_PROVEN PASS iff EVIDENCE_CONFLICTING; reports the conflict set, never picks
17 Claim extractor misses high-risk prose FIXTURE: executable claim hidden in an unparsed region UNVERIFIED PASS iff region in UNPARSED_REGION[] (high) → completeness UNVERIFIED → manual review

3. Denial-authority / non-gating tests (rev4 — Codex blocker 1)

# Test Enforcement layer Attempted bypass Block point Expected verdict Proof-of-block evidence MVP/deferred
18 Inspector taxonomy treated as authority L4 downstream cites the classifier as governing truth output/authority check FLAG_AUTHORITY_VIOLATION ⇒ FAIL report shows authority_status=PROVISIONAL_NON_AUTHORITY + flag raised MVP
19 Positive verdict attempted without governed taxonomy source L2/L4 build/output emits READ_LEVEL_ACCEPTABLE/exit 0 build + output gate build/output rejected build-time rejection; no positive enum present MVP
20 Inspector output wired as a downstream gate/block L4 a consumer treats FAIL/BLOCKED/exit-code as an allow/deny decision non-gating gate (G8) FLAG_AUTHORITY_VIOLATION ⇒ FAIL every output carries decision_effect=NONE,may_gate=false; gate-use needs a sealed consumer contract MVP
21 Negative verdict missing scope_of_denial L4 emit READ_LEVEL_FAIL/BLOCKED_* without a scope verdict guard (F24) CONTRACT_VIOLATION_IN_DESIGN verdict-schema check fails; emission refused MVP
22 Global-denial wording L4 output says "the artifact does not exist" / "the claim is false" output lint (F21) FLAG_GLOBAL_DENIAL_WORDING ⇒ FAIL lint flags global-negative phrasing; correct form is NOT_EVIDENCED_IN_ALLOWED_SURFACES MVP
23 Non-global disclaimer present L4 run a report and inspect header output gate n/a PASS iff the non-global denial disclaimer appears verbatim in every report MVP

4. Capability / bypass-path tests (rev4 — Codex blocker 6: each tied to an enforcement layer + proof-of-block)

# Test (attempted bypass) Enforcement layer Block point Expected verdict Proof-of-block evidence MVP/deferred
24 Module declares a prohibited action (EXECUTE_COMMAND/INVOKE_DOT) L2 static build guard (G4) CONTRACT_VIOLATION_IN_DESIGN (build) build log: module rejected; allowed_actions ⊄ {READ_PACKET_ITEM,WRITE_LOCAL_REPORT} MVP
25 Shell/subprocess attempt (os.system/subprocess/exec*/pty) L1 (primary) + L2 seccomp execve deny; import denylist exit 3 / build reject seccomp EPERM on execve; build-time import rejection MVP
26 Dynamic import / plugin load (importlib/__import__) L2 import denylist (G4) build reject / exit 3 build-time rejection; no dynamic-import capability MVP
27 General network egress (any endpoint) L1 (primary) no network namespace; seccomp socket/connect deny exit 3 socket/connect EPERM; no route exists. Process-level egress denial — NOT a gateway DB allowlist (Codex blocker 2). MVP
28 Credential / environment-secret access L1 (primary) scrubbed env + no credential mount BLOCKED_BY_UNSAFE_ACCESS/exit 3 env keyset snapshot: no secret vars; FS namespace: no credential files MVP
29 Arbitrary local file read (home/etc/project tree) L1 (primary) only the RO input mount is visible BLOCKED_BY_UNSAFE_ACCESS/exit 3 FS namespace listing = {RO input mount, WO output mount}; open() outside ⇒ ENOENT/EACCES MVP
30 Direct DB driver opened (psql/asyncpg/JDBC) L2 + L1 import denylist (G4); no network to connect CONTRACT_VIOLATION_IN_DESIGN (build) build-time rejection (no DB driver import); no network namespace MVP
31 Raw SQL submitted to the MVP L1/L2 MVP has no SQL code path / no DB client n/a (no path) build contains no SQL string / no DB client; raw SQL is unreachable MVP
32 SELECT side-effect function L5 (DEFERRED) export-step query catalog forbids function calls export-contract review rejected (no read-only function allowlist entry; empty today) DEFERRED (export-step contract, B7)
33 Report write outside the output dir L1 (primary) only the WO output mount is writable BLOCKED_BY_UNSAFE_ACCESS/exit 3 mount table: output mount is the only writable path; write elsewhere ⇒ EROFS/EACCES MVP
34 KB write attempted by the tool L1 + L2 no KB write SDK; no network BLOCKED_BY_UNSAFE_ACCESS/exit 3 build: no KB write SDK import; no network namespace MVP
35 Live PG query attempted by the tool L1 + L2 no DB client; no network BLOCKED_BY_UNSAFE_ACCESS/exit 3 build: no DB client; no network namespace MVP
36 Exit 0 attempted L4 G8 exit gate build failure no enum/path maps any verdict to exit 0 MVP
37 Sandbox invariants unverifiable at startup L3 network reachable / extra mount / env secret present P1 self-check (F23) CONTRACT_VIOLATION_IN_DESIGN/BLOCKED (exit 3) before any read capability-envelope attestation records the failed invariant

5. Local-first / provenance tests (rev4 — KB-first/local-last)

# Test Enforcement layer Expected Proof-of-block evidence MVP/deferred
38 Local source used as authority where a KB/PG source exists L4 (G10) FLAG_LOCAL_FIRST_AUTHORITY ⇒ FAIL packet item lacks governed provenance; flag raised; CONFLICT marked, prefers KB/PG MVP
39 Consumed packet item with no governed source_metadata L4 (G10) NOT_EVIDENCED_IN_ALLOWED_SURFACES/FAIL provenance check: item has no {governed_surface,…}; held out MVP
40 Review-ready/draft source treated as binding L4 (G10) FLAG_AUTHORITY_VIOLATION item's authority-status field shows review-ready; flagged (location-is-not-authority) MVP

6. FIX7 discoverability tests (rev4 — Codex Gate 5 PASS, preserved)

# Test Input condition (FIXTURE) Expected Pass/fail criterion
41 FIX7 Recheck-8 real dossier (Fixture A) FIXTURE A: .py SSOT declared; only a wrong-kind .md resolves; selftest/exit/hash asserted as prose READ_LEVEL_FAIL + NOT_PROVEN PASS iff execution claims INSUFFICIENT_EVIDENCE_FOR_CLAIM, existence sub-verdict NOT_EVIDENCED_IN_ALLOWED_SURFACES (NOT "does not exist"), C4/C5 fire; no command run; no positive verdict
42 Pure discoverability (Fixture A′) FIXTURE A′: "executable X exists"; no governed-provenance item resolves X; no prose-only PASS, no contradiction UNVERIFIED + NOT_PROVEN PASS iff NOT_EVIDENCED_IN_ALLOWED_SURFACES, NOT READ_LEVEL_FAIL, report states "not adequately evidenced via allowed surfaces," not global absence
43 FIX7 resolvable-but-insufficient (Fixture C) FIXTURE C: cited evidence resolves but is prose-only / wrong-kind / contradictory / unbound READ_LEVEL_FAIL + NOT_PROVEN PASS iff C5/C6/C7 fire; must NOT be NO_READ_LEVEL_DEFECT_FOUND / acceptable / PASS
44 FIX7 stripped (Fixture B) FIXTURE B: success asserted, all references removed READ_LEVEL_FAIL + NOT_PROVEN PASS iff C1/C2/C4/C8 = INSUFFICIENT_EVIDENCE_FOR_CLAIM + FLAG_PROSE_ONLY_PASS
45 Global-denial-wording trap (Fixture D, rev4) FIXTURE D: missing .py phrased as "the canonicalizer does not exist" FLAG_GLOBAL_DENIAL_WORDING ⇒ FAIL PASS iff the global negative is flagged and rewritten to NOT_EVIDENCED_IN_ALLOWED_SURFACES

7. Cross-cutting acceptance invariants (all tests)

  • I1: no test path invokes a command, FS DOT, IU command, detector, shell, subprocess, dynamic import, live query, or KB write.
  • I2: no test path mutates PG/Directus/registry/filesystem/system_issues; the MVP performs no live read and holds no DB driver/credential/network.
  • I3: every emitted count carries a full denominator_source_record; no bare counts; no literal count is a comparator.
  • I4: no positive/green verdict and no exit 0 exist; removed tokens never appear.
  • I5: denominators stay separate; TAC/IU never joined.
  • I6: "any doubt ⇒ FAIL/BLOCK/UNVERIFIED"; no silent acceptance; FLAG/FAIL/BLOCKED/UNVERIFIED map to exit 1/2/3, never 0.
  • I7: every fixture is tagged FIXTURE with as_of; a fixture value is never a production invariant.
  • I8: writes_performed[] enumerates every write (local output paths only; no hidden mutation; no KB write).
  • I9: every consumed packet item + load-bearing claim cites a governed KB/PG/native surface with an authority status; local-first authority is flagged.
  • I10: the inspector's taxonomy is PROVISIONAL_NON_AUTHORITY, decision_effect=NONE, may_gate=false; it never certifies truth, never gates, and no output claims global absence.
  • I11 (rev4): capability_envelope_attestation (sandbox invariants confirmed) + export_provenance are recorded for every run; a run with unconfirmed sandbox invariants is BLOCKED before any read.
  • I12 (rev4): every negative verdict carries scope_of_denial; the non-global denial disclaimer is present in every report.

8. Deferred tests (NOT in v0.1 — gated on named future contracts)

  • D1 — actual command run + exit-code capture (Call Contract).
  • D2 — claim bound to a real execution result / re-run determinism / global-absence proof (Call / Proof-of-run Contract).
  • D3 — generic package_manifest schema validation (lineage + Codex schema review).
  • D4 — --selftest N/N + module_sha256 (post-reseal build).
  • D5 — audit_dead_links()system_issues (write contract).
  • D6 — Directus write-path verification (DOT-control proof contract).
  • D7 — OPA/Conftest/Squawk/CI/Git-hook gating (CI/policy-gate contract).
  • D8 — positive/green verdict + exit 0 (sealed governed taxonomy authority).
  • D9 (rev4) — the live governed export step + its named-query-catalog/driver/network-policy contract (B7); side-effect-function rejection (#32) is tested here, not in the MVP.
  • D10 (rev4) — a path-scoped server-enforced KB report writer (B7); until then the MVP writes only the local output mount.
  • D11 (rev4) — any downstream consumer/authority contract that would let the output gate/block/authorize anything (B7).

9. Acceptance verdict

ACCEPTANCE_MATRIX_v0_1_REV4_READY_FOR_CODEX45 in-scope tests with deterministic fail-closed criteria, each capability/bypass test tied to a named enforcement layer (L1–L5), block point, and proof-of-block evidence (Codex blocker 6), covering: no-green (#1/#3/#19/#36); non-gating/non-global denial (#18/#20–#23/#45); structural sandbox bypass paths (#24–#37); local-last/authority-status (#38–#40); FIX7 discoverability incl. Fixture A′/D (#41–#45); plus preserved Article-14/hardcode/fake-green coverage (#2/#4–#17). 11 deferred tests behind named future contracts (incl. the export-step side-effect-function test #32 → D9). No positive verdict, no exit 0, no literal count invariant; every output non-gating and non-global. Routed with the rev4 packet to Codex re-review.

Cross-references

  • Gap-only Spec rev4 / FIX7 pilot rev4 / MVP plan rev4 / fix ledger rev4 (see those docs).
  • Codex re-seal: reviews/codex-reseal-gap-only-spec-rev3-2026-06-09.md.
  • Superseded rev3: designs/acceptance-test-matrix-implementation-package-dot-v0-1-rev3-2026-06-09.md.
Back to Knowledge Hub knowledge/dev/laws/tool-kiem-thu/designs/acceptance-test-matrix-implementation-package-dot-v0-1-rev4-2026-06-09.md