KB-180A
T1 FIX7 Adversarial Review - 06 Capability Measurement (SUPERTRACK F)
3 min read Revision 1
QT001FIX7T1capabilityverifiersupertrack-f
06 — Capability Measurement / Verifier Review (SUPERTRACK F)
Source: 05-capability-measurement-verifier-spec.md.
| # | Requirement | Spec answer | Verdict |
|---|---|---|---|
| F.1 | capability codes specified | KEYSET_PAGINATION_REAL, CRASH_SAFE_RESUME_REAL, REPRESENTATIVE_PERFORMANCE_REAL | PASS |
| F.2 | measurement schema | controlled VERIFIER records normalized typed measurements/artifacts | PASS (structure; column DDL → B) |
| F.3 | verifier identity | controlled VERIFIER principal class; lifecycle cannot assert PASS | PASS |
| F.4 | verifier run lifecycle | run records measurements; lifecycle/principal cannot self-assert PASS | PASS |
| F.5 | behavioral test definition | KEYSET ≥3 monotonic pages / exact set / no dup/missing / no OFFSET; RESUME injected termination after ≥2 checkpoints + new session + exact resumed set | PASS (concrete, behavioral) |
| F.6 | operational evidence | typed measurements + artifacts, content-hashed | PASS |
| F.7 | expected output | KEYSET exact set; PERF ≤600000 ms, ≤1073741824 bytes, zero timeout/deadlock/error | PASS (exact thresholds) |
| F.8 | failure output | fake-verified / function-exists / free-text / stale / missing fails | PASS |
| F.9 | freshness window | KEYSET 7d, RESUME 7d, PERF 24h | PASS |
| F.10 | Directus self-attest blocked | only controlled VERIFIER writes; lifecycle cannot assert PASS | PASS (design) — LIVE-gated until cutover |
| F.11 | fake verified=true fails | existence/free-text/fake-verified all fail | PASS |
| F.12 | keyset/resume/perf evidence concrete | exact workload QT001_REPRESENTATIVE_1M_V1 (1,000,000 rows / 100,000 collision candidates) | PASS |
Adversarial probes
- Can capability be self-attested today? Live yes (Directus has INSERT on
qt001_capability_operational_evidence, proven prior turn; row count 0). Design closes via owner-only storage at cutover. Until then capability is correctly 0/3 → scale NOT_SAFE. - Is "verified" a row or a measurement? A measurement (pages observed, resumed set, elapsed ms, bytes) — not a boolean a writer can set. Strong; directly fixes the FIX4 "to_regproc existence + literal-true neg test" finding.
- Is the workload representative of 100M scale? It is a fixed 1M representative corpus with 100k collision candidates and a 10-min/1-GiB bound — a deliberate, bounded proxy, not a full-set scan. Reasonable; the design does not claim 100M is itself tested, and
SCALE_SAFEis a named gate (consistent with all prior memory that real scale evidence is a 2.6B prerequisite).
Verdict: CAPABILITY_SPEC_COMPLETE
The capability subsystem is the most concretely specified — exact codes, workload, behavioral definitions, numeric thresholds, freshness, controlled-verifier, and explicit fake-proof rejection. No fake-capability path in the design. Only the measurement/verifier-run column DDL pends publication (B).