KB-180A

T1 FIX7 Adversarial Review - 06 Capability Measurement (SUPERTRACK F)

3 min read Revision 1

QT001FIX7T1capabilityverifiersupertrack-f

06 — Capability Measurement / Verifier Review (SUPERTRACK F)

Source: 05-capability-measurement-verifier-spec.md.

#	Requirement	Spec answer	Verdict
F.1	capability codes specified	KEYSET_PAGINATION_REAL, CRASH_SAFE_RESUME_REAL, REPRESENTATIVE_PERFORMANCE_REAL	PASS
F.2	measurement schema	controlled VERIFIER records normalized typed measurements/artifacts	PASS (structure; column DDL → B)
F.3	verifier identity	controlled VERIFIER principal class; lifecycle cannot assert PASS	PASS
F.4	verifier run lifecycle	run records measurements; lifecycle/principal cannot self-assert PASS	PASS
F.5	behavioral test definition	KEYSET ≥3 monotonic pages / exact set / no dup/missing / no OFFSET; RESUME injected termination after ≥2 checkpoints + new session + exact resumed set	PASS (concrete, behavioral)
F.6	operational evidence	typed measurements + artifacts, content-hashed	PASS
F.7	expected output	KEYSET exact set; PERF ≤600000 ms, ≤1073741824 bytes, zero timeout/deadlock/error	PASS (exact thresholds)
F.8	failure output	fake-verified / function-exists / free-text / stale / missing fails	PASS
F.9	freshness window	KEYSET 7d, RESUME 7d, PERF 24h	PASS
F.10	Directus self-attest blocked	only controlled VERIFIER writes; lifecycle cannot assert PASS	PASS (design) — LIVE-gated until cutover
F.11	fake verified=true fails	existence/free-text/fake-verified all fail	PASS
F.12	keyset/resume/perf evidence concrete	exact workload QT001_REPRESENTATIVE_1M_V1 (1,000,000 rows / 100,000 collision candidates)	PASS

Adversarial probes

Can capability be self-attested today? Live yes (Directus has INSERT on qt001_capability_operational_evidence, proven prior turn; row count 0). Design closes via owner-only storage at cutover. Until then capability is correctly 0/3 → scale NOT_SAFE.
Is "verified" a row or a measurement? A measurement (pages observed, resumed set, elapsed ms, bytes) — not a boolean a writer can set. Strong; directly fixes the FIX4 "to_regproc existence + literal-true neg test" finding.
Is the workload representative of 100M scale? It is a fixed 1M representative corpus with 100k collision candidates and a 10-min/1-GiB bound — a deliberate, bounded proxy, not a full-set scan. Reasonable; the design does not claim 100M is itself tested, and SCALE_SAFE is a named gate (consistent with all prior memory that real scale evidence is a 2.6B prerequisite).

Verdict: `CAPABILITY_SPEC_COMPLETE`

The capability subsystem is the most concretely specified — exact codes, workload, behavioral definitions, numeric thresholds, freshness, controlled-verifier, and explicit fake-proof rejection. No fake-capability path in the design. Only the measurement/verifier-run column DDL pends publication (B).

06 — Capability Measurement / Verifier Review (SUPERTRACK F)

Adversarial probes

Verdict: CAPABILITY_SPEC_COMPLETE

Verdict: `CAPABILITY_SPEC_COMPLETE`