KB-180A

T1 FIX7 Adversarial Review - 06 Capability Measurement (SUPERTRACK F)

3 min read Revision 1
QT001FIX7T1capabilityverifiersupertrack-f

06 — Capability Measurement / Verifier Review (SUPERTRACK F)

Source: 05-capability-measurement-verifier-spec.md.

# Requirement Spec answer Verdict
F.1 capability codes specified KEYSET_PAGINATION_REAL, CRASH_SAFE_RESUME_REAL, REPRESENTATIVE_PERFORMANCE_REAL PASS
F.2 measurement schema controlled VERIFIER records normalized typed measurements/artifacts PASS (structure; column DDL → B)
F.3 verifier identity controlled VERIFIER principal class; lifecycle cannot assert PASS PASS
F.4 verifier run lifecycle run records measurements; lifecycle/principal cannot self-assert PASS PASS
F.5 behavioral test definition KEYSET ≥3 monotonic pages / exact set / no dup/missing / no OFFSET; RESUME injected termination after ≥2 checkpoints + new session + exact resumed set PASS (concrete, behavioral)
F.6 operational evidence typed measurements + artifacts, content-hashed PASS
F.7 expected output KEYSET exact set; PERF ≤600000 ms, ≤1073741824 bytes, zero timeout/deadlock/error PASS (exact thresholds)
F.8 failure output fake-verified / function-exists / free-text / stale / missing fails PASS
F.9 freshness window KEYSET 7d, RESUME 7d, PERF 24h PASS
F.10 Directus self-attest blocked only controlled VERIFIER writes; lifecycle cannot assert PASS PASS (design) — LIVE-gated until cutover
F.11 fake verified=true fails existence/free-text/fake-verified all fail PASS
F.12 keyset/resume/perf evidence concrete exact workload QT001_REPRESENTATIVE_1M_V1 (1,000,000 rows / 100,000 collision candidates) PASS

Adversarial probes

  • Can capability be self-attested today? Live yes (Directus has INSERT on qt001_capability_operational_evidence, proven prior turn; row count 0). Design closes via owner-only storage at cutover. Until then capability is correctly 0/3 → scale NOT_SAFE.
  • Is "verified" a row or a measurement? A measurement (pages observed, resumed set, elapsed ms, bytes) — not a boolean a writer can set. Strong; directly fixes the FIX4 "to_regproc existence + literal-true neg test" finding.
  • Is the workload representative of 100M scale? It is a fixed 1M representative corpus with 100k collision candidates and a 10-min/1-GiB bound — a deliberate, bounded proxy, not a full-set scan. Reasonable; the design does not claim 100M is itself tested, and SCALE_SAFE is a named gate (consistent with all prior memory that real scale evidence is a 2.6B prerequisite).

Verdict: CAPABILITY_SPEC_COMPLETE

The capability subsystem is the most concretely specified — exact codes, workload, behavioral definitions, numeric thresholds, freshness, controlled-verifier, and explicit fake-proof rejection. No fake-capability path in the design. Only the measurement/verifier-run column DDL pends publication (B).

Back to Knowledge Hub knowledge/dev/reports/architecture/t1-fix7-implementation-spec-full-adversarial-review-2026-06-07/06-capability-measurement-review.md