T2 UI-Current Audit — 04 Regression-Test Risk
04 · Regression-Test Risk
The next macro must not be able to "go green" while binding a stale or false-green surface. These are the tests that must exist and the conditions that must fail loudly.
What must be RUN before UI deploy (all read-only / query_pg RO + grep)
| # | Test | Expected | Live status today |
|---|---|---|---|
| T1 | _current row count |
= 87 |
✓ (resolves to reliability, 87) |
| T2 | _current reliability fields non-null |
reliability_label, source_scope, confidence_score, lane_code, count_semantics present & populated |
✓ columns present |
| T3 | Invariant has 0 unexplained FAIL OR every FAIL is rendered non-green | currently 2 FAIL_COUNT_SUBSTRATE_MISMATCH |
✗ 2 live FAILs (PROC:new_candidates 6≠50, PROC:residual_reconcile 8≠23) |
| T4 | Computed proof = invariant (no literal verdict) | verdict_is_computed=true, fail_demonstrated_in_data=true |
✓ (…_proof_matrix_computed is per-axis computed; rule_can_structurally_fail present) |
| T5 | UI routes bind _current only |
grep server/api/**,pages/**,composables/** → 0 refs to _v1/_v2/_reliability/…_contract (unversioned-but-non-current) |
UNVERIFIED_SOURCE_ACCESS (and existing packages bind versioned names → would FAIL today) |
| T6 | Deploy guard | a single guard view returns PASS | ✗ guard view does not exist |
What SHOULD FAIL if stale v1 is used — the critical negative test
Point the invariant (or the bound contract) at v_rp_universal_node_ui_contract (v1): the prior T2 design predicts 12 FAIL_* (10 AX-PXT NEEDS_GROUPING-but-SHOW_SUBSTRATE + 2 AX-PROCESS dead-end). The regression harness must assert: binding → v1 ⇒ ≥12 FAIL; binding → _current/reliability ⇒ exactly the 2 known count-substrate FAILs (or 0 after T1 substrate-fix). If a build binds v1 and the suite still passes, the suite is broken.
What SHOULD FAIL if invariant has a real FAIL
The deploy-guard view (to be built) must FAIL on any invariant_status LIKE 'FAIL_%' that is not on an explicit allow-list. Today 2 FAILs are live and there is no gate — so a deploy could ship with 2 real count-substrate mismatches and nothing would stop it. Either T1 closes them (substrate-fix-v2, the absent checkpoint) or they go on an explicit "known-FAIL, rendered-red" allow-list with a tracking ticket.
What SHOULD FAIL if reliability fields are missing
Guard must assert 0 rows where reliability_label IS NULL OR source_scope IS NULL OR drill_action IS NULL OR next_route IS NULL. A count must never render without source_scope + reliability_label (no bare numbers).
False-green trap specific to this surface — must be a test
The 2 FAIL nodes carry reliability_label='CANDIDATE' in _current (doc 01 R2). A naive "badge = reliability_label" UI shows them green. Test: for every node where invariant.invariant_status LIKE 'FAIL_%', the rendered status chip MUST be non-green. This test can only pass if the renderer binds invariant_status, not just reliability_label.
Regression verdict
MANAGEABLE IF GUARDED. The tests are well-defined and mostly checkable read-only today, but two enablers are missing: (1) the deploy-guard view that wires them, and (2) source access to run the grep gate (T5). Until both exist, "UI productionization passed" cannot be trusted.