KB-5875

T2 UI-Current Audit — 04 Regression-Test Risk

4 min read Revision 1

terminal2auditregressiontestsinvariantno-false-green2026-06-05

04 · Regression-Test Risk

The next macro must not be able to "go green" while binding a stale or false-green surface. These are the tests that must exist and the conditions that must fail loudly.

What must be RUN before UI deploy (all read-only / `query_pg` RO + grep)

#	Test	Expected	Live status today
T1	`_current` row count	`= 87`	✓ (resolves to reliability, 87)
T2	`_current` reliability fields non-null	`reliability_label, source_scope, confidence_score, lane_code, count_semantics` present & populated	✓ columns present
T3	Invariant has 0 unexplained FAIL OR every FAIL is rendered non-green	currently 2 `FAIL_COUNT_SUBSTRATE_MISMATCH`	✗ 2 live FAILs (PROC:new_candidates 6≠50, PROC:residual_reconcile 8≠23)
T4	Computed proof = invariant (no literal verdict)	`verdict_is_computed=true`, `fail_demonstrated_in_data=true`	✓ (`…_proof_matrix_computed` is per-axis computed; `rule_can_structurally_fail` present)
T5	UI routes bind `_current` only	grep `server/api/`,`pages/`,`composables/**` → 0 refs to `_v1`/`_v2`/`_reliability`/`…_contract` (unversioned-but-non-current)	UNVERIFIED_SOURCE_ACCESS (and existing packages bind versioned names → would FAIL today)
T6	Deploy guard	a single guard view returns PASS	✗ guard view does not exist

What SHOULD FAIL if stale v1 is used — the critical negative test

Point the invariant (or the bound contract) at v_rp_universal_node_ui_contract (v1): the prior T2 design predicts 12 FAIL_* (10 AX-PXT NEEDS_GROUPING-but-SHOW_SUBSTRATE + 2 AX-PROCESS dead-end). The regression harness must assert: binding → v1 ⇒ ≥12 FAIL; binding → _current/reliability ⇒ exactly the 2 known count-substrate FAILs (or 0 after T1 substrate-fix). If a build binds v1 and the suite still passes, the suite is broken.

What SHOULD FAIL if invariant has a real FAIL

The deploy-guard view (to be built) must FAIL on any invariant_status LIKE 'FAIL_%' that is not on an explicit allow-list. Today 2 FAILs are live and there is no gate — so a deploy could ship with 2 real count-substrate mismatches and nothing would stop it. Either T1 closes them (substrate-fix-v2, the absent checkpoint) or they go on an explicit "known-FAIL, rendered-red" allow-list with a tracking ticket.

What SHOULD FAIL if reliability fields are missing

Guard must assert 0 rows where reliability_label IS NULL OR source_scope IS NULL OR drill_action IS NULL OR next_route IS NULL. A count must never render without source_scope + reliability_label (no bare numbers).

False-green trap specific to this surface — must be a test

The 2 FAIL nodes carry reliability_label='CANDIDATE' in _current (doc 01 R2). A naive "badge = reliability_label" UI shows them green. Test: for every node where invariant.invariant_status LIKE 'FAIL_%', the rendered status chip MUST be non-green. This test can only pass if the renderer binds invariant_status, not just reliability_label.

Regression verdict

MANAGEABLE IF GUARDED. The tests are well-defined and mostly checkable read-only today, but two enablers are missing: (1) the deploy-guard view that wires them, and (2) source access to run the grep gate (T5). Until both exist, "UI productionization passed" cannot be trusted.