02 — Two-Fail Root Cause
02 — Root Cause of the 2 Count/Substrate Fails
The v1 invariant fires FAIL_COUNT_SUBSTRATE_MISMATCH when grouping_status='GROUPED' AND grouping_surface.child_total IS DISTINCT FROM count_value. It always reconciled count_value against child_total (sum of grouped leaf objects). That basis is correct only when the node's count counts leaves.
Critical observation: the TRIG nodes also have count_value ≠ child_count yet PASS — e.g. TRIG:db_dml_trigger count=408, child_count(groups)=177, child_total(leaves)=408 → 408==408 PASS. Their count counts leaves. The 2 fails differ in grain/scope:
Fail 1 — PROC:new_candidates (grain mismatch)
count_value=6from the actionability ledger = 6 PROCESS_CANDIDATE candidates (= number of candidate buckets).v_pxt_grouped_childrengroupswf_process_candidatebycandidate_codeand setschild_object_count = GREATEST(member_count,1), sochild_total = 50(sum of members),group_count = 6.- v1 compared count(6) vs child_total(50) → FAIL. Bug = reconciliation basis: this node counts groups (6), not leaves (50). It should reconcile against
group_count(6). The 50 members are a deeper drill level (candidate → members). - Diagnosis: count semantics / reconciliation-basis bug. Count is already live-correct (matches
wf_process_candidatePROCESS_CANDIDATE=6). No data change needed.
Fail 2 — PROC:residual_reconcile (stale literal + scope mismatch)
count_value=8(stale static literal); live = 2 (v_workflow_residual_evidence_hardening_v4whereresidual_state_v4='AWAITING_OWNER_RECONCILE').v_pxt_grouped_childrengroups all of v4 byresidual_state_v4→ 7 states / 23 rows (group_count=7, child_total=23). v4 states: RESOLVED_ALREADY_MANAGED 9, RESOLVED_NOT_PROCESS_ONE_SHOT 4, RESOLVED_COMPONENT 4, RESOLVED_NOT_PROCESS 2, AWAITING_OWNER_RECONCILE 2, RESOLVED_COMPONENT_HEALTH_MON 1, RESOLVED_COMPONENT_MAINTENANCE 1.- v1 compared count(8) vs child_total(23) → FAIL. Two bugs: (a) count is a stale literal (8) vs live actionable subset (2); (b) grouping spans all states (23) while the count means only the actionable state (AWAITING_OWNER_RECONCILE=2).
- Diagnosis: stale literal + grouping-scope mismatch. Fix needs both: count→live 2, grouping scoped to AWAITING_OWNER_RECONCILE.
Bonus finding (owner_gated_runtime)
PROC:owner_gated_runtime was flagged STATIC_NO_LIVE_SOURCE (11). But wf_orphan_remediation_queue has exactly 11 docker_service_no_candidate rows — a real live source. So it was upgradeable to live, not static (see 05).
Coverage proof: only the 10 AX-PXT nodes are GROUPED (3 PROC + 7 TRIG); all 77 other nodes are OK/EMPTY. So a grain-correct reconciliation over the 10 GROUPED nodes fully covers the fail surface.