KB-62DE

02 — Two-Fail Root Cause

3 min read Revision 1
rproot-causecount-substrate2026-06-05

02 — Root Cause of the 2 Count/Substrate Fails

The v1 invariant fires FAIL_COUNT_SUBSTRATE_MISMATCH when grouping_status='GROUPED' AND grouping_surface.child_total IS DISTINCT FROM count_value. It always reconciled count_value against child_total (sum of grouped leaf objects). That basis is correct only when the node's count counts leaves.

Critical observation: the TRIG nodes also have count_value ≠ child_count yet PASS — e.g. TRIG:db_dml_trigger count=408, child_count(groups)=177, child_total(leaves)=408 → 408==408 PASS. Their count counts leaves. The 2 fails differ in grain/scope:

Fail 1 — PROC:new_candidates (grain mismatch)

  • count_value=6 from the actionability ledger = 6 PROCESS_CANDIDATE candidates (= number of candidate buckets).
  • v_pxt_grouped_children groups wf_process_candidate by candidate_code and sets child_object_count = GREATEST(member_count,1), so child_total = 50 (sum of members), group_count = 6.
  • v1 compared count(6) vs child_total(50) → FAIL. Bug = reconciliation basis: this node counts groups (6), not leaves (50). It should reconcile against group_count (6). The 50 members are a deeper drill level (candidate → members).
  • Diagnosis: count semantics / reconciliation-basis bug. Count is already live-correct (matches wf_process_candidate PROCESS_CANDIDATE=6). No data change needed.

Fail 2 — PROC:residual_reconcile (stale literal + scope mismatch)

  • count_value=8 (stale static literal); live = 2 (v_workflow_residual_evidence_hardening_v4 where residual_state_v4='AWAITING_OWNER_RECONCILE').
  • v_pxt_grouped_children groups all of v4 by residual_state_v4 → 7 states / 23 rows (group_count=7, child_total=23). v4 states: RESOLVED_ALREADY_MANAGED 9, RESOLVED_NOT_PROCESS_ONE_SHOT 4, RESOLVED_COMPONENT 4, RESOLVED_NOT_PROCESS 2, AWAITING_OWNER_RECONCILE 2, RESOLVED_COMPONENT_HEALTH_MON 1, RESOLVED_COMPONENT_MAINTENANCE 1.
  • v1 compared count(8) vs child_total(23) → FAIL. Two bugs: (a) count is a stale literal (8) vs live actionable subset (2); (b) grouping spans all states (23) while the count means only the actionable state (AWAITING_OWNER_RECONCILE=2).
  • Diagnosis: stale literal + grouping-scope mismatch. Fix needs both: count→live 2, grouping scoped to AWAITING_OWNER_RECONCILE.

Bonus finding (owner_gated_runtime)

PROC:owner_gated_runtime was flagged STATIC_NO_LIVE_SOURCE (11). But wf_orphan_remediation_queue has exactly 11 docker_service_no_candidate rows — a real live source. So it was upgradeable to live, not static (see 05).

Coverage proof: only the 10 AX-PXT nodes are GROUPED (3 PROC + 7 TRIG); all 77 other nodes are OK/EMPTY. So a grain-correct reconciliation over the 10 GROUPED nodes fully covers the fail surface.

Back to Knowledge Hub knowledge/dev/reports/architecture/rp-count-substrate-fix-registryization-generator-fullpop-v2-2026-06-05/02-two-fail-root-cause.md