KB-6FC9

Fix R2-2 — Plan P2 Failure Not Swallowed

4 min read Revision 1
c1stagingcodex-r2-fixplanp2exit-matrix2026-06-23

04 — FIX R2-2: PLAN CLEANUP NEVER SWALLOWS P2 FAILURE

File: plan/c1-staging-fast-dry-run.plan.sh (sha256 f1f5475c3a39d2aecfad6a0e263ee3b7925043851db7a2488385b18b9e4cb033).

Required behaviour matrix

primary PASS + cleanup PASS => exit 0
primary FAIL + cleanup PASS => exit primary nonzero
primary PASS + cleanup FAIL => exit cleanup nonzero
primary FAIL + cleanup FAIL => exit nonzero and report both

Code — cleanup preserves both rc, applies the matrix, gates success on staging-DB=0

cleanup(){
  local rc=$?            # primary-sequence exit status (0 iff every primitive passed)
  local cleanup_rc=0
  if [ -n "$SBX" ]; then
    echo "--- cleanup (primary, EXIT trap): dropping sandbox ${SBX} ---" >&2
    if "$BIN/dot-staging-sandbox-drop" --sandbox-id "$SBX"; then
      local cnt
      cnt="$(docker exec "$PG_CONTAINER" psql -U "$PG_USER" -tAc "select count(*) from pg_database where datname like 'c1_staging_%'" </dev/null 2>/dev/null | tr -d '[:space:]')"
      if [ "$cnt" != "0" ]; then
        cleanup_rc=87
        echo "CLEANUP_FAIL: P2 reported success but staging-DB count='$cnt' (expected 0) -- residual sandbox; inspect manually" >&2
      fi
    else
      cleanup_rc=86
      echo "CLEANUP_FAIL: P2 drop FAILED for ${SBX} -- live sandbox remains; inspect manually" >&2
    fi
  fi
  if [ "$rc" -ne 0 ] && [ "$cleanup_rc" -ne 0 ]; then
    echo "PLAN_RESULT: FAIL -- primary rc=$rc AND cleanup rc=$cleanup_rc (both failed)" >&2; exit "$rc"
  elif [ "$rc" -ne 0 ]; then
    echo "PLAN_RESULT: FAIL -- primary rc=$rc (cleanup ok)" >&2; exit "$rc"
  elif [ "$cleanup_rc" -ne 0 ]; then
    echo "PLAN_RESULT: FAIL -- primary ok but cleanup rc=$cleanup_rc" >&2; exit "$cleanup_rc"
  fi
  if [ "$PRIMITIVES_OK" -eq 1 ]; then
    echo "DRY_RUN_OK: primitives passed, sandbox ${SBX:-<none>} dropped, staging-DB count=0"
  fi
  exit 0
}
trap cleanup EXIT

No success marker precedes cleanup

The in-line marker is a stage marker only; the sole overall-success line is emitted by the trap after P2 + count=0:

PRIMITIVES_OK=1
echo "PRIMITIVES_STAGE_OK: evidence in $EVID_DIR (cleanup + final staging-DB=0 readback pending in EXIT trap)"

Forbidden patterns eliminated

  • No trap cleanup || true.
  • No cleanup … || echo WARN that then exit "$rc" with rc=0.
  • exit 0 is reachable only when rc==0 AND cleanup_rc==0, and cleanup_rc==0 requires either no sandbox or (P2 success AND c1_staging_% count==0). Therefore exit 0 ⟹ no live sandbox. A failed/unverifiable count query yields a non-"0" string → cleanup_rc=87 (fail-closed).

R2-1(b) — preselected cleanup target so the plan can always clean

CAND="c1_staging_$(date -u +%Y%m%d_%H%M)"
[[ "$CAND" =~ ^c1_staging_[0-9]{8}_[0-9]{4}$ ]] || { echo "FATAL: bad candidate sandbox id '$CAND'" >&2; exit 73; }
SBX="$CAND"   # cleanup target known BEFORE P1 runs
P1_OUT="$("$BIN/dot-staging-sandbox-create" ... --sandbox-id "$CAND")"
# ... parse + verify PARSED==CAND==PARSED_DB; FATAL on mismatch (exit 70/71/72) ...

Even if P1 fully succeeds but the JSON parse fails, SBX is already the real DB name, so the EXIT trap drops it.

Dry-run gate

C1_STAGING_DRY_RUN_CONFIRM must equal CODEX_R3_PASS (bumped from CODEX_R2_PASS, which would have implied an R2 pass that never happened); otherwise exit 64. No evidence path is created without it.

Validation

bash -n OK; shellcheck warning-clean; CODEX_R3_PASS x2, stale CODEX_R2_PASS x0. Plan was NOT executed (dry-run remains gated).

Back to Knowledge Hub knowledge/dev/laws-new/reports/c1-staging-codex-r2-fixes-ready-for-r3/04-fix-plan-p2-failure-not-swallowed.md