Fix R2-1 — P1 Partial-Failure Self-Cleaning
03 — FIX R2-1: P1 PARTIAL-FAILURE SELF-CLEANING
File: bin/dot-staging-sandbox-create (sha256 3694a0b6d35cc761637826537bfb04375b12a2db4b98b13954beeec90e33d23e).
Requirement
Track the candidate sandbox before any create; if P1 fails after creating the DB but before emitting SANDBOX_JSON, attempt cleanup; if cleanup fails, exit nonzero and report it; never emit created=true unless all postconditions pass.
Code branch — candidate derived + validated BEFORE any DB op
TS="$(date -u +%Y%m%d_%H%M)"; SBX="${SBX:-c1_staging_${TS}}"
OPERATOR="$(whoami)@$(hostname)"
stg_assert_sandbox_name "$SBX" # R2-1: candidate id derived + validated BEFORE any DB op
Code branch — EXIT trap armed before create; compensating drop on partial failure
P1_CREATED_DB=0; P1_DONE=0
p1_on_exit(){
local rc=$?
if [ "$rc" -ne 0 ] && [ "$P1_CREATED_DB" -eq 1 ] && [ "$P1_DONE" -ne 1 ]; then
echo "P1_PARTIAL_FAILURE: rc=$rc after DB create, before completion -> compensating drop of ${SBX}" >&2
if stg_drop_db "$SBX"; then
echo "P1_PARTIAL_CLEANUP_OK: candidate ${SBX} dropped; no sandbox leaked" >&2
else
echo "P1_PARTIAL_CLEANUP_FAILED: candidate ${SBX} may still exist -- run 'dot-staging-sandbox-drop --sandbox-id ${SBX}'" >&2
stg_cleanup_remote_tmps || true
exit 70 # cleanup failure forces nonzero
fi
fi
stg_cleanup_remote_tmps || true
exit "$rc"
}
trap p1_on_exit EXIT
Code branch — CREATED_DB armed the instant the DB exists
stg_run_sql_file postgres "$SQLDIR/p1a-create-db.sql" -v sbx="$SBX"
P1_CREATED_DB=1 # R2-1: DB now exists -> EXIT trap will compensating-drop on any later failure
Code branch — created=true emitted ONLY after all postconditions, then disarm
... orphan check / ledger / RETIRE hint / SANDBOX_READY ...
printf 'SANDBOX_JSON: {"sandbox_id":"%s","sandbox_db":"%s","created":true}\n' "$SBX" "$SBX"
P1_DONE=1 # full success -> disarm compensating drop (created=true now authoritative)
Why this is complete
- The trap is installed before the reuse-first read and before phase-a create, so any nonzero exit after that point fires it.
P1_CREATED_DB=1runs immediately after a successfulCREATE DATABASE(p1a is a singleCREATE DATABASEunderON_ERROR_STOP=1; if it fails no DB exists and the flag stays 0 → no spurious drop).- Every post-create step (phase-b meta, orphan check, ledger, echoes) runs under
set -e; a failure exits the script and the trap drops the candidate. SANDBOX_JSON/created=trueis the last emitted artifact;P1_DONE=1immediately follows, so the trap never drops on full success and never lets a partial run advertise success.- Defense in depth: the phase-b failure branch also does an explicit
stg_drop_db "$SBX" || true(idempotentDROP DATABASE IF EXISTS); the trap re-verifies.
Plan-side complement (report 05 option b)
The dry-run plan preselects + validates the sandbox id and passes it via --sandbox-id, setting SBX BEFORE invoking P1, so the plan's own EXIT-trap cleanup target is known even if P1 fully succeeds but the JSON parse fails (see file 04).
Validation
bash -n OK; shellcheck warning-clean; guard self-tests confirm the pre-create guards still fail closed (exit 3/4) without ever reaching CREATE DATABASE. No live execution of the create path (no staging DB created).