KB-5B56

08 — Defect found + fix (R3-SELF-1)

4 min read Revision 1

c1-stagingclaude-r3-self-gatedefectr3-self-1fix

08 — Defect found + fix (R3-SELF-1)

Did the hostile self-review find a defect? YES — one (MEDIUM).

R3-SELF-1 — "the dry-run plan can drop a sandbox it did not create"

Attack class: cleanup-target safety (the exact class Codex flagged in R2-1).

Pre-fix behavior. The plan derived a minute-resolution candidate id CAND="c1_staging_$(date -u +%Y%m%d_%H%M)" and set SBX="$CAND" (the EXIT-trap drop target) before invoking P1. If two gated dry-runs were launched in the same UTC minute, both compute the SAME CAND. Run #2's P1 fails with REUSE_BLOCK (exit 4, since --force is disabled), but run #2's cleanup() then calls dot-staging-sandbox-drop --sandbox-id CAND. Run #1's sandbox has an active sbx_meta.sandbox_registry row, so the governed drop guard permits the drop — destroying run #1's live sandbox. The drop guard (active-registry requirement) does not protect here because the victim is legitimately active.

Severity. MEDIUM. The dry-run is single-operator, env-gated (CODEX_R3_PASS), and not yet run, so the race is narrow — but the consequence (silently destroying a concurrent run's sandbox) violates the saga's core invariant "never destroy what you did not create," and it is squarely in the class Codex rejects. Under the owner directive ("fix anything Codex would likely reject"), it was fixed, not documented-and-accepted.

Fix (plan only).

(a) Refuse a pre-existing candidate up-front (exit 74) so the plan never adopts a concurrent / leftover sandbox: a read-only select count(*) … datname='$CAND'; '' != '0' (unreadable) also aborts (fail-closed).
(b) Arm SBX (the drop target) only after P1 returns success. set -e guarantees the arming line runs only if the P1_OUT="$(… create …)" command substitution returned 0 — and P1 returns 0 only after P1_DONE=1 (it created the DB and emitted created=true). A P1 REUSE_BLOCK/validation failure aborts the plan before arming, so cleanup() sees SBX="" and drops nothing. P1's own EXIT trap remains the primary cleanup for any partial create.

This preserves the genuine R2-1 benefit (the plan still cleans up when P1 created the DB but the plan later fails to parse the JSON — SBX is armed right after P1 success, before the parse).

Changed files (exactly 4, all under /opt/incomex/staging/c1/)

plan/c1-staging-fast-dry-run.plan.sh  9d35608548a91db8717d8221a97f9d0d9b907e4a782cb0e62225304451750153
README.md                             b4eb6198f3ca884728fcc7d6a2964d1b69fbc0f26bc9ca5bba396d2f0041c74e
ROLLBACK.md                           fa47c4de14c6aeb9eefb166c34f78c19f8869d3f53d91d3928f8c2d75529daa7
ledger/dot_manage.jsonl               a3225aa0d03ab81099b6e58570ec0a30481ac8cf46636de322f8059431bf30a8

Unchanged (byte-identical to R2): all 6 primitive bins, _common.sh, all 6 SQL payloads, registry/primitives.jsonl, admission/. No registered primitive touched → registry sha256 cross-check still 6/6. No scope drift.

Re-run of the ENTIRE self-gate after the fix (per §7)

bash -n             8/8 bins + plan OK
shellcheck -S warning  CLEAN (bin/* + plan)
injection grep      NO_NONCOMMENT_INJECTION_HITS
guard self-tests    14/14 expected exit codes; staging_DBs after = 0
registry sha256     ALL_REGISTRY_SHA256_MATCH (6/6)
ledger              10 entries, all valid JSON
official runtime    DATA before == after (db_list_hash unchanged dfc368f6…)
staging_DBs         0
dry-run executed    NO ; staging DB created NO

Other observations (NOT blockers)

P6 orphan sweep covers tables/functions/triggers, not sequences/views; the only sequences present are identity-owned (auto-managed) and no views are created — no false-PASS risk. Noted, not a defect.
SIGKILL between CREATE DATABASE and trap-arming can leak an orphan that no trap can catch; it is a detectable c1_staging_% DB that the next create's reuse-block surfaces. Recoverable, not a false-PASS. Noted, not a defect.