13 — Deploy Final Readiness Guard + OOM Crash Incident
13 — Deploy Final Readiness Guard + OOM Crash Incident (Phase 14)
CRITICAL OPERATIONAL INCIDENT — 4 production Postgres OOM crashes (auto-recovered, no data loss)
While building the combined deploy readiness guard, the first version was a single view that combined four heavy sources in ONE SQL statement: the contract guard, the smoke probe (which references the deep contract view-stack ~15 times), the anti-false-green regression, and the autoscale dashboard (which pulls the validation summary plus an md5 of the generator function plus the manifest chain). In a single statement the PLANNER must expand the deep contract view-stack many times over; on this VPS that exhausted memory and the Linux OOM killer sent signal 9 to the backend. The postmaster treats a signal-9 backend death as a crash and triggers crash recovery.
This was reproduced FOUR times during remediation, including once under EXPLAIN alone — which only plans and never executes — confirming the cost is planner-side (plan expansion), not execution. Each crash dropped all production connections (directus, nuxt) for a few seconds; Postgres replayed WAL and recovered cleanly each time. NO data was lost: birth_registry, axis_registry, governance ownership, trigger_guard, and all committed DDL survived every recovery (verified after each).
Root cause
This VPS cannot plan a single statement that references the deep RP contract view-stack alongside another deep-contract view. Each such view is fine ALONE (all were proven to query individually); the failure is the combinatorial plan expansion when several are combined in one statement.
Fix — crash-safe plpgsql combiner
The deploy guard is now fn_rp_ui_deploy_final_readiness_guard (plpgsql, STABLE, read-only) wrapped by the view v_rp_ui_deploy_final_readiness_guard. The function runs each gate as a SEPARATE statement (guard; smoke; anti-false-green; generated_v2 cardinality; autoscale verdict), each independently and boundedly planned at execution time, then aggregates the scalars. A function-scan in FROM is opaque, so the caller never builds one giant plan. Verified: the function-based guard returns cleanly, repeatedly, with no crash.
The naive single-statement guard (section 7 of 01_apply.sql) was neutralized in place and marked SUPERSEDED / DO NOT RUN so no operator can recreate the landmine. The crash-safe version lives in 04_deploy_guard_fn.sql.
Guard result
- verdict UI_DEPLOY_BLOCKED_BY_GIT
- db_side_ready true (guard PASS, smoke 15/15, anti-false-green 6/6, generated_v2 cardinality 87/87)
- generator_verdict GENERATOR_PARITY_PASS_OPERATOR_REPLACE_REQUIRED
- Verdict ladder: UI_DEPLOY_READY / UI_DEPLOY_READY_WITH_GENERATOR_PENDING / UI_DEPLOY_BLOCKED_BY_GENERATOR / UI_DEPLOY_BLOCKED_BY_GIT (selected) / FAIL. BLOCKED_BY_GIT chosen because every DB-side gate passes and the only remaining blocker is the operator git push; the generator is NOT a UI blocker (UI binds _current, never generated).
Lesson / gotcha for future sessions
Never combine multiple deep RP contract-derived views in ONE SQL statement on this VPS — the planner OOMs (signal 9 -> crash recovery), even under EXPLAIN. Combine them via a plpgsql function that runs each as a separate statement, or query each separately and combine in the application.