KB-27CA

13 — Deploy Final Readiness Guard + OOM Crash Incident

4 min read Revision 1

rpdeploy-guardoom-incidentcrash2026-06-05

13 — Deploy Final Readiness Guard + OOM Crash Incident (Phase 14)

CRITICAL OPERATIONAL INCIDENT — 4 production Postgres OOM crashes (auto-recovered, no data loss)

While building the combined deploy readiness guard, the first version was a single view that combined four heavy sources in ONE SQL statement: the contract guard, the smoke probe (which references the deep contract view-stack ~15 times), the anti-false-green regression, and the autoscale dashboard (which pulls the validation summary plus an md5 of the generator function plus the manifest chain). In a single statement the PLANNER must expand the deep contract view-stack many times over; on this VPS that exhausted memory and the Linux OOM killer sent signal 9 to the backend. The postmaster treats a signal-9 backend death as a crash and triggers crash recovery.

This was reproduced FOUR times during remediation, including once under EXPLAIN alone — which only plans and never executes — confirming the cost is planner-side (plan expansion), not execution. Each crash dropped all production connections (directus, nuxt) for a few seconds; Postgres replayed WAL and recovered cleanly each time. NO data was lost: birth_registry, axis_registry, governance ownership, trigger_guard, and all committed DDL survived every recovery (verified after each).

Root cause

This VPS cannot plan a single statement that references the deep RP contract view-stack alongside another deep-contract view. Each such view is fine ALONE (all were proven to query individually); the failure is the combinatorial plan expansion when several are combined in one statement.

Fix — crash-safe plpgsql combiner

The deploy guard is now fn_rp_ui_deploy_final_readiness_guard (plpgsql, STABLE, read-only) wrapped by the view v_rp_ui_deploy_final_readiness_guard. The function runs each gate as a SEPARATE statement (guard; smoke; anti-false-green; generated_v2 cardinality; autoscale verdict), each independently and boundedly planned at execution time, then aggregates the scalars. A function-scan in FROM is opaque, so the caller never builds one giant plan. Verified: the function-based guard returns cleanly, repeatedly, with no crash.

The naive single-statement guard (section 7 of 01_apply.sql) was neutralized in place and marked SUPERSEDED / DO NOT RUN so no operator can recreate the landmine. The crash-safe version lives in 04_deploy_guard_fn.sql.

Guard result

verdict UI_DEPLOY_BLOCKED_BY_GIT
db_side_ready true (guard PASS, smoke 15/15, anti-false-green 6/6, generated_v2 cardinality 87/87)
generator_verdict GENERATOR_PARITY_PASS_OPERATOR_REPLACE_REQUIRED
Verdict ladder: UI_DEPLOY_READY / UI_DEPLOY_READY_WITH_GENERATOR_PENDING / UI_DEPLOY_BLOCKED_BY_GENERATOR / UI_DEPLOY_BLOCKED_BY_GIT (selected) / FAIL. BLOCKED_BY_GIT chosen because every DB-side gate passes and the only remaining blocker is the operator git push; the generator is NOT a UI blocker (UI binds _current, never generated).

Lesson / gotcha for future sessions

Never combine multiple deep RP contract-derived views in ONE SQL statement on this VPS — the planner OOMs (signal 9 -> crash recovery), even under EXPLAIN. Combine them via a plpgsql function that runs each as a separate statement, or query each separately and combine in the application.