97 — Phase 1 Rollback & Disaster Recovery Runbook (no build, 2026-06-01)
97 — Phase 1 Rollback & Disaster Recovery Runbook
Mission §8 (Branch E). Tier: rollback/DR procedures for Phase 1 build steps. Mutation footprint: ZERO (runbook only — the SQL below is staged, not executed by this mission). All rollback SQL must itself run with the three timeouts set and be reviewed before use. Golden rule: prefer in-transaction ROLLBACK (no residue) over post-commit reversal; prefer retire/DELETE-by-origin over destructive deletes on shared reuse-tables; prefer DROP only on greenfield tables this Phase created.
97.1 Reversal-mode decision tree
Where is the step?
├─ Still inside the open transaction (not committed) ──────────► §97.2 IN-TRANSACTION ROLLBACK (cleanest)
├─ Committed, isolated, no downstream wrote against it ────────► §97.3 POST-COMMIT STEP REVERSAL
├─ Committed, a later step/FK child depends on it ─────────────► §97.4 ORDERED MULTI-STEP REVERSAL (child first)
└─ Unknown/uncertain state, guard tripped, or counts wrong ────► §97.7 DISASTER STOP + §97.8 restore-from-dump
97.2 Normal rollback within the transaction (preferred)
For any step still open (the rehearsal mode, or a COMMIT that has not yet been issued):
ROLLBACK;
Then prove entry==exit (doc 91 template): the step's target ABSENT again (greenfield), or apr_action_types back to 6 / birth_apr_rows back to 0 (SB-1), trigger restored to its pre-step def, event_outbox governance = 0, idle_in_transaction = 0, no session left open. Proven safe for every Phase-1 step in rehearsal (docs 58–62, 75–79, 83). No DR needed beyond this if you never committed.
97.3 Rollback after a partial / committed step
Reverse exactly one step, by component. Run each in its own reviewed transaction with the timeouts set; re-verify (doc 96) after.
SB-12 (STEP 1)
BEGIN;
DELETE FROM evolution_snapshots WHERE <origin key for the governance ruleset fingerprint row>;
DROP TABLE IF EXISTS governance_ruleset;
COMMIT; -- only if reversal is itself authorized; else ROLLBACK after proving the script
SB-13 (STEP 2)
BEGIN;
DELETE FROM queue_heartbeat WHERE executor_name LIKE 'gov\_%';
DROP TABLE IF EXISTS gov_worker_cursor;
COMMIT;
SB-10 (STEP 3)
BEGIN;
DROP TABLE IF EXISTS candidate_scan_run; -- FK child / sibling first
DROP TABLE IF EXISTS governance_candidate_object; -- optional object table, if it was materialized
DROP TABLE IF EXISTS governance_candidate_state; -- references governance_ruleset (SB-12) — leave SB-12 intact unless also reverting
COMMIT;
SB-11 (STEP 4)
BEGIN;
DELETE FROM event_type_registry WHERE event_domain='governance'; -- rows were active=false ⇒ nothing emitted to unwind
COMMIT;
SB-2 (STEP 5)
BEGIN;
DROP VIEW IF EXISTS v_object_owner_gap;
DROP VIEW IF EXISTS v_object_effective_owner;
DROP TABLE IF EXISTS governance_object_ownership; -- child of governance_responsibility_scope (FK scope)
DROP TABLE IF EXISTS governance_responsibility_scope; -- parent last
COMMIT;
SB-1 (STEP 6) — special: retire, do NOT delete; keep the trigger fix
BEGIN;
-- action-types are FK-referenced by approval_requests.proposed_action_code (RESTRICT) ⇒ NEVER DELETE
UPDATE apr_action_types
SET status='retired', retired_at=now()
WHERE action_code IN ('assign_governance_owner','grant_governance_exception','delegate_authority','assign_axis_owner')
AND status='active';
COMMIT;
-- The F-83-1 birth-trigger fix (fn_birth_registry_auto('action_code')) is the CORRECT wiring and is KEPT.
-- Do NOT revert the trigger to the broken no-arg def (that would re-break all future apr_action_types inserts). See doc 98 §98.5.
97.4 DROP order (greenfield) and DELETE order (governance-keyed rows)
DROP order — always child/dependent first, parent last:
- Views (
v_object_owner_gap,v_object_effective_owner). - FK children / sibling run-tables (
candidate_scan_run, optionalgovernance_candidate_object). - FK-referencing tables (
governance_candidate_state→ ruleset;governance_object_ownership→ scope). - Parents (
governance_responsibility_scope,governance_ruleset,gov_worker_cursor).
DELETE order on shared reuse-tables — by governance-scoped key only, never blanket:
event_type_registry:WHERE event_domain='governance'.queue_heartbeat:WHERE executor_name LIKE 'gov\_%'.evolution_snapshots: by the specific origin/fingerprint key of the governance ruleset row.dot_tools/dot_domains/dot_coverage_required(only relevant post-Phase-1 T6/T7):WHERE _dot_origin / domain LIKE 'governance.%'— not Phase 1.apr_action_types: retire, never DELETE (FK RESTRICT).
97.5 Trigger restoration
- In-flight: ROLLBACK restores
trg_birth_apr_action_typesautomatically (transactional DDL — proven doc 83 §83.3). - Post-commit: the F-83-1 fix is the intended permanent state — do not restore the broken no-arg def. If, and only if, an emergency requires reverting the trigger DDL itself (e.g. it was applied wrong), capture the exact original def first (
SELECT pg_get_triggerdef(oid) …) and re-apply that captured text; understand that the broken def will block allapr_action_typesinserts (doc 98 §98.5). Expect the[TRIGGER-GUARD]WARNING on any DROP/CREATE TRIGGER.
97.6 Event-type rollback
Governance event types are registered active=false and nothing is ever emitted in Phase 1 (event_outbox governance = 0). Rollback is therefore a clean DELETE FROM event_type_registry WHERE event_domain='governance' with no outbox unwind. If — in violation of Phase 1 — an event had been emitted, that is an incident: do not delete the type while event_outbox holds governance rows; quarantine, investigate emit source, then reconcile.
97.7 APR action-type rollback
As §97.3 SB-1: retire (status='retired', retired_at=now()), never delete; the FK from approval_requests.proposed_action_code is RESTRICT. A retired action-type stays in the table (vocabulary history) but is non-active; any future APR referencing it is blocked at the lifecycle/quorum layer. Keep the F-83-1 trigger fix.
97.8 Backup-restore condition (last resort)
Restore from the pre-COMMIT pg_dump only when targeted reversal (§97.3–§97.7) cannot cleanly return a reuse-table to baseline — e.g. a shared reuse-table (event_type_registry, apr_action_types, evolution_snapshots, queue_heartbeat) was mutated beyond the governance-scoped rows, or a count cannot be reconciled. Restore is table-scoped (pg_restore/psql the dumped table), never a full-DB rollback that would lose unrelated production traffic (birth_registry/event_outbox grow organically). Restore is itself a governed mutation — it needs its own authorization and a re-verify after.
97.9 Verification queries after any rollback
-- the reverted step's target ABSENT (greenfield) or vocabulary back to baseline
SELECT to_regclass('public.<reverted_table>'); -- expect NULL after DROP
SELECT count(*) FROM apr_action_types; -- 6 (or 10 with 4 retired if SB-1 retire-not-delete)
SELECT count(*) FROM apr_action_types WHERE status='active'; -- 6 after SB-1 retire
SELECT count(*) FROM event_type_registry WHERE event_domain='governance'; -- 0 after SB-11 delete
SELECT count(*) FROM event_outbox WHERE event_domain='governance'; -- 0 (MUST)
SELECT count(*) FROM governance_relations; -- 8 (unchanged throughout)
SELECT count(*) FROM os_proposal_approvals; -- unchanged (rollback never writes this)
SELECT count(*) FROM pg_stat_activity WHERE datname='directus' AND state='idle in transaction'; -- 0
SELECT pg_get_triggerdef(oid) FROM pg_trigger WHERE tgname='trg_birth_apr_action_types' AND NOT tgisinternal; -- fixed def kept post-SB-1
Post-rollback counts MUST equal the pre-build baseline for the reverted scope (doc 96 §96.1), allowing organic growth only on birth_registry/event_outbox totals.
97.10 Disaster stop condition & who may restart
Disaster = any of: a COMMIT happened without a matching os_proposal_approvals row; an event was emitted (event_outbox governance > 0); a handler_ref was flipped to a real handler; governance_relations was ALTERed; a session is stuck idle-in-transaction and cannot be cleanly closed; counts cannot be reconciled to baseline; the TRIGGER-GUARD escalated to ERROR mid-build.
On disaster: halt all build activity; do not issue further DDL/DML; write a finding citing the exact divergence; quarantine the session; if a shared reuse-table is corrupted, restore that table from pg_dump (§97.8) under fresh authorization.
Who/what may restart: only after (a) the divergence is documented and reconciled, (b) the sovereign re-confirms (or revokes) the os_proposal_approvals authorization for the affected step, and (c) the relevant council record is re-checked. No agent or GPT may self-authorize a restart. Restart resumes at STEP 0 (preflight, doc 96) for the affected step, never mid-step.
Branch E verdict: the runbook covers in-transaction rollback (preferred), per-component post-commit reversal, DROP order (child→parent) and DELETE-by-governance-key order, trigger restoration (keep the F-83-1 fix), event-type and APR-action-type rollback (retire-not-delete), table-scoped backup restore, post-rollback verification, and an explicit disaster stop with a sovereign-gated restart. All SQL staged, none executed; zero mutation by this mission.