KB-3B71

97 — Phase 1 Rollback & Disaster Recovery Runbook (no build, 2026-06-01)

10 min read Revision 1
one-roof-governanceimplementation-indexphase1rollbackdisaster-recoveryrunbookdrop-orderno-commit2026-06-01

97 — Phase 1 Rollback & Disaster Recovery Runbook

Mission §8 (Branch E). Tier: rollback/DR procedures for Phase 1 build steps. Mutation footprint: ZERO (runbook only — the SQL below is staged, not executed by this mission). All rollback SQL must itself run with the three timeouts set and be reviewed before use. Golden rule: prefer in-transaction ROLLBACK (no residue) over post-commit reversal; prefer retire/DELETE-by-origin over destructive deletes on shared reuse-tables; prefer DROP only on greenfield tables this Phase created.


97.1 Reversal-mode decision tree

Where is the step?
├─ Still inside the open transaction (not committed) ──────────► §97.2 IN-TRANSACTION ROLLBACK (cleanest)
├─ Committed, isolated, no downstream wrote against it ────────► §97.3 POST-COMMIT STEP REVERSAL
├─ Committed, a later step/FK child depends on it ─────────────► §97.4 ORDERED MULTI-STEP REVERSAL (child first)
└─ Unknown/uncertain state, guard tripped, or counts wrong ────► §97.7 DISASTER STOP + §97.8 restore-from-dump

97.2 Normal rollback within the transaction (preferred)

For any step still open (the rehearsal mode, or a COMMIT that has not yet been issued):

ROLLBACK;

Then prove entry==exit (doc 91 template): the step's target ABSENT again (greenfield), or apr_action_types back to 6 / birth_apr_rows back to 0 (SB-1), trigger restored to its pre-step def, event_outbox governance = 0, idle_in_transaction = 0, no session left open. Proven safe for every Phase-1 step in rehearsal (docs 58–62, 75–79, 83). No DR needed beyond this if you never committed.

97.3 Rollback after a partial / committed step

Reverse exactly one step, by component. Run each in its own reviewed transaction with the timeouts set; re-verify (doc 96) after.

SB-12 (STEP 1)

BEGIN;
  DELETE FROM evolution_snapshots WHERE <origin key for the governance ruleset fingerprint row>;
  DROP TABLE IF EXISTS governance_ruleset;
COMMIT;   -- only if reversal is itself authorized; else ROLLBACK after proving the script

SB-13 (STEP 2)

BEGIN;
  DELETE FROM queue_heartbeat WHERE executor_name LIKE 'gov\_%';
  DROP TABLE IF EXISTS gov_worker_cursor;
COMMIT;

SB-10 (STEP 3)

BEGIN;
  DROP TABLE IF EXISTS candidate_scan_run;            -- FK child / sibling first
  DROP TABLE IF EXISTS governance_candidate_object;   -- optional object table, if it was materialized
  DROP TABLE IF EXISTS governance_candidate_state;    -- references governance_ruleset (SB-12) — leave SB-12 intact unless also reverting
COMMIT;

SB-11 (STEP 4)

BEGIN;
  DELETE FROM event_type_registry WHERE event_domain='governance';   -- rows were active=false ⇒ nothing emitted to unwind
COMMIT;

SB-2 (STEP 5)

BEGIN;
  DROP VIEW IF EXISTS v_object_owner_gap;
  DROP VIEW IF EXISTS v_object_effective_owner;
  DROP TABLE IF EXISTS governance_object_ownership;     -- child of governance_responsibility_scope (FK scope)
  DROP TABLE IF EXISTS governance_responsibility_scope; -- parent last
COMMIT;

SB-1 (STEP 6) — special: retire, do NOT delete; keep the trigger fix

BEGIN;
  -- action-types are FK-referenced by approval_requests.proposed_action_code (RESTRICT) ⇒ NEVER DELETE
  UPDATE apr_action_types
     SET status='retired', retired_at=now()
   WHERE action_code IN ('assign_governance_owner','grant_governance_exception','delegate_authority','assign_axis_owner')
     AND status='active';
COMMIT;
-- The F-83-1 birth-trigger fix (fn_birth_registry_auto('action_code')) is the CORRECT wiring and is KEPT.
-- Do NOT revert the trigger to the broken no-arg def (that would re-break all future apr_action_types inserts). See doc 98 §98.5.

97.4 DROP order (greenfield) and DELETE order (governance-keyed rows)

DROP order — always child/dependent first, parent last:

  1. Views (v_object_owner_gap, v_object_effective_owner).
  2. FK children / sibling run-tables (candidate_scan_run, optional governance_candidate_object).
  3. FK-referencing tables (governance_candidate_state → ruleset; governance_object_ownership → scope).
  4. Parents (governance_responsibility_scope, governance_ruleset, gov_worker_cursor).

DELETE order on shared reuse-tables — by governance-scoped key only, never blanket:

  • event_type_registry: WHERE event_domain='governance'.
  • queue_heartbeat: WHERE executor_name LIKE 'gov\_%'.
  • evolution_snapshots: by the specific origin/fingerprint key of the governance ruleset row.
  • dot_tools / dot_domains / dot_coverage_required (only relevant post-Phase-1 T6/T7): WHERE _dot_origin / domain LIKE 'governance.%'not Phase 1.
  • apr_action_types: retire, never DELETE (FK RESTRICT).

97.5 Trigger restoration

  • In-flight: ROLLBACK restores trg_birth_apr_action_types automatically (transactional DDL — proven doc 83 §83.3).
  • Post-commit: the F-83-1 fix is the intended permanent state — do not restore the broken no-arg def. If, and only if, an emergency requires reverting the trigger DDL itself (e.g. it was applied wrong), capture the exact original def first (SELECT pg_get_triggerdef(oid) …) and re-apply that captured text; understand that the broken def will block all apr_action_types inserts (doc 98 §98.5). Expect the [TRIGGER-GUARD] WARNING on any DROP/CREATE TRIGGER.

97.6 Event-type rollback

Governance event types are registered active=false and nothing is ever emitted in Phase 1 (event_outbox governance = 0). Rollback is therefore a clean DELETE FROM event_type_registry WHERE event_domain='governance' with no outbox unwind. If — in violation of Phase 1 — an event had been emitted, that is an incident: do not delete the type while event_outbox holds governance rows; quarantine, investigate emit source, then reconcile.

97.7 APR action-type rollback

As §97.3 SB-1: retire (status='retired', retired_at=now()), never delete; the FK from approval_requests.proposed_action_code is RESTRICT. A retired action-type stays in the table (vocabulary history) but is non-active; any future APR referencing it is blocked at the lifecycle/quorum layer. Keep the F-83-1 trigger fix.

97.8 Backup-restore condition (last resort)

Restore from the pre-COMMIT pg_dump only when targeted reversal (§97.3–§97.7) cannot cleanly return a reuse-table to baseline — e.g. a shared reuse-table (event_type_registry, apr_action_types, evolution_snapshots, queue_heartbeat) was mutated beyond the governance-scoped rows, or a count cannot be reconciled. Restore is table-scoped (pg_restore/psql the dumped table), never a full-DB rollback that would lose unrelated production traffic (birth_registry/event_outbox grow organically). Restore is itself a governed mutation — it needs its own authorization and a re-verify after.

97.9 Verification queries after any rollback

-- the reverted step's target ABSENT (greenfield) or vocabulary back to baseline
SELECT to_regclass('public.<reverted_table>');                              -- expect NULL after DROP
SELECT count(*) FROM apr_action_types;                                      -- 6 (or 10 with 4 retired if SB-1 retire-not-delete)
SELECT count(*) FROM apr_action_types WHERE status='active';               -- 6 after SB-1 retire
SELECT count(*) FROM event_type_registry WHERE event_domain='governance';  -- 0 after SB-11 delete
SELECT count(*) FROM event_outbox       WHERE event_domain='governance';   -- 0 (MUST)
SELECT count(*) FROM governance_relations;                                  -- 8 (unchanged throughout)
SELECT count(*) FROM os_proposal_approvals;                                 -- unchanged (rollback never writes this)
SELECT count(*) FROM pg_stat_activity WHERE datname='directus' AND state='idle in transaction'; -- 0
SELECT pg_get_triggerdef(oid) FROM pg_trigger WHERE tgname='trg_birth_apr_action_types' AND NOT tgisinternal; -- fixed def kept post-SB-1

Post-rollback counts MUST equal the pre-build baseline for the reverted scope (doc 96 §96.1), allowing organic growth only on birth_registry/event_outbox totals.

97.10 Disaster stop condition & who may restart

Disaster = any of: a COMMIT happened without a matching os_proposal_approvals row; an event was emitted (event_outbox governance > 0); a handler_ref was flipped to a real handler; governance_relations was ALTERed; a session is stuck idle-in-transaction and cannot be cleanly closed; counts cannot be reconciled to baseline; the TRIGGER-GUARD escalated to ERROR mid-build.

On disaster: halt all build activity; do not issue further DDL/DML; write a finding citing the exact divergence; quarantine the session; if a shared reuse-table is corrupted, restore that table from pg_dump (§97.8) under fresh authorization.

Who/what may restart: only after (a) the divergence is documented and reconciled, (b) the sovereign re-confirms (or revokes) the os_proposal_approvals authorization for the affected step, and (c) the relevant council record is re-checked. No agent or GPT may self-authorize a restart. Restart resumes at STEP 0 (preflight, doc 96) for the affected step, never mid-step.

Branch E verdict: the runbook covers in-transaction rollback (preferred), per-component post-commit reversal, DROP order (child→parent) and DELETE-by-governance-key order, trigger restoration (keep the F-83-1 fix), event-type and APR-action-type rollback (retire-not-delete), table-scoped backup restore, post-rollback verification, and an explicit disaster stop with a sovereign-gated restart. All SQL staged, none executed; zero mutation by this mission.

Back to Knowledge Hub knowledge/dev/reports/architecture/one-roof-governance-technical-addendum-and-implementation-index-2026-06-01/97-phase1-rollback-and-disaster-recovery-runbook.md