KB-7A73
WATCH_RUNBOOK — Isolated OOM-safe poll queries
3 min read Revision 1
WATCH_RUNBOOK.md
Poll cadence: hourly is sufficient (scanner is daily; authority is human). All queries are isolated scalars — OOM-safe. DB=directus.
One-shot dashboard (preferred)
SELECT lane_no, lane, value, status FROM v_rp_production_wait_watch_dashboard ORDER BY lane_no;
Individual lanes (if dashboard ever unavailable)
-- OOM (function-backed; safe)
SELECT verdict, live_crash_landmines FROM v_rp_guard_safety_status;
-- trigger guard
SELECT count(*) FROM trigger_guard_alerts WHERE resolved_at IS NULL; -- expect <=129
-- birth drift
SELECT count(*) FROM birth_registry; -- growth = live worker, not us
-- scanner freshness (NEVER use wf_scanner_run_log / registry)
SELECT max(started_at) FROM wf_adapter_run_log; -- FRESH if <26h
-- action queue
SELECT count(*) FROM wf_candidate_action_log; -- PREVIEW+BLOCKED only
-- authority arrival
SELECT owners_active, president_approvals_on_procown FROM v_rp_authority_execution_preflight;
-- realrun / emit flags
SELECT key, value FROM dot_config WHERE key IN
('process_dot_runtime.real_run_enabled','process_dot_runtime.execute_enabled','piece_event_runtime.emit_enabled','iu_core.operator_runtime_enabled');
-- router decision
SELECT recommended_next_macro, reason FROM v_rp_next_macro_router_supertrack;
OOM incident check (run if anything looks off)
ssh contabo "docker logs postgres --tail 200 2>&1 | grep -c 'signal 9: Killed'" # expect 0
Alert rules
- OOM != SAFE or live_crash_landmines>0 → STOP, run OOM_INCIDENT_RESPONSE.
- trigger_guard_unresolved increases above 129 → investigate trigger drift.
- AUTHORITY_ARRIVAL flips to AUTHORITY_ARRIVED → read router, execute named macro.
- SCANNER STALE (>26h) → check wf-universal-scanner.timer on host.
OOM-safety rules (do not violate)
No deep composite smoke/RP mega query; no EXPLAIN on smoke-combo deep views; no new composite guard view over deep stacks. Isolated scalars + function-backed guards only.