PG Backup Phase C Hardening Report — 2026-05-19
PG Backup Phase C Hardening Report
Date: 2026-05-19 Host: vmi3080463 Scope: permission guard lifecycle, registry lock coverage audit, stale monitor cleanup, restore drill.
0. Governance
- Skill read:
.claude/skills/incomex-rules.md. - KB read via
search_knowledgedirect main process:knowledge/dev/ssot/operating-rules.md— OR v7.58, 2026-05-01.knowledge/dev/laws/constitution.md— Constitution v4.6.3.- P9-G6/S174 backup incident context: PIPESTATUS, active DOWN push, grant drift guard, restore drill.
- Three statements:
- Permanent: added independent daily permission drift monitor; stale workflow monitor no longer creates false DOWN; restore drill validates recoverability.
- Cannot silently fail: guard has dedicated Kuma monitor and failure heartbeat, separate from local backup/GDrive.
- Automated: guard runs daily via cron and sends its own heartbeat.
1. Permission Guard Hardening
Before Phase C, check-pg-dump-permissions.sh only ran inside pg-backup.sh; root crontab had no standalone guard entry.
Current guard coverage:
- Dump role:
directus. - Non-system schema scan excludes only
information_schemaand schemas matchingpg_%. - No custom application allowlist/exclude schema is present.
- Objects checked: schema
USAGE; tables/partitioned tables/views/materialized views/foreign tablesSELECT; sequencesUSAGE, SELECT.
Changes:
- Added
/opt/incomex/scripts/pg-dump-permission-guard-monitor.sh. - Added
/opt/incomex/scripts/ensure-pg-dump-permission-guard-kuma-monitor.sh. - Created Uptime Kuma push monitor
PG Dump Permission Guard, id 17, tokenpg-dump-permission-guard, interval 90000, active 1. - Added root cron:
17 4 * * * /opt/incomex/scripts/pg-dump-permission-guard-monitor.sh.
Manual verification:
PASS: dump role directus has required schema/table/view/sequence privileges
2026-05-19T05:05:49+02:00 FINAL_STATUS=success ERROR_CLASS=none MSG=PASS: dump role directus has required schema/table/view/sequence privileges
17 PG Dump Permission Guard 1 2026-05-19 03:05:49.881 1 permission guard PASS 1
No GRANT was applied in Phase C because the guard was already PASS.
2. Registry Lock Coverage Audit
Batch functions expected from Phase B:
fn_refresh_orphan_col has_lock=t uses_key=t
fn_refresh_orphan_dot has_lock=t uses_key=t
fn_refresh_orphan_species has_lock=t uses_key=t
fn_refresh_species_per_level has_lock=t uses_key=t
refresh_meta_catalog_from_pivot has_lock=t uses_key=t
refresh_registry_views has_lock=t uses_key=t truncates_registry_counts=t
refresh_pivot_results has_lock=f uses_key=f
refresh_matrix_results has_lock=f uses_key=f
Coverage gaps found read-only, not changed in Phase C:
refresh_pivot_results()andrefresh_matrix_results()are scheduled write-heavy jobs and do not useincomex.registry_refresh.v1.- Trigger/function paths touching
meta_catalogorv_registry_countswithout the registry lock includefn_auto_cleanup_on_meta_delete,fn_auto_sync_v_registry_counts,fn_ensure_registry_counts, count refresh functions,refresh_all_meta_counts,update_record_count,trg_pivot_def_refresh, andtrg_fn_refresh_orphan_*. - DOT/code paths that can mutate registry counts or
meta_catalogexist outside these DB functions, including/opt/incomex/dot/bin/dot-schema-dot-origin-ensure,/opt/incomex/dot/bin/dot-orphan-scan,/opt/incomex/dot/bin/dot-registry-count-refresh, and/opt/incomex/dot/bin/dot-schema-trigger-registry-ensure.
Conclusion: Phase B removed the immediate cron/deadlock class for the main backup collision, but advisory-lock coverage is not complete for all registry/meta write paths. This should be a separate approved SQL/DOT hardening phase.
3. Monitoring Cleanup
Evidence that PG Backup Workflow was stale:
# S174-FIX-03 archived: 0 2 * * * /opt/workflow/postgres/backup.sh >> /opt/workflow/postgres/backup.log 2>&1
/opt/workflow/postgres/backup.sh.retired.20260408
/opt/workflow/postgres/backups.retired.20260408/workflow_20260401T000001Z.sql.gz 20 bytes
Before:
13 PG Backup Workflow push 1 pg-backup-workflow 90000
2026-05-19 01:21:53.452 0 No heartbeat in the time window
2026-05-18 00:21:53.440 0 No heartbeat in the time window
Action: paused monitor #13 via Kuma socket API. Did not restart PostgreSQL. Did not touch nginx or unrelated services.
After:
13 PG Backup Workflow push 0 pg-backup-workflow 90000
Current backup monitor separation:
12 PG Backup Local active=1 latest success OK size=64M
14 PG Backup GDrive active=1 latest success OK archive=131M
17 PG Dump Permission Guard active=1 latest success permission guard PASS
4. Restore Drill
Local backup tested:
LOCAL_BACKUP=/opt/incomex/backups/pg/directus_2026-05-19_0240.sql.gz
LOCAL_SIZE_BYTES=66690408
LOCAL_PAYLOAD_BYTES=720906367
LOCAL_HEADER=-- -- PostgreSQL database dump --
RESTORE_DB=restore_drill_20260519_050848
RESTORE_TABLES=302
RESTORE_SCHEMAS=cutter_governance,public,sandbox_tac
RESTORE_ERROR_LINES=none
GDrive archive tested:
GDRIVE_ARCHIVE=vps-backup-20260519_044156.tar.gz
GDRIVE_ARCHIVE_SIZE_BYTES=136703863
GDRIVE_PG_MEMBER=vps-backup-20260519_044156/postgresql-directus.sql.gz
GDRIVE_PG_SIZE_BYTES=66690402
GDRIVE_PAYLOAD_BYTES=720906367
GDRIVE_HEADER=-- -- PostgreSQL database dump --
Cleanup evidence: no restore_drill_% database rows and no /tmp/pg-restore-drill-* directories remained.
5. Backups Created
/root/crontab.pre-phase-c-pgbackup-hardening-1779159891.txt
/opt/incomex/backups/kuma.db.pre-phase-c-pgbackup-hardening-1779159891
/opt/incomex/backups/kuma.db.pre-phase-c-disable-pg-backup-workflow-1779159993
/opt/incomex/scripts/ensure-pg-dump-permission-guard-kuma-monitor.sh.pre-phase-c-1779159934
6. Rollback
Restore crontab:
crontab /root/crontab.pre-phase-c-pgbackup-hardening-1779159891.txt
Full Kuma rollback to pre-Phase-C monitor state:
docker stop uptime-kuma
cp /opt/incomex/backups/kuma.db.pre-phase-c-pgbackup-hardening-1779159891 /opt/incomex/uptime-kuma/kuma.db
docker start uptime-kuma
Rollback only the stale monitor disable:
docker stop uptime-kuma
cp /opt/incomex/backups/kuma.db.pre-phase-c-disable-pg-backup-workflow-1779159993 /opt/incomex/uptime-kuma/kuma.db
docker start uptime-kuma
Remove Phase C guard scripts if doing full rollback:
rm -f /opt/incomex/scripts/pg-dump-permission-guard-monitor.sh
rm -f /opt/incomex/scripts/ensure-pg-dump-permission-guard-kuma-monitor.sh
7. Remaining Risks
- Advisory lock coverage is incomplete for all registry/meta write paths. A separate task should wrap all registry-count/meta-catalog writers and relevant DOT jobs with a shared serialization mechanism.
TRUNCATE public.v_registry_countsremains inrefresh_registry_views(). Replacement with upsert/delete-diff or a staging-table swap should be a separate approved redesign.- A scheduled periodic restore drill is recommended, not just manual incident-time validation.