P9 G6 Backup Integrity Recovery — 2026-04-27
title: P9 G6 Backup Integrity Recovery — 2026-04-27 date: 2026-04-27 executor: Claude Code (medium effort, VPS context via SSH contabo) type: investigation-recovery-report status: HARD_STOP_5_TRIGGERED — fresh backup integrity FAIL
G6 Backup Integrity Recovery — 2026-04-27
TL;DR
Hai anomaly có CÙNG MỘT root cause duy nhất: PostgreSQL role directus thiếu USAGE privilege trên schema sandbox_tac (tạo bởi S178 Fix 20 M3A). pg_dump lock-table phase fail → output gzip rỗng (20 bytes) → cả pg-backup.sh lẫn backup-to-gdrive.sh đều hỏng cùng kiểu.
- Việc 3 chạy fresh backup: governed script
pg-backup.shexit 1 cùng lỗi → file mớidirectus_2026-04-27_1353.sql.gzcũng 20 bytes → Hard Stop #5 TRIGGERED. - KHÔNG retry, KHÔNG patch script, KHÔNG GRANT. Chờ orchestrator + user authorize fix DDL.
§0. Execution Context
hostname: vmi3080463 (VPS Contabo)
whoami: root
pwd: /root
kernel: Linux 6.8.0-90-generic Ubuntu
VPS context confirmed (qua SSH contabo, key-based, BatchMode=yes).
§1. Việc 1 — Anomaly #2 Root Cause
1.1 Backup directory state
ls -lht /opt/incomex/backups/pg/ (head):
-rw------- 1 root root 20 Apr 27 02:00 directus_2026-04-27_0000.sql.gz ← BROKEN
-rw-r--r-- 1 root root 11K Apr 27 02:00 backup.log
-rw------- 1 root root 42M Apr 26 02:00 directus_2026-04-26_0000.sql.gz
-rw------- 1 root root 42M Apr 25 02:00 directus_2026-04-25_0000.sql.gz
... (>=40M baseline trở về quá khứ)
1.2 gzip / file metadata
gzip -t directus_2026-04-27_0000.sql.gz→ exit 0 (technically valid empty stream)file ...→gzip compressed data, max compression, from Unix, original size modulo 2^32 0(original size = 0 bytes → gzip header + EOF only)zcat ... | head→ empty (no PG header, no SQL)
→ File là gzip header + empty payload (~20 bytes), không phải corruption — pg_dump ghi 0 byte vào pipe vì fail trước SQL output.
1.3 backup.log evidence (masked)
[2026-04-25T00:00:02Z] OK size=42M (43311450 bytes)
[2026-04-26T00:00:01Z] OK size=42M (43763755 bytes)
[2026-04-27T00:00:01Z] START pg-backup -> .../directus_2026-04-27_0000.sql.gz
pg_dump: error: query failed: ERROR: permission denied for schema sandbox_tac
pg_dump: detail: Query was: LOCK TABLE public.agent_views, ...,
sandbox_tac.section_type_vocab, sandbox_tac.publication_type_vocab,
sandbox_tac.logical_unit, sandbox_tac.unit_version,
sandbox_tac.publication, sandbox_tac.publication_member,
sandbox_tac.change_set, sandbox_tac.change_set_member
IN ACCESS SHARE MODE
1.4 Root cause confirmation
Read-only diagnostic (không mutate):
docker exec postgres psql -U directus -d directus -tAc \
"SELECT has_schema_privilege(current_user, 'sandbox_tac', 'USAGE');"
→ f
SELECT current_user → directus
→ Role directus KHÔNG có USAGE trên schema sandbox_tac. Khi pg_dump enumerate tables (toàn DB) và phát LOCK TABLE bao gồm 8 bảng sandbox_tac.*, fail tại đó.
Lịch sử (memory): sandbox_tac schema được tạo trong S178 Fix 20 M3B (2026-04-19) cho 5 PG-only governance tables + nhiều bảng phụ. Permissions chưa được mở cho dump role.
Anomaly #2 = permission gap kế thừa từ S178 Fix 20. KHÔNG phải bug của pg-backup.sh.
§2. Việc 2 — Backup Script Inspection
2.1 Metadata
path: /opt/incomex/scripts/pg-backup.sh
size: 2157 bytes, mtime 2026-04-08 11:23
owner: root:root, mode 755 (NOT world-writable ✓)
2.2 Selective grep (business logic, masked)
4: # S174-FIX-01: Replaces retired mysql-backup.sh
5: # Runs via cron, keeps 7 days of backups
6: # Heartbeat → Uptime Kuma push monitor (pg-backup-local)
11: CONTAINER="postgres"
12: DB_USER="directus"
13: DB_NAME="directus"
14: BACKUP_DIR="/opt/incomex/backups/pg"
16: KUMA_PUSH_URL="http://localhost:3001/api/push/***" ← TOKEN MASKED
22: echo "[$(date -u ...)] START pg-backup -> ${BACKUP_FILE}"
31: docker exec "$CONTAINER" \
32: pg_dump -U "$DB_USER" -d "$DB_NAME" --no-owner --no-acl \
33: | gzip -9 > "$BACKUP_FILE"
38: echo "[...] ERROR: backup file too small (${FILE_SIZE} bytes)" >&2
43: # Verify gzip integrity
45: echo "[...] ERROR: gzip integrity check failed" >&2
52: # Cleanup old backups
58: curl -fsS "${KUMA_PUSH_URL}?status=up&msg=...&ping=${FILE_SIZE}" >&2
Observations:
- Pipeline
pg_dump | gzip > FILE— không check${PIPESTATUS[0]}cho pg_dump exit code; gzip trên empty input vẫn exit 0 → script chỉ catch fail qua size check (line 38). - Heartbeat (line 58) chỉ chạy khi mọi check PASS → ngày 2026-04-27 00:00 cron run không gửi heartbeat (Kuma sẽ alert "down" — nhưng đó là detection layer riêng, ngoài scope).
- Script hiện tại ĐÚNG NGHIỆP VỤ — fix phải ở DB permission, không phải script.
§3. Việc 3 — Fresh Backup Run (Governed)
3.1 Preconditions (P1–P6)
| # | Check | Result | Status |
|---|---|---|---|
| P1 | Cron registration | 0 2 * * * /opt/incomex/scripts/pg-backup.sh ... |
PASS |
| P2 | NOT world-writable | 755 root:root |
PASS |
| P3 | Disk space | 55 GB free / 96 GB | PASS |
| P4 | No concurrent job | pgrep empty |
PASS |
| P5 | Postgres container | Running=true Status=running |
PASS |
| P6 | PG governed path | Lines 11+31: CONTAINER="postgres", docker exec "$CONTAINER" pg_dump ... (Docker-local via variable, not literal) |
PASS |
3.2 Pre-run diagnostic (read-only, không bypass)
docker exec postgres psql -U directus -d directus -tAc \
"SELECT has_schema_privilege(current_user, 'sandbox_tac', 'USAGE');"
→ f
Diagnostic chỉ ra fail gần như chắc chắn. Theo dispatch tôi vẫn run governed script (preconditions PASS), không bypass; nếu fail → Hard Stop #5.
3.3 Run result
$ bash /opt/incomex/scripts/pg-backup.sh >> /opt/incomex/backups/pg/backup.log 2>&1
exit_code=1 duration=0s
backup.log tail (masked):
[2026-04-27T13:53:48Z] START pg-backup -> .../directus_2026-04-27_1353.sql.gz
pg_dump: error: query failed: ERROR: permission denied for schema sandbox_tac
pg_dump: detail: Query was: LOCK TABLE ... sandbox_tac.section_type_vocab ...
IN ACCESS SHARE MODE
Side effects observed:
- File mới
directus_2026-04-27_1353.sql.gz(20 bytes) — script dùng_HHMMsuffix, KHÔNG overwrite_0000 - Log entry mới (cùng error)
- Heartbeat không fire (script exit non-zero trước line 58)
- DB data/schema: không mutation
§4. Việc 4 — Integrity Verification
| # | Check | New file directus_2026-04-27_1353.sql.gz |
Result |
|---|---|---|---|
| V4-1 | File exists | -rw------- 20 bytes 2026-04-27 15:53 |
PRESENT |
| V4-2 | gzip -t | exit 0 (empty stream technically valid) | PASS (misleading) |
| V4-3 | PG dump header | (empty zcat output) | FAIL |
| V4-4 | CREATE/COPY count | 0 | FAIL |
| V4-5 | Size > 1 MB / baseline order | 20 bytes vs baseline ~42 MB | FAIL |
| V4-6 | Fresh mtime | 2026-04-27 15:53:48 (~2 min ago) | PASS |
| V4-7 | Baseline comparison | 22-26: 40–42 MB; 27_0000 + 27_1353: 20 B | FAIL |
Verdict: integrity FAIL (4/7 fail; gzip-only check misleading).
→ Hard Stop #5 TRIGGERED. STOP, không retry, không tự sửa.
§5. Việc 5 — Anomaly #1 (tar lag)
5.1 Directory state
/opt/incomex/backups/vps-backup-20260426_200001/
└── postgresql-directus.sql.gz (20 bytes, mtime 2026-04-26 20:00)
stat: Modify/Change/Birth tất cả 2026-04-26 20:00:02 → directory dừng ở step 1, không có file thêm.
5.2 Process check
ps -eo pid,etime,cmd | grep -iE "tar|backup-to-gdrive"
→ (no matching process)
5.3 backup-gdrive.log evidence (masked)
2026-04-25 20:01:53 INFO : vps-backup-20260425_200002.tar.gz: Copied (new)
2026-04-25 20:01:57 BACKUP DONE: ... Archive: 105M PG: 43M | Qdrant: 104M
HEARTBEAT sent to Kuma (pg-backup-gdrive)
==========================================
BACKUP START: 2026-04-26 20:00:01 CEST
==========================================
[1/5] PostgreSQL dump...
pg_dump: error: query failed: ERROR: permission denied for schema sandbox_tac
pg_dump: detail: Query was: LOCK TABLE ... sandbox_tac.* IN ACCESS SHARE MODE
5.4 Classification & root cause
- Classification: (b) Backup job fail finalization — process đã chết, dir staging chưa tar, không có file Qdrant/archive nào sau step 1.
- Root cause IDENTICAL to Anomaly #2:
directusrole thiếu USAGE trênsandbox_tacschema. backup-to-gdrive.sh step[1/5] PostgreSQL dumpfail → script abort trước khi bước [2/5] Qdrant, [3/5] archive, [4/5] tar, [5/5] rclone copy. - 2026-04-27 20:00 (UTC 13:00) cron run của backup-to-gdrive.sh chưa chạy (cron là 20:00 CEST = 18:00 UTC; current 13:53 UTC; pending). Khi chạy, cũng sẽ fail cùng kiểu.
KHÔNG cleanup directory vps-backup-20260426_200001/ (per Hard Exclusion #10).
§6. Recovery Path Recommendation
Đây là proposal cho orchestrator + GPT R-next, KHÔNG tự thực thi.
6.1 Fix root cause (DDL — CẦN AUTHORIZE riêng)
Option A (preferred, minimal blast radius):
GRANT USAGE ON SCHEMA sandbox_tac TO directus;
GRANT SELECT ON ALL TABLES IN SCHEMA sandbox_tac TO directus;
ALTER DEFAULT PRIVILEGES IN SCHEMA sandbox_tac
GRANT SELECT ON TABLES TO directus;
Option B (excludes sandbox_tac from dump — workaround, không recommend):
- Sửa
pg-backup.shthêm--exclude-schema=sandbox_tac→ mất đi backup của 8 governance tables.
→ Recommend Option A vì: (i) --no-owner --no-acl đã có trong script nên dump không carry permission state ra; (ii) backup phải cover toàn DB; (iii) sandbox_tac chứa governance data quan trọng.
6.2 Sequence sau khi fix
- User authorize Option A (DDL).
- Apply GRANT trong session riêng (out-of-scope dispatch hiện tại).
- Re-run
pg-backup.sh→ expect file ~42 MB, integrity PASS. - Re-run
backup-to-gdrive.sh→ expect tar + rclone upload thành công. - Lúc đó PF-07 v0.5 (lag window 30h) mới có khả năng PASS.
- G6 retry mới authorize được.
6.3 Anti-regression (out-of-scope dispatch)
- Patch
pg-backup.shcheck${PIPESTATUS[0]}của pg_dump (catch fail trước khi gzip thành công trên input rỗng). - Bổ sung Kuma "down" alert path khi pg_dump fail (hiện tại heartbeat im lặng → Kuma sau X phút sẽ trigger).
- Chuẩn hóa S178: mỗi schema mới tạo phải GRANT USAGE/SELECT cho dump role.
§7. Compliance Confirm
- ✅ VPS context (vmi3080463), KHÔNG Mac local
- ✅ Read-only Việc 1, 2, 4, 5; Việc 3 governed-script side effect duy nhất (1 file 20-byte + 1 log line, 0 heartbeat)
- ✅ no DDL · no DML · no SCHEMA mutation · no DB data mutation
- ✅ no script edit · no cron edit · no systemd edit · no rclone destination change
- ✅ no
cat/head/tail/lesstoàn script (chỉgrep -nE) - ✅ no
rclone config show/ cat rclone.conf - ✅ no backup payload download
- ✅ no backup deletion (incl. broken 20-byte files giữ làm evidence)
- ✅ no anomaly #1 cleanup
- ✅ no
sudointeractive - ✅ no G6 retry / no PF-07 patching wrapper
- ✅ no
git commit/git push
Secret hygiene scan (pre-upload)
Patterns scanned trong report:
password=/PASSWORD=→ 0 hittoken=/TOKEN=/Authorization: Bearer→ 0 hit- Kuma URL có token → masked
***(1 occurrence ở §2.2 line 16) - rclone
client_secret/refresh_token→ 0 hit (rclone.conf chưa từng read) postgres://user:pass@→ 0 hit (chỉ giá trịDB_USER="directus"plain, không phải credstring)- API keys (
sk-,ghp_,gh_pat_, JWTeyJ) → 0 hit - DB password literal → 0 hit (script không chứa, dump qua
docker execdùng peer auth)
→ Hygiene scan PASS, 0 leaks.
§8. Hand-off
STOP HERE. Investigation + governed verification done. Both anomalies documented + root cause proven (single shared cause).
Chờ orchestrator + GPT review:
- Authorize Option A GRANT (separate dispatch, DDL gate)
- Sau fix → re-run pg-backup.sh + backup-to-gdrive.sh ngoài cron để rút ngắn restore window
- Sau backup integrity PASS → PF-07 v0.5 wrapper + G6 retry chain
- Cleanup
/opt/incomex/backups/{directus_2026-04-27_0000,directus_2026-04-27_1353}.sql.gz+ dirvps-backup-20260426_200001/sau khi có fresh successful backup (separate dispatch)
Report — 2026-04-27 — Claude Code VPS executor — medium effort — Hard Stop #5 (integrity FAIL) triggered đúng quy trình — GPT R14+R15+R16 chain