VPS Health Investigation Report
VPS Health Investigation Report
Date: 2026-04-08
Mode: READ-ONLY
Target: Incomex VPS 38.242.240.89
Observation window: current state at 2026-04-08T13:23:13+02:00 on VPS, plus logs from 2026-04-02 to 2026-04-08
Step 0. Checkpoint Bat Buoc
Files and docs read
.claude/skills/incomex-rules.mdsearch_knowledge("operating rules SSOT")search_knowledge("vps operating rules architecture health reports")search_knowledge("hien phap v4.0 constitution")- KB document
knowledge/dev/ssot/vps/vps-operating-rules.mdv1.0, rev 2 - KB document
knowledge/dev/ssot/vps/vps-architecture.mdv2.0, rev 4 - KB document
knowledge/dev/laws/constitution.mdv4.4.0, rev 10
3 cau Tuyen ngon
- Vinh vien? Mission nay khong fix. Dau ra vinh vien la mot chan doan co bang chung, tach ro active/recurring/noise, va mot thu tu fix uu tien cho phien sau.
- Nham duoc khong? Khong. Toan bo thu thap giu read-only: khong restart, khong rebuild, khong redeploy, khong sua config, khong chmod/chown, khong xoa/truncate/rotate.
- 100% tu dong? Moi ket luan duoc rut ra tu it nhat 4 nguon doc lap: system/journal, docker/container, nginx/http/ssl, cron/backup/app logs.
Step 1. Scope Va Cach Dieu Tra
Muc tieu
Dieu tra suc khoe VPS trong 3-7 ngay gan day de tra loi 4 cau hoi:
- Loi nao dang active ngay luc kiem tra
- Loi nao dang lap lai 3-7 ngay
- Loi nao chi la warning/noise
- Can fix gi truoc de giu VPS on dinh
4 nguon bang chung bat buoc
- System / journal
systemctl --failedjournalctl -u logrotate --since "7 days ago"journalctl --since "7 days ago" -p 0..4 --no-pager
- Docker / container
docker psdocker inspect ... health/statusdocker logs incomex-nginx|incomex-directus|incomex-agent-data
- Nginx / HTTP / SSL
curl -sI https://vps.incomexsaigoncorp.vn/curl -sS https://vps.incomexsaigoncorp.vn/api/healthopenssl x509 -in /etc/letsencrypt/live/vps.incomexsaigoncorp.vn/cert.pem -noout -dates -issuer -subject
- Cron / backup / app logs
crontab -ltail /opt/workflow/postgres/backup.logtail /opt/incomex/logs/integrity/cron.logtail /opt/incomex/logs/integrity/watchdog.logtail /opt/incomex/logs/backup-gdrive.logtail /var/log/mcp-health.log
Step 2. Snapshot Hien Tai
Host
Command:
ssh root@38.242.240.89 'date -Is; uptime; hostnamectl --static; df -h /; free -m; systemctl --failed --no-legend || true'
Output excerpt:
2026-04-08T13:23:13+02:00
13:23:13 up 55 days, 2:06, 5 users, load average: 0.52, 0.61, 0.59
vmi3080463
/dev/sda1 96G 37G 60G 38% /
Mem: 11960 total, 2931 used, 6243 free
cloud-init.service failed
logrotate.service failed
systemd-networkd-wait-online.service failed
Docker
Command:
ssh root@38.242.240.89 'docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"; echo; docker inspect incomex-nuxt incomex-directus incomex-agent-data incomex-nginx postgres --format "{{.Name}} health={{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}} status={{.State.Status}} started={{.State.StartedAt}}"'
Output excerpt:
uptime-kuma Up 2 hours (healthy)
incomex-agent-data Up 45 hours (healthy)
postgres Up 2 weeks (healthy)
incomex-nginx Up 3 weeks
incomex-nuxt Up 5 days (healthy)
incomex-directus Up 10 days (healthy)
incomex-qdrant Up 3 weeks (healthy)
/incomex-nuxt health=healthy status=running
/incomex-directus health=healthy status=running
/incomex-agent-data health=healthy status=running
/incomex-nginx health=none status=running
/postgres health=healthy status=running
Public HTTP + SSL
Command:
curl -sI https://vps.incomexsaigoncorp.vn/
curl -sS https://vps.incomexsaigoncorp.vn/api/health
ssh root@38.242.240.89 'openssl x509 -in /etc/letsencrypt/live/vps.incomexsaigoncorp.vn/cert.pem -noout -dates -issuer -subject'
Output excerpt:
HTTP/1.1 200 OK
Server: nginx/1.29.5
Strict-Transport-Security: max-age=31536000; includeSubDomains
Content-Security-Policy: default-src 'self'; ...
{"status":"healthy","version":"0.1.0","langroid_available":true,"services":{"qdrant":{"status":"ok","latency_ms":10.4},"postgres":{"status":"ok","latency_ms":0.7},"openai":{"status":"ok","latency_ms":0.0}},"service_count":3,"data_integrity":{"document_count":914,"vector_point_count":1502,"ratio":1.64,"sync_status":"ok"}}
notBefore=Feb 18 08:02:06 2026 GMT
notAfter=May 19 08:02:05 2026 GMT
issuer=C = US, O = Let's Encrypt, CN = E7
subject=CN = vps.incomexsaigoncorp.vn
Cron / backup snapshot
Command:
ssh root@38.242.240.89 'crontab -l'
Output excerpt:
0 2 * * * /opt/workflow/postgres/backup.sh >> /opt/workflow/postgres/backup.log 2>&1
0 */6 * * * /opt/incomex/deploys/web-test/scripts/integrity/cron-integrity.sh >> /opt/incomex/logs/integrity/cron.log 2>&1
0 * * * * /opt/incomex/deploys/web-test/scripts/integrity/watchdog-monitor.sh >> /opt/incomex/logs/integrity/watchdog.log 2>&1
0 20 * * * /opt/incomex/scripts/backup-to-gdrive.sh >> /opt/incomex/logs/backup-gdrive.log 2>&1
0 2 * * * /opt/incomex/scripts/pg-backup.sh >> /opt/incomex/backups/pg/backup.log 2>&1
Tong quan 1 cau
Host va public surface hien tai van song tot, nhung lop van hanh nen chua on dinh: co 2 van de dang active ngay luc kiem tra, 1 loi daily backup legacy van lap lai, va mot cum warning/noise dang lam xau observability.
Step 3. Ket Luan Nhanh
Verdict
- Host resources OK: CPU/load, RAM, disk deu con rong.
- Service surface OK:
7/7containers dangUp,api/healthhealthy, homepage200, SSL con han den2026-05-19. - Van hanh nen chua OK: co 3 van de can uu tien:
logrotate.servicefail hang ngay tu2026-04-02den2026-04-08, va ngay luc kiem tra co floodrsyslog omfile suspendedDieu 31cron van fail moi6hvi thieuAGENT_DATA_URL; watchdog dang stale150hworkflowbackup cron van goi container retiredworkflow-postgresmoi dem02:00
- Diem tich cuc: GDrive backup luc
2026-04-08 12:22:41 CESTthanh cong; public app vaagent-datahealth deu xanh.
Log quan trong nhat trong 7 ngay
journalctl -u logrotate --since "7 days ago"/opt/incomex/logs/integrity/cron.log/opt/incomex/logs/integrity/watchdog.log/opt/workflow/postgres/backup.log/opt/incomex/logs/backup-gdrive.log
Step 4. Phan Loai Van De
4.1 Active Now
A1. System logging degradation: logrotate fail + rsyslog omfile suspend flood
- Component:
logrotate.service,rsyslogd,/var/log/mcp-health.log,/var/log/config-integrity.log - Symptom:
logrotate.servicedang o trang thai failed; journal ngay luc kiem tra lap lien tucaction-0-builtin:omfile suspended - Evidence:
Apr 08 00:00:02 logrotate: error: incomex:8 duplicate log entry for /var/log/config-integrity.log
Apr 08 00:00:02 logrotate: error: mcp-health:1 duplicate log entry for /var/log/mcp-health.log
Apr 08 00:00:03 systemd[1]: Failed to start logrotate.service - Rotate log files.
Apr 08 13:23:19 rsyslogd: action 'action-0-builtin:omfile' suspended ...
Apr 08 13:23:19 rsyslogd: action 'action-0-builtin:omfile' ... next retry is Wed Apr 8 13:23:49 2026
/var/log/mcp-health.log size=40320
/var/log/config-integrity.log size=0
- Frequency:
logrotatefail moi ngay luc00:00;rsyslogsuspension dang xay ra ngay luc kiem tra - Impact: rotation/logging khong on dinh; de che mat loi khac va lam giam observability
- Confidence: Cao
A2. Dieu 31 dang critical ngay luc kiem tra
- Component:
cron-integrity.sh,watchdog-monitor.sh,sync-check.js - Symptom: cron Điều 31 fail moi
6h; watchdog dang alert lien tuc vi runner khong chay tu2026-04-02 - Evidence:
=== Dieu 31 Cron — cron-20260408-000001 ===
Error: Missing required env: AGENT_DATA_URL
=== Dieu 31 Cron — cron-20260408-060002 ===
Error: Missing required env: AGENT_DATA_URL
=== Dieu 31 Cron — cron-20260408-120001 ===
Error: Missing required env: AGENT_DATA_URL
WATCHDOG ALERT: Runner Dieu 31 khong chay tu 2026-04-02T04:01:15.687Z (150h ago, max 93600s)
- Frequency: moi
6hcho cron, moi1hcho watchdog - Impact: mat scanner integrity cap cao; loi nay danh thang vao kha nang tu phat hien he thong
- Confidence: Cao
4.2 Recurring 3-7d
R1. Workflow backup legacy cron van fail moi dem
- Component:
/opt/workflow/postgres/backup.sh,root crontab - Symptom: backup chain cu van goi container retired
workflow-postgres - Evidence:
0 2 * * * /opt/workflow/postgres/backup.sh >> /opt/workflow/postgres/backup.log 2>&1
[2026-04-02T00:00:01Z] START backup -> /opt/workflow/postgres/backups/workflow_20260402T000001Z.sql.gz
Error response from daemon: No such container: workflow-postgres
...
[2026-04-08T00:00:02Z] START backup -> /opt/workflow/postgres/backups/workflow_20260408T000002Z.sql.gz
Error response from daemon: No such container: workflow-postgres
- Frequency: daily
02:00 - Impact: backup status bi loang; de gay nham rang production backup van con mot chain hong song song
- Confidence: Cao
R2. Public surface van xanh nhung homepage probe tao warning lap lai o nginx
- Component:
incomex-nginx, upstream Nuxt homepage - Symptom: nginx lap lai warning
upstream response is buffered to a temporary filekhi monitor hoac scanner hit/ - Evidence:
[warn] an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/... while reading upstream, request: "GET / HTTP/1.1"
- Frequency: lap lai nhieu lan trong tail log ngay
2026-04-08 - Impact: hien tai chua thanh outage; nghieng ve tuning/perf warning
- Confidence: Trung binh-cao
4.3 Warning / Noise
W1. cloud-init.service failed nhung chi la dau vet boot cu
- Component:
cloud-init.service - Symptom: service nam trong
systemctl --failed - Evidence:
Active: failed since Thu 2026-02-12 10:16:52 CET; 1 month 24 days ago
- Frequency: khong co dau vet lap lai trong 7 ngay
- Impact: hien tai khong thay anh huong runtime
- Confidence: Cao
W2. systemd-networkd-wait-online.service failed nhung chi la dau vet boot cu
- Component:
systemd-networkd-wait-online.service - Symptom: service nam trong
systemctl --failed - Evidence:
Active: failed since Thu 2026-02-12 10:19:52 CET; 1 month 24 days ago
Timeout occurred while waiting for network connectivity.
- Frequency: khong lap lai trong 7 ngay
- Impact: hien tai khong thay anh huong runtime
- Confidence: Cao
W3. UFW block va internet scan la noise binh thuong
- Component: kernel / UFW
- Symptom: journal co nhieu dong
[UFW BLOCK] - Evidence:
Apr 08 13:23:06 kernel: [UFW BLOCK] ... DPT=29842 ...
Apr 08 13:23:19 kernel: [UFW BLOCK] ... DPT=3001 ...
- Frequency: lien tuc trong journal
- Impact: hien tai la dau hieu firewall dang chan scan, khong phai outage
- Confidence: Cao
W4. mcp-health dang xanh; khong phai van de
- Component:
/var/log/mcp-health.log - Symptom: duoc logrotate tranh chap, nhung ban than test health dang pass
- Evidence:
=== MCP Connectivity Test ===
1. MCP handshake (initialize)... PASS
2. MCP tools/list... PASS (11 tools)
3. Agent Data /health... PASS
4. Directus health... PASS
5. OPS Proxy (tasks endpoint)... PASS
- Frequency: moi
5 phut - Impact: day la bang chung side-channel rang mcp path van song
- Confidence: Cao
4.4 Unknowns / Khong Chac
U1. PG backup moi co dau hieu da duoc sua, nhung chua co overnight evidence
- Component:
/opt/incomex/scripts/pg-backup.sh,/opt/incomex/backups/pg - Symptom: co 4 file backup
35Mtao trong ngay2026-04-08, nhung/opt/incomex/backups/pg/backup.loghien tai trong - Evidence:
directus_2026-04-08_0922.sql.gz 35M
directus_2026-04-08_0926.sql.gz 35M
directus_2026-04-08_1019.sql.gz 35M
directus_2026-04-08_1021.sql.gz 35M
- Why uncertain: chua co bang chung cho lan cron dem tiep theo; co the day la run tay/hotfix ban ngay
- Impact: chua the dong hoan toan bai toan backup moi
- Confidence: Trung binh
U2. Khong tim thay bang chung moi de ket luan loi Git VPS->GH van active
- Component:
/var/log/git-push-gh.log - Symptom: trong phien nay file log khong tra ve dong loi moi; chi xac nhan duoc
backup-gdrive.logthanh cong - Evidence:
===== git push =====
(no recent output in current check)
- Why uncertain: bao cao truoc da tung thay
FATAL: no token, nhung lan kiem tra nay khong co bang chung moi trong file dang doc - Impact: khong dua vao nhom active/recurring cua report nay
- Confidence: Trung binh-thap
U3. GOOGLE_APPLICATION_CREDENTIALS tren agent-data co ve la hop le, nhung day la inference
- Component:
incomex-agent-data - Evidence:
GOOGLE_APPLICATION_CREDENTIALS=/app/credentials/google-credentials.json
local|7
- Why uncertain: bang chung nghieng manh ve GSM-only, nhung phien nay khong trace het code runtime
- Impact: khong coi la health issue hien tai
- Confidence: Trung binh
Step 5. De Xuat Uu Tien
Fix ngay hom nay
-
Khoi phuc logrotate va dung flood rsyslog
- File/area:
/etc/logrotate.d/incomex,/etc/logrotate.d/mcp-health,/etc/logrotate.d/config-integrity - Action: bo duplicate log entries de
logrotate.servicetro lai xanh - Risk neu chua fix: log rotation/log observability tiep tuc bi sut giam
- Rollback: restore file logrotate truoc khi sua
- File/area:
-
Sua Dieu 31 runner env + watchdog
- File/area:
cron-integrity.sh, env choAGENT_DATA_URL, watchdog expectations - Action: cap lai env va xac nhan runner chay lai o chu ky ke tiep
- Risk neu chua fix: mat he thong integrity scanner cap cao
- Rollback: revert env/script ve commit truoc neu runner fix gay side effect
- File/area:
-
Rut workflow backup legacy ra khoi runtime active
- File/area:
root crontab,/opt/workflow/postgres/backup.sh - Action: neu PG backup moi la SSOT thi bo cron legacy chain hoac archive no ra khoi runtime path
- Risk neu chua fix: tiep tuc co backup loi hang dem va lam loang recovery picture
- Rollback: giu ban archive script cu ngoai active crontab
- File/area:
Fix trong 1-3 ngay
-
Xac nhan PG backup moi bang overnight run that
- Can co bang chung cho lan
2026-04-09 02:00va backup log khong trong
- Can co bang chung cho lan
-
Danh gia warning homepage perf
- Kiem tra vi sao probe
/taoproxy_tempwarnings lap lai - Chi xep sau 3 loi tren vi hien tai surface van
200
- Kiem tra vi sao probe
-
Lam sach failed units boot-only neu muon dashboard system sach
cloud-init.servicesystemd-networkd-wait-online.service- Chi lam sau khi xac nhan day thuc su la stale noise
Theo doi them
- Theo doi
mcp-health.logsau khi fix logrotate de chac chan rsyslog khong con suspend - Theo doi
watchdog.logsau khi fix Điều 31 de stale hour reset ve binh thuong - Theo doi
backup-gdrive.logvi day hien la kenh backup dang xanh ro rang nhat trong phien nay
Step 6. Appendix
Lenh chinh da chay
sed -n '1,260p' .claude/skills/incomex-rules.md
search_knowledge("operating rules SSOT")
search_knowledge("vps operating rules architecture health reports")
search_knowledge("hien phap v4.0 constitution")
ssh root@38.242.240.89 'date -Is; uptime; hostnamectl --static; df -h /; free -m; systemctl --failed --no-legend || true'
ssh root@38.242.240.89 'journalctl --since "7 days ago" -p 0..4 --no-pager | tail -n 200'
ssh root@38.242.240.89 'journalctl -u logrotate --since "7 days ago" --no-pager | tail -n 120'
ssh root@38.242.240.89 'systemctl status cloud-init.service --no-pager | sed -n "1,30p"'
ssh root@38.242.240.89 'systemctl status systemd-networkd-wait-online.service --no-pager | sed -n "1,30p"'
ssh root@38.242.240.89 'docker ps --format ...'
ssh root@38.242.240.89 'docker inspect incomex-nuxt incomex-directus incomex-agent-data incomex-nginx postgres --format ...'
ssh root@38.242.240.89 'docker logs --tail 80 incomex-nginx'
ssh root@38.242.240.89 'docker logs --tail 80 incomex-directus'
ssh root@38.242.240.89 'docker logs --tail 80 incomex-agent-data'
curl -sI https://vps.incomexsaigoncorp.vn/
curl -sS https://vps.incomexsaigoncorp.vn/api/health
ssh root@38.242.240.89 'openssl x509 -in /etc/letsencrypt/live/vps.incomexsaigoncorp.vn/cert.pem -noout -dates -issuer -subject'
ssh root@38.242.240.89 'crontab -l'
ssh root@38.242.240.89 'tail -n 60 /opt/workflow/postgres/backup.log'
ssh root@38.242.240.89 'tail -n 60 /opt/incomex/logs/integrity/cron.log'
ssh root@38.242.240.89 'tail -n 60 /opt/incomex/logs/integrity/watchdog.log'
ssh root@38.242.240.89 'tail -n 40 /opt/incomex/logs/backup-gdrive.log'
ssh root@38.242.240.89 'tail -n 20 /var/log/mcp-health.log'
ssh root@38.242.240.89 'ls -lah /opt/incomex/backups/pg /opt/workflow/postgres /opt/incomex/logs'
ssh root@38.242.240.89 'docker exec postgres sh -lc "... SELECT storage, count(*) FROM public.directus_files GROUP BY 1 ..."'
ssh root@38.242.240.89 'docker inspect incomex-agent-data --format "{{range .Config.Env}}{{println .}}{{end}}" | grep -E "^GOOGLE_APPLICATION_CREDENTIALS" || true'
Logs da doc
journalctl -u logrotatejournalctl --since "7 days ago" -p 0..4/opt/workflow/postgres/backup.log/opt/incomex/logs/integrity/cron.log/opt/incomex/logs/integrity/watchdog.log/opt/incomex/logs/backup-gdrive.log/var/log/mcp-health.logdocker logs incomex-nginxdocker logs incomex-directusdocker logs incomex-agent-data
Containers / services da kiem
uptime-kumaincomex-agent-datapostgresincomex-nginxincomex-nuxtincomex-directusincomex-qdrantlogrotate.servicecloud-init.servicesystemd-networkd-wait-online.service
Ket luan cuoi
VPS hien tai van phuc vu duoc production, nhung chua dat trang thai van hanh on dinh. Hai van de dang active ngay luc kiem tra la:
logrotatefail +rsyslogsuspend floodDieu 31watchdog critical vi runner fail lien tuc
Loi recurring quan trong nhat trong 7 ngay la workflow backup legacy cron van goi workflow-postgres hang dem. Mat khac, kenh backup GDrive va health endpoints dang xanh, nen he thong chua roi vao outage, nhung observability va integrity dang yeu.