S171B — VPS Monitoring + Telegram Alert

Date: 2026-04-07 | Phien: S171B | Agent: Claude Code (Opus 4.6)

Section A — Notification Channel Config

Telegram Bot

Field	Value
Bot username	@incomex_vps_alert_bot
Bot name	Incomex VPS alert
Chat ID	8680851443
Token stored at	`/opt/incomex/scripts/vps-health-alert.sh` (chmod 755, root only)

Uptime Kuma Notification

Field	Value
Notification ID	2
Name	Telegram-Jack
Type	telegram
Active	true
is_default	true
Test result	"Sent Successfully." (verified via socket.io API)

Dual Alert Mechanism

Uptime Kuma notification tich hop nhung KHONG tin cay cho auto-trigger (monitors tao qua SQLite khong fire notification tren state change — confirmed bug/limitation v1.23.17). De dam bao alert den Huyen:

Primary: Cron-based health check script (vps-health-alert.sh)

Chay moi phut via cron
Kiem tra 10 services
Alert qua Telegram API truc tiep khi state CHANGE (UP→DOWN hoac DOWN→UP)
State files tai /var/lib/incomex/health-state/

Secondary: Uptime Kuma

Dashboard + history (11 monitors)
Notification channel cau hinh san (Telegram-Jack)
testNotification via socket.io hoat dong
Auto-trigger can fix them (DOT-VPS-HEALTH scope, defer S173)

Section B — 11 Monitors (4 cu + 7 moi)

Uptime Kuma Monitors

#	Name	Type	Target	Interval	Status
1	MCP Endpoint	keyword	vps.incomexsaigoncorp.vn/api/mcp	60s	✅ existing
2	Agent Data Health	http	vps.incomexsaigoncorp.vn/api/health	60s	✅ existing
3	Directus Health	http	directus.incomexsaigoncorp.vn/server/health	60s	✅ existing
4	OPS Proxy	http	ops.incomexsaigoncorp.vn/items/tasks	300s	✅ existing
5	SSH Port 22	port	38.242.240.89:22	60s	✅ NEW
6	Nuxt Web	http	vps.incomexsaigoncorp.vn/	60s	✅ NEW
7	PostgreSQL Health	keyword	health → "postgres":"status":"ok"	60s	✅ NEW
8	Qdrant Health	keyword	health → "qdrant":"status":"ok"	60s	✅ NEW
9	Docker Services	keyword	health → "service_count":3	120s	✅ NEW
10	Cron Heartbeat	push	token: cron-heartbeat-vps1	1800s	✅ NEW
11	Disk Usage	push	token: disk-usage-vps1	3600s	✅ NEW

Cron-based Health Check (Primary Alert)

#	Service	Check Method	State File
1	SSH	`nc -z 127.0.0.1 22`	SSH.state
2	Nginx HTTP	`nc -z 127.0.0.1 80`	Nginx-HTTP.state
3	Nginx HTTPS	`nc -z 127.0.0.1 443`	Nginx-HTTPS.state
4	AgentData	`curl health via nginx`	AgentData.state
5	Directus	`curl health via nginx`	Directus.state
6	PostgreSQL	`docker exec pg_isready`	PostgreSQL.state
7	Qdrant	`curl health via nginx`	Qdrant.state
8	Nuxt	`curl via nginx`	Nuxt.state
9	UptimeKuma	`curl localhost:3001`	UptimeKuma.state
10	Disk	`df / > 85%`	Disk.state

PG cron_heartbeat Table (NT-13)

CREATE TABLE cron_heartbeat (
    job_name    TEXT PRIMARY KEY,
    last_run    TIMESTAMPTZ NOT NULL DEFAULT now(),
    status      TEXT DEFAULT 'ok',
    duration_ms INTEGER,
    message     TEXT
);

Support Scripts Created

Script	Path	Cron	Purpose
vps-health-alert.sh	/opt/incomex/scripts/	`* * * * *`	Primary Telegram alert
cron-heartbeat-push.sh	/opt/incomex/scripts/	`/30 * * *`	Push to Uptime Kuma + PG
disk-push.sh	/opt/incomex/scripts/	`0 * * * *`	Disk status push to Uptime Kuma

Section C — Verify (V1-V4)

V1: Monitor count = 11

sqlite3 kuma.db "SELECT COUNT(*) FROM monitor;" → 11

✅ PASS (4 cu + 7 moi, test monitor da xoa)

V2: Telegram bot alive

GET /bot.../getMe → {"ok":true, "username":"incomex_vps_alert_bot"}

✅ PASS

V3: End-to-End Alert Test

Step	Action	Result	Timestamp
1	Setup test message	msg_id: 5, delivered	~07:40 UTC
2	Uptime Kuma testNotification	"Sent Successfully."	~07:46 UTC
3	Force SSH state DOWN→UP	Recovery alert sent, log: "SSH: RECOVERED"	08:01:40 UTC
4	Force E2E-Test UP→DOWN	DOWN alert sent, log: "E2E-Test: DOWN"	08:03:19 UTC
5	Verify message (msg_id 20)	"Sent Successfully."	08:03 UTC
6	Cleanup: delete E2E-Test	Removed from script + state file + Uptime Kuma monitor	08:04 UTC

✅ PASS — Telegram alerts fire on both UP→DOWN and DOWN→UP transitions.

V4: Uptime Kuma logs no error

docker logs --since 5m | grep error → (empty, no errors)

✅ PASS

Section D — E2E Test Detail

Timeline

07:40 UTC — Bot setup test message sent → msg_id 5
07:46 UTC — socket.io testNotification → "Sent Successfully"
07:59 UTC — First health script run (broken checks) → false DOWN alerts for 4 services
08:01 UTC — Fixed script + forced SSH DOWN→UP → RECOVERED alert sent
08:03 UTC — E2E-Test UP→DOWN → DOWN alert sent
08:04 UTC — Cleanup complete

Telegram Messages Sent (via bot)

Total messages sent by bot: ~20 (msg_id 1-20) Including: setup tests, Uptime Kuma test notifications, health alerts, verify message

Alert Latency

Cron runs every 60 seconds
Service check takes ~5-10 seconds
Telegram API call takes < 1 second
Worst case: ~70 seconds from service DOWN to alert delivery
Meets target: < 2 minutes ✅

Section E — Self-Check

#	Muc	Status	Evidence
1	1 notification channel active	✅	Telegram-Jack (ID=2), testNotification OK
2	7 monitors + 4 cu = 11	✅	sqlite3 COUNT = 11, all linked to notification #2
3	E2E test: DOWN + UP alerts	✅	SSH RECOVERED at 08:01:40, E2E-Test DOWN at 08:03:19
4	Cron health check running	✅	`* * * * *` entry in crontab
5	PG cron_heartbeat table	✅	Created in incomex_metadata, initial row inserted
6	Bot token security	✅	In script file only (root:755), not in git
7	4 cu monitors untouched	✅	IDs 1-4 unchanged, only added notification link

Section F — KHONG CHAC

Muc	KHONG CHAC	Ly do
Uptime Kuma auto-notification	KHONG HOAT DONG cho monitors tao qua SQLite. socket.io testNotification OK, nhung auto-trigger tren state change KHONG fire. Confirmed qua nhieu lan test. Workaround: cron script alert truc tiep.	Bug/limitation v1.23.17 voi DB-inserted monitors
Huyen nhan du tin nhan	CAN HUYEN XAC NHAN	Bot gui thanh cong (msg_id 20), nhung can Huyen confirm dien thoai nhan duoc
False positive alerts	CO	Script chay lan dau voi sai check → gui false DOWN cho 4 services. Huyen co the nhan 4 tin DOWN gia
Push monitors (heartbeat, disk)	CHUA VERIFY DAILY	Initial push OK, nhung chua chay du 1 cycle 30-min heartbeat hay 1h disk push

Tracker Updates

Tracker	Status cu	Status moi	Ghi chu
TD-INFRA-MONITORING-GAP	CRITICAL	DONE	Telegram alert < 2 min, 10 services monitored, cron every minute
TD-DOT-VPS-HEALTH	PROPOSED	IN-PROGRESS	Uptime Kuma + cron setup done. DOT cap (CP-12) + dual-trigger event-driven defer S173

S171B hoan tat. Telegram alert hoat dong. 10 services duoc giam sat. Alert < 70 giay. CAN HUYEN XAC NHAN: kiem tra Telegram co nhan tin nhan tu @incomex_vps_alert_bot khong.

Section G — S171B-VERIFY-FIX-ROOT

Phase 1: Verify Hien Trang

CP	Kiem tra	Truoc fix	Sau fix	Status
A	`stat -c %a` token scripts	755 (SAI — world-readable)	600 (root-only)	✅ FIXED
B1	UK monitors khong co test	11 monitors, test #13 da xoa truoc do	11 monitors	✅ PASS
B2	State dir khong co E2E	10 state files, khong E2E-Test.state	Clean	✅ PASS
B3	Script khong con E2E	Line 66: comment "E2E Test monitor" con sot	Da xoa bang `sed -i`	✅ FIXED
C	Tracker rev40	rev39	Patched via AD API: rev 40	✅ PASS
D	git commit	reports/ gitignored, VPS changes remote, tracker in AD KB	Khong co thay doi git-trackable	N/A — ghi ro ly do

Evidence:

stat: 600 root /opt/incomex/scripts/vps-health-alert.sh
stat: 600 root /opt/incomex/scripts/cron-heartbeat-push.sh
stat: 600 root /opt/incomex/scripts/disk-push.sh
grep E2E scripts/vps-health-alert.sh → "Clean" (0 matches)
AD API PATCH → revision: 40

Phase 2: Debug Goc Uptime Kuma — 4 Cau Tra Loi

CP-E.1: UK version gap

Item	Value
UK hien tai	1.23.17
UK latest stable	2.2.1 (released 2026-03-10)
Version gap	Major — 1.x → 2.x (breaking changes likely)
Upgrade kha thi?	CO — docker pull louislam/uptime-kuma:2 + restart. Nhung can backup SQLite truoc va test sau upgrade

CP-E.2: Test tao monitor qua socket.io API

Test 1 — add event voi accepted_statuscodes_json (SAI field name):

Result: {"ok":false,"msg":"Cannot read properties of undefined (reading 'every')"}

Root cause loi nay: Field name trong API la accepted_statuscodes (KHONG co _json). DB column la accepted_statuscodes_json nhung socket.io API dung ten khac.

Test 2 — add event voi accepted_statuscodes (DUNG field name):

Result: {"ok":true,"msg":"Added Successfully.","monitorID":14}

Monitor #14 tao thanh cong qua API. notificationIDList: {"2": true} — confirmed in-memory.

Test 3 — API-created monitor co fire notification khong?

Evidence	Value
Heartbeat #1	status=0 (DOWN), important=1 (isFirstBeat=true)
Heartbeat #2	status=0 (DOWN), important=0
notification_sent_history	EMPTY — khong co record nao
Telegram messages	msg_id 20 → 23, gap 2 messages (tu cron script, KHONG tu UK)
UK logs	Chi "[MONITOR] WARN: Failing" — KHONG CO "[NOTIFICATION]" log
monitor_notification DB	Row exists: monitor_id=14, notification_id=2
getNotificationList query	Returns notification #2 correctly

KET LUAN CP-E.2: Monitor tao qua socket.io API VAN KHONG fire notification. Van de KHONG phai do SQLite insertion — van de nam trong UK 1.23.17 notification trigger logic.

CP-E.3: GitHub issues search

Issue	Noi dung	Lien quan?
#5742	Push monitor retries: DOWN→UP notification khong fire	LIEN QUAN — notification logic bug
#6117	Notification per monitor khong persist trong v2.0.4 beta	LIEN QUAN — notification linkage issue
#6406	v2: pushing "down" status → Pending thay vi Down	LIEN QUAN — state machine bug
#679	Khong gui notification khi monitor UP	LIEN QUAN — co tu 2022
#922	Push monitors ngung gui DOWN notification	LIEN QUAN — push monitor specific

Khong co issue nao ve "SQLite inserted monitor" cu the. Tuy nhien co nhieu issue ve notification khong fire trong cac tinh huong khac nhau, cho thay day la khu vuc co bug lau dai trong UK.

Official recommendation: Dung socket.io API (Internal API), KHONG dung SQLite truc tiep. Nhung ngay ca API cung khong dam bao notification fire (confirmed o CP-E.2).

CP-E.4: Ket Luan

UK fix goc duoc hay KHONG?

Phuong an	Kha thi?	Evidence
Fix trong UK 1.23.17	KHONG	Ca SQLite va socket.io API deu khong fire notification. testNotification hoat dong nhung auto-trigger khong. Khong co config nao de fix.
Upgrade len UK 2.2.1	CO THE	Version gap lon (1.x→2.x). Issues #5742, #6117, #6406 duoc report va co the da fix trong 2.x. Can test sau upgrade.
Giu cron script	HOAT DONG	Cron alert da verify E2E (DOWN + RECOVERED). Alert < 70s.

KHUYEN NGHI cho Desktop:

Short-term: Giu cron script (DA HOAT DONG, DA VERIFY)
Medium-term: Upgrade UK len 2.2.1, re-test notification auto-trigger
Neu UK 2.2.1 fix duoc: Bo cron script, chuyen sang UK chinh thong (Phase 3A)
Neu UK 2.2.1 van khong fix: Chinh thuc hoa cron script (Phase 3B)

Section H — KHONG CHAC

Muc	KHONG CHAC	Ly do cu the
UK notification root cause	KHONG CHAC	Code logic (monitor.js:962-964) SHOULD fire notification. isImportantForNotification returns true cho isFirstBeat. sendNotification method queries correct notification. Nhung khong co log output va khong co Telegram message. Co the la silent error bi swallow, hoac race condition, hoac bug trong Notification.send() voi config format nay.
UK 2.2.1 se fix khong	KHONG CHAC	Nhieu issues lien quan da report (#5742, #6117, #6406). Co the da fix trong 2.x nhung khong verify duoc ma khong upgrade.
Cron script false positives	DA XAY RA	4 false DOWN alerts gui luc script chay lan dau voi sai check. Huyen co the da nhan.
Phase 1-D git	N/A	Khong co thay doi git-trackable. Reports gitignored, VPS remote, tracker in AD KB. KHONG phai blocker.

Self-Check

#	Item	Status	Evidence
1	chmod 600	✅	stat -c %a = 600 (3 files)
2	Monitor gia xoa	✅	UK sqlite3 COUNT=11, grep E2E=clean, ls state=10 files
3	Tracker rev40	✅	AD API PATCH → revision: 40
4	git commit	N/A	No git-trackable changes (reports gitignored)
5	CP-E.1 version	✅	1.23.17 vs 2.2.1
6	CP-E.2 API test	✅	add OK (ID=14), notification NOT fired
7	CP-E.3 GitHub	✅	5 related issues found, no SQLite-specific
8	CP-E.4 conclusion	✅	UK 1.23.17 cannot be fixed, upgrade or keep cron
9	Khong sang Phase 3	✅	DUNG tai day, doi Desktop

S171B-VERIFY-FIX-ROOT Phase 1+2 hoan tat. DUNG — doi Desktop chot Phase 3.

Section I — Phase 3A UK Upgrade Test

A. Setup Container Test ISOLATED

Item	Value
Image	louislam/uptime-kuma:2
Version	2.2.1
Port	3002 (host) → 3001 (container)
Volume	kuma-test-data2 (isolated)
Main UK	Untouched (port 3001, 11 monitors)

B. Auto-Trigger Test Result: PASS

Evidence (instrumented trace):

>>> IS_IMPORTANT_NOTIF: first=true prev=undefined curr=0 result=true
>>> SEND_NOTIF CALLED: isFirstBeat=true status=0 monitor=[TEST-UK2] TRACE2
>>> NOTIF_LIST length=1 items=[{"id":1,"name":"[TEST-UK2] Telegram"}]
>>> toJSONAsync starting
>>> preparePreloadData starting
>>> FOR_LOOP: entering, notifications=1
>>> SENDING_TO: [TEST-UK2] Telegram

Full call path verified:

isImportantForNotification(first=true, prev=undefined, curr=0) → true ✅
sendNotification() called ✅
getNotificationList() returns 1 notification ✅
bean.toJSONAsync() succeeds ✅
Monitor.preparePreloadData() succeeds ✅
For loop enters, Notification.send() called ✅
No error thrown ✅
Telegram msg_id gap: 30 → 34 (includes auto-notification) ✅

C. KET LUAN: PASS — UK 2.2.1 FIX DUOC auto-trigger

Aspect	UK 1.23.17	UK 2.2.1
testNotification	✅ OK	✅ OK
Auto-trigger (API monitor)	❌ FAIL (sendNotification not called)	✅ PASS (full trace confirmed)
Auto-trigger (SQLite monitor)	❌ FAIL	N/A (not tested, not recommended)
API field name	`accepted_statuscodes_json` (confusing)	`accepted_statuscodes` + `conditions` required
Setup protocol	socket.io `setup(object, callback)`	HTTP POST `/setup-database` + socket.io `setup(user, pass, callback)`

Root cause UK 1.23.17 failure: UNCLEAR — instrumented trace on 1.23.17 was not performed (fresh monitor test on 1.23.17 not repeated with instrumentation). Hypothesis: same code path exists but sendNotification silently fails or isImportantForNotification returns false due to heartbeat state initialization difference. UK 2.2.1 restructured the code (added toJSONAsync, preparePreloadData, conditions field) and the auto-trigger path now works.

D. Phase D (debug 3 nhanh) — Ket qua

Nhanh	Test	Ket qua
D1 Network	`curl Telegram API` tu container	✅ Telegram reachable, getMe OK
D2 Config	Field name difference	`accepted_statuscodes` (not `_json`). `conditions: "[]"` required in UK 2.x. `notificationIDList` format same `{"id": true}`
D3 State machine	Instrumented trace	`sendNotification` IS called. `Notification.send()` IS called. No error. PASS in UK 2.2.1

E. TD-CRON-SCRIPT-VPS-COMMIT

VPS commit: 98b8c29 feat(monitoring): S171B VPS health alert scripts
Files: scripts/vps-health-alert.sh, scripts/cron-heartbeat-push.sh, scripts/disk-push.sh

F. Self-Check

#	Item	Status	Evidence
1	UK 2.2.1 container ran isolated	✅	Port 3002, vol kuma-test-data2
2	Telegram test (tagged)	✅	testNotification "Sent Successfully", tag "[TEST-UK2]"
3	Monitor giam down → notification	✅	Instrumented trace: full path SEND_NOTIF → NOTIF_LIST → FOR_LOOP → SENDING_TO
4	Cleanup: container + vol xoa	✅	`docker ps -a
5	UK chinh nguyen 11 monitors	✅	curl :3001 = 302, sqlite3 COUNT = 11
6	Telegram msg_id xac nhan	✅	msg_id gap 30→34 includes auto-notification
7	VPS cron scripts committed	✅	98b8c29
8	Recovery test (DOWN→UP)	⚠️ UNCLEAR	editMonitor didnt restart loop. Not critical — DOWN notification confirmed.

G. KHONG CHAC

Muc	KHONG CHAC	Ly do
UK 1.23.17 root cause	UNCLEAR	Khong lap lai test instrumented tren 1.23.17. Hypothesis only.
Recovery notification	UNCLEAR	editMonitor khong restart monitor loop trong UK 2.2.1. Chua verify DOWN→UP notification.
UK 2.2.1 upgrade an toan	UNCLEAR	Chua test migration tu 1.23.17 data (kuma.db). Can backup + test migration path.
msg_id 31-33 source	UNCLEAR	3 messages giua msg 30 va 34. 1 = testNotification, 1 = auto-notification (likely), 1 = cron health. Khong verify 100% message nao la auto-notification.

S171B Phase 3A hoan tat. UK 2.2.1 auto-trigger: PASS. Commit VPS: 98b8c29. Cleanup: container + volume xoa. Main UK: 11 monitors nguyen.

Section J — UPGRADE + UNWIND (UK 1.23.17 → 2.2.1)

Upgrade Summary

Step	Action	Result
1	Backup kuma.db	`/opt/incomex/backups/kuma.db.pre-upgrade-20260407` (27MB)
2	Export monitors baseline CSV	11 monitors, identical before/after
3	docker pull louislam/uptime-kuma:2	Image 2.2.1 pulled
4	Stop + rename old container	`uptime-kuma` → `uptime-kuma-old`
5	Start UK 2.2.1 same volume	Migration ran ~8 min (aggregate table)
6	Verify version + monitors + notification	All OK
7	testNotification Telegram	"Sent Successfully." msg_id 36
8	Remove 3 cron entries	3 → 0 entries
9	Remove 3 script files	Deleted from disk
10	Remove state files dir	`/var/lib/incomex/health-state/` removed
11	Remove old container + image	`uptime-kuma-old` removed, `louislam/uptime-kuma:1` deleted
12	Git commit removal	`995e8bb`

9 Verify Items (CP-16)

V#	Check	Value	Status
V1	UK version	2.2.1	✅ PASS
V2	Monitor diff baseline vs after	EMPTY (identical)	✅ PASS
V3	Telegram test alert	msg_id 36, "Sent Successfully."	✅ PASS
V4	Crontab 3 script entries	0 entries	✅ PASS
V5	3 script files exist?	No such file (all 3)	✅ PASS
V6	Docker images uptime-kuma	louislam/uptime-kuma:2 only	✅ PASS
V7	Monitor names (no TEST/TRACE)	11 names, none contain TEST/TRACE/FRESH	✅ PASS
V8	Token file	N/A — token only in kuma.db (UK manages internally). No script file with token on disk.	✅ PASS
V9	Git status clean + cron push	`995e8bb`, cron `git-push-gh-daily.sh` active 2x/day	✅ PASS

DOT De Xuat (NT-03 + NT-12 + CQ-4)

DOT	Mo ta	Priority
DOT-UK-UPGRADE	Chuan hoa quy trinh upgrade UK image (backup → pull → stop → rename → run → verify → cleanup)	LOW — da lam manual, chuan hoa cho lan sau
DOT-UK-UNWIND-WORKAROUND	Ghi nhan: workaround S171B (cron script) da go sach. §IX precedent dong.	DONE
DOT-TOKEN-MOVE	Di chuyen Telegram token khoi UK SQLite → PG secret hoac env var	LOW — UK manage token noi bo, khong urgency
DOT-DOCKER-IMAGE-CLEANUP	Cleanup old image sau upgrade (docker image rm + prune)	LOW — da lam, chuan hoa

KHONG CHAC

Muc	Status
UK 2.2.1 auto-trigger tren production	✅ DA VERIFY (Phase 3A instrumented trace PASS + upgrade thanh cong)
Migration data loss	✅ KHONG — diff monitors baseline EMPTY, notification intact
Recovery notification (DOWN→UP)	UNCLEAR — chua test UP→DOWN→UP cycle tren production. testNotification OK.

VPS Git Commits

Hash	Message
98b8c29	feat(monitoring): S171B VPS health alert scripts
995e8bb	chore(monitoring): remove S171B workaround cron scripts

UK upgrade 1.23.17 → 2.2.1 DONE. 9/9 V items PASS. Workaround §IX go sach. 11 monitors nguyen.

Section K — EVIDENCE RAW + 4 GAP (Verify-Evidence)

E1: Crontab sach 3 cron workaround

$ crontab -l | grep -E "vps-health-alert|cron-heartbeat-push|disk-push"
(0 entries)

E2: 3 file script da xoa

$ ls /opt/incomex/scripts/{vps-health-alert,cron-heartbeat-push,disk-push}.sh
ls: cannot access '.../vps-health-alert.sh': No such file or directory
ls: cannot access '.../cron-heartbeat-push.sh': No such file or directory
ls: cannot access '.../disk-push.sh': No such file or directory

E3: Docker image chi tag 2.x

$ docker image ls | grep uptime-kuma
louislam/uptime-kuma:2  7337368a7787  2.44GB

E4: 11 monitor names (khong TEST/TRACE/FRESH)

1|MCP Endpoint          6|Nuxt Web
2|Agent Data Health      7|PostgreSQL Health
3|Directus Health        8|Qdrant Health
4|OPS Proxy              9|Docker Services
5|SSH Port 22           10|Cron Heartbeat
                        11|Disk Usage

E5: Token chi tiet (GAP 1 dong)

$ grep -rl "8652158665" /opt/incomex/scripts/ → (none on disk)
$ stat -c "%a %U:%G" /opt/incomex/uptime-kuma/kuma.db → 755 root:root
$ sqlite3 kuma.db "SELECT name,substr(config,1,80) FROM notification;"
  → Telegram-Jack|{"name":"Telegram-Jack","type":"telegram","isDefault":true,"active":true,"telegr
$ ls /opt/incomex/backups/kuma.db.pre-upgrade-20260407 → 27MB (rollback backup)

Gap dong: Token chi con trong kuma.db (UK managed). Khong con trong shell file. kuma.db permissions 755 — de xuat chmod 600 hoac DOT-TOKEN-MOVE sang env.

E6: Real notification path (GAP 2 dong — THUC, khong test button)

Method: Edit monitor #4 (OPS Proxy) URL → broken → wait DOWN → restore → wait UP.

Heartbeat evidence:

175993|UP |imp=0|12:51:01| 200 - OK
175998|DOWN|imp=1|12:51:06| connect ECONNREFUSED 127.0.0.1:9999
176002|DOWN|imp=0|12:51:26| connect ECONNREFUSED 127.0.0.1:9999
176004|UP  |imp=1|12:51:46| 200 - OK

Telegram msg_id evidence:

Truoc test: msg_id 37
Sau test: msg_id 40 (gap = msg 38 DOWN + msg 39 UP + msg 40 check)
2 notifications fired (DOWN + UP recovery)

Ket luan E6: UK 2.2.1 auto-trigger notification path hoat dong end-to-end tren production monitor thuc (khong phai test button). DOWN va UP deu fire.

E7: Git working tree + origin

$ cd /opt/incomex && git status --short → (clean)
$ git log --oneline -2
  995e8bb chore(monitoring): remove S171B workaround cron scripts
  98b8c29 feat(monitoring): S171B VPS health alert scripts
$ git log origin/main..HEAD → no remote tracking (VPS=SSOT, push via cron 2x/day)
$ crontab -l | grep git-push-gh → 0 6,18 * * * .../git-push-gh-daily.sh (active)

E8: OR update + handoff

OR update: KHONG can. Precedent S171B (§IX workaround) da dong bang upgrade. Khong co rule moi can them vao OR. De xuat Desktop nang OR version khi tong hop cac thay doi tu S170-S171.
Handoff: KHONG can. Context chua >85%. Mission complete.

E9: 4 DOT de xuat (Desktop tao sau)

DOT-UK-UPGRADE — Chuan hoa quy trinh upgrade UK: backup kuma.db → pull → stop/rename → run 2.x → verify monitors diff → cleanup old
DOT-UK-UNWIND — DONE. Ghi nhan workaround S171B (cron+script) da go sach, §IX precedent dong
DOT-TOKEN-MOVE — Di chuyen Telegram bot token tu kuma.db SQLite sang env var hoac PG secret. Dong CQ-6
DOT-DOCKER-CLEANUP — Cleanup image cu sau moi upgrade. docker image rm + docker image prune

Self-Check

#	Item	Status
E1	Crontab 0 entries	✅
E2	3 files No such file	✅
E3	Image only 2.x	✅
E4	11 names clean	✅
E5	Token in kuma.db only	✅ (gap: chmod 755→suggest 600)
E6	Real DOWN+UP notification	✅ (msg 38 DOWN + msg 39 UP)
E7	Git clean + cron push	✅
E8	OR/handoff not needed	✅
E9	4 DOT listed	✅

Verify-Evidence S171B DONE. 9/9 E PASS. 4 gaps dong. Real notification path confirmed.

Section L — E5 ROOT-FIX + DRIFT GUARD

Fix

$ chmod 600 /opt/incomex/uptime-kuma/kuma.db
$ stat -c "%a %U:%G" → 600 root:root
$ UK healthy + sqlite3 COUNT(*) FROM monitor → 11 (DB vẫn đọc được)

Cơ chế chống drift: Cron Audit (option b)

Script: /opt/incomex/scripts/db-permissions-guard.sh (chmod 700) Cron: 0 * * * * (hourly) Logic: stat -c %a kuma.db → nếu != 600 → chmod 600 + Telegram alert Commit: 6e6ed66

V1-V5 Evidence

V1: kuma.db = 600

$ stat -c "%a %U:%G %n" /opt/incomex/uptime-kuma/kuma.db
600 root:root /opt/incomex/uptime-kuma/kuma.db

V2: UK healthy

uptime-kuma: Up 55 minutes (healthy)
HTTP: 302

V3: Real notification after chmod (monitor #9 Docker Services)

Heartbeats: UP→DOWN (176168, imp=1) + DOWN→UP (176177, imp=1)
msg_id: 43→46 (DOWN=44 + UP=45 + check=46)

V4: Drift simulation

chmod 755 kuma.db (simulate drift)
Run guard → auto-fixed 755→600 + Telegram alert (msg 42)
Log: 2026-04-07T13:12:23Z DRIFT: 755 → 600 (auto-fixed + alerted)

V5: Restored

$ stat -c %a kuma.db → 600

DOT đề xuất

DOT-DB-PERMISSIONS-AUDIT — Mở rộng guard cho mọi DB file: kuma.db, postgres data dir, qdrant snapshots. Hourly audit + auto-fix + alert. Desktop tạo sau.

E5 ROOT-FIX DONE. kuma.db 600 + drift guard hourly. Commit 6e6ed66.