KB-2B10
S178-Fix30 Wiring Fix: heartbeat + severity + auto-close
9 min read Revision 1
S178-Fix30Đ22heartbeatseverityauto-closeproduction-verify
S178-Fix30 Wiring Fix: heartbeat + severity + auto-close
Date: 2026-04-23
Scope: Đ22 v1.2 (§1.1 auto-close, §4.3 executor), HP NT12, HP NT13
Host: contabo
DB: directus / Postgres container postgres
1. Executor file path + heartbeat đã gắn chưa
Executor paths found:
/opt/incomex/dot/bin/dot-hc-executor-verify
/opt/incomex/dot/bin/dot-hc-executor
/opt/incomex/dot/bin/dot-hc-executor-verify
/opt/incomex/dot/bin/dot-hc-executor
/opt/incomex/dot/bin/dot-context-pack-verify.sh
/opt/incomex/dot/bin/dot-dot-health
Changed executor: /opt/incomex/dot/bin/dot-hc-executor
Implementation summary:
- Added
write_db_last_run(). - It writes
dot_config.hc_executor_last_run = date -u -Iseconds. - Called after acquire lock / precheck and before
verify_all. write_heartbeat()also refreshes DB heartbeat at end of cycle.
Diff stat:
dot/bin/dot-context-pack-build.sh | 2 +-
dot/bin/dot-context-pack-verify.sh | 2 +-
dot/bin/dot-dot-health | 2 +-
dot/bin/dot-hc-executor | 19 ++++++++++++++++---
dot/bin/dot-hc-executor-verify | 29 ++++++++++++-----------------
5 files changed, 31 insertions(+), 23 deletions(-)
Production heartbeat verify after running executor:
key | value | age
----------------------+---------------------------+-----------------
hc_executor_last_run | 2026-04-23T03:42:39+00:00 | 00:01:45.837306
(1 row)
2. Verify script có đọc heartbeat key đúng không
Changed verify script: /opt/incomex/dot/bin/dot-hc-executor-verify
Grep evidence:
/opt/incomex/dot/bin/dot-hc-executor-verify:10:# (a) LAST_RUN: dot_config.hc_executor_last_run tồn tại + timestamp trong ngưỡng MAX_AGE_HOURS
/opt/incomex/dot/bin/dot-hc-executor-verify:11:# (b) COVERAGE: heartbeat.executed == count(system_health_checks is_active=true)
/opt/incomex/dot/bin/dot-hc-executor-verify:37:HEARTBEAT="/var/lock/dot-hc-executor.heartbeat"
/opt/incomex/dot/bin/dot-hc-executor-verify:98: (a) dot_config heartbeat freshness (<= MAX_AGE_HOURS)
/opt/incomex/dot/bin/dot-hc-executor-verify:129:probe_heartbeat() {
/opt/incomex/dot/bin/dot-hc-executor-verify:130: local last_run epoch_last epoch_now age_s age_h
/opt/incomex/dot/bin/dot-hc-executor-verify:131: last_run="$(run_pg_rw "SELECT value FROM dot_config WHERE key='hc_executor_last_run' LIMIT 1" 2>/dev/null | tr -d '[:space:]')"
/opt/incomex/dot/bin/dot-hc-executor-verify:132: if [[ -z "$last_run" ]]; then
/opt/incomex/dot/bin/dot-hc-executor-verify:133: echo "dot_config heartbeat missing: hc_executor_last_run"
/opt/incomex/dot/bin/dot-hc-executor-verify:136: epoch_last="$(date -d "$last_run" +%s 2>/dev/null || date -j -f '%Y-%m-%dT%H:%M:%S%z' "$last_run" +%s 2>/dev/null || echo 0)"
/opt/incomex/dot/bin/dot-hc-executor-verify:138: echo "could not parse hc_executor_last_run='${last_run}'"
/opt/incomex/dot/bin/dot-hc-executor-verify:145: echo "last run too old: age=${age_h}h > max=${MAX_AGE_HOURS}h (hc_executor_last_run=${last_run})"
/opt/incomex/dot/bin/dot-hc-executor-verify:148: echo "last run fresh: age=${age_h}h <= max=${MAX_AGE_HOURS}h (hc_executor_last_run=${last_run})"
/opt/incomex/dot/bin/dot-hc-executor-verify:154: echo "heartbeat missing — cannot compute coverage"
/opt/incomex/dot/bin/dot-hc-executor-verify:208: log_info "=== probe (a) heartbeat freshness ==="
/opt/incomex/dot/bin/dot-hc-executor-verify:210: out="$(probe_heartbeat)"; rc=$?
/opt/incomex/dot/bin/dot-hc-executor-verify:213: else log_err "probe(a) FAIL — ${out}"; log_issue_fail "heartbeat" "$out"; fail=$((fail+1)); fi
Verify script result:
[INFO] dot-hc-executor-verify v1.0.0 max_age_hours=1
[OK] env loaded
[INFO] === probe (a) heartbeat freshness ===
[OK] probe(a) PASS — last run fresh: age=0h <= max=1h (hc_executor_last_run=2026-04-23T03:42:39+00:00)
[INFO] === probe (b) coverage ===
[OK] probe(b) PASS — coverage ok: executed=29 == active_in_db=29
[INFO] === probe (c) lockfile ===
[OK] probe(c) PASS — no lockfile — executor idle
[OK] dot-hc-executor-verify all 3 probes PASS
VERIFY_RC=0
3. Severity trước/sau normalize + constraint tạo thành công
Before:
warning 766
WARNING 759
critical 285
warn 224
CRITICAL 27
info 2
normal 2
INFO 1
Migration/transaction applied:
UPDATE system_issues SET severity = 'warning' WHERE severity IN ('WARNING', 'warn');
UPDATE system_issues SET severity = 'critical' WHERE severity = 'CRITICAL';
UPDATE system_issues SET severity = 'info' WHERE severity IN ('INFO', 'normal');
CREATE OR REPLACE FUNCTION public.fn_log_issue(...) -- normalizes warn/WARNING/normal before insert
ALTER TABLE system_issues DROP CONSTRAINT IF EXISTS chk_severity_values;
ALTER TABLE system_issues ADD CONSTRAINT chk_severity_values CHECK (severity IN ('critical', 'warning', 'info'));
After production verification:
severity | count
----------+-------
critical | 312
info | 5
warning | 1757
(3 rows)
Constraint evidence:
conname | pg_get_constraintdef
---------------------+--------------------------------------------------------------------------------------------------------------------------------------------
chk_severity_values | CHECK (((severity)::text = ANY ((ARRAY['critical'::character varying, 'warning'::character varying, 'info'::character varying])::text[])))
(1 row)
Code-side severity fixes:
/opt/incomex/dot/bin/dot-hc-executor: defaultwarn->warning; skipped check issue useswarning./opt/incomex/dot/bin/dot-dot-health: defaultwarn->warning./opt/incomex/dot/bin/dot-context-pack-verify.sh: defaultwarn->warning./opt/incomex/dot/bin/dot-context-pack-build.sh: stub defaultwarn->warning.fn_log_issuenow mapswarn/WARNING/normalto canonical lowercase values before insert.
Migration file added:
/opt/incomex/migrations/s178_fix30_wiring_fix.sql
4. 6 issues H11a: executor sống hay chết, đã auto-close chưa
Cron evidence:
# A+6 — Đ22 §1 HC Executor every 3h (schedule seeded in dot_tools.cron_schedule)
0 */3 * * * . /opt/incomex/scripts/cron-env.sh && /opt/incomex/dot/bin/dot-hc-executor >> /var/log/incomex/hc-executor-$(date +\%Y-\%m-\%d).log 2>&1
30 */3 * * * . /opt/incomex/scripts/cron-env.sh && /opt/incomex/dot/bin/dot-hc-executor-verify >> /var/log/incomex/hc-executor-verify-$(date +\%Y-\%m-\%d).log 2>&1
Recent log files:
-rw-r--r-- 1 root root 930 Apr 23 03:30 /var/log/incomex/hc-executor-verify-2026-04-23.log
-rw-r--r-- 1 root root 24786 Apr 23 03:00 /var/log/incomex/hc-executor-2026-04-23.log
-rw-r--r-- 1 root root 2790 Apr 22 21:30 /var/log/incomex/hc-executor-verify-2026-04-22.log
-rw-r--r-- 1 root root 73620 Apr 22 21:00 /var/log/incomex/hc-executor-2026-04-22.log
Manual executor run evidence:
[INFO] check[H11a] Description Basic Missing kind=sql expect=eq 0 sev=critical scope=entities
[OK] H11a PASS — h11a_total=0 eq 0 @directus -> PASS (guards 1-5 OK)
[OK] H11a auto-close: 6 entity + 0 system issue(s) resolved
Final H11a issue state:
source | status | count
--------+----------+-------
H11a | resolved | 6
(1 row)
Note: executor process is cron-driven and was idle after manual run; verify probe confirmed no lockfile — executor idle.
5. Phát hiện mới
dot-hc-executorcompleted withEXECUTOR_RC=1because unrelated health checks remain failing; this is expected after this wiring fix and does not block heartbeat/severity/H11a evidence.- Current unrelated failures observed during run:
H11b:h11b_total=1401 NOT eq 0(warning, AI enrichment backlog)HC-REG:unregistered_count=28 NOT eq 0(critical)HC-SCHEMA:missing_description_count=13 NOT eq 0(critical)DOT-H2,DOT-H3,DOT-H8warning failures
system_health_checks.severity_on_failmay still use legacywarnlabels, but the executor andfn_log_issuenow normalize issue writes towarning. No uppercase/new invalid severity can entersystem_issuesbecause ofchk_severity_values.
Conclusion
- Heartbeat wire: fixed. Executor writes
dot_config.hc_executor_last_run; verify reads the same key and passes. - Severity wire: fixed. Existing rows normalized to 3 lowercase values;
fn_log_issuenormalizes incoming legacy values; CHECK constraint is active. - Auto-close wire: fixed for H11a. Running executor auto-closed the 6 open H11a rows; final state has no H11a open rows.