KB-2B10

S178-Fix30 Wiring Fix: heartbeat + severity + auto-close

9 min read Revision 1
S178-Fix30Đ22heartbeatseverityauto-closeproduction-verify

S178-Fix30 Wiring Fix: heartbeat + severity + auto-close

Date: 2026-04-23 Scope: Đ22 v1.2 (§1.1 auto-close, §4.3 executor), HP NT12, HP NT13 Host: contabo DB: directus / Postgres container postgres

1. Executor file path + heartbeat đã gắn chưa

Executor paths found:

/opt/incomex/dot/bin/dot-hc-executor-verify
/opt/incomex/dot/bin/dot-hc-executor
/opt/incomex/dot/bin/dot-hc-executor-verify
/opt/incomex/dot/bin/dot-hc-executor
/opt/incomex/dot/bin/dot-context-pack-verify.sh
/opt/incomex/dot/bin/dot-dot-health

Changed executor: /opt/incomex/dot/bin/dot-hc-executor

Implementation summary:

  • Added write_db_last_run().
  • It writes dot_config.hc_executor_last_run = date -u -Iseconds.
  • Called after acquire lock / precheck and before verify_all.
  • write_heartbeat() also refreshes DB heartbeat at end of cycle.

Diff stat:

 dot/bin/dot-context-pack-build.sh  |  2 +-
 dot/bin/dot-context-pack-verify.sh |  2 +-
 dot/bin/dot-dot-health             |  2 +-
 dot/bin/dot-hc-executor            | 19 ++++++++++++++++---
 dot/bin/dot-hc-executor-verify     | 29 ++++++++++++-----------------
 5 files changed, 31 insertions(+), 23 deletions(-)

Production heartbeat verify after running executor:

         key          |           value           |       age       
----------------------+---------------------------+-----------------
 hc_executor_last_run | 2026-04-23T03:42:39+00:00 | 00:01:45.837306
(1 row)

2. Verify script có đọc heartbeat key đúng không

Changed verify script: /opt/incomex/dot/bin/dot-hc-executor-verify

Grep evidence:

/opt/incomex/dot/bin/dot-hc-executor-verify:10:#   (a) LAST_RUN: dot_config.hc_executor_last_run tồn tại + timestamp trong ngưỡng MAX_AGE_HOURS
/opt/incomex/dot/bin/dot-hc-executor-verify:11:#   (b) COVERAGE: heartbeat.executed == count(system_health_checks is_active=true)
/opt/incomex/dot/bin/dot-hc-executor-verify:37:HEARTBEAT="/var/lock/dot-hc-executor.heartbeat"
/opt/incomex/dot/bin/dot-hc-executor-verify:98:  (a) dot_config heartbeat freshness (<= MAX_AGE_HOURS)
/opt/incomex/dot/bin/dot-hc-executor-verify:129:probe_heartbeat() {
/opt/incomex/dot/bin/dot-hc-executor-verify:130:  local last_run epoch_last epoch_now age_s age_h
/opt/incomex/dot/bin/dot-hc-executor-verify:131:  last_run="$(run_pg_rw "SELECT value FROM dot_config WHERE key='hc_executor_last_run' LIMIT 1" 2>/dev/null | tr -d '[:space:]')"
/opt/incomex/dot/bin/dot-hc-executor-verify:132:  if [[ -z "$last_run" ]]; then
/opt/incomex/dot/bin/dot-hc-executor-verify:133:    echo "dot_config heartbeat missing: hc_executor_last_run"
/opt/incomex/dot/bin/dot-hc-executor-verify:136:  epoch_last="$(date -d "$last_run" +%s 2>/dev/null || date -j -f '%Y-%m-%dT%H:%M:%S%z' "$last_run" +%s 2>/dev/null || echo 0)"
/opt/incomex/dot/bin/dot-hc-executor-verify:138:    echo "could not parse hc_executor_last_run='${last_run}'"
/opt/incomex/dot/bin/dot-hc-executor-verify:145:    echo "last run too old: age=${age_h}h > max=${MAX_AGE_HOURS}h (hc_executor_last_run=${last_run})"
/opt/incomex/dot/bin/dot-hc-executor-verify:148:  echo "last run fresh: age=${age_h}h <= max=${MAX_AGE_HOURS}h (hc_executor_last_run=${last_run})"
/opt/incomex/dot/bin/dot-hc-executor-verify:154:    echo "heartbeat missing — cannot compute coverage"
/opt/incomex/dot/bin/dot-hc-executor-verify:208:  log_info "=== probe (a) heartbeat freshness ==="
/opt/incomex/dot/bin/dot-hc-executor-verify:210:  out="$(probe_heartbeat)"; rc=$?
/opt/incomex/dot/bin/dot-hc-executor-verify:213:  else log_err "probe(a) FAIL — ${out}"; log_issue_fail "heartbeat" "$out"; fail=$((fail+1)); fi

Verify script result:

[INFO]  dot-hc-executor-verify v1.0.0 max_age_hours=1
[OK]    env loaded
[INFO]  === probe (a) heartbeat freshness ===
[OK]    probe(a) PASS — last run fresh: age=0h <= max=1h (hc_executor_last_run=2026-04-23T03:42:39+00:00)
[INFO]  === probe (b) coverage ===
[OK]    probe(b) PASS — coverage ok: executed=29 == active_in_db=29
[INFO]  === probe (c) lockfile ===
[OK]    probe(c) PASS — no lockfile — executor idle
[OK]    dot-hc-executor-verify all 3 probes PASS
VERIFY_RC=0

3. Severity trước/sau normalize + constraint tạo thành công

Before:

warning 766
WARNING 759
critical 285
warn 224
CRITICAL 27
info 2
normal 2
INFO 1

Migration/transaction applied:

UPDATE system_issues SET severity = 'warning' WHERE severity IN ('WARNING', 'warn');
UPDATE system_issues SET severity = 'critical' WHERE severity = 'CRITICAL';
UPDATE system_issues SET severity = 'info' WHERE severity IN ('INFO', 'normal');
CREATE OR REPLACE FUNCTION public.fn_log_issue(...) -- normalizes warn/WARNING/normal before insert
ALTER TABLE system_issues DROP CONSTRAINT IF EXISTS chk_severity_values;
ALTER TABLE system_issues ADD CONSTRAINT chk_severity_values CHECK (severity IN ('critical', 'warning', 'info'));

After production verification:

 severity | count 
----------+-------
 critical |   312
 info     |     5
 warning  |  1757
(3 rows)

Constraint evidence:

       conname       |                                                            pg_get_constraintdef                                                            
---------------------+--------------------------------------------------------------------------------------------------------------------------------------------
 chk_severity_values | CHECK (((severity)::text = ANY ((ARRAY['critical'::character varying, 'warning'::character varying, 'info'::character varying])::text[])))
(1 row)

Code-side severity fixes:

  • /opt/incomex/dot/bin/dot-hc-executor: default warn -> warning; skipped check issue uses warning.
  • /opt/incomex/dot/bin/dot-dot-health: default warn -> warning.
  • /opt/incomex/dot/bin/dot-context-pack-verify.sh: default warn -> warning.
  • /opt/incomex/dot/bin/dot-context-pack-build.sh: stub default warn -> warning.
  • fn_log_issue now maps warn/WARNING/normal to canonical lowercase values before insert.

Migration file added:

/opt/incomex/migrations/s178_fix30_wiring_fix.sql

4. 6 issues H11a: executor sống hay chết, đã auto-close chưa

Cron evidence:

# A+6 — Đ22 §1 HC Executor every 3h (schedule seeded in dot_tools.cron_schedule)
0 */3 * * * . /opt/incomex/scripts/cron-env.sh && /opt/incomex/dot/bin/dot-hc-executor >> /var/log/incomex/hc-executor-$(date +\%Y-\%m-\%d).log 2>&1
30 */3 * * * . /opt/incomex/scripts/cron-env.sh && /opt/incomex/dot/bin/dot-hc-executor-verify >> /var/log/incomex/hc-executor-verify-$(date +\%Y-\%m-\%d).log 2>&1

Recent log files:

-rw-r--r-- 1 root root   930 Apr 23 03:30 /var/log/incomex/hc-executor-verify-2026-04-23.log
-rw-r--r-- 1 root root 24786 Apr 23 03:00 /var/log/incomex/hc-executor-2026-04-23.log
-rw-r--r-- 1 root root  2790 Apr 22 21:30 /var/log/incomex/hc-executor-verify-2026-04-22.log
-rw-r--r-- 1 root root 73620 Apr 22 21:00 /var/log/incomex/hc-executor-2026-04-22.log

Manual executor run evidence:

[INFO]  check[H11a] Description Basic Missing kind=sql expect=eq 0 sev=critical scope=entities
[OK]      H11a PASS — h11a_total=0 eq 0 @directus -> PASS (guards 1-5 OK)
[OK]      H11a auto-close: 6 entity + 0 system issue(s) resolved

Final H11a issue state:

 source |  status  | count 
--------+----------+-------
 H11a   | resolved |     6
(1 row)

Note: executor process is cron-driven and was idle after manual run; verify probe confirmed no lockfile — executor idle.

5. Phát hiện mới

  • dot-hc-executor completed with EXECUTOR_RC=1 because unrelated health checks remain failing; this is expected after this wiring fix and does not block heartbeat/severity/H11a evidence.
  • Current unrelated failures observed during run:
    • H11b: h11b_total=1401 NOT eq 0 (warning, AI enrichment backlog)
    • HC-REG: unregistered_count=28 NOT eq 0 (critical)
    • HC-SCHEMA: missing_description_count=13 NOT eq 0 (critical)
    • DOT-H2, DOT-H3, DOT-H8 warning failures
  • system_health_checks.severity_on_fail may still use legacy warn labels, but the executor and fn_log_issue now normalize issue writes to warning. No uppercase/new invalid severity can enter system_issues because of chk_severity_values.

Conclusion

  • Heartbeat wire: fixed. Executor writes dot_config.hc_executor_last_run; verify reads the same key and passes.
  • Severity wire: fixed. Existing rows normalized to 3 lowercase values; fn_log_issue normalizes incoming legacy values; CHECK constraint is active.
  • Auto-close wire: fixed for H11a. Running executor auto-closed the 6 open H11a rows; final state has no H11a open rows.