KB-1E09 rev 2

5000x-live — Ops healthcheck automation package

3 min read Revision 2
iu-core5000x-liveopshealthcheckcronhygiene-repaired-by-6000x

5000x-live — Ops healthcheck automation package

Hygiene note (6000x): title/tags renormalised from "5500x" to "5000x-live" to match the path. Content preserved verbatim except this banner.

Verdict: DONE_WITH_EXTERNAL_BLOCKER — staging + cron install deferred to dedicated 5700x macro.

Healthcheck capabilities (live)

python3 -m cutter_agent.iu_core.healthcheck returns JSON with overall_ok boolean + 7 surface entries.

Exit codes: 0 = all healthy, 2 = ≥1 surface FAIL, 3 = bootstrap error.

Surfaces covered: three_axis_cache, directus_collection, qdrant_collection, auto_refresh_trigger, vector_boundary, write_gates, operator_runtime.

Surfaces NOT covered (gap):

  • nuxt — no /admin/iu-three-axis probe (live page not yet deployed; would need to be additive to existing healthcheck.py)
  • agent_data — no curl to AgentData MCP endpoint
  • directus_api — only PG-level rows + permission, no live https://directus.incomexsaigoncorp.vn/items/... REST probe
# IU Core 7-surface healthcheck — every 30 min, JSON to log, exit 2 on FAIL
*/30 * * * * cd /opt/incomex/dot/iu-cutter-v0.6-5500x && /usr/bin/python3 -m cutter_agent.iu_core.healthcheck >> /var/log/incomex/iu-core-healthcheck.log 2>&1

Exact blocker

The healthcheck default executor wires ssh contabo docker exec postgres psql …. It is designed to run from the host, not from VPS.

Two install paths:

  1. Host-cron (MacBook) — UNSTABLE — laptop sleeps/closes → cron silently doesn't fire. Not recommended for ops.
  2. VPS-side cron — requires VPS staging of 5000x iu-cutter (current VPS at /opt/incomex/dot/iu-cutter-v0.6 is at v0.4-era commit e93424b, BEFORE 5000x). Steps:
    • Push commit 20af56e to VPS via existing iu-cutter sync flow.
    • Override _default_executor in healthcheck to use local docker exec (no SSH).
    • Validate exit codes + JSON output structure unchanged.
    • Install cron line as above.
    • Pair with uptime-kuma exec-monitor for paged alerts on exit 2/3.

Reversibility

  • Cron removal: crontab -e and delete the one line.
  • VPS staging: keep pre- snapshot of /opt/incomex/dot/iu-cutter-v0.6 before push (matches pattern of existing snapshots).
  • Healthcheck script: read-only PG; never writes; never logs secrets.

Coverage gap closure plan (5700x scope)

  • Add 8th surface nuxt_smokecurl -sSf https://ai.incomexsaigoncorp.vn/admin/iu-three-axis -H 'Authorization: Bearer …' (token from env, not logged). Depends on 5600x Nuxt deploy.
  • Add 9th surface agent_data_mcpcurl -sSf https://vps.incomexsaigoncorp.vn/api/mcp -H 'X-API-Key: …' (initialize-only).
  • Add 10th surface directus_rest — items endpoint w/ admin token.

All 3 new surfaces are simple HTTP probes that don't need a DB executor.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-5000x-live-ui-ops-real-corpus-pilot-open-goal/03-ops-automation-package.md