KB-2818

Phase 3B — Heartbeat Caller Hardening

9 min read Revision 1
dieu45phase-3bheartbeatfn-queue-heartbeat-ping-externalsilent-gapiu-outbound-default2026-05-26

Phase 3B — Heartbeat Caller Hardening

Problem

Phase 2 enacted queue_heartbeat table + fn_queue_heartbeat_tick + governance gates (queue.heartbeat.enabled=true) and registered iu_outbound_default as a passive marker carrying the pre-Điều 45 silent gap. But there was no callable safe path for operators or external workers to actually emit a heartbeat — and worse, calling fn_queue_heartbeat_tick('iu_outbound_default', …) directly would have falsely healed the silent worker marker, hiding the §15.5 violation ([[feedback-dieu45-silent-gap-violation-post-enactment]]).

Phase 3B closes this by introducing a single safe caller surface and protecting the legacy marker against false-heal.

Chosen approach — Option A (external/operator ping)

Mission's Option A: bounded external/operator heartbeat ping function. Implemented as SECURITY DEFINER wrapper over fn_queue_heartbeat_tick with pre-checks and a hard refusal for iu_outbound_default.

Option B (hook into fn_iu_route_worker_run) was not taken — it would have required modifying an existing worker function and starting/triggering it. Phase 3B mandate is "no broad worker started"; Option B is deferred to a later pack that wires a real heartbeat caller into the route worker.

Apply

Created in single committed TX:

CREATE OR REPLACE FUNCTION fn_queue_heartbeat_ping_external(
  p_executor_name text,
  p_executor_kind text,
  p_status text DEFAULT 'ok',
  p_metadata jsonb DEFAULT '{}'::jsonb
) RETURNS jsonb LANGUAGE plpgsql SECURITY DEFINER
SET search_path TO 'pg_catalog','public' AS $fn$
DECLARE
  v_enabled boolean;
  v_actor text := COALESCE(current_setting('app.heartbeat_actor', true), session_user::text);
  v_meta jsonb;
  v_result jsonb;
BEGIN
  -- Input validation
  IF p_executor_name IS NULL OR btrim(p_executor_name)='' THEN RETURN jsonb_build_object('refused',true,'reason','executor_name_required'); END IF;
  IF p_executor_kind IS NULL OR btrim(p_executor_kind)='' THEN RETURN jsonb_build_object('refused',true,'reason','executor_kind_required'); END IF;
  IF p_executor_kind NOT IN ('DOT','Agent','Hermes','Codex','PG_worker','external_worker','future_Kestra_adapter') THEN
    RETURN jsonb_build_object('refused',true,'reason','executor_kind_not_in_vocab','value',p_executor_kind,
      'allowed',ARRAY['DOT','Agent','Hermes','Codex','PG_worker','external_worker','future_Kestra_adapter']);
  END IF;
  IF COALESCE(p_status,'ok') NOT IN ('ok','warn','error') THEN
    RETURN jsonb_build_object('refused',true,'reason','status_not_in_vocab','value',p_status);
  END IF;

  -- HARDENING: refuse to mark the legacy silent passive marker as healthy
  IF btrim(p_executor_name) = 'iu_outbound_default' THEN
    RETURN jsonb_build_object('refused',true,'reason','protected_legacy_silent_passive',
      'message','iu_outbound_default is a passive marker for the pre-Dieu45 route worker silent gap; external pings cannot overwrite it. Update fn_iu_route_worker_run callsite to register a separate active executor.');
  END IF;

  -- Strip caller-supplied identity keys; pin them ourselves for audit integrity
  v_meta := COALESCE(p_metadata,'{}'::jsonb) - 'ping_actor' - 'ping_origin' - 'ping_at' - 'ping_function';
  v_meta := v_meta || jsonb_build_object(
    'ping_origin','external_operator',
    'ping_actor', v_actor,
    'ping_at', now(),
    'ping_function','fn_queue_heartbeat_ping_external'
  );

  SELECT (value='true') INTO v_enabled FROM dot_config WHERE key='queue.heartbeat.enabled';
  IF NOT COALESCE(v_enabled,false) THEN
    RETURN jsonb_build_object('refused',true,'reason','queue.heartbeat.enabled=false');
  END IF;

  v_result := fn_queue_heartbeat_tick(
    p_executor_name := btrim(p_executor_name),
    p_executor_kind := p_executor_kind,
    p_status := COALESCE(p_status,'ok'),
    p_current_job_id := NULL,
    p_lease_owner := NULL,
    p_metadata := v_meta
  );
  RETURN v_result || jsonb_build_object('wrapper','fn_queue_heartbeat_ping_external');
END;
$fn$;

Design decisions

Decision Rationale
Does NOT pass p_current_job_id or p_lease_owner Wrapper is for liveness only — does not claim work; matches mission "must not claim actual work".
Pre-validates executor_kind against 7-vocab in code Defense in depth — queue_heartbeat_kind_check CHECK enforces the same, but wrapper returns a structured refusal jsonb instead of a raw constraint exception, easier for callers.
Strips caller-supplied ping_* metadata keys Audit integrity — caller cannot fake ping_actor.
Inherits payload denylist via downstream fn_queue_heartbeat_tick → table CHECK queue_heartbeat_metadata_safe_check (body/vector/secret/token/…) still rejects unsafe metadata as a hard constraint error — wrapper does not pre-strip these, so denylist violations remain visible as errors rather than silent strips.
Refuses iu_outbound_default by exact name match Hardens the §15.5 silent-gap marker — preserves the warn/0-ticks/2026-05-22 11:31:41 state so fn_queue_stale_check keeps reporting the violation.

Refusal proofs

# Input Result
1 ('iu_outbound_default','PG_worker','ok','{}') {"reason":"protected_legacy_silent_passive","message":"...","refused":true}
2 ('test_ext_caller_phase3b','BogusKind','ok','{}') {"value":"BogusKind","reason":"executor_kind_not_in_vocab","allowed":[...],"refused":true}
3 ('test_ext_caller_phase3b','external_worker','bogus_status','{}') {"value":"bogus_status","reason":"status_not_in_vocab","refused":true}
4 ('','external_worker','ok','{}') {"reason":"executor_name_required","refused":true}
5 ('test_ext_caller_phase3b','external_worker','ok','{"vector":[1,2,3]}') ERROR queue_heartbeat_metadata_safe_check (denylist enforcement via table CHECK)
6 Gate flipped to queue.heartbeat.enabled=false, then called {"reason":"queue.heartbeat.enabled=false","refused":true}; gate restored to true

Positive tick proof (BEGIN/ROLLBACK)

fn_queue_heartbeat_ping_external('phase3b_synthetic_external','external_worker','ok','{"probe":"phase3b_safe_caller_proof"}')

returns:

{"refused": false, "wrapper": "fn_queue_heartbeat_ping_external", "ticks_total": 1,
 "last_tick_at": "2026-05-26T15:44:43.316473+00:00",
 "executor_kind": "external_worker", "executor_name": "phase3b_synthetic_external",
 "last_tick_status": "ok"}

Row state inside TX:

executor_name executor_kind last_tick_status ticks_total metadata
phase3b_synthetic_external external_worker ok 1 {"probe":"phase3b_safe_caller_proof","ping_at":"...","ping_actor":"workflow_admin","ping_origin":"external_operator","ping_function":"fn_queue_heartbeat_ping_external"}

Post-ROLLBACK queue_heartbeat count: 2 (unchanged).

Silent gap STILL VISIBLE — no false-heal

After all wrapper calls (refusals + positive synthetic tick + rollback), iu_outbound_default row state:

executor_name last_tick_at ticks_total last_tick_status metadata.marker
iu_outbound_default 2026-05-22 11:31:41.928657+00 0 warn legacy_silent_passive

fn_queue_stale_check():

{"stale_count": 2, "threshold_seconds": 300,
 "stale_executors": [
   {"executor_name":"iu_outbound_default","executor_kind":"PG_worker",
    "age_seconds":360812,"last_tick_at":"2026-05-22T11:31:41.928657+00:00",
    "ticks_total":0,"last_tick_status":"warn","is_stale":true},
   {"executor_name":"dieu45_phase3_pilot","executor_kind":"external_worker",
    "age_seconds":1655,"last_tick_at":"2026-05-26T15:17:39.389501+00:00",
    "ticks_total":2,"last_tick_status":"ok","is_stale":true}]}

The §15.5 silent gap is operationally surfaced and protected, but is not yet operationally closed (no real heartbeat caller is wired for a production worker). Closing it requires a future pack that:

  1. wires fn_iu_route_worker_run (or its operator) to call fn_queue_heartbeat_ping_external on a new executor_name (e.g. iu_outbound_default_active_v2),
  2. leaves the legacy iu_outbound_default marker as-is so the historical gap remains visible,
  3. transitions the silent-gap surface from "warn forever" to "warn until next real tick".

Lesson cross-link [[feedback-hc-executor-last-run-is-proven-heartbeat-pattern]] — the operator-ping pattern matches hc_executor_last_run proven semantics.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-dieu45-phase-3b-queue-cutter-hardening/03-heartbeat-caller-hardening.md