09 — Next-Phase Recommendation (Phase 2 minimal scope)
09 — Next-Phase Recommendation
Phase 1 PASS leaves the substrate live + inert. Phase 2 should activate the heartbeat layer to close the §15.5 silent gap, and add lease/DLQ governance primitives. Nothing in Phase 2 needs to touch
event_outbox, installpg_cron, or widen any CHECK vocab — those remain Phase 3+ tasks.
Recommended Phase 2 scope: DIEU45_PHASE_2_HEARTBEAT_ACTIVATION_AND_LEASE_GOVERNANCE
P2-A. Close live §15.5 silent gap (priority 1)
- Action: write a 1-line
fn_queue_heartbeat_tick('iu_outbound_default','PG_worker','ok',NULL,NULL,'{}')call intofn_iu_route_worker_run(top of body, BEFORE early-return onroute_worker_enabled=falseif we want a no-op tick even when disabled — alternative: tick from the external orchestrator only). - Flip:
UPDATE dot_config SET value='true' WHERE key='queue.heartbeat.enabled'. - Confirm:
SELECT executor_name, status_hint FROM v_queue_health→iu_outbound_default | fresh.
P2-B. fn_job_lease_reaper
fn_job_lease_reaper(p_batch_limit int DEFAULT 100, p_actor text DEFAULT 'lease_reaper')
RETURNS jsonb
Find state='leased' AND lease_until < now() (uses job_queue_lease_until_idx), update to state='queued' + bump attempts + apply backoff + clear lease_owner/lease_until. Idempotent. Callable from external scheduler or as a job_kind.
P2-C. fn_job_dead_letter_replay
fn_job_dead_letter_replay(p_dead_letter_id uuid, p_authorization_source text)
RETURNS jsonb
Refused unless triage_status='manual_replay'. Re-enqueues a fresh job_queue row with attempts=0 and a NEW idempotency_key (partial-unique would block the original key since the DLQ source row is terminal but Phase 2 should not assume that; safer to mint fresh). Update DLQ row with triage_status='closed', triaged_at, triaged_by, triage_note='replayed'.
P2-D. Refusal-event emission
fn_queue_heartbeat_tick emits an internal log row (NOT event_outbox — Phase 3 task) on fresh→stale transition; Phase 2 could write to a new queue_event_log table OR just rely on fn_queue_stale_check being polled. Recommend a small internal log table + view to avoid touching event_outbox until DP5 widens its vocab.
P2-E. Operator runbook
A KB doc explaining: when to flip each gate, expected v_queue_health outputs, how to triage DLQ rows, how to replay manually.
Decisions Council must ratify before Phase 2 starts
- DC-1 — Add
target_executor_kindcolumn tojob_queue? (Recommend: NO, defer to DP6.) - DC-5 — Heartbeat caller for
iu_outbound_default: in-fn vs. external orchestrator. (Recommend: in-fn — minimal diff, leverages existing tick cadence.) - DC-3 — Lease reaper as fn or as job_kind? (Recommend: fn first; promote to job_kind in Phase 3.)
- WK-A — Worker host process choice (Hermes Python, Codex, dedicated container)? (Defer to Phase 3; Phase 2 can run reaper from existing Hermes cron path.)
Phases 3–7 unchanged from DP1–DP7 roadmap
- Phase 3 — NOTIFY bridge +
system/queue_worker_silentevent widening - Phase 4 —
consumer_registry+ DP6 routing + DOT catalog backfill - Phase 5 — MARK/CUT pilot job_kinds (
mark_file,cut_from_manifestas jobs) - Phase 6 — MOT runtime +
job_workflow(if deferred from Phase 1) - Phase 7 — DP7 partitioning + retention sweep
Acceptance criteria for Phase 2
iu_outbound_defaultheartbeat isfresh(age < 300 s).fn_job_lease_reaperproven in BEGIN/ROLLBACK against a manually-aged leased row.fn_job_dead_letter_replayproven against a triage-marked row.- All 4 master gates remain individually controllable; toggling one does not affect the others.
- No
event_outboxmutation, no CHECK widening, nopg_croninstall.
What this report is NOT
This is not a binding Phase 2 design pack. It is a recommendation tailored to the live state at 2026-05-26 11:22 UTC immediately after Phase 1 COMMIT. The actual Phase 2 macro should re-survey live state (especially the silent gap age) before authoring.