Phase 3 — Queue health + stuck-job audit (PASS)
Phase 3 — Queue health + stuck-job audit
1. v_queue_health
surface | queue_substrate_phase1
substrate_enabled | false -- closed at exit
heartbeat_enabled | true
worker_enabled | false
stale_threshold_seconds| 300
executors_fresh | 1 -- dieu45_phase3_pilot
executors_warning | 0
executors_stale | 1 -- legacy iu_outbound_default (carry-forward)
queued_count | 0
leased_count | 0
in_progress_count | 0
retry_waiting_count | 0
failed_count | 0
dead_letter_count | 0
cancelled_count | 0
succeeded_count | 6 -- this pilot
cleaned_count | 0
stale_lease_count | 0
dlq_pending_count | 0
dlq_total_count | 0
2. v_job_queue_backlog
job_kind | state | row_count
------------------------+-----------+----------
cut.cleanup_checkpoint | succeeded | 1
cut.copy_to_staging | succeeded | 1
cut.cut | succeeded | 1
cut.mark | succeeded | 1
cut.verify_cut | succeeded | 1
cut.verify_mark | succeeded | 1
oldest_created_at = 2026-05-26 15:13:49.405831+00,
newest_created_at = 2026-05-26 15:13:49.430201+00 (all 6 enqueued within 25 ms of each
other), avg_attempts = 0.00, max_attempts_seen = 0.
3. v_job_dead_letter_summary
(0 rows)
4. No stuck jobs
SELECT count(*) FROM job_queue
WHERE state IN ('queued','leased','in_progress','retry_waiting','failed');
-- 0
SELECT count(*) FROM job_queue WHERE state='succeeded' AND finished_at IS NULL;
-- 0
SELECT count(*) FROM job_queue
WHERE lease_owner IS NOT NULL AND lease_until < now();
-- 0 -- no stale leases (the pilot ran inside lease window 300 s)
5. No unsafe dead_letter
SELECT count(*) FROM job_dead_letter;
-- 0
SELECT count(*) FROM job_dead_letter WHERE triage_status NOT IN ('pending','acknowledged');
-- 0
6. Staging lifecycle correctness
staging_record_id : 258c715c-… (this pilot)
lifecycle_status : consumed
approved_at : 2026-05-26 15:13:49.501679+00
consumed_at : 2026-05-26 15:17:38.784037+00
consumed_by_run_id: a64340fe-… (matches cut_run_id)
vector_excluded : t (NVSZ guarantee)
Full state sequence: pending → pending_review → approved → consumed (no skip).
7. Cleanup checkpoint
fn_iu_op_cleanup_dry_run(older_than_days=15) returned:
{"alias":"fn_iu_op_cleanup_dry_run","apply":false,"actions":[],"eligible_count":0,"older_than_days":15}
apply=false, actions=[]. No staging row was deleted/moved — the dry-run is the policy.
Older candidates (>15 days): 0 (the older c3f3f073 staging is dated 2026-05-26 09:13 —
inside the 15-day window and lifecycle_status='consumed' so already cleanable when policy
flips to apply-mode in a future phase).
8. Carry-forward stale executor
The single executors_stale=1 value is legacy (iu_outbound_default, last tick
2026-05-22 11:31:41+00). The §15.5 silent-gap surface is visible but not closed — that
remains a Phase 3+ task. See 09-carry-forward.md item 2.
9. Pilot durations
| step | duration (ms) |
|---|---|
| Enqueue 6 jobs | ~27 ms |
cut.copy_to_staging claim→ack |
8 ms |
cut.mark (incl. fn_iu_op_mark_file) |
38 ms |
cut.verify_mark |
13 ms |
cut.cut initial attempt (failed) |
356 ms |
Patch + retry cut.cut |
395 ms |
cut.verify_cut |
51 ms |
cut.cleanup_checkpoint |
84 ms |
| Final heartbeat + gate close | 36 ms |
Total wall time of pilot SQL ~3 min 50 s (most of it idle between the failed cut and the patch).