Phase 2 — Lease Governance (stale lease reaper)
Lease Governance — stale lease reaper
Two-function shape
| Function | Mode | Mutates | Gated by |
|---|---|---|---|
fn_job_reap_stale_leases_dry_run(p_limit) |
STABLE, INVOKER | No | — (always available) |
fn_job_reap_stale_leases_apply(p_actor, p_limit) |
VOLATILE, SECURITY DEFINER | Yes | queue.job_substrate.enabled AND queue.lease.reaper_enabled AND NOT queue.lease.reaper_dry_run_only |
Stale lease definition
A row in job_queue is stale-leased when:
state = 'leased'lease_until IS NOT NULLlease_until < now()
The Phase 1 index job_queue_lease_until_idx ON (lease_until) WHERE state='leased' supports this scan efficiently.
fn_job_reap_stale_leases_dry_run — output shape
{
"evaluated_at": "<now>",
"stale_lease_count": <int>,
"limit": <int>,
"mutation": false,
"jobs": [
{
"job_id": "<uuid>",
"job_kind": "<text>",
"lease_owner": "<text>",
"lease_until": "<timestamptz>",
"attempts": <int>,
"max_attempts": <int>,
"overdue_seconds": <bigint>,
"would_action": "reset_to_retry_waiting" | "move_to_dead_letter"
}
]
}
would_action predicts the apply-side decision based on whether attempts+1 >= max_attempts.
fn_job_reap_stale_leases_apply — apply logic
For each stale-leased job (claimed with FOR UPDATE SKIP LOCKED):
If attempts + 1 >= max_attempts → move to DLQ via fn_job_move_to_dead_letter with final_error='lease_expired_reaped_at_max_attempts'. Attempts are bumped before the move so the DLQ row records the final attempt count.
Else → reset to retry_waiting:
attempts := attempts + 1last_error := 'lease_expired_reaped'scheduled_at := now() + queue.retry.backoff_base_sec * 2^(attempts-1)capped at2^10lease_owner := NULL,lease_until := NULL
Triple-gate design rationale
| Gate | Purpose | Default | Flip authority |
|---|---|---|---|
queue.job_substrate.enabled |
Substrate master gate (Phase 1) | false |
Phase 3 enactment |
queue.lease.reaper_enabled |
Reaper master gate | false |
Operator authorization |
queue.lease.reaper_dry_run_only |
Safety gate; refuses mutation even if reaper_enabled=true | true |
Operator authorization per reap window |
The third gate is the dry-run safety: even after enabling the reaper, durable mutation requires explicitly flipping reaper_dry_run_only=false. This gives operators a "armed but safe" window for observation.
Refusal proofs (bounded TX)
| Test | Result |
|---|---|
apply with reaper_dry_run_only=true, reaper_enabled=true, substrate=true |
{"reason":"queue.lease.reaper_dry_run_only=true","refused":true} |
apply with reaper_enabled=false, reaper_dry_run_only=false |
{"reason":"queue.lease.reaper_enabled=false","refused":true} |
apply with substrate=false |
{"reason":"queue.job_substrate.enabled=false","refused":true} (implicit from gate order; tested in Phase 1) |
apply with empty actor |
{"reason":"actor_required","refused":true} |
Apply path proof (bounded TX, all three gates temporarily true)
2 stale-leased jobs prepared:
phase2_proof_retryable(attempts=0, max=5) →reset_to_retry_waiting, attempts→1, backoff_sec=10phase2_proof_dlq_bound(attempts=4, max=5) →moved_to_dead_letter, attempts→5
Apply output:
{
"actor": "phase2_proof_reaper",
"refused": false,
"reset_count": 1,
"dead_letter_count": 1,
"actions": [
{"action":"reset_to_retry_waiting","job_id":"5f2a426d…","attempts":1,"backoff_sec":10,"scheduled_at":"…"},
{"action":"moved_to_dead_letter","job_id":"4f1eae41…","attempts":5,
"dead_letter":{"dead_letter_id":"9a4f319f…","state":"dead_letter","refused":false,"attempts":5,…}}
]
}
SKIP LOCKED semantics inherited from FOR UPDATE SKIP LOCKED in the cursor loop. Concurrent reaper invocations are safe.
Lease duration source
queue.lease.duration_sec (default 300s) is used by fn_job_claim to set lease_until. The reaper does NOT re-read this — it only checks lease_until < now(). This means changing lease duration affects new leases, not in-flight ones.
Future work (out of Phase 2)
- Lease reaper as a job_kind (DP3 §11.3.2 design): the reaper itself enqueues into
job_queuerather than being called externally. Out of scope until Phase 3+ when worker substrate is enabled. - Per-job-kind lease duration override (
job_queue.metadata.lease_override_sec?). Not designed yet.