08 — Open Questions + Carry-Forward to Phase 2
08 — Open Questions + Carry-Forward
Decisions deferred from Phase 1 (require Council ratification before Phase 2)
DC-1. Executor whitelist CHECK on job_queue.executor?
Phase 1 stores executor identity only in lease_owner (free text, no CHECK). job_queue itself has no executor column — that intentionally avoids tying every row to an executor at enqueue time.
queue_heartbeat.executor_kind already enforces the §11.5 7-name set: DOT, Agent, Hermes, Codex, PG_worker, external_worker, future_Kestra_adapter.
Question: should we add a target_executor_kind text NULL CHECK (… §11.5 set …) column to job_queue in Phase 2, or keep enqueue executor-agnostic and let consumer_registry (Phase 4) bind job_kind → executor?
Default recommendation: keep agnostic; let DP6 consumer_registry route. Adding the column now would force a schema-level coupling that DP6 explicitly aims to invert.
DC-2. NOTIFY channel naming
queue.notify.enabled=false ships. When flipping it on, what channel name? DP4-Q3 proposes queue_wake_<domain>. No code yet emits, so we have zero technical debt either way.
DC-3. Lease reaper as DOT job vs. as fn
Phase 1 ships neither fn_job_lease_reaper nor a DOT entry for it. Design says it can be a self-registered job_kind. Question: Phase 2 path — fn that we invoke from external scheduler vs. job_kind that the queue runs itself?
Default recommendation: ship fn_job_lease_reaper first (idempotent, callable from cron OR from the queue worker itself); register as job_kind only when consumer_registry lands.
DC-4. DOT catalog entries for queue ops
Deferred (see 03-migration-phase-1-substrate.md §DOT catalog). Phase 4 (consumer_registry) is the natural home. Phase 2 may add a dot_queue_healthcheck entry as the only safe early DOT entry — it's read-only.
DC-5. Activating heartbeat for iu_outbound_default
See 06-heartbeat-and-silent-gap-status.md. Recommended in Phase 2 to close the live §15.5 violation. Two equivalent code paths (in-fn vs. external orchestrator). Council picks.
DC-6. Retention sweep (DP7)
job_queue retention thresholds (30d hot, 90d archive, 365d DLQ) not enforced in Phase 1. Phase 7 task.
DC-7. event_outbox → job_queue bridge
Phase 1 explicitly forbids automatic routing. Phase 6+ task; need consumer_registry first.
DC-8. Idempotency-key shape
Phase 1 uses text (free-form, callers pass UUID strings or business keys). DP2 design suggested uuid. Question: standardise on UUID-only in Phase 2 via CHECK (idempotency_key ~ '^[0-9a-f-]{36}$') or keep flexible?
Default recommendation: keep flexible — many likely callers (MARK/CUT, email_send, vector_sync_drain) already have natural business keys (file paths, message IDs) that are not UUIDs.
DC-9. executor_kind for queue_heartbeat — should MOT ever appear?
§11.5 explicitly lists MOT as is_executor=false. The CHECK on queue_heartbeat.executor_kind does NOT permit MOT, matching the law. Confirmed correct.
DC-10. Backoff strategy
Phase 1 hardcodes exponential * 2^(attempts-1) capped at 2^10. Question: should this become a dot_config.queue.retry.backoff_strategy ∈ {exponential, linear, fixed} selector?
Default recommendation: defer until a second strategy is actually wanted. YAGNI.
Phase-2 carry-forward (recommended sequence)
- HB-A — Wire
fn_queue_heartbeat_tick('iu_outbound_default','PG_worker',…)into the existing iu_outbound_default cadence (via Hermes or in-fn). Flipqueue.heartbeat.enabled=true. Verifyv_queue_health.executors_fresh=1. - HB-B — Add
fn_queue_stale_checkevent emission (system/queue_worker_silent) on fresh→stale transition. Requires CHECK widening ofevent_type_registryifsystem/queue_worker_silentis new — design Phase 3 explicitly per the roadmap. - LR-A — Author
fn_job_lease_reaper— moves stalelease_until < now()rows back toqueued, incrementsattempts, applies backoff. Plus integration test (BEGIN/ROLLBACK). - DLR-A — Author
fn_job_dead_letter_replay(dead_letter_id, authorization_source)— gated bytriage_status='manual_replay'. Plus operator runbook. - WK-A — Phase 2 minimal worker (Python or PL/pgSQL?). Most likely Python under Hermes; iterates
fn_job_claim→ run →fn_job_ack/fn_job_fail_or_retry. Council should pick host process now.
New lessons / memory writes (proposed)
- Confirm with user: write
feedback_dieu45_phase1_substrate_live_patterndocumenting "for any future PG-native queue substrate work: ship gates default OFF, prove via BEGIN/ROLLBACK first, post-rollback diff to confirm inertness, separate hot/DLQ tables, denylist mirrors event_outbox safe_payload, partial-unique idempotency, SKIP LOCKED claim — all proven 2026-05-26 in 050 phase 1." - Refresh [[feedback-dieu45-silent-gap-violation-post-enactment]] noting substrate is now in place; activation is the Phase 2 task.
What is now safe to begin in parallel (no Phase 2 blockers)
- Operator runbook: when to flip each
queue.*.enabledgate, what to monitor inv_queue_healthafterwards. - Architecture index update (HK1 from §22 post-enactment housekeeping): add the new
queue_substrate_phase1surface. - DP1-DP7 design pack annotation: tag DP2, DP3, DP4 docs with "implemented in 050" marker.