06 — Existing vs Needed — Gap Matrix
06 — Existing vs Needed — Gap Matrix
Date: 2026-05-26 | Scope: Map each "system-wide PG-native queue principle" to live substrate; flag gaps.
§1. Principles vs reality
Principles from the mission:
queue_principles:
- system_wide_not_cut_only
- PG_first
- PG_native
- PG_driven
- trigger_in_and_trigger_out_capable
- supports_AGENT_DOT_worker_Hermes
- supports_retry_dead_letter
- supports_status_visibility
- supports_idempotency
- supports_scheduling
- supports_batch
- supports_cleanup
- no_vector_for_transient_payload
- compatible_with_governance/birth/DOT
Matrix:
| Principle | Live evidence | Gap |
|---|---|---|
| system_wide_not_cut_only | 9-domain × 7-stream × 2-lane vocab covers IU, piece, staging, system, birth_registry, governance, tac, kg, dot, health | Only 4 domains have active event types (iu, piece, staging, system); domains 5–8 are vocab-only, no producers wired yet |
| PG_first | All substrate is PG tables + plpgsql + triggers | ✅ |
| PG_native | No external queue (Redis/RabbitMQ/Kafka) involved | ✅ |
| PG_driven | Worker fn lives in PG, called externally; no in-DB scheduler | ⚠️ no pg_cron → calls come from outside |
| trigger_in_and_trigger_out_capable | iu_sql_event_route (trigger-IN bridge, dry-run) + fn_iu_auto_instantiate_from_event (trigger-OUT consumer) |
⚠️ both exist in prototype form; not generalized |
| supports_AGENT_DOT_worker_Hermes | DOT has dot_iu_command_run + dot_iu_command_catalog; Agent/Hermes use external orchestration |
❌ no unified job substrate for Agent/Hermes |
| supports_retry_dead_letter | iu_route_dead_letter (full schema), event_pending.error_count + last_error, iu_route_worker_cursor.dead_lettered |
⚠️ no max_attempts policy in dot_config; no retry backoff schedule |
| supports_status_visibility | iu_route_worker_cursor, v_iu_event_backlog, v_iu_composer_event_backlog, v_iu_route_dead_letter_open, v_iu_auto_instantiate_event_summary, v_dot_iu_command_run_health |
✅ good (could be unified into v_queue_health) |
| supports_idempotency | event_outbox unique constraints; staging idempotency_key; iu_auto_instantiate_event_log idempotency_key; dot_iu_command_run.params_digest | ✅ |
| supports_scheduling | None (no pg_cron, no schedule table) | ❌ scheduling boundary is "whoever calls fn_iu_route_worker_run" |
| supports_batch | Worker drains in single call (fn_iu_route_worker_run); event_pending designed for debounce |
⚠️ no formal batch policy; debounce config absent |
| supports_cleanup | fn_iu_staging_cleanup (3-pass), fn_iu_core_retention_cleanup, retention_enabled flag |
⚠️ retention gated off; cleanup is operator-invoked |
| no_vector_for_transient_payload | CHECK on event_outbox forbids vector, embedding; staging.vector_excluded=true; iu_vector_sync_point has no staging path |
✅ 4-layer guarantee live |
| compatible_with_governance/birth/DOT | event types for staging., piece., iu.structure_*; birth_registry auto-birth proven; DOT runs ledgered | ✅ |
§2. Substrate × use-case matrix
Columns are the 11 use cases from the mission. Rows are the substrate tables. ✅ = directly used today. ⚠ = capable but unwired. ❌ = no fit.
| Substrate ↓ / Use case → | tasks | messages | trigger-IN | trigger-OUT | worker jobs | Agent/Hermes | DOT jobs | review/approval | vector sync | staging cleanup | MOT workflows | IU two-way |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
event_outbox |
⚠ (stream=task) | ⚠ (stream=comment) | ⚠ via SQL route | ✅ | ⚠ | ⚠ | ⚠ | ⚠ (stream=review) | ❌ (separate channel) | ⚠ | ❌ | ⚠ |
event_pending |
⚠ | ⚠ | ⚠ | ⚠ | ❌ | ❌ | ❌ | ⚠ | ❌ | ⚠ | ❌ | ⚠ |
event_read |
⚠ | ⚠ | n/a | n/a | n/a | n/a | n/a | ⚠ | n/a | n/a | n/a | n/a |
event_subscription |
⚠ | ⚠ | n/a | ⚠ | ⚠ | ⚠ | ⚠ | ⚠ | n/a | ⚠ | ❌ | ⚠ |
event_type_registry |
✅ (vocab) | ✅ | ✅ | ✅ | ⚠ | ⚠ | ⚠ | ✅ | n/a | ✅ | ❌ | ⚠ |
iu_route_worker_cursor |
⚠ | ⚠ | n/a | ✅ | ❌ | ❌ | ⚠ | ⚠ | ❌ | ⚠ | ❌ | ⚠ |
iu_route_dead_letter |
⚠ | ⚠ | n/a | ✅ | ❌ | ❌ | ❌ | ⚠ | ❌ | ⚠ | ❌ | ⚠ |
iu_sql_event_route |
n/a | n/a | ✅ (dry-run) | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | ⚠ |
dot_iu_runtime_lease |
✅ | ✅ | n/a | ✅ | ⚠ | ⚠ | ✅ | ✅ | ⚠ | ✅ | ⚠ | ✅ |
dot_iu_command_run |
❌ | ❌ | n/a | ⚠ | ❌ | ❌ | ✅ | ⚠ | ❌ | ⚠ | ❌ | ⚠ |
dot_iu_command_catalog |
❌ | ❌ | n/a | n/a | ❌ | ❌ | ✅ | ❌ | n/a | ❌ | ❌ | ❌ |
iu_staging_record/payload |
❌ | ❌ | n/a | n/a | ❌ | ❌ | n/a | n/a | n/a | ✅ | ❌ | n/a |
iu_vector_sync_point + kb_vector_sync NOTIFY |
n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | ✅ | n/a | n/a | n/a |
iu_auto_instantiate_event_log |
n/a | n/a | n/a | ✅ | n/a | n/a | n/a | n/a | n/a | n/a | ⚠ | ⚠ |
iu_lifecycle_log |
n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | ✅ |
Reading: rows of ✅ + ⚠ mean "substrate fits"; rows of mostly ❌ mean "substrate gap".
The big gap row is worker jobs / Agent / Hermes / MOT workflows — there is no PG-native substrate for arbitrary long-running jobs today.
§3. Minimum additive deltas to close the gaps (NOT a design — just a list)
These would each need to be designed properly in the next pack:
| Δ | What | Why |
|---|---|---|
| Δ1 | event_type_registry rows for non-iu domains (governance.*, tac.*, kg.*, dot.*, health.*, birth_registry.*) |
Activate dormant vocab |
| Δ2 | Additional worker cursors (iu_route_worker_cursor rows) for domains beyond iu |
Drain staging., system., dot.* events |
| Δ3 | A worker invocation contract — either pg_cron rows OR a documented external-caller cadence (Hermes / Codex / Directus flow) | Currently implicit |
| Δ4 | Generic SQL-event routes in iu_sql_event_route (today only 1 row, dry-run; widen target_event_domain CHECK) |
Generalize trigger-IN |
| Δ5 | Job substrate decision: extend event_outbox (new domain job) OR new job_outbox table |
Cover Agent/Hermes/long-running |
| Δ6 | Retry policy table or dot_config keys (event.retry.max_attempts, event.retry.backoff_seconds) |
Codify retry semantics |
| Δ7 | Scheduler table or dot_config keys (schedule.<job_name>.cron) |
Even without pg_cron, formalize "what should run when" |
| Δ8 | Outbox-to-NOTIFY bridge (optional) — emit pg_notify('event_universal', json) on event_outbox INSERT, consumed by external worker |
Bring event substrate up to vector-sync parity |
| Δ9 | A v_queue_health aggregated view across cursors, dead-letter, retention candidates |
One-pane observability |
| Δ10 | Ratify inclusion criteria (P3D4C0X §M.3) as enforced — registry check + producer rubric | Prevent activity-log scope creep |
None of Δ1–Δ10 require PG 18.
§4. Risk register on the substrate as-is
| Risk | Severity | Mitigation deferred to design |
|---|---|---|
| Worker runs once, never re-tickled (no scheduler) | Medium | Δ3 |
| event_pending could accumulate silently if worker breaks | Medium | Δ9 alerts + dead-letter |
| Subscription resolution view assumes default-broadcast fallback (per design doc); not verified live | Low | Re-verify live behavior |
| iu_sql_event_route CHECK forces dry_run when enabled=false — must be explicitly relaxed to enable | Low (good safety) | None |
event_outbox has no partitioning; will reach >1M rows within 8 days at current system/issue_opened rate |
Medium | Partition-by-month after >5M rows |
| iu_notification_event (legacy) is empty — compat layer status unclear | Low | Confirm fully deprecated then drop |
| Operator-driven cutting flow may bottleneck on human attention | Low | Optional puller in design |
§5. Summary
The substrate is mostly there for events and mostly missing for jobs. Closing the system-wide-queue ambition is additive for events (Δ1–Δ4, Δ6–Δ10) and a design choice for jobs (Δ5).
No demolition required. No PG upgrade required. No new extension required (unless pg_cron is in scope — separate decision).