14 — Implementation Roadmap (Phase 1–7)
14 — Implementation Roadmap (Phase 1–7)
DESIGN-ONLY. No phase implemented here. Each phase is a future pack with its own Hard Gate + Council ratification. Phase order is chosen for safety, not delivery speed.
§1. Goal
Sequence the work introduced by this design pack into seven phases, each:
- Independently reversible (rollback story).
- Independently observable (
v_queue_healthextension). - Independently gateable (
dot_configflags). - Each phase opens a small additive surface; later phases consume earlier substrate.
Phase order matches §18.4 sub-design pack naming where possible (DP1–DP7) and additionally introduces three integration phases.
§2. Phases at a glance
| Phase | Theme | Substrate change | Risk | DP source |
|---|---|---|---|---|
| Phase 1 | Minimal job substrate | job_queue + job_dead_letter + gated fns (no-op when disabled) |
Low — additive | DP2 |
| Phase 2 | Retry / lease / DLQ wired | dot_config queue.retry.*, lease fn integration |
Low — config-driven | DP3 |
| Phase 3 | Worker / NOTIFY / heartbeat | queue_heartbeat + stale-check + optional NOTIFY bridge |
Medium — closes §15.5 silent gap | DP4, DP1 |
| Phase 4 | Trigger-in / Trigger-out generalisation | consumer_registry + widened iu_sql_event_route CHECK |
Medium — CHECK change needs ratification | DP5 |
| Phase 5 | MARK/CUT queue pilot | First puller-enabled collection; staging.* event consumers | Medium — first live consumer flow | doc 10 |
| Phase 6 | MOT pilot | job_workflow + fn_mot_graph_emit + first MOT template |
High — MOT spec dependency | doc 11 |
| Phase 7 | Customer / email / message pilot | First customer_message_* job kinds + channel adapter | High — PII boundary, external SMTP/IMAP | doc 12 |
§3. Phase 1 — Minimal job substrate
§3.1 Scope
- Create
job_queue,job_dead_letter,job_workflowtables (job_workflow may slip to Phase 6). - Create all gated fns (
fn_job_*) with disable-flag short-circuit. - Create new dot_config keys:
queue.job_substrate.enabled=false,queue.retry.*defaults,queue.lease_reaper.enabled=false. - Create views:
v_job_queue_backlog,v_job_queue_in_progress,v_job_queue_dead_letter_open,v_dead_letter_all.
§3.2 Out-of-scope at Phase 1
- No
consumer_registry(Phase 4). - No
queue_heartbeat(Phase 3). - No
job_subscription(Phase 3 or later). - No
event_outboxchange. - No first job actually enqueued.
§3.3 Verification gate
- Functions exist; calling enqueue with
queue.job_substrate.enabled=falsereturns{refused: true, reason: 'job_substrate_disabled'}. v_job_queue_backlogreturns 0 rows.- D9 KB report shows new tables + fns count.
- Regression: existing event_outbox queries unchanged.
§3.4 Rollback
- Drop new tables (no rows yet).
- Remove new fns.
- Remove new dot_config keys.
§4. Phase 2 — Retry / lease / DLQ wired
§4.1 Scope
- Implement DP3 retry/backoff inside
fn_job_fail_transient(no behavioural change yet; flag-gated). - Implement
fn_job_lease_reaperas a callable fn. - Implement
fn_job_dead_letter_replay. - Council ratifies concrete
max_attemptsnumbers perdot_config.queue.retry.max_attempts.<job_kind>.
§4.2 Verification gate
- Test a synthetic transient failure: lease expires → reaper rescues → row re-claimable.
- Test permanent failure path: row in
job_dead_letter. - Test replay: row back in
job_queuewithattempts=0.
§5. Phase 3 — Heartbeat + NOTIFY bridge + DP1 cadence
§5.1 Scope
- Create
queue_heartbeattable + register/write/stale-check fns (DP4). - Create
v_queue_healthaggregating cursor + heartbeat + DLQ + backlog. - Register existing
iu_outbound_defaultworker as firstqueue_heartbeatrow (its existinglast_run_atmigrates to a newqueue_heartbeat.last_beat_aton every tick). - Register
system/queue_worker_silentinevent_type_registry(Council vocab gate). - Optionally: enable NOTIFY bridge (DP1 Layer 2) per
queue.notify.bridge_enabled(default false). - Document external cadence: one named orchestrator owns ticks (Hermes, host cron, dedicated container).
§5.2 Critical milestone
Close the 4-day silent gap. Within Phase 3 completion, iu_outbound_default should have a heartbeat fresher than stale_threshold (default 300s).
§5.3 Verification gate
v_queue_healthrow foriu_outbound_defaultshowsstatus_hint='fresh'.- A simulated 10× cadence skip emits
system/queue_worker_silent. - D31 watchdog / D43 red_zones receive the event.
§6. Phase 4 — Trigger-in / Trigger-out generalisation
§6.1 Scope
- Widen
iu_sql_event_route.target_event_domainCHECK to §6.1 9-domain set (Council vocab gate). - Create
consumer_registry+fn_consumer_dispatch(DP5). - Migrate
fn_iu_auto_instantiate_from_eventinto aconsumer_registryrow (callable from dispatch). - Create
job_subscriptiontable + executor binding (DP6).
§6.2 Verification gate
- One existing event_type (e.g.
iu.template.instance_auto_composed) is consumed via the new dispatch path, producing identical effect to legacy. - DLQ remains empty.
§7. Phase 5 — MARK/CUT queue pilot
§7.1 Scope
- Pick one collection (probably a low-volume governance one) as the puller-enabled pilot.
- Add
consumer_registryrows forstaging.record_approved → cut,staging.record_consumed → verify_cut,staging.record_cleaned → (terminal). - Worker (Agent/Codex/DOT) claims the resulting
job_queuerows and callsfn_iu_op_*aliases unchanged. - Operator continues to drive MARK and APPROVE; CUT and CLEANUP automate.
§7.2 Verification gate
- One end-to-end MARK → CUT → CLEANUP cycle in the pilot collection runs without operator touching CUT.
- All status transitions visible in
v_queue_health+v_iu_staging_record.
§8. Phase 6 — MOT pilot
§8.1 Scope
- Implement
job_workflow(if deferred from Phase 1). - Implement
fn_mot_graph_emit. - Ship one MOT template — likely the cutting pipeline expressed as a MOT graph (so MARK/CUT becomes a 6-step MOT workflow).
- Add
fn_job_workflow_refreshas a tick job.
§8.2 Verification gate
- One MOT-generated workflow visible in
v_job_workflow_healthadvancing throughactive → succeeded. - No
MOTvalue appears in anyexecutorfield — enforcement intact.
§9. Phase 7 — Customer / email / message pilot
§9.1 Scope
- Open the customer message store sub-design pack (separate macro).
- Implement first channel adapter (likely IMAP→PG read + SMTP send via
external_worker). - Register
customer.*event_type values in registry. - Register
customer_message_*job_kinds inconsumer_registry. - Enforce approve-required flag (default on).
§9.2 Verification gate
- One round-trip customer interaction (inbound → classify → draft → approve → send) processed through queue.
- PII boundary preserved (no body in
safe_payload). - DLQ empty for
customer_message_send.
§10. Cross-phase dependencies
Phase 1 (job_queue)
│
├──→ Phase 2 (retry/lease) ──→ Phase 5 (cut pilot)
│ ▲
│ │
└──→ Phase 3 (heartbeat) ──────────┴─→ Phase 4 (dispatch) ──→ Phase 6 (MOT) ──→ Phase 7 (customer)
│ ▲
└─→ closes §15.5 silent gap (depends on §13.2)
Critical path order: Phase 1 → Phase 3 (heartbeat fix is high-value because §15.5 is already violated). Phases 2, 4 may parallelise after Phase 1.
§11. Per-phase governance
Each phase requires:
- A separate prompt-mục-tiêu-mở macro (per Open Goal Prompt Guide v1.2).
- Council Round 1 review (GPT) of the design.
- Optional Council Round 2 (Gemini).
- User approval.
- Migration ratification + dry-run + apply.
- Post-apply verification gate.
- Memory update + report upload.
No phase combines DDL ratification with DML or with dot_config mutation in a single migration — Đimu 35/44 discipline.
§12. Out-of-scope at every phase
- pg_cron installation (
§5.4— requires its own amendment). - PG 18 upgrade (separate readiness macro).
- Any change to Điều 45 v1.0 substance (requires amendment process per §18.1).
- Any change to existing
event_outbox,iu_core.iu_staging_*schema. - Qdrant / vector tier mutation.
§13. Open questions
| # | Question | Routed to |
|---|---|---|
| RM-Q1 | Phase order: should Phase 3 (heartbeat) come before Phase 1 (job_queue) to fix silent gap faster? | Council |
| RM-Q2 | Combine Phase 1+2 into one migration (smaller blast radius) or separate (cleaner gates)? | Council |
| RM-Q3 | Phase 5 pilot collection — which one? | Council + operator |
| RM-Q4 | Phase 6 MOT template — should it be the cut pipeline, or a new low-stakes workflow? | Council |
| RM-Q5 | Approve the 7-phase cadence (vs collapsing into fewer)? | Council |
Implementation roadmap. No phase executed. Authored 2026-05-26.