KB-296E

08 — Questions to Answer Before Design

8 min read Revision 1
surveyopen-questionspre-designcouncil-reviewqueue-law

08 — Questions to Answer Before Design

Date: 2026-05-26 | Scope: Open questions that block or shape the design pack. None of these should be answered inside this survey — they need user + GPT Council.


§1. Scope questions

Q1. What does "system-wide queue" mean operationally?

  • (a) A unified event store with worker drain (current substrate, with widening).
  • (b) A generic job queue with claim/start/done semantics for any worker.
  • (c) Both — events and jobs are distinct substrates with shared invariants.
  • (d) Something else.

Q2. Is the cutting flow's operator-driven model acceptable long-term, or must it become puller-driven?

  • (a) Stay operator-driven; queue serves only as audit + retry surface.
  • (b) Add an automated puller for approved manifests → CUT.
  • (c) Hybrid: operator triggers MARK and APPROVE; puller handles VERIFY and CLEANUP.

Q3. Should non-IU domains (governance, tac, kg, dot, health, birth_registry) ship producers in the design pack, or stay vocab-only?

Q4. Do MOT-generated workflows need to live in the same queue, or are they out of scope until MOT spec stabilises?


§2. Substrate questions

Q5. Re-confirm event_outbox unique-index policy. The design (23-P3D4C0X §C) proposes UNIQUE (event_domain, event_type, event_subject_table, event_subject_ref) WHERE event_subject_ref IS NOT NULL. This survey did not introspect actual unique indexes on the live table. Council/design must verify the live constraint matches the spec.

Q6. Should event_pending and event_outbox carry the trace columns proposed in design but missing live? source_document_ref, import_batch_ref, causation_id, event_subject_type, source_function, processed_at are in doc 23-P3D4C0X §C but NOT in live event_outbox. Add now? Add later? Decide intentionally.

Q7. Status of iu_notification_event (legacy). Empty live. Is the compat layer (v_event_unified per doc 23-P3D4C0X §I.3) ever going to be needed, or has IU notification been fully ported into event_outbox? If ported, when do we drop the legacy tables?

Q8. Status of iu_sql_event_route activation. 1 dry-run/disabled row. Is this the future trigger-IN bridge for the whole system, or should it be replaced by something more generic? CHECK target_event_domain is currently {iu, iu_sql} — should be {iu, piece, staging, system, birth_registry, governance, tac, kg, dot, health, iu_sql} or similar.

Q9. Worker scope decision. One worker (iu_outbound_default on domain iu) currently. Per-domain workers, or one fan-out worker with per-cursor scope, or staffed-by-volume?


§3. Scheduling questions

Q10. pg_cron — install or not?

  • (a) Install pg_cron 1.6+ on PG 16.13 and put queue ticks in the DB.
  • (b) Stay external — Hermes / Codex / Directus / a dedicated worker container holds the cadence.
  • (c) Defer until queue volume justifies it.
  • This is the decision that shapes §7 of any Điều 45 text.

Q11. If external, who is the canonical worker invoker? Today the worker has been called manually and last ran 2026-05-22 — that's a 4-day silent gap. Is that gap acceptable? If not, what mechanism guarantees a tick every N seconds?

Q12. LISTEN/NOTIFY bridge — yes/no? Add a trigger AFTER INSERT ON event_outbox that emits pg_notify('event_universal', json)? This brings event substrate to vector-sync parity (low-latency push for daemons) without requiring pg_cron. Tradeoff: any DB-connected listener needs to handle reconnect / replay correctly.


§4. Retry / dead-letter questions

Q13. Max attempts policy. iu_route_dead_letter.attempts ≥ 1. No max defined. Council picks default + per-domain overrides?

Q14. Backoff schedule. No backoff today (worker just retries on next tick). Should design specify exponential / linear / configurable?

Q15. Resolution authority. resolution ∈ {replayed, discarded, superseded}. Who is authorised to set which? fn_iu_route_dead_letter_replay exists but not gated by role in any clear way; needs a policy.


§5. Inclusion criteria questions

Q16. Ratify 23-P3D4C0X §M.3 (3-question gate) as enforced? The gate is well-drafted but not enforced. Should event_type_registry insert require evidence of gate-pass? Should producers be blocked if their event_type is not in registry?

Q17. event_outbox vs activity_log boundary. The "not an activity log" hard rule from 23-P3D4C0X §0 is critical. Council should ratify the inclusion / exclusion lists from §M.1 / §M.2.


§6. Cross-law questions

Q18. Who owns each piece?

Piece Candidate owner law Decided?
Envelope contract Điều 45 (new) no
Routing / subscription Điều 37 (governance org) no
Directus exposure Điều 35 + 43 no
Display rule Điều 28 no
Worker / lease / pg-cron Điều 45 no
Inclusion criteria Điều 45 no

Q19. Cross-references to ratify. Mission expects D22, D28, D30, D31, D35, D36, D37, D43, D44 cross-refs. Are any others needed (e.g. D39 KG)?


§7. PG 18 questions (for the separate readiness macro)

Q20. Is pg_cron 1.6+ available on our Debian pgdg-18 source? This survey did not verify; verification needed before the PG 18 upgrade macro can commit a date.

Q21. Does our Directus 11.x support PG 18?

Q22. Wire-protocol 3.2 client matrix. libpq still defaults to 3.0; verify Hermes, Agent Data daemon, Directus client versions.

Q23. MD5 → SCRAM auth migration scope. Inventory pg_hba.conf and current role hash schemes.

Q24. Page-checksum policy on upgrade. Adopt checksums (CPU/IO cost) or use --no-data-checksums for like-for-like?


§8. Other open issues surfaced during the survey

Q25. The 4-day worker silence (last_run_at 2026-05-22, today 2026-05-26). Is this expected, or is the worker not being called at the cadence intended? route_worker_enabled='true' but no recent tick. Either:

  • Hermes/Codex stopped invoking it.
  • Workload genuinely has nothing to drain. Verify before design.

Q26. event_subscription only has 3 rows. Two for role:health_owner (system/issue_opened, system/issue_resolved), one for agency:sysop (stream=alert). No IU subscriptions, no DOT subscriptions, no agency routings. Is this minimal-config-by-design (default broadcast fallback) or under-configured?

Q27. iu_route_worker_cursor.dead_lettered = 0 despite 67 attempts_written. Healthy or worker isn't surfacing failures? Sample a failed-attempt-but-not-DLQ scenario to confirm worker error handling correctness.

Q28. The vocab event_domain='piece' (sub-domain) overlaps semantically with event_domain='iu'. Piece is a kind of IU. Either keep the sub-domain split (and document why) or merge them. Design decision.


§9. Refused-to-decide list (this survey will NOT answer these)

  • Final Điều 45 article structure.
  • pg_cron yes/no.
  • job_outbox yes/no.
  • PG 18 upgrade date.
  • Worker invocation cadence.
  • Any vocab change to event_type_registry.

All of those belong in the next pack, not this one.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-system-wide-pg-native-queue-law-readiness-survey/08-questions-to-answer-before-design.md