00 — System-Wide PG-Native Queue Law + Readiness Survey — Summary
00 — System-Wide PG-Native Queue Law + Readiness Survey — Summary
Date: 2026-05-26 Status: SURVEY ONLY — no design, no DDL, no production mutation, no PG upgrade. Scope: Answer 4 questions before any queue design is written. Mode: Read-only KB + read-only PG + public web research.
§0. Top-line verdicts
| Question | Verdict (this survey) |
|---|---|
| Q1. Does queue/outbox/event infrastructure already exist live? | YES — substantial substrate already in production. event_outbox (131,746 rows), event_pending, event_read (131,407 rows), event_subscription, event_type_registry (31 types), iu_route_dead_letter, iu_route_worker_cursor, iu_sql_event_route, dot_iu_runtime_lease, fn_iu_route_worker_run (active, last run 2026-05-22), fn_iu_emit_event, fn_iu_auto_instantiate_from_event, fn_iu_route_dead_letter_replay. Also iu_notification_event/read legacy tables (empty). Plus iu_core.iu_staging_record/payload (mark→cut staging lifecycle). |
| Q2. Should queue governance become a new law, or be added to an existing law? | Tentative: NEW law (Điều 45 candidate) — but ONLY after design pack. Current substrate is implementation; no law currently covers system-wide queue invariants. §C below. |
| Q3. What should a PG-native system-wide queue law cover? | 8 minimum invariants identified (idempotency, durability, retry/dead-letter, lifecycle vocab, payload classification, no-vector-in-transient, lease/lock, observability). §D below. |
| Q4. PG 16.13 vs PG 18 for this work? | STAY ON 16.13 for queue design. Separate later macro for PG 18 upgrade readiness. PG 18 features are nice-to-have (uuidv7, AIO, OLD/NEW RETURNING) but the queue design does not depend on them; major-version upgrade introduces orthogonal risk (page checksum default, MD5→SCRAM, FTS reindex, pg_cron PG18-compat unconfirmed). §05 report. |
§A. Substrate-already-live finding (the single most important fact)
The system is not greenfield. The universal event/outbox pipeline from 23-P3D4C0X design has been built in stages:
- 131,746 rows in
event_outboxsince 2026-05-08. - Worker
iu_outbound_defaultis enabled (dot_config.iu_core.route_worker_enabled='true') and has processed 67 attempts / 0 dead-letter as of 2026-05-22. event_type_registryhas 31 registered types across 4 active domains (iu,piece,staging,system); supported vocab covers 9 domains.- Dead-letter table is present (0 rows — clean state).
- SQL→event router (
iu_sql_event_route) exists in dry-run/disabled state — generic capability not yet flipped. - Staging lifecycle (
iu_core.iu_staging_record) has 7-state vocab (pending → pending_review → approved → consumed → rejected → expired → cleaned) and operates as a per-domain pending tier with cleanup retention. - Runtime lease primitive (
dot_iu_runtime_lease+ acquire/release fns) exists for cross-process serialization without pg_cron.
Implication: any "system-wide PG-native queue" design must either extend this substrate (most likely) or deprecate it (high blast radius, almost certainly rejected). The default working assumption from this survey is extend.
§B. The cutting flow already runs on top of this substrate (mostly)
The current Information Unit MARK → VERIFY-MARK → APPROVE → CUT → VERIFY-CUT → CLEANUP workflow is implemented with:
- Staging tier (
iu_core.iu_staging_record+iu_staging_payload) carrying the 7-state lifecycle, idempotency_key, content_hash, expires_at, consumed_by_run_id, cleaned_at, vector_excluded. - Operator alias (
fn_iu_op_mark_file / verify_mark / cut / verify_cut / cleanup_dry_run) wrapping the underlying creators with workflow_admin trust + composer/runtime gates. - Run ledger (
dot_iu_command_run) capturing per-call run_id, run_mode, run_status, params_digest, gate_snapshot, evidence. - Event emission into the universal
event_outbox(5 staging.* event types active in registry; staging.record_created isstream=birth, othersstream=update, alldelivery_lane=delayed).
What is missing for a generic queue lens:
- The flow is operator-driven, not scheduler-driven. There is no automatic "pull next MARK to verify" or "pull next approved record to CUT" — the operator (human or agent) explicitly calls each step.
- There is no in-DB scheduler (no
pg_cron) to drain delayed-lane events or cleanup expired staging rows; that is done by external invocation today.
The cutting flow can therefore be represented as states on top of the existing substrate, but does not currently behave like a queue — it behaves like an explicit pipeline with idempotency guarantees and event observability.
§C. Law placement — recommendation gist (full analysis in 04-law-placement-analysis.md)
| Option | Verdict (this survey) |
|---|---|
| A — extend Điều 35 (DOT Governance) | ❌ Queue scope is broader than DOT executors. |
| B — extend Điều 36 (Collection Protocol) | ❌ Queue scope is broader than collection birth. |
| C — extend Điều 31 (System Integrity) | ⚠️ Possible — queue is integrity-adjacent — but queue is operational substrate, not contract validation. |
| D — extend Điều 44 (Implementation) | ⚠️ Acceptable interim — current event substrate already sits inside Điều 44 implementation docs. |
| E — new Điều 45 "Luật Hàng Đợi & Điều Phối Tác Vụ PG-native" | ✅ Recommended target — queue/scheduling/routing/retry/dead-letter/lease invariants cross-cut every domain (IU, Birth Registry, Governance, TAC, KG, System, DOT, Health) and need a single owner; this matches the candidate Điều 45 already noted in 23-P3D4C0X §K. |
Important caveat: the existing P3D4C0X design note explicitly recommended deferring the new-law decision until after Phase 2 PoC (per §K). Phase 2 substrate is now built. The deferral condition is met — but the decision still belongs to the user + GPT council, not to this survey.
§D. What the law should cover (minimum invariants)
Drawn from the existing substrate + the gaps observed:
- Idempotency contract — every enqueue / event emission MUST carry an idempotency_key or equivalent unique constraint. Already enforced for events; not yet uniform for run-level jobs.
- Durable + transient split — durable events (
event_outbox) vs transient staging (event_pending,iu_staging_record). Hard payload-classification CHECK (no body / no vector / no secret / no token) is already live and should be law-level. - Lifecycle vocab discipline — staging has 7 states; events have 9 domains × 7 streams × 2 lanes; runs have run_mode {plan, apply, verify} × run_status {planned, applied, verified, refused, failed}. Law should fix these vocabs as governed.
- Retry / dead-letter contract —
iu_route_dead_letterexists withattempts ≥ 1,failure_code,resolution ∈ {replayed, discarded, superseded}. Law should fix retry semantics + max attempts + resolution responsibilities. - Lease / advisory lock primitive —
dot_iu_runtime_leasealready implements named-lease with token + expiry + renew. Law should define when a job MUST acquire a lease before mutating. - No-vector-in-transient hard rule — staging payloads MUST NOT enter Qdrant; this is already a 4-layer guarantee. Law should ratify it.
- Observability contract — every worker MUST update a cursor (
iu_route_worker_cursorschema is the template); every run MUST writedot_iu_command_run; every dead-letter MUST keepevent_snapshot. - Scheduler boundary — until pg_cron is installed, jobs are externally triggered (Hermes / Codex / Directus / operator). Law should explicitly name this constraint and require a re-evaluation if/when pg_cron lands.
§E. PG 16.13 vs PG 18 — short answer
Recommendation: stay on PG 16.13 for the queue design. PG 18 brings useful features (uuidv7, AIO, OLD/NEW in RETURNING, virtual generated columns, temporal constraints) but none are blocking for the queue invariants above. A PG 18 upgrade is its own readiness macro — see 05-pg16-vs-pg18-assessment.md for the breakdown of benefits, risks, and the safe next step.
The cost of NOT upgrading: minor performance ceiling on scans/vacuums, no native uuidv7 (still have gen_random_uuid for v4).
The cost of upgrading NOW: orthogonal risk (page-checksum default, MD5 deprecation, FTS reindex, protocol 3.2, pg_cron PG18 compatibility unconfirmed) bundled into a design phase. Bad time.
§F. Strict non-goals reaffirmed
This survey did NOT:
- create queue tables
- create triggers, workers, or pg_cron jobs
- modify any law text
- update START-HERE
- touch MARK / CUT aliases
- upgrade PostgreSQL
- mutate production
- author the final queue design
event_outbox, event_read, event_subscription, event_type_registry, event_pending, iu_route_*, iu_staging_*, iu_lifecycle_log, iu_vector_sync_point, dot_iu_runtime_lease, dot_iu_command_run, dot_iu_command_catalog, iu_auto_instantiate_event_log, all fn_iu_* and fn_dot_iu_* functions are untouched. No row inserted, updated, deleted.
§G. Next required pack (proposal only)
A subsequent design pack should:
- Re-state the invariants in §D as a draft Điều 45 article structure.
- Decide pg_cron acquisition (yes/no/when) and its dependency on the PG 18 question.
- Decide whether generic non-event jobs (Hermes/Agent jobs, MOT workflows, vector sync drainers) get a new
job_outboxsubstrate or piggyback onevent_outboxwith a job-style domain. - Address the 8 open questions in
08-questions-to-answer-before-design.md.
This survey makes no commitment about what that pack should conclude.
§H. Verification block
phase_status=PASS
mode=survey_only
no_pg_mutation=true
no_directus_mutation=true
no_qdrant_mutation=true
no_law_text_change=true
no_start_here_change=true
no_alias_change=true
no_pg_upgrade=true
no_queue_design_authored=true
kb_docs_surveyed=7+
live_pg_objects_inventoried=20_tables_77_functions
pg_version=16.13
pg_extensions=btree_gist,pgcrypto,plpgsql,postgres_fdw
pg_cron_installed=false
pg_listen_notify_in_use=true_for_qdrant_sync_only
event_outbox_rows=131746
event_read_rows=131407
event_type_registry_rows=31
event_subscription_rows=3
iu_route_worker_cursor_rows=1
iu_route_dead_letter_rows=0
recommended_law_placement=new_dieu45_after_design_pack
recommended_pg_upgrade_now=false
next_required_pack=DIEU45_QUEUE_LAW_DESIGN_DRAFT