IU 4-Mothers Master Design Rev2 — WS5 Event 5-Layer + Realtime Gateway + DLQ (DRAFT 2026-05-27)
Master Design Rev2 — Event 5-Layer + Realtime Gateway + DLQ (WS5)
Path:
knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/03-event-5layer-realtime-dlq-design.mdStatus: DRAFT Rev2 (document-only). Companion to00-master-design-rev2.md. Date: 2026-05-27 Authority: Rev2 brief §6 (Event 5-layer + reconcile checklist §6.6), §7.4 (Governance UI), §8 (IU Event Contract), §10 (Usage Evidence), §13 (Constitution matrix), §14 (PG Maximization). Boundary: Điều 45 v1.0 owns queue/event-core/work_state_machine/executor boundary/heartbeat caller. This document only assembles around Điều 45 — never redefines. Nuxt = render shell (Điều 28 / S178). Approval = Điều 32. IU = Điều 38/39.
§1. Scope and boundaries
This document designs:
- Layer 1 — Event Producers (Rev2 §6.1) — taxonomy, register-before-emit, capture-by-config.
- Layer 2 — Broker / Event Bus + Job Queue split (Rev2 §6.2).
- Layer 3 — Consumers / Workers / Executors (Rev2 §6.3) — executor class registry, ACK/NACK, idempotency, heartbeat, W3C trace_id.
- Layer 4 — Realtime Gateway (Rev2 §6.4) — backend gateway, Nuxt SSE shell-only, permission/relevance filter, governance summary.
- Layer 5 — DLQ / Recovery / Governance (Rev2 §6.5) — idempotency registry, retry policy registry, dlq_replay_request ledger, schema compat, audit timeline, governance UI binding.
- IU Event Contract (Rev2 §8) — refs-only envelope, W3C trace shape, semver schema.
- Usage Evidence Registry (Rev2 §10) — 8-signal derivation feeding KG / governance.
This document does not design:
- The internal mechanics of
fn_iu_cut_*,fn_iu_op_*aliases, MARK/CUT contract — those are Điều 38/39 surface, owned elsewhere. - The Điều 45 queue/event-core schema itself — referenced read-only.
- The state machine transitions — see
02-step-state-machine-and-workflow-ui-design.md. - OSS tool selection for broker / NATS / Centrifugo / OTel — see
05-oss-candidate-strategy-rev2.md.
§2. Top-level invariants (carry to every layer)
- Producers only produce; workers only execute. Queue does not auto-run scripts (Rev2 §6.2 + §6.3).
- Event bus ≠ job queue. Event bus is pub/sub fan-out; job queue is durable lease + retry + DLQ. Tables for each are distinct.
- Register-before-emit. Every event type must exist in
event_type_registrywith a valid JSON schema before any producer is allowed to emit it. - Refs-only payload. Event payloads carry signal/ref, not heavy body (Rev2 §6.2 + §8 + Điều 45 §13.5). Worker fetches body from PG/Directus/DOT.
- W3C trace shape NOW. Every event/job carries
trace_id+correlation_id+parent_span_idin W3C shape from day 1 (Rev2 §6.3 + §15 L1; memoryfeedback-oss-tool-adoption-state-vocab-fit-and-config-first-test). - Heartbeat caller pattern. Every worker class MUST have a heartbeat caller; new workers MUST emit heartbeat from day 1 (Rev2 §6.3 + §7.4.6; Điều 45 §15.5; memory
feedback-protect-legacy-silent-passive-heartbeat-from-false-heal). - Idempotency mandatory. Every executor call carries an idempotency key; replay is safe by design.
- Nuxt boundary. Nuxt never connects to core queue / event_outbox / NOTIFY directly. Realtime through backend gateway only (Rev2 §6.4).
- No cross-IU vector pollution. Adapters that mirror events to external systems (Benthos / NATS / OTel) must NOT collapse multiple IUs into a single point or fan body across IU boundaries (Rev2 §13 vector law).
- Reversibility. Every new event type, executor class, retry policy must have rollback (deprecate event type, retire executor, freeze retry policy).
§3. Layer 1 — Event Producers
§3.1 Register-before-emit substrate
Existing event_type_registry [VL row 10]. Producer write path:
PRODUCE(event_type, payload_json, trace_id, correlation_id, parent_span_id?)
1. Lookup event_type in event_type_registry; refuse if absent.
2. Validate payload_json against registered JSON schema (semver-resolved).
3. Insert into event_outbox (canonical ledger) in TX with the originating row mutation.
4. NOTIFY 'event_outbox_tick' (Layer 2 trigger; see §4.1).
Open decision: schema compatibility mode (OD8). Design default — event_type_registry.compat_mode ∈ {forward, backward, full, breaking_with_policy}. Default = forward (consumers tolerate new optional fields). Breaking change requires a schema_change_policy row capturing migration plan; producer refuses to emit a new major version until policy row is present.
§3.2 Producer taxonomy (Rev2 §6.1)
| Producer class | Source | Mechanism | Examples | Live? |
|---|---|---|---|---|
pg_dml_trigger |
PG row mutation | trigger function → event_outbox insert |
task INSERT, workflow_step UPDATE | yes (rev1 §1) |
iu_axis_lifecycle |
IU axis refresh / compose / structure ops | DOT alias → outbox | iu.axis_refreshed, iu.composed, iu.split_proposed |
yes |
dot_command_lifecycle |
dot_iu_command_run transitions |
trigger → outbox | dot.command_started/completed/failed |
yes |
cut_pipeline |
cut_request_transition |
trigger → outbox | cut.mark_verified, cut.completed, cut.failed |
yes |
workflow_runtime |
workflow_run / step_run / task_run state changes |
step transition validator → outbox | step.ready, step.completed, etc. (see 02-… §4.2) |
partial |
proposal_lifecycle |
workflow_change_requests / generic proposal table |
trigger → outbox | proposal.submitted, proposal.approved, proposal.rejected |
partial |
human_decision |
UI RPC | backend RPC handler → outbox | decision.recorded, pic.assigned |
partial |
agent_output |
AI agent worker | worker writes via DOT → outbox | agent.suggestion, agent.warning |
partial |
external_api_adapter |
external webhook / poll | approved adapter → outbox | external.callback_received |
future |
Capture-by-config principle (Rev2 §6.1): adding an event type requires only an event_type_registry INSERT + JSON schema; no application code redeploy. Producer dispatch table (paper-only) maps event_type → producer class → emit path.
§3.3 IU Event family (Rev2 §8 R8.1)
IU lifecycle emits these registered event types:
| event_type | Trigger | Payload fields (refs only) | Compat mode |
|---|---|---|---|
iu.born |
New information_unit row + axes resolved |
iu_unit_id, iu_version_id, governance_state |
forward |
iu.edited |
New iu_version (minor or major) |
iu_unit_id, iu_version_id, prior_version_id, change_kind |
forward |
iu.split_proposed |
KG / human proposes split | iu_unit_id, proposal_id |
forward |
iu.split_approved |
Điều 32 approves split | iu_unit_id, child_iu_unit_ids, approval_id |
forward |
iu.merged |
Merge approved | iu_unit_id (kept), merged_from_iu_unit_ids, approval_id |
forward |
iu.deprecated |
Governance state → deprecated |
iu_unit_id, iu_version_id, reason_code |
forward |
iu.linked |
KG edge created | iu_unit_id, related_iu_unit_id, edge_type |
forward |
iu.rendered |
Render layer materializes IU body to a surface | iu_unit_id, iu_version_id, render_surface, render_context_id |
forward |
iu.validated |
Axis validate pass | iu_unit_id, iu_version_id, axis_kinds[], verdict |
forward |
iu.used_in_workflow |
Workflow run binds IU | iu_unit_id, iu_version_id, workflow_run_id, step_run_id |
forward |
Envelope (Rev2 §8 R8.2, every IU event):
{
"event_type": "iu.used_in_workflow",
"event_schema_version": "1.0.0",
"iu_unit_id": "…",
"iu_version_id": "…",
"workflow_run_id": "…",
"task_run_id": null,
"trace_id": "<32 hex>",
"correlation_id": "<uuid>",
"parent_span_id": "<16 hex>",
"trace_flags": "01",
"traceparent": "00-<trace_id>-<parent_span_id>-<trace_flags>",
"produced_at": "2026-05-27T…Z",
"producer_class": "workflow_runtime",
"idempotency_key": "<producer-scoped>",
"extra_refs": { … }
}
Payload policy (MP-D8 — refined):
- Forbidden keys (deny-list).
body_text,instruction_text,payload_blob,iu_bodyMUST NEVER appear in an event payload. Schema validator refuses any payload carrying these keys, regardless of event_type (Rev2 §6.2 + §8 R8.3). - Allowlist per event_type schema. Beyond the universal envelope (event_type / event_schema_version / trace_id / parent_span_id / trace_flags / correlation_id / produced_at / producer_class / idempotency_key / extra_refs), every additional key must be declared in the registered
event_type_registry.schema_jsonbfor that event_type. Unknown keys are rejected at producer write time. - Max payload size (configurable).
dot_config event_payload.max_size_bytes.<event_type>declares per-event-type max payload bytes; defaultdot_config event_payload.max_size_bytes.default(proposed 4 KB). Producer write refused if serialized payload exceeds limit; producer must emit refs only and let worker fetch body from PG. - Sentinel + acceptance: §11 acceptance row A1.b — schema validator enforces deny-list + allowlist + size cap; rejected emits land in
event_validation_audit. No event payload anywhere in the system carries IU body bytes.
§3.4 No cross-IU vector pollution (Rev2 row 21 + vector law)
When an event type is mirrored to a vector store (future), the mapping rule is "1 IU → ≥1 point, never cross-IU merge". Adapter contract enforces:
- Each event → mirror function fetches IU body by ref from PG; never embeds multiple IUs in one chunk.
- Mirror sink table (e.g.
iu_vector_sync_point[VL]) keys oniu_unit_id+iu_version_id. - Gate
iu_vector_sync_enabled=falserespected; design assumes off until governance flips.
§3.5 Scheduled producers (pg_cron legacy, row 24)
pg_cron keeps its role for scheduled triggers but does not play queue role. Cron job emits an event into event_outbox (registered cron.tick.<job_name>); worker handles, not cron itself.
§4. Layer 2 — Broker / Event Bus + Job Queue (separate)
§4.1 Substrate split
| Concern | Substrate | Pattern |
|---|---|---|
| Event bus (pub/sub fan-out) | event_outbox + event_subscription + event_pending + event_read [VL rows 9, 11] |
Producer → outbox; broker tick fans to subscriptions; subscribers see in event_pending / mark event_read |
| Job queue (durable lease + retry + DLQ) | job_queue + job_dead_letter + queue_heartbeat [VL row 12] |
Worker leases; ACK/NACK; retry per policy; poison → DLQ |
| Realtime topic stream | realtime_gateway_topic_registry (new, paper) + transient process state |
Gateway maps event_outbox tail → topics (see Layer 4) |
These are three substrates. Event bus and job queue never share rows. A producer emits to event_outbox (always); a router (a "subscription" with kind='job_dispatch') materializes selected events into job_queue jobs.
§4.2 Topic / routing key / priority
event_subscription
subscription_id uuid PK
subscriber_class text -- 'worker.<class>' | 'realtime_gateway' | 'audit' | 'projection.<view>'
event_type_pattern text -- exact OR glob ('iu.*', 'step.*')
routing_filter_jsonb jsonb -- e.g. {"producer_class":"workflow_runtime"} OR predicate ref
delivery_kind text -- 'fanout' | 'job_dispatch' | 'realtime_topic'
priority int -- 0=low..9=high
active bool
job_dispatch deliveries create job_queue rows with:
job_queue
job_id uuid PK
job_class text -- maps to executor_class_registry
payload_refs_jsonb jsonb -- refs only (event_outbox_id, IU refs, run refs)
priority int
lease_until timestamptz nullable
attempt_count int
last_attempt_at timestamptz nullable
idempotency_key text
trace_id / correlation_id / parent_span_id
status text -- 'queued' | 'leased' | 'completed' | 'failed' | 'dead_letter'
Properties:
- Queue carries refs only.
payload_refs_jsonbhas no body text, no IU instructions, no policy text. Sentinel: row byte size cap (e.g. 4 KB) enforced by trigger. - Queue does not auto-execute. Lease + worker ACK pattern is the only execution path. Cron / queue does not run scripts directly (Rev2 §6.2 + §6.3 binding).
- Priority is config-driven. Producer / subscription can set priority via
event_subscription.priority; queue dispatch honors it.
§4.3 Fan-out semantics
One event_outbox row can satisfy multiple subscriptions. The broker tick:
- Reads new outbox rows (LSN-style cursor per subscription).
- For each subscription matching by
event_type_pattern+routing_filter_jsonb:- If
delivery_kind='fanout': insert intoevent_pendingfor that subscriber. - If
delivery_kind='job_dispatch': insert intojob_queuewithjob_classresolved. - If
delivery_kind='realtime_topic': signal Layer 4 gateway (in-process notification).
- If
- Advances per-subscription cursor on success.
Failure isolation: a single subscription failure does not block other subscriptions; failed subscription dispatch logged to subscription_dispatch_audit (paper).
§4.4 Idempotency on dispatch
The broker dispatch is itself idempotent: re-running the broker tick over the same outbox cursor produces no duplicate event_pending / job_queue rows. Mechanism: (subscription_id, event_outbox_id) UNIQUE constraint on dispatch ledger; insert-on-conflict-do-nothing pattern.
§4.5 Substrate adoption gates (Gate A / B from feedback-oss-tool-adoption-state-vocab-fit-and-config-first-test)
PG-native (event_outbox + job_queue) is the default substrate now. External broker adoption (NATS / Redis Streams) gated by:
- Gate A — state-vocab fit (≥9-state work_state_machine). NATS / Redis transport, do not impose state vocab → PASS as transport, NOT as state authority.
- Gate B — config-first fit (workflow/event in PG/registry, not in tool's code/yaml). NATS / Redis as transport → PASS; pg-boss / Graphile as substrate owning state → FAIL (would re-implement state machine outside PG).
- See
05-oss-candidate-strategy-rev2.mdfor verdicts.
§5. Layer 3 — Consumers / Workers / Executors
§5.1 Executor class registry (Rev2 §6.3 + §11 + G3)
Schema proposal (paper-only; OD3 ownership = Điều XX referent, Điều 45 substrate cross-ref):
executor_class_registry
executor_class_ref text PK -- 'dot.iu' | 'sql.fn' | 'agent.ai.<name>' | 'human.review' | 'external.<adapter>' | 'notification.<channel>' | 'render.<surface>'
executor_kind text -- 'dot' | 'sql' | 'ai' | 'human' | 'external' | 'notification' | 'render'
config_jsonb jsonb -- e.g. DOT command ref, AI model id, external adapter id
mutating bool -- carries Điều 35 v5.2 mutating flag
default_retry_policy_id text FK -> retry_policy_registry
default_idempotency_ns text -- idempotency namespace
heartbeat_caller_fn text -- function/job name responsible for emitting heartbeat for this class
active bool
created_at timestamptz
Mapping:
| executor_kind | Underlying primitive | Mutating |
|---|---|---|
dot |
DOT command via dot_iu_command_catalog [VL row 14] (col mutating) |
varies per command |
sql |
SQL function in PG | varies |
ai |
AI agent process leasing job | typically false; if writing → must go via DOT |
human |
Human PIC in UI; task picked by PIC and submitted via RPC | yes (audit row) |
external |
External API adapter | varies |
notification |
iu_notification_event [VL row 22] / external notify |
typically false |
render |
Render-layer materialization (e.g. cache rebuild, MOUT view refresh) | false |
Key invariant (Rev2 §11.5): MOT is NOT an executor. MOT envelopes a task and calls an executor class registered in executor_class_registry. Sentinel: every task_run.executor_class_ref resolves to an executor_class_registry row; MOT runtime code has zero embedded executor logic.
§5.2 ACK/NACK + retry contract
Worker lease/release pattern:
LEASE(worker_id, job_class, lease_seconds)
→ returns job_id, payload_refs_jsonb, trace context
ACK(job_id, worker_id, outcome_jsonb)
→ job_queue.status='completed'; emit `job.completed`
NACK(job_id, worker_id, failure_jsonb, classification)
classification ∈ 'transient' | 'permanent' | 'poison'
transient: schedule next attempt per retry policy
permanent: job → 'failed' + emit step.failed (permanent)
poison: job → DLQ (job_dead_letter) + emit `job.dead_lettered`
Worker must NACK with classification — no silent failure.
§5.3 Retry policy registry (Rev2 §14 partial)
retry_policy_registry
retry_policy_id text PK
policy_name text
policy_kind text -- 'exponential_backoff' | 'fixed_delay' | 'fibonacci' | 'no_retry'
base_delay_ms int
max_attempts int
max_delay_ms int -- cap for exponential
jitter_kind text -- 'none' | 'full' | 'equal' | 'decorrelated'
transient_class_filter text[] -- which failure_class codes retry under this policy
config_jsonb jsonb
active bool
Defaults (proposal):
default.transient.5x.30s.exp— exponential, base 30s, max 5 attempts, max delay 30 min, full jitter.default.notification.3x.10s.fixed— fixed 10s, 3 attempts.default.render.10x.5s.exp— render is idempotent and cheap; 10 attempts.default.human.no_retry— human steps don't auto-retry; escalation handles.
Workers resolve policy at lease time; policy can be overridden per job_class via executor_class_registry.default_retry_policy_id.
§5.4 Idempotency registry (Rev2 §14 GAP G5; OD11)
idempotency_registry
idempotency_namespace text -- per-executor namespace, e.g. 'dot.iu_post_cut_axis_materialize'
idempotency_key text -- producer-scoped key
first_seen_at timestamptz
last_outcome_jsonb jsonb
observation_count int
PRIMARY KEY (idempotency_namespace, idempotency_key)
Properties:
- Executor invocation always carries
(namespace, key). Re-invocation returns prior outcome instead of re-running mutation. - Namespace per executor class avoids key collision across classes.
- Retention: 90 days default; per-namespace override allowed.
OD11 default kept: per-executor namespace + key. Schema above is design proposal.
§5.5 Heartbeat caller pattern (Rev2 §6.3 + §7.4.6 + memory)
Binding: every worker class has a heartbeat caller mapped to queue_heartbeat. Pattern:
HEARTBEAT(worker_class, worker_instance_id, tick_payload?)
INSERT into queue_heartbeat ON CONFLICT (worker_class, worker_instance_id)
DO UPDATE SET last_tick_at = now(), status = 'ok', last_payload = excluded.last_payload;
Properties:
- From day 1. New worker classes MUST emit heartbeat in the same TX as their lease/release, not "later".
- SECURITY DEFINER wrapper (memory pattern from mig 053). Worker process calls a defined wrapper function; raw INSERT denied to non-privileged workers.
- False-heal protection. Auto-resume of silent worker blocked unless
(a)heartbeat caller successfully ticks AND(b)worker class flag is notfrozen. Detection of false-heal: asilentrow that flips tookwithout an intervening tick → integrity warning (memoryfeedback-protect-legacy-silent-passive-heartbeat-from-false-heal). - Threshold per class.
dot_config heartbeat.threshold.<worker_class>defines silent threshold. Default 60s; long-running batch can override.
§5.6 W3C trace_id (Rev2 §6.3 + §15 L1; adopt NOW)
Adopt W3C traceparent format from day 1, even before OTel integration. Shape:
trace_id char(32) -- 32 hex; W3C traceparent trace-id field
parent_span_id char(16) -- 16 hex; W3C traceparent parent-id field
trace_flags char(2) -- 2 hex; W3C traceparent trace-flags field (e.g. '01' = sampled)
correlation_id uuid -- business-level correlation across distinct trace_ids
traceparent = '00-' || trace_id || '-' || parent_span_id || '-' || trace_flags
MP-D1 (refined wording): trace_id is only the 32 hex segment; parent_span_id is only the 16 hex segment; trace_flags is only the 2 hex segment. The full traceparent string (00-<trace_id>-<parent_span_id>-<trace_flags>) is a derived concatenation; never assign the full traceparent string to the trace_id field.
Every event_outbox row + job_queue row + heartbeat row carries trace_id (and parent_span_id where applicable). Producers without prior trace context generate a new trace_id; workers inherit and append spans. Later OTel collector (Rev2 §15 OpenTelemetry L4) ingests these shapes without migration.
Sentinel: event_outbox.trace_id IS NULL count = 0 after Phase 1 producers upgraded.
§5.7 Worker class enumerations (paper-only, illustrative)
| Worker class | Executor kind | Typical events consumed | Heartbeat threshold |
|---|---|---|---|
worker.dot.iu_axis |
dot | iu.axis_refresh_requested |
30s |
worker.dot.cut_pipeline |
dot | cut.mark_verified (CUT scheduling) |
60s |
worker.sql.aggregate |
sql | aggregate_recompute_requested |
60s |
worker.ai.review_suggest |
ai | task.requires_review_suggestion |
120s |
worker.notification.iu |
notification | iu.notification_event_pending |
60s |
worker.render.mout_view |
render | mout.view_invalidated |
60s |
worker.external.<adapter> |
external | varies | per-adapter |
worker.realtime.gateway |
(gateway process; not a job worker but a heartbeat-emitting daemon) | — | 15s |
All worker classes register in executor_class_registry; new class adoption is design-time + governance review (Điều 0-G).
§6. Layer 5 — DLQ / Recovery / Governance ledger
§6.1 DLQ table (existing)
job_dead_letter [VL row 12] keeps poison messages with full payload + failure context + classification.
§6.2 DLQ replay request ledger (G4 / OD10)
dlq_replay_request
dlq_replay_request_id uuid PK
job_dead_letter_id uuid FK
proposed_by uuid
proposed_at timestamptz
proposal_state text -- 'submitted' | 'approved' | 'rejected' | 'in_progress' | 'completed' | 'failed'
approval_id uuid FK -- Điều 32 approval reference (required to leave 'submitted')
replay_strategy text -- 'as_is' | 'with_payload_patch' | 'requeue_to_class.<X>'
payload_patch_jsonb jsonb nullable
new_idempotency_key text
outcome_jsonb jsonb nullable
trace_id text
Properties:
- Replay never bypasses Điều 32 approval. Sentinel:
dlq_replay_requestrows withproposal_state != 'submitted'MUST have non-nullapproval_id. - Replay uses fresh idempotency key in the same namespace; original key still recorded for audit.
- Replay outcome events:
dlq_replay.scheduled,dlq_replay.completed,dlq_replay.failed.
§6.3 Schema registry compatibility mode (Rev2 §6.5 + OD8)
event_type_registry.compat_mode enum:
forward— consumers tolerate new optional fields (DEFAULT).backward— producers tolerate removed/renamed fields (rare; used for legacy event types being phased out).full— both directions (used for stable contracts during migration).breaking_with_policy— major version bump; producer refuses to emit untilschema_change_policyrow provides migration plan + cutover date + dual-write window.
Validation flow on producer write:
1. Resolve current schema version per (event_type, producer-claimed schema version).
2. Validate payload against that schema.
3. If compat_mode='breaking_with_policy' and policy row missing → REFUSE.
4. Audit version used into event_outbox.event_schema_version.
§6.4 Event lag monitoring (Rev2 §7.4.7)
fn_event_lag_compute(window, group_by) returns:
producer_class consumer_class p50_ms p95_ms p99_ms breaches
workflow_runtime worker.dot.iu_axis 120 480 1900 0
…
Lag = consumer_acked_at - event_outbox.produced_at. Threshold per pair in dot_config event_lag.threshold.<producer>.<consumer>.{p50,p95,p99}. Breach → governance UI red row + event_lag.breach event.
§6.5 Audit timeline (Rev2 §6.5)
vw_audit_event_timeline(p_trace_id) returns ordered list of events + jobs + state transitions correlated by trace_id. Used by governance UI drill-down. Source: event_outbox ∪ job_queue ∪ step_run.transitions ∪ dot_iu_command_run ∪ iu_lifecycle_log. Single query; no Nuxt-side correlation.
§6.6 Governance UI binding (Rev2 §7.4)
Layer 5 surfaces all governance categories. Binding to 02-step-state-machine-and-workflow-ui-design.md §7 governance UI:
| Governance view | Source |
|---|---|
| DLQ count | vw_governance_dlq_count (this design §6.1) |
| Silent workers | queue_heartbeat joined with dot_config heartbeat.threshold.* |
| Schema violations | event_validation_audit (paper; rejected emits go here) |
| Event lag breaches | vw_governance_event_lag |
| Audit timeline | vw_audit_event_timeline(trace_id) |
§7. Layer 4 — Realtime Gateway (governed boundary)
§7.1 Boundary rule (Rev2 §6.4)
Nuxt MUST NOT connect directly to event_outbox, job_queue, PG LISTEN/NOTIFY channels. The backend realtime gateway is the only Nuxt-facing real-time substrate. Nuxt may host an SSE shell in a server route, but that route is a proxy shell calling the backend gateway abstraction — it does not read outbox tail nor listen on PG.
Sentinel: grep Nuxt source for LISTEN, event_outbox, job_queue, pg.connect, NOTIFY → zero matches.
§7.2 Gateway responsibilities
| Concern | Behavior |
|---|---|
| Topic resolution | Maps subscriptions (Layer 2 delivery_kind='realtime_topic') to gateway topics by realtime_gateway_topic_registry (paper) |
| Permission filter | For every subscriber, applies role-based filter BEFORE forwarding (Điều 37 v3.3 + Điều 33 v2.1) |
| Relevance filter | Filters noise: governance summary only (no raw event flood); collapses bursts |
| Backpressure | Per-client buffer + slow-consumer disconnect; client reconnect resumes from last event_seq |
| Delivery | SSE (default) or WebSocket (if Nuxt shell connects via WS-capable transport) |
| Heartbeat | Gateway sends :ping every 15s; client missing 2 → reconnect |
| Trace propagation | Forwards trace_id so client log + backend trace correlate |
§7.3 Realtime topic registry (paper)
realtime_gateway_topic_registry
topic_id text PK
topic_name text
scope text -- 'workflow_run' | 'governance' | 'iu' | 'audit'
source_subscription_id uuid FK -> event_subscription
permission_filter_ref text -- predicate function name
relevance_filter_ref text
summary_kind text -- 'delta' | 'summary' | 'redflag' | 'progress'
active bool
Topics enumerated (illustrative):
topic.workflow_run.<workflow_run_id>— step state changes for one workflow_run; permission filter by workflow visibility.topic.governance.problems— aggregate problem counts; permission filter by role.topic.governance.silent_workers— silent worker alerts.topic.iu.<iu_unit_id>— IU governance changes (governance state, version).topic.audit.<trace_id>— single-trace timeline (drill-down).
§7.4 SSE shell pattern (Nuxt boundary, Rev2 §6.4 sửa lại)
[Browser]
⇅ SSE (text/event-stream)
[Nuxt server route /sse/topic/:topicId]
⇅ HTTP/WS (internal)
[Backend realtime gateway service]
⇅ broker tick / event_subscription dispatch
[PG event_outbox + event_pending + event_subscription]
Nuxt server route is shell only:
- Accepts SSE connect, extracts session/auth.
- Calls backend gateway abstraction (
POST /gateway/subscribe) withtopic_id+ auth → receives stream handle. - Pipes backend stream events to browser SSE channel.
- Honors backend gateway's disconnect / backpressure signals.
- Does NOT subscribe to PG; does NOT read outbox.
Sentinel: Nuxt SSE route source contains zero PG connection imports.
§7.5 OD4 realtime gateway impl path (carried to 06-… §S4)
Default: start with Nuxt SSE shell + backend gateway as Node service (lightweight; matches state-vocab + config-first gates). Preserve Centrifugo adapter slot (Rev2 §15 L4/L5) for >1k concurrent realtime clients. No final pick now.
§8. Schema compatibility mode (Rev2 §14 + OD8)
Already introduced in §6.3. Design default: forward for new event types; breaking_with_policy requires schema_change_policy row. Reviewer can verify by checking event_type_registry has the compat_mode column populated for every event type.
Additional governance:
- Breaking schema policy must include: producer migration window, consumer migration window, dual-write strategy, rollback path. Phase 1 task.
- Deprecation: event type →
deprecated; producer warned; consumer must migrate within window.
§9. Usage Evidence Registry (Rev2 §10 — derivation, not new substrate)
iu_usage_evidence (paper-only) materializes 8 signal classes from existing tables:
| Signal | Source view | Derivation |
|---|---|---|
| co-used-in-workflow | iu.used_in_workflow event joined by workflow_run_id |
Pairs of IU appearing in same workflow_run |
| co-triggered | events joined by correlation_id |
Pairs in same trace |
| co-edited | iu_version joined by change_request_id |
Pairs edited in same change |
| co-retrieved | KG retrieval log (paper) joined by session_id |
Pairs retrieved together in context-pack |
| failure correlation | step.failed events joined by correlation_id window |
Pairs failing in same run |
| repeated escalation | step.escalated events grouped by step_def_id |
Step kinds with high escalation |
| repeated human correction | proposal.kg_edge proposals approved |
Pattern of human overriding KG |
| event lag / DLQ correlation | event_lag.breach + job.dead_lettered joined per (producer, consumer) |
Pairs with chronic latency / DLQ |
Each signal computed by a STABLE function and persisted into iu_usage_evidence periodically (paper-only retention 365 days). Feeds KG feedback proposals (Rev2 §9) and governance UI insights.
Sentinel: KG never reads from PG directly; reads from iu_usage_evidence views.
§10. Reconcile checklist with Bắt sự kiện của PG(3).docx (Rev2 §6.6)
| Lesson | Embedded in this design | Sentinel |
|---|---|---|
| Producers only produce, không execute | §3 producer taxonomy + §5 executor class | "Producer code does not invoke executor; producer writes only to event_outbox" |
| Event bus ≠ job queue | §4.1 substrate split | "Distinct tables; no row shared between bus and queue" |
| Workers consume + execute; queue does not auto-run | §5.2 ACK/NACK | "Cron does not run scripts; cron emits events, worker leases" |
| Nuxt does NOT connect core queue | §7.1 boundary rule | "grep Nuxt source: zero direct PG / outbox / NOTIFY connections" |
| Realtime gateway mandatory | §7 | "All real-time goes through gateway topic registry" |
| DLQ / retry / replay / poison isolation | §6 | "Every job has classification on failure; DLQ replay needs Điều 32" |
| ACK/NACK / timeout / idempotency / replay | §5.2 + §5.4 | "Worker lease enforces ACK or NACK with classification" |
| Schema registry + distributed tracing primitives | §3.1 register-before-emit + §5.6 W3C trace_id | "Producer refused without registry hit; every row carries trace_id" |
| Governance UI: problems / summary / drill-down | §6.6 → 02-… §7 |
"Default route is problem-first; drill-down via trace_id" |
§11. Cross-references and acceptance
Cross-refs:
- State transitions emit events into
event_outboxper §3 — see02-step-state-machine-…§4. - IU brick / bundle binding emits via §3.3 IU event family — see
04-iu-centered-4mothers-binding-design.md. - OSS substrate gate verdicts — see
05-oss-candidate-strategy-rev2.md. - Schemas (executor_class_registry, idempotency_registry, dlq_replay_request, retry_policy_registry, realtime_gateway_topic_registry, iu_usage_evidence, event_validation_audit) all paper-only; gated by
06-open-decisions-and-readiness.md§S20 implementation sequencing.
Acceptance:
A1. Producers / register-before-emit / capture-by-config covered §3.
A1.b. Payload policy (MP-D8): schema validator enforces forbidden-key deny-list + per-event_type allowlist + configurable event_payload.max_size_bytes.* cap; rejected emits land in event_validation_audit.
A2. Event bus vs job queue cleanly separated §4.
A3. Executor class registry shape + MOT-is-not-executor sentinel §5.1.
A4. ACK/NACK + retry + idempotency + heartbeat + W3C trace_id covered §5.2..§5.6.
A5. Realtime gateway boundary + Nuxt SSE shell + permission/relevance filter + topic registry covered §7.
A6. DLQ + replay (Điều 32 approval) + schema compat + event lag + audit timeline covered §6.
A7. Usage evidence schema derived from existing tables, no new substrate §9.
A8. Reconcile checklist for Bắt sự kiện của PG(3).docx 9 lessons all bound §10.
A9. No PG mutation; no DOT command run; no migration; substrate schemas paper-only.
End WS5 design.