KB-2FAE rev 2

IU 4-Mothers Master Design Rev2 — WS5 Event 5-Layer + Realtime Gateway + DLQ (DRAFT 2026-05-27)

34 min read Revision 2
designmaster-design-rev2event-5-layerrealtime-gatewaydlqidempotencyheartbeatw3c-trace-idexecutor-class-registryschema-compatusage-evidencews5iu-centered4-mothersdraftdocument-only2026-05-27

Master Design Rev2 — Event 5-Layer + Realtime Gateway + DLQ (WS5)

Path: knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/03-event-5layer-realtime-dlq-design.md Status: DRAFT Rev2 (document-only). Companion to 00-master-design-rev2.md. Date: 2026-05-27 Authority: Rev2 brief §6 (Event 5-layer + reconcile checklist §6.6), §7.4 (Governance UI), §8 (IU Event Contract), §10 (Usage Evidence), §13 (Constitution matrix), §14 (PG Maximization). Boundary: Điều 45 v1.0 owns queue/event-core/work_state_machine/executor boundary/heartbeat caller. This document only assembles around Điều 45 — never redefines. Nuxt = render shell (Điều 28 / S178). Approval = Điều 32. IU = Điều 38/39.


§1. Scope and boundaries

This document designs:

  • Layer 1 — Event Producers (Rev2 §6.1) — taxonomy, register-before-emit, capture-by-config.
  • Layer 2 — Broker / Event Bus + Job Queue split (Rev2 §6.2).
  • Layer 3 — Consumers / Workers / Executors (Rev2 §6.3) — executor class registry, ACK/NACK, idempotency, heartbeat, W3C trace_id.
  • Layer 4 — Realtime Gateway (Rev2 §6.4) — backend gateway, Nuxt SSE shell-only, permission/relevance filter, governance summary.
  • Layer 5 — DLQ / Recovery / Governance (Rev2 §6.5) — idempotency registry, retry policy registry, dlq_replay_request ledger, schema compat, audit timeline, governance UI binding.
  • IU Event Contract (Rev2 §8) — refs-only envelope, W3C trace shape, semver schema.
  • Usage Evidence Registry (Rev2 §10) — 8-signal derivation feeding KG / governance.

This document does not design:

  • The internal mechanics of fn_iu_cut_*, fn_iu_op_* aliases, MARK/CUT contract — those are Điều 38/39 surface, owned elsewhere.
  • The Điều 45 queue/event-core schema itself — referenced read-only.
  • The state machine transitions — see 02-step-state-machine-and-workflow-ui-design.md.
  • OSS tool selection for broker / NATS / Centrifugo / OTel — see 05-oss-candidate-strategy-rev2.md.

§2. Top-level invariants (carry to every layer)

  1. Producers only produce; workers only execute. Queue does not auto-run scripts (Rev2 §6.2 + §6.3).
  2. Event bus ≠ job queue. Event bus is pub/sub fan-out; job queue is durable lease + retry + DLQ. Tables for each are distinct.
  3. Register-before-emit. Every event type must exist in event_type_registry with a valid JSON schema before any producer is allowed to emit it.
  4. Refs-only payload. Event payloads carry signal/ref, not heavy body (Rev2 §6.2 + §8 + Điều 45 §13.5). Worker fetches body from PG/Directus/DOT.
  5. W3C trace shape NOW. Every event/job carries trace_id + correlation_id + parent_span_id in W3C shape from day 1 (Rev2 §6.3 + §15 L1; memory feedback-oss-tool-adoption-state-vocab-fit-and-config-first-test).
  6. Heartbeat caller pattern. Every worker class MUST have a heartbeat caller; new workers MUST emit heartbeat from day 1 (Rev2 §6.3 + §7.4.6; Điều 45 §15.5; memory feedback-protect-legacy-silent-passive-heartbeat-from-false-heal).
  7. Idempotency mandatory. Every executor call carries an idempotency key; replay is safe by design.
  8. Nuxt boundary. Nuxt never connects to core queue / event_outbox / NOTIFY directly. Realtime through backend gateway only (Rev2 §6.4).
  9. No cross-IU vector pollution. Adapters that mirror events to external systems (Benthos / NATS / OTel) must NOT collapse multiple IUs into a single point or fan body across IU boundaries (Rev2 §13 vector law).
  10. Reversibility. Every new event type, executor class, retry policy must have rollback (deprecate event type, retire executor, freeze retry policy).

§3. Layer 1 — Event Producers

§3.1 Register-before-emit substrate

Existing event_type_registry [VL row 10]. Producer write path:

PRODUCE(event_type, payload_json, trace_id, correlation_id, parent_span_id?)
  1. Lookup event_type in event_type_registry; refuse if absent.
  2. Validate payload_json against registered JSON schema (semver-resolved).
  3. Insert into event_outbox (canonical ledger) in TX with the originating row mutation.
  4. NOTIFY 'event_outbox_tick' (Layer 2 trigger; see §4.1).

Open decision: schema compatibility mode (OD8). Design defaultevent_type_registry.compat_mode ∈ {forward, backward, full, breaking_with_policy}. Default = forward (consumers tolerate new optional fields). Breaking change requires a schema_change_policy row capturing migration plan; producer refuses to emit a new major version until policy row is present.

§3.2 Producer taxonomy (Rev2 §6.1)

Producer class Source Mechanism Examples Live?
pg_dml_trigger PG row mutation trigger function → event_outbox insert task INSERT, workflow_step UPDATE yes (rev1 §1)
iu_axis_lifecycle IU axis refresh / compose / structure ops DOT alias → outbox iu.axis_refreshed, iu.composed, iu.split_proposed yes
dot_command_lifecycle dot_iu_command_run transitions trigger → outbox dot.command_started/completed/failed yes
cut_pipeline cut_request_transition trigger → outbox cut.mark_verified, cut.completed, cut.failed yes
workflow_runtime workflow_run / step_run / task_run state changes step transition validator → outbox step.ready, step.completed, etc. (see 02-… §4.2) partial
proposal_lifecycle workflow_change_requests / generic proposal table trigger → outbox proposal.submitted, proposal.approved, proposal.rejected partial
human_decision UI RPC backend RPC handler → outbox decision.recorded, pic.assigned partial
agent_output AI agent worker worker writes via DOT → outbox agent.suggestion, agent.warning partial
external_api_adapter external webhook / poll approved adapter → outbox external.callback_received future

Capture-by-config principle (Rev2 §6.1): adding an event type requires only an event_type_registry INSERT + JSON schema; no application code redeploy. Producer dispatch table (paper-only) maps event_type → producer class → emit path.

§3.3 IU Event family (Rev2 §8 R8.1)

IU lifecycle emits these registered event types:

event_type Trigger Payload fields (refs only) Compat mode
iu.born New information_unit row + axes resolved iu_unit_id, iu_version_id, governance_state forward
iu.edited New iu_version (minor or major) iu_unit_id, iu_version_id, prior_version_id, change_kind forward
iu.split_proposed KG / human proposes split iu_unit_id, proposal_id forward
iu.split_approved Điều 32 approves split iu_unit_id, child_iu_unit_ids, approval_id forward
iu.merged Merge approved iu_unit_id (kept), merged_from_iu_unit_ids, approval_id forward
iu.deprecated Governance state → deprecated iu_unit_id, iu_version_id, reason_code forward
iu.linked KG edge created iu_unit_id, related_iu_unit_id, edge_type forward
iu.rendered Render layer materializes IU body to a surface iu_unit_id, iu_version_id, render_surface, render_context_id forward
iu.validated Axis validate pass iu_unit_id, iu_version_id, axis_kinds[], verdict forward
iu.used_in_workflow Workflow run binds IU iu_unit_id, iu_version_id, workflow_run_id, step_run_id forward

Envelope (Rev2 §8 R8.2, every IU event):

{
  "event_type": "iu.used_in_workflow",
  "event_schema_version": "1.0.0",
  "iu_unit_id": "…",
  "iu_version_id": "…",
  "workflow_run_id": "…",
  "task_run_id": null,
  "trace_id": "<32 hex>",
  "correlation_id": "<uuid>",
  "parent_span_id": "<16 hex>",
  "trace_flags": "01",
  "traceparent": "00-<trace_id>-<parent_span_id>-<trace_flags>",
  "produced_at": "2026-05-27T…Z",
  "producer_class": "workflow_runtime",
  "idempotency_key": "<producer-scoped>",
  "extra_refs": { … }
}

Payload policy (MP-D8 — refined):

  • Forbidden keys (deny-list). body_text, instruction_text, payload_blob, iu_body MUST NEVER appear in an event payload. Schema validator refuses any payload carrying these keys, regardless of event_type (Rev2 §6.2 + §8 R8.3).
  • Allowlist per event_type schema. Beyond the universal envelope (event_type / event_schema_version / trace_id / parent_span_id / trace_flags / correlation_id / produced_at / producer_class / idempotency_key / extra_refs), every additional key must be declared in the registered event_type_registry.schema_jsonb for that event_type. Unknown keys are rejected at producer write time.
  • Max payload size (configurable). dot_config event_payload.max_size_bytes.<event_type> declares per-event-type max payload bytes; default dot_config event_payload.max_size_bytes.default (proposed 4 KB). Producer write refused if serialized payload exceeds limit; producer must emit refs only and let worker fetch body from PG.
  • Sentinel + acceptance: §11 acceptance row A1.b — schema validator enforces deny-list + allowlist + size cap; rejected emits land in event_validation_audit. No event payload anywhere in the system carries IU body bytes.

§3.4 No cross-IU vector pollution (Rev2 row 21 + vector law)

When an event type is mirrored to a vector store (future), the mapping rule is "1 IU → ≥1 point, never cross-IU merge". Adapter contract enforces:

  • Each event → mirror function fetches IU body by ref from PG; never embeds multiple IUs in one chunk.
  • Mirror sink table (e.g. iu_vector_sync_point [VL]) keys on iu_unit_id + iu_version_id.
  • Gate iu_vector_sync_enabled=false respected; design assumes off until governance flips.

§3.5 Scheduled producers (pg_cron legacy, row 24)

pg_cron keeps its role for scheduled triggers but does not play queue role. Cron job emits an event into event_outbox (registered cron.tick.<job_name>); worker handles, not cron itself.


§4. Layer 2 — Broker / Event Bus + Job Queue (separate)

§4.1 Substrate split

Concern Substrate Pattern
Event bus (pub/sub fan-out) event_outbox + event_subscription + event_pending + event_read [VL rows 9, 11] Producer → outbox; broker tick fans to subscriptions; subscribers see in event_pending / mark event_read
Job queue (durable lease + retry + DLQ) job_queue + job_dead_letter + queue_heartbeat [VL row 12] Worker leases; ACK/NACK; retry per policy; poison → DLQ
Realtime topic stream realtime_gateway_topic_registry (new, paper) + transient process state Gateway maps event_outbox tail → topics (see Layer 4)

These are three substrates. Event bus and job queue never share rows. A producer emits to event_outbox (always); a router (a "subscription" with kind='job_dispatch') materializes selected events into job_queue jobs.

§4.2 Topic / routing key / priority

event_subscription
  subscription_id        uuid PK
  subscriber_class       text   -- 'worker.<class>' | 'realtime_gateway' | 'audit' | 'projection.<view>'
  event_type_pattern     text   -- exact OR glob ('iu.*', 'step.*')
  routing_filter_jsonb   jsonb  -- e.g. {"producer_class":"workflow_runtime"} OR predicate ref
  delivery_kind          text   -- 'fanout' | 'job_dispatch' | 'realtime_topic'
  priority               int    -- 0=low..9=high
  active                 bool

job_dispatch deliveries create job_queue rows with:

job_queue
  job_id                 uuid PK
  job_class              text   -- maps to executor_class_registry
  payload_refs_jsonb     jsonb  -- refs only (event_outbox_id, IU refs, run refs)
  priority               int
  lease_until            timestamptz nullable
  attempt_count          int
  last_attempt_at        timestamptz nullable
  idempotency_key        text
  trace_id / correlation_id / parent_span_id
  status                 text   -- 'queued' | 'leased' | 'completed' | 'failed' | 'dead_letter'

Properties:

  • Queue carries refs only. payload_refs_jsonb has no body text, no IU instructions, no policy text. Sentinel: row byte size cap (e.g. 4 KB) enforced by trigger.
  • Queue does not auto-execute. Lease + worker ACK pattern is the only execution path. Cron / queue does not run scripts directly (Rev2 §6.2 + §6.3 binding).
  • Priority is config-driven. Producer / subscription can set priority via event_subscription.priority; queue dispatch honors it.

§4.3 Fan-out semantics

One event_outbox row can satisfy multiple subscriptions. The broker tick:

  1. Reads new outbox rows (LSN-style cursor per subscription).
  2. For each subscription matching by event_type_pattern + routing_filter_jsonb:
    • If delivery_kind='fanout': insert into event_pending for that subscriber.
    • If delivery_kind='job_dispatch': insert into job_queue with job_class resolved.
    • If delivery_kind='realtime_topic': signal Layer 4 gateway (in-process notification).
  3. Advances per-subscription cursor on success.

Failure isolation: a single subscription failure does not block other subscriptions; failed subscription dispatch logged to subscription_dispatch_audit (paper).

§4.4 Idempotency on dispatch

The broker dispatch is itself idempotent: re-running the broker tick over the same outbox cursor produces no duplicate event_pending / job_queue rows. Mechanism: (subscription_id, event_outbox_id) UNIQUE constraint on dispatch ledger; insert-on-conflict-do-nothing pattern.

§4.5 Substrate adoption gates (Gate A / B from feedback-oss-tool-adoption-state-vocab-fit-and-config-first-test)

PG-native (event_outbox + job_queue) is the default substrate now. External broker adoption (NATS / Redis Streams) gated by:

  • Gate A — state-vocab fit (≥9-state work_state_machine). NATS / Redis transport, do not impose state vocab → PASS as transport, NOT as state authority.
  • Gate B — config-first fit (workflow/event in PG/registry, not in tool's code/yaml). NATS / Redis as transport → PASS; pg-boss / Graphile as substrate owning state → FAIL (would re-implement state machine outside PG).
  • See 05-oss-candidate-strategy-rev2.md for verdicts.

§5. Layer 3 — Consumers / Workers / Executors

§5.1 Executor class registry (Rev2 §6.3 + §11 + G3)

Schema proposal (paper-only; OD3 ownership = Điều XX referent, Điều 45 substrate cross-ref):

executor_class_registry
  executor_class_ref      text  PK     -- 'dot.iu' | 'sql.fn' | 'agent.ai.<name>' | 'human.review' | 'external.<adapter>' | 'notification.<channel>' | 'render.<surface>'
  executor_kind           text         -- 'dot' | 'sql' | 'ai' | 'human' | 'external' | 'notification' | 'render'
  config_jsonb            jsonb        -- e.g. DOT command ref, AI model id, external adapter id
  mutating                bool         -- carries Điều 35 v5.2 mutating flag
  default_retry_policy_id text  FK -> retry_policy_registry
  default_idempotency_ns  text         -- idempotency namespace
  heartbeat_caller_fn     text         -- function/job name responsible for emitting heartbeat for this class
  active                  bool
  created_at              timestamptz

Mapping:

executor_kind Underlying primitive Mutating
dot DOT command via dot_iu_command_catalog [VL row 14] (col mutating) varies per command
sql SQL function in PG varies
ai AI agent process leasing job typically false; if writing → must go via DOT
human Human PIC in UI; task picked by PIC and submitted via RPC yes (audit row)
external External API adapter varies
notification iu_notification_event [VL row 22] / external notify typically false
render Render-layer materialization (e.g. cache rebuild, MOUT view refresh) false

Key invariant (Rev2 §11.5): MOT is NOT an executor. MOT envelopes a task and calls an executor class registered in executor_class_registry. Sentinel: every task_run.executor_class_ref resolves to an executor_class_registry row; MOT runtime code has zero embedded executor logic.

§5.2 ACK/NACK + retry contract

Worker lease/release pattern:

LEASE(worker_id, job_class, lease_seconds)
  → returns job_id, payload_refs_jsonb, trace context

ACK(job_id, worker_id, outcome_jsonb)
  → job_queue.status='completed'; emit `job.completed`

NACK(job_id, worker_id, failure_jsonb, classification)
  classification ∈ 'transient' | 'permanent' | 'poison'
  transient: schedule next attempt per retry policy
  permanent: job → 'failed' + emit step.failed (permanent)
  poison:    job → DLQ (job_dead_letter) + emit `job.dead_lettered`

Worker must NACK with classification — no silent failure.

§5.3 Retry policy registry (Rev2 §14 partial)

retry_policy_registry
  retry_policy_id        text PK
  policy_name            text
  policy_kind            text   -- 'exponential_backoff' | 'fixed_delay' | 'fibonacci' | 'no_retry'
  base_delay_ms          int
  max_attempts           int
  max_delay_ms           int    -- cap for exponential
  jitter_kind            text   -- 'none' | 'full' | 'equal' | 'decorrelated'
  transient_class_filter text[] -- which failure_class codes retry under this policy
  config_jsonb           jsonb
  active                 bool

Defaults (proposal):

  • default.transient.5x.30s.exp — exponential, base 30s, max 5 attempts, max delay 30 min, full jitter.
  • default.notification.3x.10s.fixed — fixed 10s, 3 attempts.
  • default.render.10x.5s.exp — render is idempotent and cheap; 10 attempts.
  • default.human.no_retry — human steps don't auto-retry; escalation handles.

Workers resolve policy at lease time; policy can be overridden per job_class via executor_class_registry.default_retry_policy_id.

§5.4 Idempotency registry (Rev2 §14 GAP G5; OD11)

idempotency_registry
  idempotency_namespace   text   -- per-executor namespace, e.g. 'dot.iu_post_cut_axis_materialize'
  idempotency_key         text   -- producer-scoped key
  first_seen_at           timestamptz
  last_outcome_jsonb      jsonb
  observation_count       int
  PRIMARY KEY (idempotency_namespace, idempotency_key)

Properties:

  • Executor invocation always carries (namespace, key). Re-invocation returns prior outcome instead of re-running mutation.
  • Namespace per executor class avoids key collision across classes.
  • Retention: 90 days default; per-namespace override allowed.

OD11 default kept: per-executor namespace + key. Schema above is design proposal.

§5.5 Heartbeat caller pattern (Rev2 §6.3 + §7.4.6 + memory)

Binding: every worker class has a heartbeat caller mapped to queue_heartbeat. Pattern:

HEARTBEAT(worker_class, worker_instance_id, tick_payload?)
  INSERT into queue_heartbeat ON CONFLICT (worker_class, worker_instance_id)
    DO UPDATE SET last_tick_at = now(), status = 'ok', last_payload = excluded.last_payload;

Properties:

  • From day 1. New worker classes MUST emit heartbeat in the same TX as their lease/release, not "later".
  • SECURITY DEFINER wrapper (memory pattern from mig 053). Worker process calls a defined wrapper function; raw INSERT denied to non-privileged workers.
  • False-heal protection. Auto-resume of silent worker blocked unless (a) heartbeat caller successfully ticks AND (b) worker class flag is not frozen. Detection of false-heal: a silent row that flips to ok without an intervening tick → integrity warning (memory feedback-protect-legacy-silent-passive-heartbeat-from-false-heal).
  • Threshold per class. dot_config heartbeat.threshold.<worker_class> defines silent threshold. Default 60s; long-running batch can override.

§5.6 W3C trace_id (Rev2 §6.3 + §15 L1; adopt NOW)

Adopt W3C traceparent format from day 1, even before OTel integration. Shape:

trace_id          char(32)   -- 32 hex; W3C traceparent trace-id field
parent_span_id    char(16)   -- 16 hex; W3C traceparent parent-id field
trace_flags       char(2)    -- 2 hex; W3C traceparent trace-flags field (e.g. '01' = sampled)
correlation_id    uuid       -- business-level correlation across distinct trace_ids
traceparent       = '00-' || trace_id || '-' || parent_span_id || '-' || trace_flags

MP-D1 (refined wording): trace_id is only the 32 hex segment; parent_span_id is only the 16 hex segment; trace_flags is only the 2 hex segment. The full traceparent string (00-<trace_id>-<parent_span_id>-<trace_flags>) is a derived concatenation; never assign the full traceparent string to the trace_id field.

Every event_outbox row + job_queue row + heartbeat row carries trace_id (and parent_span_id where applicable). Producers without prior trace context generate a new trace_id; workers inherit and append spans. Later OTel collector (Rev2 §15 OpenTelemetry L4) ingests these shapes without migration.

Sentinel: event_outbox.trace_id IS NULL count = 0 after Phase 1 producers upgraded.

§5.7 Worker class enumerations (paper-only, illustrative)

Worker class Executor kind Typical events consumed Heartbeat threshold
worker.dot.iu_axis dot iu.axis_refresh_requested 30s
worker.dot.cut_pipeline dot cut.mark_verified (CUT scheduling) 60s
worker.sql.aggregate sql aggregate_recompute_requested 60s
worker.ai.review_suggest ai task.requires_review_suggestion 120s
worker.notification.iu notification iu.notification_event_pending 60s
worker.render.mout_view render mout.view_invalidated 60s
worker.external.<adapter> external varies per-adapter
worker.realtime.gateway (gateway process; not a job worker but a heartbeat-emitting daemon) 15s

All worker classes register in executor_class_registry; new class adoption is design-time + governance review (Điều 0-G).


§6. Layer 5 — DLQ / Recovery / Governance ledger

§6.1 DLQ table (existing)

job_dead_letter [VL row 12] keeps poison messages with full payload + failure context + classification.

§6.2 DLQ replay request ledger (G4 / OD10)

dlq_replay_request
  dlq_replay_request_id   uuid PK
  job_dead_letter_id      uuid FK
  proposed_by             uuid
  proposed_at             timestamptz
  proposal_state          text   -- 'submitted' | 'approved' | 'rejected' | 'in_progress' | 'completed' | 'failed'
  approval_id             uuid FK  -- Điều 32 approval reference (required to leave 'submitted')
  replay_strategy         text   -- 'as_is' | 'with_payload_patch' | 'requeue_to_class.<X>'
  payload_patch_jsonb     jsonb nullable
  new_idempotency_key     text
  outcome_jsonb           jsonb nullable
  trace_id                text

Properties:

  • Replay never bypasses Điều 32 approval. Sentinel: dlq_replay_request rows with proposal_state != 'submitted' MUST have non-null approval_id.
  • Replay uses fresh idempotency key in the same namespace; original key still recorded for audit.
  • Replay outcome events: dlq_replay.scheduled, dlq_replay.completed, dlq_replay.failed.

§6.3 Schema registry compatibility mode (Rev2 §6.5 + OD8)

event_type_registry.compat_mode enum:

  • forward — consumers tolerate new optional fields (DEFAULT).
  • backward — producers tolerate removed/renamed fields (rare; used for legacy event types being phased out).
  • full — both directions (used for stable contracts during migration).
  • breaking_with_policy — major version bump; producer refuses to emit until schema_change_policy row provides migration plan + cutover date + dual-write window.

Validation flow on producer write:

1. Resolve current schema version per (event_type, producer-claimed schema version).
2. Validate payload against that schema.
3. If compat_mode='breaking_with_policy' and policy row missing → REFUSE.
4. Audit version used into event_outbox.event_schema_version.

§6.4 Event lag monitoring (Rev2 §7.4.7)

fn_event_lag_compute(window, group_by) returns:

producer_class          consumer_class          p50_ms          p95_ms          p99_ms          breaches
workflow_runtime        worker.dot.iu_axis      120             480             1900            0
…

Lag = consumer_acked_at - event_outbox.produced_at. Threshold per pair in dot_config event_lag.threshold.<producer>.<consumer>.{p50,p95,p99}. Breach → governance UI red row + event_lag.breach event.

§6.5 Audit timeline (Rev2 §6.5)

vw_audit_event_timeline(p_trace_id) returns ordered list of events + jobs + state transitions correlated by trace_id. Used by governance UI drill-down. Source: event_outboxjob_queuestep_run.transitionsdot_iu_command_runiu_lifecycle_log. Single query; no Nuxt-side correlation.

§6.6 Governance UI binding (Rev2 §7.4)

Layer 5 surfaces all governance categories. Binding to 02-step-state-machine-and-workflow-ui-design.md §7 governance UI:

Governance view Source
DLQ count vw_governance_dlq_count (this design §6.1)
Silent workers queue_heartbeat joined with dot_config heartbeat.threshold.*
Schema violations event_validation_audit (paper; rejected emits go here)
Event lag breaches vw_governance_event_lag
Audit timeline vw_audit_event_timeline(trace_id)

§7. Layer 4 — Realtime Gateway (governed boundary)

§7.1 Boundary rule (Rev2 §6.4)

Nuxt MUST NOT connect directly to event_outbox, job_queue, PG LISTEN/NOTIFY channels. The backend realtime gateway is the only Nuxt-facing real-time substrate. Nuxt may host an SSE shell in a server route, but that route is a proxy shell calling the backend gateway abstraction — it does not read outbox tail nor listen on PG.

Sentinel: grep Nuxt source for LISTEN, event_outbox, job_queue, pg.connect, NOTIFY → zero matches.

§7.2 Gateway responsibilities

Concern Behavior
Topic resolution Maps subscriptions (Layer 2 delivery_kind='realtime_topic') to gateway topics by realtime_gateway_topic_registry (paper)
Permission filter For every subscriber, applies role-based filter BEFORE forwarding (Điều 37 v3.3 + Điều 33 v2.1)
Relevance filter Filters noise: governance summary only (no raw event flood); collapses bursts
Backpressure Per-client buffer + slow-consumer disconnect; client reconnect resumes from last event_seq
Delivery SSE (default) or WebSocket (if Nuxt shell connects via WS-capable transport)
Heartbeat Gateway sends :ping every 15s; client missing 2 → reconnect
Trace propagation Forwards trace_id so client log + backend trace correlate

§7.3 Realtime topic registry (paper)

realtime_gateway_topic_registry
  topic_id                text PK
  topic_name              text
  scope                   text   -- 'workflow_run' | 'governance' | 'iu' | 'audit'
  source_subscription_id  uuid FK -> event_subscription
  permission_filter_ref   text   -- predicate function name
  relevance_filter_ref    text
  summary_kind            text   -- 'delta' | 'summary' | 'redflag' | 'progress'
  active                  bool

Topics enumerated (illustrative):

  • topic.workflow_run.<workflow_run_id> — step state changes for one workflow_run; permission filter by workflow visibility.
  • topic.governance.problems — aggregate problem counts; permission filter by role.
  • topic.governance.silent_workers — silent worker alerts.
  • topic.iu.<iu_unit_id> — IU governance changes (governance state, version).
  • topic.audit.<trace_id> — single-trace timeline (drill-down).

§7.4 SSE shell pattern (Nuxt boundary, Rev2 §6.4 sửa lại)

[Browser]
   ⇅ SSE (text/event-stream)
[Nuxt server route /sse/topic/:topicId]
   ⇅ HTTP/WS (internal)
[Backend realtime gateway service]
   ⇅ broker tick / event_subscription dispatch
[PG event_outbox + event_pending + event_subscription]

Nuxt server route is shell only:

  • Accepts SSE connect, extracts session/auth.
  • Calls backend gateway abstraction (POST /gateway/subscribe) with topic_id + auth → receives stream handle.
  • Pipes backend stream events to browser SSE channel.
  • Honors backend gateway's disconnect / backpressure signals.
  • Does NOT subscribe to PG; does NOT read outbox.

Sentinel: Nuxt SSE route source contains zero PG connection imports.

§7.5 OD4 realtime gateway impl path (carried to 06-… §S4)

Default: start with Nuxt SSE shell + backend gateway as Node service (lightweight; matches state-vocab + config-first gates). Preserve Centrifugo adapter slot (Rev2 §15 L4/L5) for >1k concurrent realtime clients. No final pick now.


§8. Schema compatibility mode (Rev2 §14 + OD8)

Already introduced in §6.3. Design default: forward for new event types; breaking_with_policy requires schema_change_policy row. Reviewer can verify by checking event_type_registry has the compat_mode column populated for every event type.

Additional governance:

  • Breaking schema policy must include: producer migration window, consumer migration window, dual-write strategy, rollback path. Phase 1 task.
  • Deprecation: event type → deprecated; producer warned; consumer must migrate within window.

§9. Usage Evidence Registry (Rev2 §10 — derivation, not new substrate)

iu_usage_evidence (paper-only) materializes 8 signal classes from existing tables:

Signal Source view Derivation
co-used-in-workflow iu.used_in_workflow event joined by workflow_run_id Pairs of IU appearing in same workflow_run
co-triggered events joined by correlation_id Pairs in same trace
co-edited iu_version joined by change_request_id Pairs edited in same change
co-retrieved KG retrieval log (paper) joined by session_id Pairs retrieved together in context-pack
failure correlation step.failed events joined by correlation_id window Pairs failing in same run
repeated escalation step.escalated events grouped by step_def_id Step kinds with high escalation
repeated human correction proposal.kg_edge proposals approved Pattern of human overriding KG
event lag / DLQ correlation event_lag.breach + job.dead_lettered joined per (producer, consumer) Pairs with chronic latency / DLQ

Each signal computed by a STABLE function and persisted into iu_usage_evidence periodically (paper-only retention 365 days). Feeds KG feedback proposals (Rev2 §9) and governance UI insights.

Sentinel: KG never reads from PG directly; reads from iu_usage_evidence views.


§10. Reconcile checklist with Bắt sự kiện của PG(3).docx (Rev2 §6.6)

Lesson Embedded in this design Sentinel
Producers only produce, không execute §3 producer taxonomy + §5 executor class "Producer code does not invoke executor; producer writes only to event_outbox"
Event bus ≠ job queue §4.1 substrate split "Distinct tables; no row shared between bus and queue"
Workers consume + execute; queue does not auto-run §5.2 ACK/NACK "Cron does not run scripts; cron emits events, worker leases"
Nuxt does NOT connect core queue §7.1 boundary rule "grep Nuxt source: zero direct PG / outbox / NOTIFY connections"
Realtime gateway mandatory §7 "All real-time goes through gateway topic registry"
DLQ / retry / replay / poison isolation §6 "Every job has classification on failure; DLQ replay needs Điều 32"
ACK/NACK / timeout / idempotency / replay §5.2 + §5.4 "Worker lease enforces ACK or NACK with classification"
Schema registry + distributed tracing primitives §3.1 register-before-emit + §5.6 W3C trace_id "Producer refused without registry hit; every row carries trace_id"
Governance UI: problems / summary / drill-down §6.6 → 02-… §7 "Default route is problem-first; drill-down via trace_id"

§11. Cross-references and acceptance

Cross-refs:

  • State transitions emit events into event_outbox per §3 — see 02-step-state-machine-… §4.
  • IU brick / bundle binding emits via §3.3 IU event family — see 04-iu-centered-4mothers-binding-design.md.
  • OSS substrate gate verdicts — see 05-oss-candidate-strategy-rev2.md.
  • Schemas (executor_class_registry, idempotency_registry, dlq_replay_request, retry_policy_registry, realtime_gateway_topic_registry, iu_usage_evidence, event_validation_audit) all paper-only; gated by 06-open-decisions-and-readiness.md §S20 implementation sequencing.

Acceptance:

A1. Producers / register-before-emit / capture-by-config covered §3. A1.b. Payload policy (MP-D8): schema validator enforces forbidden-key deny-list + per-event_type allowlist + configurable event_payload.max_size_bytes.* cap; rejected emits land in event_validation_audit. A2. Event bus vs job queue cleanly separated §4. A3. Executor class registry shape + MOT-is-not-executor sentinel §5.1. A4. ACK/NACK + retry + idempotency + heartbeat + W3C trace_id covered §5.2..§5.6. A5. Realtime gateway boundary + Nuxt SSE shell + permission/relevance filter + topic registry covered §7. A6. DLQ + replay (Điều 32 approval) + schema compat + event lag + audit timeline covered §6. A7. Usage evidence schema derived from existing tables, no new substrate §9. A8. Reconcile checklist for Bắt sự kiện của PG(3).docx 9 lessons all bound §10. A9. No PG mutation; no DOT command run; no migration; substrate schemas paper-only.

End WS5 design.

Back to Knowledge Hub knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/03-event-5layer-realtime-dlq-design.md