KB-28B8 rev 2

IU 4-Mothers Master Design Rev2 — WS4 Step/Task State Machine + Workflow UI (DRAFT 2026-05-27)

42 min read Revision 2
designmaster-design-rev2state-machineworkflow-ui9-state-floorwaiting-facetsa11ygovernance-uiws4iu-centered4-mothersdraftdocument-only2026-05-27

Master Design Rev2 — Step/Task State Machine + Workflow UI (WS4)

Path: knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/02-step-state-machine-and-workflow-ui-design.md Status: DRAFT Rev2 (document-only). Companion to 00-master-design-rev2.md. Date: 2026-05-27 Authority: Rev2 brief §7 (UI), §7.2.6 (roll-up), §7.3 (9-state model), §7.3.6 (waiting facets), §7.3.7 (a11y), §7.4 (governance UI). Boundary: state machine and transitions live in PG registry/config, not in Nuxt or in worker code. Nuxt is render shell (Điều 28 / S178). Transition validation is backend-side. Approval logic is Điều 32 — not owned here. Queue/event/heartbeat substrate is Điều 45 — not redefined here.


§1. Scope and boundaries

This document designs:

  • The 9-state floor for step/task instances (Rev2 §7.3) + waiting facets (MP3) + a11y tokens (MP4).
  • The transition matrix (Rev2 §7.3 deferred here) — actor/trigger/guard/event/audit per transition.
  • The workflow status roll-up rules (Rev2 §7.2.6 MP5).
  • The decision on derived states above 9 (Rev2 OD12).
  • The PG/config substrate (state_machine_registry) and validation function.
  • The workflow UI: Standard Process View, Runtime Progress View, long-workflow patterns (100–500+ steps), Governance UI.
  • The proposal mode shape (writes to workflow_change_requests, never workflow_registry).

This document does not design:

  • The IU brick / IU bundle internal schema — see 04-iu-centered-4mothers-binding-design.md.
  • Producer/consumer/queue/realtime substrate — see 03-event-5layer-realtime-dlq-design.md.
  • OSS tool selection or substrate adapter pick — see 05-oss-candidate-strategy-rev2.md.
  • Approval quorum — Điều 32 surface.

§2. Substrate: state_machine_registry (PG)

OD9 default proposal: state machines for step / task / workflow run in PG, declared-by-config, transition validation via PG function.

§2.1 Tables (paper-only schema)

state_machine_def
  state_machine_id        text  PK
  scope                   text  -- 'step' | 'task' | 'workflow_run' | 'cut_request' | ...
  version                 text  semver
  active                  bool
  description             text
  created_at              timestamptz

state_def
  state_machine_id        text  FK
  state_code              text
  semantic_class          text  -- 'idle' | 'active' | 'wait' | 'red' | 'terminal'
  is_terminal             bool
  ordinal                 int
  ui_color_token          text  -- 'gray' | 'green_pale' | 'green' | ... (see §3.2)
  ui_icon_token           text  -- '○' | '▶' | '►' | ...
  ui_label_i18n_key       text  -- 'state.not_started' (Nuxt resolves)
  ui_short_message_key    text
  PRIMARY KEY (state_machine_id, state_code)

transition_def
  state_machine_id        text  FK
  from_state              text  FK -> state_def
  to_state                text  FK -> state_def
  actor_class             text  -- 'system' | 'pic_human' | 'reviewer' | 'executor_worker' | 'mow_orchestrator' | 'escalation_handler'
  trigger_kind            text  -- 'event' | 'rpc' | 'timer' | 'condition_resolved'
  trigger_ref             text  -- event_type / rpc name / timer key
  guard_kind              text  -- 'config' | 'predicate_fn'
  guard_ref               text  -- predicate function name or config rule id
  emitted_event_type      text  FK -> event_type_registry (nullable)
  audit_required          bool
  rollback_to             text  nullable -- explicit rollback edge
  proposed                bool  default false  -- proposed transitions stay false until governance approves
  PRIMARY KEY (state_machine_id, from_state, to_state, actor_class)

state_facet_def           -- §3.3 waiting sublabels and similar
  state_machine_id        text
  state_code              text
  facet_code              text  -- 'waiting_dependency' | 'waiting_human' | 'waiting_external' | 'waiting_time_gate' | ...
  ui_icon_token           text
  ui_label_i18n_key       text
  primary_ordinal         int   -- for primary picking when multiple
  PRIMARY KEY (state_machine_id, state_code, facet_code)

rollup_policy_def         -- §5 workflow roll-up rules
  state_machine_id        text  FK (workflow_run scope)
  policy_code             text  -- 'red_overrides_yellow_overrides_green' | 'mandatory_only' | ...
  policy_config_jsonb     jsonb
  active                  bool

§2.2 Runtime tables (paper-only)

step_run
  step_run_id             uuid PK
  workflow_run_id         uuid FK
  step_def_id             uuid FK
  state_machine_id        text FK
  state_code              text FK -> state_def
  waiting_facet           text nullable FK -> state_facet_def (only when state_code='waiting')
  pinned_iu_version_id    uuid FK    -- per §4 OD15 default 'pin'
  pic_id                  uuid nullable
  executor_class_ref      text nullable
  started_at              timestamptz nullable
  completed_at            timestamptz nullable
  last_transition_at      timestamptz
  trace_id                text     -- W3C shape
  correlation_id          text

task_run
  task_run_id             uuid PK
  step_run_id             uuid FK
  state_machine_id        text FK
  state_code              text FK
  waiting_facet           text nullable
  ...same envelope fields...

workflow_run
  workflow_run_id         uuid PK
  workflow_def_id         uuid FK
  state_machine_id        text FK (typically scope='workflow_run')
  rollup_state_code       text   -- derived per rollup_policy_def
  rollup_basis_jsonb      jsonb  -- which step_runs contributed
  last_rollup_at          timestamptz

Reuse note: tasks / task_checkpoints / task_comments [VL row 19] become the auditable persistence underneath task_run; workflows / workflow_steps / workflow_step_relations [VL row 16] underneath workflow_run. The new *_run schema is the runtime envelope; columns above are design proposals, not migrations.

§2.3 Validation function (paper-only)

fn_state_transition_validate(
    p_run_id        uuid,
    p_run_kind      text,            -- 'step' | 'task' | 'workflow_run'
    p_from_state    text,
    p_to_state      text,
    p_actor_class   text,
    p_trigger_kind  text,
    p_trigger_ref   text,
    p_evidence      jsonb            -- pre/post checks, audit refs, idempotency key
) RETURNS jsonb (
    ok bool,
    transition_id text,
    emitted_event_type text nullable,
    problems text[]
)

Properties:

  • STABLE for dry-run / preview (no mutation when called with p_preview:=true).
  • Refuses if no transition_def row matches (state_machine_id, from, to, actor_class).
  • Refuses if guard_kind='predicate_fn' and predicate returns false; emits transition_refused_by_guard problem with named guard.
  • Refuses if audit_required=true and p_evidence.audit_ref is null.
  • On success, emits emitted_event_type to event_outbox and updates *_run row in single TX.
  • Idempotent: same (transition_id, idempotency_key) returns prior verdict.

Nuxt cannot call this directly; calls go through MOW orchestrator / MOT envelope / executor worker via backend route (Điều 28 / S178).


§3. The 9-state floor (Rev2 §7.3 binding)

§3.1 State definitions

# state_code Semantics Traffic-light Icon Text (vi) Text (en) Semantic class Terminal?
1 not_started Precondition not yet met Gray Chưa tới lượt Not started idle no
2 ready Precondition met, awaiting PIC/executor Green (pale) Sẵn sàng Ready active no
3 in_progress Currently being executed Green Đang làm In progress active no
4 waiting Waiting on external dependency Yellow Đang chờ Waiting wait no
5 blocked Internal blocker (missing IU / config / approval) Yellow (dark) — escalates to Red on threshold (see MP-D4) Bị chặn Blocked wait no
6 overdue Past deadline, still salvageable Red (pale) Trễ hạn Overdue red no
7 failed Executed and failed (transient or permanent) Red Thất bại Failed red no
8 cannot_complete Worker / PIC declares unable to complete in step scope Red (dark) Không thể hoàn tất Cannot complete red no
9 completed Output contract met, postcondition events ready Green (deep) Hoàn thành Completed active yes

This is the floor — every step/task state machine must include these 9 codes verbatim, with these semantic classes and traffic-light tokens.

MP-D4 — blocked severity escalation (UI-only; does not change core state): Default traffic-light token for blocked is ui.traffic_light.yellow_dark. The UI token escalates to ui.traffic_light.red (rendered presentation only — state_code stays blocked) when ANY of the following holds:

  • blocked_since > dot_config block_severity.threshold_seconds.<workflow_class> (default 4 h).
  • step is on the workflow's critical path (per fn_workflow_rollup_compute.basis critical-path flag).
  • step's deadline already breached (i.e. now() > deadline while blocked).
  • operator manual escalation via rpc step_block_escalate (requires Điều 32 governance review if blocked > N hours; threshold per class).

Escalation is purely presentational + roll-up-relevant (§5: roll-up treats blocked as wait-class by default, but red-escalated blocked rolls up as red per rollup_policy_def.blocked_escalation_treats_as_red=true). The core state machine MUST NOT branch state codes on severity; UI / roll-up consult a derived blocked_severity ∈ {yellow_dark, red} field computed by fn_step_blocked_severity(step_run_id) → text.

Sentinel: any UI rendering blocked as red MUST source its color from fn_step_blocked_severity, never from a separate state_code value.

§3.2 Traffic-light tokens (Rev2 §7.3.1 + §7.3.7)

Design system registry (lives in PG dot_config namespace ui.traffic_light.* or Directus content registry — Điều 33 v2.1 boundary respected):

ui.traffic_light.gray
  fg #4B5563    bg #F3F4F6    contrast_ratio_fg_on_bg 9.3
  high_contrast.fg #FFFFFF  high_contrast.bg #1F2937  contrast_ratio 16.8
  shape_token "circle_hollow"

ui.traffic_light.green_pale
  fg #047857   bg #ECFDF5    contrast_ratio 8.4
  high_contrast equivalents …
  shape_token "triangle_outline"

ui.traffic_light.green
  fg #FFFFFF   bg #059669    contrast_ratio 4.6
  shape_token "triangle_filled"

ui.traffic_light.green_deep
  fg #FFFFFF   bg #047857    contrast_ratio 5.4
  shape_token "check_filled"

ui.traffic_light.yellow
  fg #92400E   bg #FEF3C7    contrast_ratio 6.5
  shape_token "hourglass"

ui.traffic_light.yellow_dark
  fg #FFFFFF   bg #B45309    contrast_ratio 5.1
  shape_token "warn_filled"

ui.traffic_light.red_pale
  fg #B91C1C   bg #FEE2E2    contrast_ratio 5.7
  shape_token "clock_alert"

ui.traffic_light.red
  fg #FFFFFF   bg #DC2626    contrast_ratio 4.8
  shape_token "cross_filled"

ui.traffic_light.red_dark
  fg #FFFFFF   bg #991B1B    contrast_ratio 7.4
  shape_token "stop_filled"

All values are paper proposals; final WCAG audit happens in Phase 1 UI. Sentinel: any new token MUST declare fg/bg/contrast_ratio + high_contrast.fg/bg + shape_token. Nuxt resolves tokens only — no hardcoded colors.

§3.3 Waiting facets (Rev2 §7.3.6 MP3)

facet_code Meaning Primary ordinal
waiting_dependency Awaiting upstream step/task output in same workflow 10
waiting_human Awaiting another user's input/decision (not the current PIC) 20
waiting_external Awaiting external API / callback / webhook / partner 30
waiting_time_gate Awaiting cron / scheduled time / business calendar / SLA window 40

Properties (MP3):

  • Facet is a UI display facet, not a new core state. step_run.state_code = 'waiting' always; step_run.waiting_facet ∈ 4-vocab.
  • Mixed wait: primary picked by lowest primary_ordinal (dependency > human > external > time_gate); secondary surface as chips in UI.
  • All 4 facets share the Yellow traffic-light token (ui.traffic_light.yellow), distinguished by icon + text.
  • Schema lives in state_facet_def (PG); step_run.waiting_facet lives in step_run (PG). Nuxt zero logic.

Sentinel: any step_run.state_code='waiting' with step_run.waiting_facet IS NULL is a integrity violation.

§3.4 State registry shape (Rev2 §7.3.4)

Each state declared as:

state_code                'in_progress'
semantic_class            'active'
is_terminal               false
ordinal                   3
ui_color_token            'ui.traffic_light.green'
ui_icon_token             'icon.play_filled'
ui_label_i18n_key         'state.in_progress.label'
ui_short_message_key      'state.in_progress.short'

i18n is Nuxt-side resolution via Directus content registry. No business logic in Nuxt; only key→string lookup.

§3.5 Accessibility design tokens (Rev2 §7.3.7 MP4)

Binding rules:

  1. Not color-alone. Every state cell renders color + icon + text label triplet. Nuxt component layer must compose all three; missing icon or text is a render-layer bug.
  2. Tooltip / short reason. Hover/long-press renders ui_short_message_key (e.g. "Đang chờ duyệt từ KT" for waiting_human).
  3. WCAG 2.1 AA contrast. Text ≥4.5:1; icon/large text ≥3:1. Token table §3.2 satisfies; any new token must satisfy.
  4. Color-blind safe palette. Red+Green pair never the sole disambiguator — shape tokens differ (cross vs check vs triangle). Test with Deuteranopia/Protanopia/Tritanopia simulator (CI gate Phase 1).
  5. High-contrast/dark mode. Each token has paired high_contrast.fg/bg; UI activates pair when prefers-contrast:more or theme toggle. State distinguishability preserved.
  6. Screen reader. aria-label template = {state_label_en} · {short_message_en} (English fallback) + Vietnamese variant via i18n.
  7. Tokens declare-by-config. Lives in PG dot_config vocab.ui.traffic_light.* or Directus content registry (Điều 33 v2.1 boundary respected); never hardcoded in Nuxt.

§4. Transition matrix (Rev2 §7.3 deferred → defined here)

§4.1 Notation

Each transition row: (from → to, actor_class, trigger_kind, trigger_ref, guard_kind, guard_ref, emitted_event_type, audit_required, rollback_to).

actor_class enum: system, pic_human, reviewer, executor_worker, mow_orchestrator, escalation_handler.

trigger_kind enum: event, rpc, timer, condition_resolved.

§4.2 Step state machine transition matrix

# From To Actor Trigger Guard Emitted event Audit Rollback
T1 not_started ready mow_orchestrator condition_resolved precondition_config resolves true step.ready no
T2 ready in_progress pic_human or executor_worker rpc step_claim pic_id assigned AND executor_class_ref resolved step.started yes → ready (T2r)
T2r in_progress ready pic_human rpc step_release actor = current pic_id step.released yes
T3 in_progress waiting executor_worker or pic_human rpc step_wait wait reason ∈ facet vocab step.waiting (carries waiting_facet) no → in_progress (T3r)
T3r waiting in_progress system (facet resolved) event (e.g. dependency completed / human input received / external callback / time elapsed) facet resolution proof step.resumed no
T4 in_progress blocked executor_worker or pic_human rpc step_block block reason ∈ block-reason vocab (missing IU / missing approval / config error) step.blocked yes → in_progress (T4r)
T4r blocked in_progress system or pic_human event blocker_resolved blocker resolution evidence ref step.unblocked yes
T5 in_progress overdue system (timer) timer deadline_passed now > deadline step.overdue no
T6 overdue in_progress system (timer) condition_resolved (deadline extended) OR pic re-engages deadline updated step.recovered_from_overdue yes
T7 overdue completed pic_human or executor_worker rpc step_complete output contract met AND completed_after_deadline=true written to step_run step.completed (carries completed_after_deadline=true, overdue_first_at, sla_breach_duration) yes
T8 in_progress completed pic_human or executor_worker rpc step_complete output contract met AND postcondition_config satisfied step.completed yes → in_progress only via reopen_for_correction T8r with Điều 32 approval — never deletes prior completion
T8r completed in_progress reviewer rpc step_reopen_for_correction Điều 32 approval ID present AND correction_reason ∈ vocab.correction_reason.* AND prior_completed_audit row written AND original_output_snapshot captured step.reopened_for_correction (carries prior_completed_audit_id, original_output_snapshot_id, correction_reason, approval_id) yes
T9 in_progress or waiting or blocked failed executor_worker event executor_failure failure classification ∈ transient | permanent step.failed (carries classification) yes
T10 failed (transient) in_progress executor_worker event retry_attempt retry budget not exceeded AND idempotency key present step.retry no
T11 failed (permanent) cannot_complete escalation_handler rpc escalate retry budget exhausted OR classification=permanent step.escalated yes
T12 in_progress or blocked or waiting cannot_complete pic_human or executor_worker rpc declare_cannot_complete declaration evidence ref step.cannot_complete yes → in_progress (T12r) reopen_for_correction with Điều 32
T12r cannot_complete in_progress reviewer rpc step_reopen_for_correction Điều 32 approval ID present AND correction_reason ∈ vocab.correction_reason.* AND prior_cannot_complete_audit row written step.reopened_for_correction (carries prior_audit_id, correction_reason, approval_id) yes
T13 any non-terminal (derived state, see §4.5) varies varies varies varies varies varies

Notes:

  • Every emitted event_type must be registered in event_type_registry before this transition is allowed (Rev2 §6.1 register-before-emit; §3.1 of 03-event-5layer-…).
  • T8r and T12r are reopen_for_correction (MP-D2). They DO NOT delete the prior completed / cannot_complete record. Mandatory artifacts before transition:
    • Điều 32 approval_id present.
    • prior_completed_audit row (or prior_cannot_complete_audit) written, snapshotting the closing state's actor, timestamp, output, signoff.
    • original_output_snapshot captured (full output payload at original completion) — stored in step_run_output_snapshot (paper).
    • correction_reason from dot_config vocab.correction_reason.* (paper vocab includes data_error, policy_change, downstream_dependency_failed, regulatory_recall, etc.).
    • Audit timeline (vw_audit_event_timeline) MUST show both the original step.completed (or step.cannot_complete) event AND the subsequent step.reopened_for_correction event linked via correlation_id. Reopen is an additive event, never a deletion. Sentinel: zero step_run UPDATE that erases the original completed_at, completion_actor_id, or output_snapshot fields when reopening; reopen instead writes a new row in step_run_correction_cycle (paper) referencing both states.
  • failure_classification taxonomy lives in a separate failure_class_registry (paper-only) referenced via step.failed event payload.
  • MP-D3 — overdue → completed audit preservation. T7 MUST persist completed_after_deadline=true, overdue_first_at, and sla_breach_duration (= completed_at - deadline) on the step_run row, and the emitted step.completed event carries the same three fields. The historical overdue mark is not erased: step_run_state_history (paper, derived from transition events) retains every state visit including overdue. SLA breach roll-up + governance UI (§7) continue to show the breach even after completion. Sentinel: no T7 transition may UPDATE step_run in a way that nulls overdue_first_at or sets completed_after_deadline=false.
  • block-reason vocab lives in dot_config vocab.block_reason.* (paper).
  • pic_id assigned is a guard predicate function: fn_step_pic_resolves(p_step_run_id) → bool (paper).

§4.3 Task state machine

Tasks inherit the same 9-state floor and the same transitions, with these specializations:

  • actor_class adds task_pic_human (the task's PIC, distinct from step PIC when step has multiple tasks).
  • T8 task completion carries task.completed event; step completion is triggered only when all task_runs in the step reach completed or skipped-via-branching.
  • Automated tasks (no human UI) skip T2 PIC claim — direct from ready→in_progress by executor_worker actor.

§4.4 Workflow_run state machine

workflow_run.state_code is a roll-up derived state (§5), not a directly transitionable code. The validation function only allows transitions from explicit lifecycle RPCs:

From To Actor Trigger
not_started in_progress mow_orchestrator rpc workflow_start
in_progress completed system condition_resolved all_mandatory_steps_completed
in_progress cancelled (derived state — §4.5) reviewer rpc workflow_cancel with Điều 32 approval
in_progress paused (derived state — §4.5) reviewer rpc workflow_pause
paused in_progress reviewer rpc workflow_resume

Rollup state (rollup_state_code) is a separate column derived per §5 — not editable directly.

§4.5 Derived states above 9 (Rev2 OD12 decision in this design)

Rev2 §7.3.3 allows derived states with justification. OD12 defers default choice to Master Design Rev2. Decision in this design:

Candidate Decision Reason
paused ACCEPT as derived state (workflow_run scope) Long-running workflows (months/years) need explicit governance-driven pause separate from blocked/waiting. Pause has clear actor (reviewer) and clear resumption RPC. Avoids overloading blocked semantics.
cancelled ACCEPT as derived state (workflow_run + step + task scope) cannot_complete is per-actor declaration; cancelled is workflow-level decision (e.g. business need disappears). Terminal but distinguishable from completed.
retrying DEFER as facet, not state Captured by retry_policy_registry + retry budget on failed (transient) state. UI surfaces "retrying (attempt n/N)" via facet on failed. Avoids state explosion.
escalated DEFER as facet Captured by waiting_facet='waiting_human' with escalation_chain_id evidence + governance UI surfaces escalations. State machine stays 9+2.

Accepted derived states get their own state_def rows with semantic_class:

  • paused → semantic_class wait, ui_color_token ui.traffic_light.yellow_dark, ui_icon pause_filled, ui_label state.paused, terminal=no.
  • cancelled → semantic_class red, ui_color_token ui.traffic_light.red_dark, ui_icon cancel_filled, ui_label state.cancelled, terminal=yes.

MP-D5 — Derived states are extension states, not replacements. paused and cancelled extend the 9-state floor; they do not replace any of the 9 floor codes. The state machine remains a 9-state floor + N derived (currently N=2). Binding rules:

  • Every adapter, OSS tool, external mirror, UI widget, and registry consumer MUST support the full 9-state floor as a baseline. Adapters that cannot encode all 9 floor codes are not allowed (Gate A — state-vocab fit, per 05-…).
  • Adapters MAY ignore derived states OR map them to a floor equivalent (recommended mapping: paused → waiting UI-side; cancelled → cannot_complete UI-side), provided the floor view stays correct. The PG SoT keeps the precise derived code; adapters down-map only for their own surface.
  • state_def rows for derived states MUST declare is_derived=true and a floor_equivalent column pointing to the closest floor code, so adapters can mechanically map.
  • Adding a future derived state requires the same justification + state_def row + floor_equivalent; it never modifies any floor row.

Sentinel: derived states never sneak in without a state_def row + justification line + floor_equivalent. No adapter is approved that fails to cover all 9 floor codes.

§4.6 Idempotency on transitions

Every transition RPC carries an idempotency_key. The transition validation function records (step_run_id, transition_id, idempotency_key) → verdict in transition_idempotency (paper, alternatively in shared idempotency_registry per 03-event-5layer-… §4.4). Re-submission returns prior verdict — no double-event, no double-audit.


§5. Workflow status roll-up (Rev2 §7.2.6 MP5)

§5.1 Binding rules (verbatim from MP5, expanded)

  • R-1 Red overrides yellow overrides green. Any mandatory active step in red class (failed/overdue/cannot_complete/cancelled) → workflow rollup MUST NOT be green.
  • R-2 Yellow without red. Mandatory active step in wait class (waiting/blocked/paused) but no red → workflow rollup MAX yellow.
  • R-3 Skipped / not_applicable do not count. Steps skipped via branching condition or not_applicable (e.g. else-branch) do NOT pull workflow into yellow/red. Only active mandatory steps (already entered the graph branch) count.
  • R-4 Optional vs mandatory. Steps with workflow_step_def.optional=true (config) never pull workflow status. Only mandatory steps roll up.
  • R-5 Config-driven. Roll-up rules declared in rollup_policy_def (PG); never hardcoded in Nuxt.
  • R-6 Terminal completed. All mandatory active steps completed → workflow rollup = green; AND when all also reach terminal-with-no-pending-postcondition → workflow_run transitions to completed.

§5.2 Roll-up function (paper-only)

fn_workflow_rollup_compute(p_workflow_run_id uuid)
  RETURNS jsonb (
    rollup_state_code        text  -- 'gray' | 'green' | 'yellow' | 'red'
    basis_step_run_ids       uuid[]  -- which steps were considered
    red_count                int
    yellow_count             int
    green_count              int
    skipped_count            int
    optional_excluded_count  int
    decisive_step_run_id     uuid nullable
    decisive_state_code      text nullable
    computed_at              timestamptz
  )

Function is STABLE (no mutation). MOW orchestrator calls it on every step state change event; result is written to workflow_run.rollup_state_code + last_rollup_at in a single TX.

§5.3 Edge cases (resolved here)

  • Workflow with 0 mandatory active steps. rollup_state_code = 'gray' (idle); workflow_run.state_code = not_started or in_progress (lifecycle independent of rollup color).
  • Workflow with cycle. Cycles forbidden at workflow_def level (workflow_step_relations cycle check enforced at proposal-accept time). Rollup function assumes DAG.
  • Sub-workflow roll-up. Sub-workflow rolled up first; its rollup_state_code becomes the input state for the parent's containing step. Mandatory/optional applies at parent level; sub-workflow color drives parent-step color via a small bridging rule: sub-workflow red → parent step failed; sub-workflow yellow → parent step blocked (waiting_facet=waiting_dependency); sub-workflow green-but-not-completed → parent step in_progress; sub-workflow completed → parent step completed.
  • Workflow paused. When workflow_run.state_code='paused', rollup_state_code freezes at last computed value with a frozen_at marker. UI surfaces both: state badge Paused, rollup color from freeze.

§5.4 Per-state weight (deferred to Phase 1)

R7.2.6 mentions per-state weight for advanced rollup (e.g. ratio of yellow:green). Rev2 design accepts R1..R6 binding and defers weighted rollup to Phase 1 after operator feedback. Sentinel: no weighted rollup until rollup_policy_def.policy_code row added with explicit weights.


§6. Workflow UI design — Standard, Runtime, Long-workflow

§6.1 Standard Process View (Rev2 §7.1)

  • Renders workflow definition as a directed graph: nodes = step (IU brick / IU bundle / sub-workflow boundary); edges = workflow_step_relations rows (trigger / branching condition / parallel).
  • Data source: backend gateway → workflow_registry + workflow_step_def + workflow_step_relations + IU registry. Nuxt component <StandardProcessView workflowDefId /> is zero-logic.
  • Branching renders as labeled fan-out (AND / OR / condition).
  • Parallel renders as horizontal swimlanes.
  • Sub-workflow boundary renders as collapsed node with drill-down.
  • Single layout for all roles; backend permission filter decides which step nodes are visible (some steps may carry role-restricted IU body — but body is fetched only on demand; node visibility is by step-def permission row).
  • Switch to Proposal mode (toggle in UI): every add/edit/delete becomes a draft against workflow_change_requests; no direct workflow_registry mutation. Diff view shows current vs proposed.

§6.2 Runtime Progress View (Rev2 §7.2)

  • Renders runtime instance of a workflow_run with each step_run carrying its own state + deadline + PIC + last output snapshot.
  • Drill-down: step_run → task_runs → IU content (via render layer pulling from information_unit pinned version) + IO actual (MOIT input snapshots, MOUT output view).
  • Resume-safe: backend serves a workflow_run_snapshot materialized view that includes all step/task states + pinned IU versions + last realtime event seq. Refresh restores from snapshot.
  • Long-running timeline + milestone view: see §6.3 below.
  • Realtime updates via gateway (Rev2 §6.4; design 03-event-5layer-… §5). Nuxt does NOT poll PG — subscribes to gateway topic for the workflow_run.

§6.3 Long workflow UI (100–500+ steps)

Single layout requirement (Rev2 §2 D2.5) — 500-step must render with same primitive as 2-step. Patterns:

  1. Zoom / pan. Continuous zoom 25%..200%; pan via drag; mouse-wheel zoom; touch pinch-zoom.
  2. Collapse subgraph. Sub-workflow nodes collapse to single rollup chip; group nodes (e.g. by workflow_categories) collapse.
  3. Critical path highlight. Backend computes critical path on each rollup; UI overlays. Critical path = the longest dependency chain reaching the earliest mandatory unfinished step.
  4. Blocked chain visualization. Steps in blocked/waiting cause downstream steps to be highlighted with a "downstream of blocker" indicator.
  5. Search. Search box queries step_def.name + iu_label + bundle_label; results highlight and pan-to.
  6. Mini-map. Bottom-right miniature with current viewport rectangle.
  7. Group by lane. Optional lane grouping by PIC role / executor class / category.
  8. Progress bar header. Workflow-level rollup + percentage (mandatory completed / mandatory total).
  9. Lazy-load step body. Step body (IU/task content) fetched only on click; node carries metadata-only by default.
  10. Single primitive — same <WorkflowGraph> component. No <LargeWorkflowGraph> sibling; the same component scales by virtualization (only visible nodes hydrate).

Sentinel: every long-workflow feature MUST work on a 2-step workflow without UI/code branching.

§6.4 Drill-down chain

Workflow node → Step node → Task envelope → IU content (rendered from pinned version) → IO actual (MOIT snapshot / MOUT view) → Event timeline (filtered to this trace_id).

Each level fetches via backend gateway; Nuxt holds zero state-machine logic (Điều 28 / S178).

§6.5 Proposal mode shape

When in Proposal mode (Rev2 §7.1):

  • All add/edit/delete actions create rows in workflow_change_requests with proposal_state enum (draft / submitted / review / approved / rejected / merged).
  • Diff is computed by backend (fn_workflow_change_diff(p_change_request_id)) and rendered as overlay.
  • Approval routes to Điều 32 quorum.
  • On approval, a separate macro applies the change to workflow_registry (out of scope here — Phase 1+).

OD2 default kept: reuse workflow_change_requests for workflow proposals; design a generic proposal table only for non-workflow proposals (IU split/merge → see Điều 39 G1; MOT template change; MOIT field change; MOUT view change). See 06-open-decisions-and-readiness.md §S2.


§7. Governance UI (Rev2 §7.4)

§7.1 Problem-first default view

Default landing route = /governance/problems. Shows:

  • Aggregate category counts (1 row each): DLQ messages / silent workers / overdue steps / schema violations / event lag breaches / integrity warnings / failed cuts / orphan workflows.
  • Click row → drill-down list (paginated).
  • Click list item → full audit + correlation + trace_id timeline.

§7.2 Aggregate counts (Rev2 R7.4.2)

Source views (paper-only):

  • vw_governance_dlq_countjob_dead_letter filtered to current_window.
  • vw_governance_silent_workersqueue_heartbeat with last_tick_at < now() - N per worker class.
  • vw_governance_overduestep_run where deadline < now() AND state_code IN ('overdue','blocked','waiting').
  • vw_governance_schema_violationsevent_type_registry validation rejects logged to event_validation_audit (paper).
  • vw_governance_event_lagfn_event_lag_compute(window) returning p50/p95/p99.
  • vw_governance_integrity_warningsiu_lifecycle_log filtered to severity='warning' or axis_inconsistency.

§7.3 Concise AI/worker status (Rev2 R7.4.3)

Each worker/executor class has a backend-summarized status string (e.g. "OK, 14 jobs / 0 errors / lag 1.2s p95"). Generated by a periodic summarization function — NOT raw log dump. Drill-down for full log/trace.

MP-D9 — Summary evidence rule (binding). Any AI- or worker-generated concise summary MUST carry, alongside the human-readable string, the following structured evidence fields:

  • source_event_count — integer; number of source events (event_outbox / job_queue / heartbeat / step_run transitions) the summary aggregated. MUST be ≥1; a summary with source_event_count=0 is forbidden (no fabrication).
  • time_window{start_at, end_at} timestamps demarcating the aggregation window.
  • generated_by{summarizer_kind ∈ {sql_fn, ai_agent, deterministic_rule}, summarizer_ref, summarizer_version}.
  • confidence — float [0..1] (or null for deterministic rules); AI summaries MUST populate this field and surface it in UI alongside the string.
  • source_event_refs — array of event ids / outbox ids / job ids the summary aggregated; truncated to first 100 if window is large, with a separate source_event_total count preserved.

Binding rules:

  1. Summary must not hide raw evidence. Every concise status row MUST expose a drill-down link that runs vw_audit_event_timeline(...) filtered to the summary's source_event_refs or (trace_id, time_window). UI MUST render the link; backend MUST serve it.
  2. Summary must not mark a problem resolved without a source event. Any auto-resolution of a governance problem (DLQ cleared, silent worker recovered, lag breach healed, integrity warning closed) requires at least one corresponding *.resolved / *.recovered / *.healed event in event_outbox. Summarizer that flips a problem to "OK" without such an event MUST be refused at write-time (event_validation_audit row).
  3. AI-generated summaries with confidence < dot_config ai_summary.min_confidence.<category> (default 0.6) MUST be flagged in UI as low-confidence and NOT trigger auto-resolution.
  4. Sentinel: any governance row whose state flipped from red/yellow to green without a corresponding source event reference is an integrity violation (logged + raised in governance UI).

§7.4 DLQ replay / rescue (Rev2 R7.4.5)

UI workflow:

  1. Operator opens DLQ entry.
  2. Inspects payload + failure classification + retry history.
  3. Initiates replay via rpc dlq_replay_propose — writes to dlq_replay_request (G4) with proposal_state='submitted'.
  4. Approval via Điều 32; on approval, replay scheduled with idempotency key (from idempotency_registry G5).
  5. Replay outcome logged to dlq_replay_request.outcome_jsonb + emits dlq_replay.completed event.

Sentinel: every DLQ replay has a dlq_replay_request_id AND approval_id (Điều 32) — design enforces this binding.

§7.5 Heartbeat / silent worker monitor (Rev2 R7.4.6)

  • queue_heartbeat table tracks (worker_class, last_tick_at, status, last_ok_at).
  • Worker N silent for > threshold_N (per-class config in dot_config heartbeat.threshold.*) → status flipped to silent.
  • Governance UI highlights silent workers in problem-first view.
  • False-heal protection (Rev2 §15.5 ref): silent worker auto-resume blocked until heartbeat caller has emitted ≥1 successful tick AND the worker class flag is not frozen (per feedback memory pattern).

§7.6 Event lag monitor (Rev2 R7.4.7)

  • fn_event_lag_compute(window) returns p50/p95/p99 per (producer, consumer) pair.
  • Traffic-light threshold in dot_config event_lag.threshold.{p50,p95,p99} (per-pair override allowed).
  • UI shows trend line + threshold overlay.

§7.7 No raw event stream surface (Rev2 R7.4.8)

UI never offers a "raw outbox tail" view. Drill-down always summarized and filtered. Raw event inspection only via backend vw_audit_event_timeline(trace_id) per investigation.

§7.8 Same layout per role + backend filter (Rev2 R7.4.9)

UI component is single; permission filter is backend route. Sentinel: there is exactly one <GovernanceProblemView> component; role variance is via permission-filtered backend response.


§8. Proposal mode shape (Rev2 §7.1 + OD2)

§8.1 Generic vs per-domain

  • Workflow proposals reuse workflow_change_requests (Rev2 §12 row 17 [VL]).
  • Non-workflow proposals (IU split/merge, MOT template, MOIT field, MOUT view, KG edge proposal, bundle proposal) land in a new generic proposal table (design proposal — schema in 06-open-decisions-and-readiness.md §S2).
  • Both share the same proposal_state enum and the same governance gates (Điều 32 approval for impactful proposals).

§8.2 KG proposal shape (Rev2 §9 + G1)

When KG (Điều 39) proposes a change (add edge / bundle / split / merge / re-parent / no-action), it writes to proposal (generic) with proposal_kind='kg_edge' / 'iu_bundle' / 'iu_split' / 'iu_merge' / etc. Each carries:

  • evidence_refiu_usage_evidence row(s).
  • affected_iu_ids array.
  • proposed_action_jsonb (DSL describing the change).
  • proposer_class='kg_feedback'.
  • proposal_state enum.

Sentinel: KG never writes any registry directly; every KG output appears as a proposal row.


§9. Cross-references

  • Substrate (queue / outbox / heartbeat / idempotency / retry / DLQ) → 03-event-5layer-realtime-dlq-design.md.
  • IU brick / bundle / 4 Mothers binding → 04-iu-centered-4mothers-binding-design.md.
  • OSS strategy (state-vocab Gate A applies here) → 05-oss-candidate-strategy-rev2.md (binding: any adopted workflow/state OSS tool must fit the 9-state floor + derived state design).
  • Open decisions (OD2/9/10/11/12/14/15) → 06-open-decisions-and-readiness.md.

§10. Acceptance criteria for this WS

A1. 9-state floor declared with all 5 attributes per state (Rev2 §7.3.4) — §3.1 + §3.4. A2. Waiting facets §3.3 cover 4 sublabels (MP3); share Yellow; primary picking rule defined. A3. A11y tokens §3.5 cover MP4 6 binding rules + 9 traffic-light tokens with WCAG ratios. A4. Transition matrix §4 covers all 9 states + rollback edges + Điều 32-gated reopens. A5. Derived-states decision (OD12) recorded in §4.5 — paused + cancelled accepted; retrying + escalated deferred to facets. A6. Roll-up rules §5 cover MP5 binding R1..R6 + edge cases (no mandatory, cycle, sub-workflow, paused). A7. State machine + transitions + rollup live in PG state_machine_registry / rollup_policy_def — Nuxt zero logic. A8. Workflow UI §6 covers Standard + Runtime + Long-workflow patterns + proposal mode + drill-down chain. A9. Governance UI §7 covers Rev2 R7.4.1..R7.4.9 with concrete view names + sentinel bindings. A10. Proposal mode §8 keeps workflow proposals in workflow_change_requests; introduces generic proposal table only for non-workflow proposals (OD2 default refined). A11. No PG mutation; no migration; no DOT command run; no law enactment. All schema is paper-only.

End WS4 design.

Back to Knowledge Hub knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/02-step-state-machine-and-workflow-ui-design.md