IU 4-Mothers Master Design Rev2 — WS4 Step/Task State Machine + Workflow UI (DRAFT 2026-05-27)
Master Design Rev2 — Step/Task State Machine + Workflow UI (WS4)
Path:
knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/02-step-state-machine-and-workflow-ui-design.mdStatus: DRAFT Rev2 (document-only). Companion to00-master-design-rev2.md. Date: 2026-05-27 Authority: Rev2 brief §7 (UI), §7.2.6 (roll-up), §7.3 (9-state model), §7.3.6 (waiting facets), §7.3.7 (a11y), §7.4 (governance UI). Boundary: state machine and transitions live in PG registry/config, not in Nuxt or in worker code. Nuxt is render shell (Điều 28 / S178). Transition validation is backend-side. Approval logic is Điều 32 — not owned here. Queue/event/heartbeat substrate is Điều 45 — not redefined here.
§1. Scope and boundaries
This document designs:
- The 9-state floor for step/task instances (Rev2 §7.3) + waiting facets (MP3) + a11y tokens (MP4).
- The transition matrix (Rev2 §7.3 deferred here) — actor/trigger/guard/event/audit per transition.
- The workflow status roll-up rules (Rev2 §7.2.6 MP5).
- The decision on derived states above 9 (Rev2 OD12).
- The PG/config substrate (
state_machine_registry) and validation function. - The workflow UI: Standard Process View, Runtime Progress View, long-workflow patterns (100–500+ steps), Governance UI.
- The proposal mode shape (writes to
workflow_change_requests, neverworkflow_registry).
This document does not design:
- The IU brick / IU bundle internal schema — see
04-iu-centered-4mothers-binding-design.md. - Producer/consumer/queue/realtime substrate — see
03-event-5layer-realtime-dlq-design.md. - OSS tool selection or substrate adapter pick — see
05-oss-candidate-strategy-rev2.md. - Approval quorum — Điều 32 surface.
§2. Substrate: state_machine_registry (PG)
OD9 default proposal: state machines for step / task / workflow run in PG, declared-by-config, transition validation via PG function.
§2.1 Tables (paper-only schema)
state_machine_def
state_machine_id text PK
scope text -- 'step' | 'task' | 'workflow_run' | 'cut_request' | ...
version text semver
active bool
description text
created_at timestamptz
state_def
state_machine_id text FK
state_code text
semantic_class text -- 'idle' | 'active' | 'wait' | 'red' | 'terminal'
is_terminal bool
ordinal int
ui_color_token text -- 'gray' | 'green_pale' | 'green' | ... (see §3.2)
ui_icon_token text -- '○' | '▶' | '►' | ...
ui_label_i18n_key text -- 'state.not_started' (Nuxt resolves)
ui_short_message_key text
PRIMARY KEY (state_machine_id, state_code)
transition_def
state_machine_id text FK
from_state text FK -> state_def
to_state text FK -> state_def
actor_class text -- 'system' | 'pic_human' | 'reviewer' | 'executor_worker' | 'mow_orchestrator' | 'escalation_handler'
trigger_kind text -- 'event' | 'rpc' | 'timer' | 'condition_resolved'
trigger_ref text -- event_type / rpc name / timer key
guard_kind text -- 'config' | 'predicate_fn'
guard_ref text -- predicate function name or config rule id
emitted_event_type text FK -> event_type_registry (nullable)
audit_required bool
rollback_to text nullable -- explicit rollback edge
proposed bool default false -- proposed transitions stay false until governance approves
PRIMARY KEY (state_machine_id, from_state, to_state, actor_class)
state_facet_def -- §3.3 waiting sublabels and similar
state_machine_id text
state_code text
facet_code text -- 'waiting_dependency' | 'waiting_human' | 'waiting_external' | 'waiting_time_gate' | ...
ui_icon_token text
ui_label_i18n_key text
primary_ordinal int -- for primary picking when multiple
PRIMARY KEY (state_machine_id, state_code, facet_code)
rollup_policy_def -- §5 workflow roll-up rules
state_machine_id text FK (workflow_run scope)
policy_code text -- 'red_overrides_yellow_overrides_green' | 'mandatory_only' | ...
policy_config_jsonb jsonb
active bool
§2.2 Runtime tables (paper-only)
step_run
step_run_id uuid PK
workflow_run_id uuid FK
step_def_id uuid FK
state_machine_id text FK
state_code text FK -> state_def
waiting_facet text nullable FK -> state_facet_def (only when state_code='waiting')
pinned_iu_version_id uuid FK -- per §4 OD15 default 'pin'
pic_id uuid nullable
executor_class_ref text nullable
started_at timestamptz nullable
completed_at timestamptz nullable
last_transition_at timestamptz
trace_id text -- W3C shape
correlation_id text
task_run
task_run_id uuid PK
step_run_id uuid FK
state_machine_id text FK
state_code text FK
waiting_facet text nullable
...same envelope fields...
workflow_run
workflow_run_id uuid PK
workflow_def_id uuid FK
state_machine_id text FK (typically scope='workflow_run')
rollup_state_code text -- derived per rollup_policy_def
rollup_basis_jsonb jsonb -- which step_runs contributed
last_rollup_at timestamptz
Reuse note: tasks / task_checkpoints / task_comments [VL row 19] become the auditable persistence underneath task_run; workflows / workflow_steps / workflow_step_relations [VL row 16] underneath workflow_run. The new *_run schema is the runtime envelope; columns above are design proposals, not migrations.
§2.3 Validation function (paper-only)
fn_state_transition_validate(
p_run_id uuid,
p_run_kind text, -- 'step' | 'task' | 'workflow_run'
p_from_state text,
p_to_state text,
p_actor_class text,
p_trigger_kind text,
p_trigger_ref text,
p_evidence jsonb -- pre/post checks, audit refs, idempotency key
) RETURNS jsonb (
ok bool,
transition_id text,
emitted_event_type text nullable,
problems text[]
)
Properties:
- STABLE for dry-run / preview (no mutation when called with
p_preview:=true). - Refuses if no
transition_defrow matches(state_machine_id, from, to, actor_class). - Refuses if
guard_kind='predicate_fn'and predicate returns false; emitstransition_refused_by_guardproblem with named guard. - Refuses if
audit_required=trueandp_evidence.audit_refis null. - On success, emits
emitted_event_typetoevent_outboxand updates*_runrow in single TX. - Idempotent: same
(transition_id, idempotency_key)returns prior verdict.
Nuxt cannot call this directly; calls go through MOW orchestrator / MOT envelope / executor worker via backend route (Điều 28 / S178).
§3. The 9-state floor (Rev2 §7.3 binding)
§3.1 State definitions
| # | state_code | Semantics | Traffic-light | Icon | Text (vi) | Text (en) | Semantic class | Terminal? |
|---|---|---|---|---|---|---|---|---|
| 1 | not_started |
Precondition not yet met | Gray | ○ | Chưa tới lượt | Not started | idle | no |
| 2 | ready |
Precondition met, awaiting PIC/executor | Green (pale) | ▶ | Sẵn sàng | Ready | active | no |
| 3 | in_progress |
Currently being executed | Green | ► | Đang làm | In progress | active | no |
| 4 | waiting |
Waiting on external dependency | Yellow | ⏳ | Đang chờ | Waiting | wait | no |
| 5 | blocked |
Internal blocker (missing IU / config / approval) | Yellow (dark) — escalates to Red on threshold (see MP-D4) | ⚠ | Bị chặn | Blocked | wait | no |
| 6 | overdue |
Past deadline, still salvageable | Red (pale) | ⏰ | Trễ hạn | Overdue | red | no |
| 7 | failed |
Executed and failed (transient or permanent) | Red | ✕ | Thất bại | Failed | red | no |
| 8 | cannot_complete |
Worker / PIC declares unable to complete in step scope | Red (dark) | ⛔ | Không thể hoàn tất | Cannot complete | red | no |
| 9 | completed |
Output contract met, postcondition events ready | Green (deep) | ✓ | Hoàn thành | Completed | active | yes |
This is the floor — every step/task state machine must include these 9 codes verbatim, with these semantic classes and traffic-light tokens.
MP-D4 — blocked severity escalation (UI-only; does not change core state): Default traffic-light token for blocked is ui.traffic_light.yellow_dark. The UI token escalates to ui.traffic_light.red (rendered presentation only — state_code stays blocked) when ANY of the following holds:
blocked_since > dot_config block_severity.threshold_seconds.<workflow_class>(default 4 h).- step is on the workflow's critical path (per
fn_workflow_rollup_compute.basiscritical-path flag). - step's
deadlinealready breached (i.e.now() > deadlinewhile blocked). - operator manual escalation via
rpc step_block_escalate(requires Điều 32 governance review if blocked > N hours; threshold per class).
Escalation is purely presentational + roll-up-relevant (§5: roll-up treats blocked as wait-class by default, but red-escalated blocked rolls up as red per rollup_policy_def.blocked_escalation_treats_as_red=true). The core state machine MUST NOT branch state codes on severity; UI / roll-up consult a derived blocked_severity ∈ {yellow_dark, red} field computed by fn_step_blocked_severity(step_run_id) → text.
Sentinel: any UI rendering blocked as red MUST source its color from fn_step_blocked_severity, never from a separate state_code value.
§3.2 Traffic-light tokens (Rev2 §7.3.1 + §7.3.7)
Design system registry (lives in PG dot_config namespace ui.traffic_light.* or Directus content registry — Điều 33 v2.1 boundary respected):
ui.traffic_light.gray
fg #4B5563 bg #F3F4F6 contrast_ratio_fg_on_bg 9.3
high_contrast.fg #FFFFFF high_contrast.bg #1F2937 contrast_ratio 16.8
shape_token "circle_hollow"
ui.traffic_light.green_pale
fg #047857 bg #ECFDF5 contrast_ratio 8.4
high_contrast equivalents …
shape_token "triangle_outline"
ui.traffic_light.green
fg #FFFFFF bg #059669 contrast_ratio 4.6
shape_token "triangle_filled"
ui.traffic_light.green_deep
fg #FFFFFF bg #047857 contrast_ratio 5.4
shape_token "check_filled"
ui.traffic_light.yellow
fg #92400E bg #FEF3C7 contrast_ratio 6.5
shape_token "hourglass"
ui.traffic_light.yellow_dark
fg #FFFFFF bg #B45309 contrast_ratio 5.1
shape_token "warn_filled"
ui.traffic_light.red_pale
fg #B91C1C bg #FEE2E2 contrast_ratio 5.7
shape_token "clock_alert"
ui.traffic_light.red
fg #FFFFFF bg #DC2626 contrast_ratio 4.8
shape_token "cross_filled"
ui.traffic_light.red_dark
fg #FFFFFF bg #991B1B contrast_ratio 7.4
shape_token "stop_filled"
All values are paper proposals; final WCAG audit happens in Phase 1 UI. Sentinel: any new token MUST declare fg/bg/contrast_ratio + high_contrast.fg/bg + shape_token. Nuxt resolves tokens only — no hardcoded colors.
§3.3 Waiting facets (Rev2 §7.3.6 MP3)
| facet_code | Meaning | Primary ordinal |
|---|---|---|
waiting_dependency |
Awaiting upstream step/task output in same workflow | 10 |
waiting_human |
Awaiting another user's input/decision (not the current PIC) | 20 |
waiting_external |
Awaiting external API / callback / webhook / partner | 30 |
waiting_time_gate |
Awaiting cron / scheduled time / business calendar / SLA window | 40 |
Properties (MP3):
- Facet is a UI display facet, not a new core state.
step_run.state_code = 'waiting'always;step_run.waiting_facet ∈4-vocab. - Mixed wait: primary picked by lowest
primary_ordinal(dependency > human > external > time_gate); secondary surface as chips in UI. - All 4 facets share the Yellow traffic-light token (
ui.traffic_light.yellow), distinguished by icon + text. - Schema lives in
state_facet_def(PG);step_run.waiting_facetlives instep_run(PG). Nuxt zero logic.
Sentinel: any step_run.state_code='waiting' with step_run.waiting_facet IS NULL is a integrity violation.
§3.4 State registry shape (Rev2 §7.3.4)
Each state declared as:
state_code 'in_progress'
semantic_class 'active'
is_terminal false
ordinal 3
ui_color_token 'ui.traffic_light.green'
ui_icon_token 'icon.play_filled'
ui_label_i18n_key 'state.in_progress.label'
ui_short_message_key 'state.in_progress.short'
i18n is Nuxt-side resolution via Directus content registry. No business logic in Nuxt; only key→string lookup.
§3.5 Accessibility design tokens (Rev2 §7.3.7 MP4)
Binding rules:
- Not color-alone. Every state cell renders
color + icon + text labeltriplet. Nuxt component layer must compose all three; missing icon or text is a render-layer bug. - Tooltip / short reason. Hover/long-press renders
ui_short_message_key(e.g. "Đang chờ duyệt từ KT" forwaiting_human). - WCAG 2.1 AA contrast. Text ≥4.5:1; icon/large text ≥3:1. Token table §3.2 satisfies; any new token must satisfy.
- Color-blind safe palette. Red+Green pair never the sole disambiguator — shape tokens differ (cross vs check vs triangle). Test with Deuteranopia/Protanopia/Tritanopia simulator (CI gate Phase 1).
- High-contrast/dark mode. Each token has paired
high_contrast.fg/bg; UI activates pair whenprefers-contrast:moreor theme toggle. State distinguishability preserved. - Screen reader.
aria-labeltemplate ={state_label_en} · {short_message_en}(English fallback) + Vietnamese variant via i18n. - Tokens declare-by-config. Lives in PG
dot_config vocab.ui.traffic_light.*or Directus content registry (Điều 33 v2.1 boundary respected); never hardcoded in Nuxt.
§4. Transition matrix (Rev2 §7.3 deferred → defined here)
§4.1 Notation
Each transition row: (from → to, actor_class, trigger_kind, trigger_ref, guard_kind, guard_ref, emitted_event_type, audit_required, rollback_to).
actor_class enum: system, pic_human, reviewer, executor_worker, mow_orchestrator, escalation_handler.
trigger_kind enum: event, rpc, timer, condition_resolved.
§4.2 Step state machine transition matrix
| # | From | To | Actor | Trigger | Guard | Emitted event | Audit | Rollback |
|---|---|---|---|---|---|---|---|---|
| T1 | not_started |
ready |
mow_orchestrator |
condition_resolved |
precondition_config resolves true |
step.ready |
no | — |
| T2 | ready |
in_progress |
pic_human or executor_worker |
rpc step_claim |
pic_id assigned AND executor_class_ref resolved |
step.started |
yes | → ready (T2r) |
| T2r | in_progress |
ready |
pic_human |
rpc step_release |
actor = current pic_id |
step.released |
yes | — |
| T3 | in_progress |
waiting |
executor_worker or pic_human |
rpc step_wait |
wait reason ∈ facet vocab |
step.waiting (carries waiting_facet) |
no | → in_progress (T3r) |
| T3r | waiting |
in_progress |
system (facet resolved) |
event (e.g. dependency completed / human input received / external callback / time elapsed) |
facet resolution proof |
step.resumed |
no | — |
| T4 | in_progress |
blocked |
executor_worker or pic_human |
rpc step_block |
block reason ∈ block-reason vocab (missing IU / missing approval / config error) |
step.blocked |
yes | → in_progress (T4r) |
| T4r | blocked |
in_progress |
system or pic_human |
event blocker_resolved |
blocker resolution evidence ref |
step.unblocked |
yes | — |
| T5 | in_progress |
overdue |
system (timer) |
timer deadline_passed |
now > deadline |
step.overdue |
no | — |
| T6 | overdue |
in_progress |
system (timer) |
condition_resolved (deadline extended) OR pic re-engages |
deadline updated |
step.recovered_from_overdue |
yes | — |
| T7 | overdue |
completed |
pic_human or executor_worker |
rpc step_complete |
output contract met AND completed_after_deadline=true written to step_run |
step.completed (carries completed_after_deadline=true, overdue_first_at, sla_breach_duration) |
yes | — |
| T8 | in_progress |
completed |
pic_human or executor_worker |
rpc step_complete |
output contract met AND postcondition_config satisfied |
step.completed |
yes | → in_progress only via reopen_for_correction T8r with Điều 32 approval — never deletes prior completion |
| T8r | completed |
in_progress |
reviewer |
rpc step_reopen_for_correction |
Điều 32 approval ID present AND correction_reason ∈ vocab.correction_reason.* AND prior_completed_audit row written AND original_output_snapshot captured |
step.reopened_for_correction (carries prior_completed_audit_id, original_output_snapshot_id, correction_reason, approval_id) |
yes | — |
| T9 | in_progress or waiting or blocked |
failed |
executor_worker |
event executor_failure |
failure classification ∈ transient | permanent |
step.failed (carries classification) |
yes | — |
| T10 | failed (transient) |
in_progress |
executor_worker |
event retry_attempt |
retry budget not exceeded AND idempotency key present |
step.retry |
no | — |
| T11 | failed (permanent) |
cannot_complete |
escalation_handler |
rpc escalate |
retry budget exhausted OR classification=permanent |
step.escalated |
yes | — |
| T12 | in_progress or blocked or waiting |
cannot_complete |
pic_human or executor_worker |
rpc declare_cannot_complete |
declaration evidence ref |
step.cannot_complete |
yes | → in_progress (T12r) reopen_for_correction with Điều 32 |
| T12r | cannot_complete |
in_progress |
reviewer |
rpc step_reopen_for_correction |
Điều 32 approval ID present AND correction_reason ∈ vocab.correction_reason.* AND prior_cannot_complete_audit row written |
step.reopened_for_correction (carries prior_audit_id, correction_reason, approval_id) |
yes | — |
| T13 | any non-terminal | (derived state, see §4.5) | varies | varies | varies | varies | varies | varies |
Notes:
- Every emitted event_type must be registered in
event_type_registrybefore this transition is allowed (Rev2 §6.1 register-before-emit; §3.1 of03-event-5layer-…). T8randT12rarereopen_for_correction(MP-D2). They DO NOT delete the priorcompleted/cannot_completerecord. Mandatory artifacts before transition:- Điều 32
approval_idpresent. prior_completed_auditrow (orprior_cannot_complete_audit) written, snapshotting the closing state's actor, timestamp, output, signoff.original_output_snapshotcaptured (full output payload at original completion) — stored instep_run_output_snapshot(paper).correction_reasonfromdot_config vocab.correction_reason.*(paper vocab includesdata_error,policy_change,downstream_dependency_failed,regulatory_recall, etc.).- Audit timeline (
vw_audit_event_timeline) MUST show both the originalstep.completed(orstep.cannot_complete) event AND the subsequentstep.reopened_for_correctionevent linked viacorrelation_id. Reopen is an additive event, never a deletion. Sentinel: zerostep_runUPDATE that erases the originalcompleted_at,completion_actor_id, or output_snapshot fields when reopening; reopen instead writes a new row instep_run_correction_cycle(paper) referencing both states.
- Điều 32
failure_classificationtaxonomy lives in a separatefailure_class_registry(paper-only) referenced viastep.failedevent payload.- MP-D3 —
overdue → completedaudit preservation. T7 MUST persistcompleted_after_deadline=true,overdue_first_at, andsla_breach_duration(=completed_at - deadline) on thestep_runrow, and the emittedstep.completedevent carries the same three fields. The historicaloverduemark is not erased:step_run_state_history(paper, derived from transition events) retains every state visit includingoverdue. SLA breach roll-up + governance UI (§7) continue to show the breach even after completion. Sentinel: no T7 transition may UPDATEstep_runin a way that nullsoverdue_first_ator setscompleted_after_deadline=false. block-reasonvocab lives indot_config vocab.block_reason.*(paper).pic_id assignedis a guard predicate function:fn_step_pic_resolves(p_step_run_id) → bool(paper).
§4.3 Task state machine
Tasks inherit the same 9-state floor and the same transitions, with these specializations:
actor_classaddstask_pic_human(the task's PIC, distinct from step PIC when step has multiple tasks).- T8 task completion carries
task.completedevent; step completion is triggered only when all task_runs in the step reachcompletedorskipped-via-branching. - Automated tasks (no human UI) skip T2 PIC claim — direct from
ready→in_progressbyexecutor_workeractor.
§4.4 Workflow_run state machine
workflow_run.state_code is a roll-up derived state (§5), not a directly transitionable code. The validation function only allows transitions from explicit lifecycle RPCs:
| From | To | Actor | Trigger |
|---|---|---|---|
not_started |
in_progress |
mow_orchestrator |
rpc workflow_start |
in_progress |
completed |
system |
condition_resolved all_mandatory_steps_completed |
in_progress |
cancelled (derived state — §4.5) |
reviewer |
rpc workflow_cancel with Điều 32 approval |
in_progress |
paused (derived state — §4.5) |
reviewer |
rpc workflow_pause |
paused |
in_progress |
reviewer |
rpc workflow_resume |
Rollup state (rollup_state_code) is a separate column derived per §5 — not editable directly.
§4.5 Derived states above 9 (Rev2 OD12 decision in this design)
Rev2 §7.3.3 allows derived states with justification. OD12 defers default choice to Master Design Rev2. Decision in this design:
| Candidate | Decision | Reason |
|---|---|---|
paused |
ACCEPT as derived state (workflow_run scope) | Long-running workflows (months/years) need explicit governance-driven pause separate from blocked/waiting. Pause has clear actor (reviewer) and clear resumption RPC. Avoids overloading blocked semantics. |
cancelled |
ACCEPT as derived state (workflow_run + step + task scope) | cannot_complete is per-actor declaration; cancelled is workflow-level decision (e.g. business need disappears). Terminal but distinguishable from completed. |
retrying |
DEFER as facet, not state | Captured by retry_policy_registry + retry budget on failed (transient) state. UI surfaces "retrying (attempt n/N)" via facet on failed. Avoids state explosion. |
escalated |
DEFER as facet | Captured by waiting_facet='waiting_human' with escalation_chain_id evidence + governance UI surfaces escalations. State machine stays 9+2. |
Accepted derived states get their own state_def rows with semantic_class:
paused→ semantic_classwait, ui_color_tokenui.traffic_light.yellow_dark, ui_iconpause_filled, ui_labelstate.paused, terminal=no.cancelled→ semantic_classred, ui_color_tokenui.traffic_light.red_dark, ui_iconcancel_filled, ui_labelstate.cancelled, terminal=yes.
MP-D5 — Derived states are extension states, not replacements. paused and cancelled extend the 9-state floor; they do not replace any of the 9 floor codes. The state machine remains a 9-state floor + N derived (currently N=2). Binding rules:
- Every adapter, OSS tool, external mirror, UI widget, and registry consumer MUST support the full 9-state floor as a baseline. Adapters that cannot encode all 9 floor codes are not allowed (Gate A — state-vocab fit, per
05-…). - Adapters MAY ignore derived states OR map them to a floor equivalent (recommended mapping:
paused → waitingUI-side;cancelled → cannot_completeUI-side), provided the floor view stays correct. The PG SoT keeps the precise derived code; adapters down-map only for their own surface. state_defrows for derived states MUST declareis_derived=trueand afloor_equivalentcolumn pointing to the closest floor code, so adapters can mechanically map.- Adding a future derived state requires the same justification +
state_defrow +floor_equivalent; it never modifies any floor row.
Sentinel: derived states never sneak in without a state_def row + justification line + floor_equivalent. No adapter is approved that fails to cover all 9 floor codes.
§4.6 Idempotency on transitions
Every transition RPC carries an idempotency_key. The transition validation function records (step_run_id, transition_id, idempotency_key) → verdict in transition_idempotency (paper, alternatively in shared idempotency_registry per 03-event-5layer-… §4.4). Re-submission returns prior verdict — no double-event, no double-audit.
§5. Workflow status roll-up (Rev2 §7.2.6 MP5)
§5.1 Binding rules (verbatim from MP5, expanded)
- R-1 Red overrides yellow overrides green. Any mandatory active step in red class (
failed/overdue/cannot_complete/cancelled) → workflow rollup MUST NOT be green. - R-2 Yellow without red. Mandatory active step in wait class (
waiting/blocked/paused) but no red → workflow rollup MAX yellow. - R-3 Skipped / not_applicable do not count. Steps skipped via branching condition or
not_applicable(e.g. else-branch) do NOT pull workflow into yellow/red. Only active mandatory steps (already entered the graph branch) count. - R-4 Optional vs mandatory. Steps with
workflow_step_def.optional=true(config) never pull workflow status. Only mandatory steps roll up. - R-5 Config-driven. Roll-up rules declared in
rollup_policy_def(PG); never hardcoded in Nuxt. - R-6 Terminal completed. All mandatory active steps
completed→ workflow rollup = green; AND when all also reach terminal-with-no-pending-postcondition → workflow_run transitions tocompleted.
§5.2 Roll-up function (paper-only)
fn_workflow_rollup_compute(p_workflow_run_id uuid)
RETURNS jsonb (
rollup_state_code text -- 'gray' | 'green' | 'yellow' | 'red'
basis_step_run_ids uuid[] -- which steps were considered
red_count int
yellow_count int
green_count int
skipped_count int
optional_excluded_count int
decisive_step_run_id uuid nullable
decisive_state_code text nullable
computed_at timestamptz
)
Function is STABLE (no mutation). MOW orchestrator calls it on every step state change event; result is written to workflow_run.rollup_state_code + last_rollup_at in a single TX.
§5.3 Edge cases (resolved here)
- Workflow with 0 mandatory active steps.
rollup_state_code = 'gray'(idle); workflow_run.state_code =not_startedorin_progress(lifecycle independent of rollup color). - Workflow with cycle. Cycles forbidden at workflow_def level (
workflow_step_relationscycle check enforced at proposal-accept time). Rollup function assumes DAG. - Sub-workflow roll-up. Sub-workflow rolled up first; its
rollup_state_codebecomes the input state for the parent's containing step. Mandatory/optional applies at parent level; sub-workflow color drives parent-step color via a small bridging rule: sub-workflow red → parent stepfailed; sub-workflow yellow → parent stepblocked(waiting_facet=waiting_dependency); sub-workflow green-but-not-completed → parent stepin_progress; sub-workflow completed → parent stepcompleted. - Workflow paused. When
workflow_run.state_code='paused', rollup_state_code freezes at last computed value with afrozen_atmarker. UI surfaces both: state badgePaused, rollup color from freeze.
§5.4 Per-state weight (deferred to Phase 1)
R7.2.6 mentions per-state weight for advanced rollup (e.g. ratio of yellow:green). Rev2 design accepts R1..R6 binding and defers weighted rollup to Phase 1 after operator feedback. Sentinel: no weighted rollup until rollup_policy_def.policy_code row added with explicit weights.
§6. Workflow UI design — Standard, Runtime, Long-workflow
§6.1 Standard Process View (Rev2 §7.1)
- Renders workflow definition as a directed graph: nodes = step (IU brick / IU bundle / sub-workflow boundary); edges =
workflow_step_relationsrows (trigger / branching condition / parallel). - Data source: backend gateway →
workflow_registry+workflow_step_def+workflow_step_relations+ IU registry. Nuxt component<StandardProcessView workflowDefId />is zero-logic. - Branching renders as labeled fan-out (AND / OR / condition).
- Parallel renders as horizontal swimlanes.
- Sub-workflow boundary renders as collapsed node with drill-down.
- Single layout for all roles; backend permission filter decides which step nodes are visible (some steps may carry role-restricted IU body — but body is fetched only on demand; node visibility is by step-def permission row).
- Switch to Proposal mode (toggle in UI): every add/edit/delete becomes a draft against
workflow_change_requests; no directworkflow_registrymutation. Diff view shows current vs proposed.
§6.2 Runtime Progress View (Rev2 §7.2)
- Renders runtime instance of a workflow_run with each step_run carrying its own state + deadline + PIC + last output snapshot.
- Drill-down: step_run → task_runs → IU content (via render layer pulling from
information_unitpinned version) + IO actual (MOIT input snapshots, MOUT output view). - Resume-safe: backend serves a
workflow_run_snapshotmaterialized view that includes all step/task states + pinned IU versions + last realtime event seq. Refresh restores from snapshot. - Long-running timeline + milestone view: see §6.3 below.
- Realtime updates via gateway (Rev2 §6.4; design
03-event-5layer-…§5). Nuxt does NOT poll PG — subscribes to gateway topic for the workflow_run.
§6.3 Long workflow UI (100–500+ steps)
Single layout requirement (Rev2 §2 D2.5) — 500-step must render with same primitive as 2-step. Patterns:
- Zoom / pan. Continuous zoom 25%..200%; pan via drag; mouse-wheel zoom; touch pinch-zoom.
- Collapse subgraph. Sub-workflow nodes collapse to single rollup chip; group nodes (e.g. by
workflow_categories) collapse. - Critical path highlight. Backend computes critical path on each rollup; UI overlays. Critical path = the longest dependency chain reaching the earliest mandatory unfinished step.
- Blocked chain visualization. Steps in
blocked/waitingcause downstream steps to be highlighted with a "downstream of blocker" indicator. - Search. Search box queries
step_def.name + iu_label + bundle_label; results highlight and pan-to. - Mini-map. Bottom-right miniature with current viewport rectangle.
- Group by lane. Optional lane grouping by PIC role / executor class / category.
- Progress bar header. Workflow-level rollup + percentage (mandatory completed / mandatory total).
- Lazy-load step body. Step body (IU/task content) fetched only on click; node carries metadata-only by default.
- Single primitive — same
<WorkflowGraph>component. No<LargeWorkflowGraph>sibling; the same component scales by virtualization (only visible nodes hydrate).
Sentinel: every long-workflow feature MUST work on a 2-step workflow without UI/code branching.
§6.4 Drill-down chain
Workflow node → Step node → Task envelope → IU content (rendered from pinned version) → IO actual (MOIT snapshot / MOUT view) → Event timeline (filtered to this trace_id).
Each level fetches via backend gateway; Nuxt holds zero state-machine logic (Điều 28 / S178).
§6.5 Proposal mode shape
When in Proposal mode (Rev2 §7.1):
- All add/edit/delete actions create rows in
workflow_change_requestswithproposal_stateenum (draft/submitted/review/approved/rejected/merged). - Diff is computed by backend (
fn_workflow_change_diff(p_change_request_id)) and rendered as overlay. - Approval routes to Điều 32 quorum.
- On approval, a separate macro applies the change to
workflow_registry(out of scope here — Phase 1+).
OD2 default kept: reuse workflow_change_requests for workflow proposals; design a generic proposal table only for non-workflow proposals (IU split/merge → see Điều 39 G1; MOT template change; MOIT field change; MOUT view change). See 06-open-decisions-and-readiness.md §S2.
§7. Governance UI (Rev2 §7.4)
§7.1 Problem-first default view
Default landing route = /governance/problems. Shows:
- Aggregate category counts (1 row each): DLQ messages / silent workers / overdue steps / schema violations / event lag breaches / integrity warnings / failed cuts / orphan workflows.
- Click row → drill-down list (paginated).
- Click list item → full audit + correlation + trace_id timeline.
§7.2 Aggregate counts (Rev2 R7.4.2)
Source views (paper-only):
vw_governance_dlq_count←job_dead_letterfiltered tocurrent_window.vw_governance_silent_workers←queue_heartbeatwithlast_tick_at < now() - Nper worker class.vw_governance_overdue←step_runwheredeadline < now() AND state_code IN ('overdue','blocked','waiting').vw_governance_schema_violations←event_type_registryvalidation rejects logged toevent_validation_audit(paper).vw_governance_event_lag←fn_event_lag_compute(window)returning p50/p95/p99.vw_governance_integrity_warnings←iu_lifecycle_logfiltered toseverity='warning'oraxis_inconsistency.
§7.3 Concise AI/worker status (Rev2 R7.4.3)
Each worker/executor class has a backend-summarized status string (e.g. "OK, 14 jobs / 0 errors / lag 1.2s p95"). Generated by a periodic summarization function — NOT raw log dump. Drill-down for full log/trace.
MP-D9 — Summary evidence rule (binding). Any AI- or worker-generated concise summary MUST carry, alongside the human-readable string, the following structured evidence fields:
source_event_count— integer; number of source events (event_outbox / job_queue / heartbeat / step_run transitions) the summary aggregated. MUST be ≥1; a summary withsource_event_count=0is forbidden (no fabrication).time_window—{start_at, end_at}timestamps demarcating the aggregation window.generated_by—{summarizer_kind ∈ {sql_fn, ai_agent, deterministic_rule}, summarizer_ref, summarizer_version}.confidence— float[0..1](ornullfor deterministic rules); AI summaries MUST populate this field and surface it in UI alongside the string.source_event_refs— array of event ids / outbox ids / job ids the summary aggregated; truncated to first 100 if window is large, with a separatesource_event_totalcount preserved.
Binding rules:
- Summary must not hide raw evidence. Every concise status row MUST expose a drill-down link that runs
vw_audit_event_timeline(...)filtered to the summary'ssource_event_refsor(trace_id, time_window). UI MUST render the link; backend MUST serve it. - Summary must not mark a problem resolved without a source event. Any auto-resolution of a governance problem (DLQ cleared, silent worker recovered, lag breach healed, integrity warning closed) requires at least one corresponding
*.resolved/*.recovered/*.healedevent inevent_outbox. Summarizer that flips a problem to "OK" without such an event MUST be refused at write-time (event_validation_auditrow). - AI-generated summaries with
confidence < dot_config ai_summary.min_confidence.<category>(default 0.6) MUST be flagged in UI as low-confidence and NOT trigger auto-resolution. - Sentinel: any governance row whose
stateflipped fromred/yellowtogreenwithout a corresponding source event reference is an integrity violation (logged + raised in governance UI).
§7.4 DLQ replay / rescue (Rev2 R7.4.5)
UI workflow:
- Operator opens DLQ entry.
- Inspects payload + failure classification + retry history.
- Initiates replay via
rpc dlq_replay_propose— writes todlq_replay_request(G4) withproposal_state='submitted'. - Approval via Điều 32; on approval, replay scheduled with idempotency key (from
idempotency_registryG5). - Replay outcome logged to
dlq_replay_request.outcome_jsonb+ emitsdlq_replay.completedevent.
Sentinel: every DLQ replay has a dlq_replay_request_id AND approval_id (Điều 32) — design enforces this binding.
§7.5 Heartbeat / silent worker monitor (Rev2 R7.4.6)
queue_heartbeattable tracks(worker_class, last_tick_at, status, last_ok_at).- Worker N silent for
> threshold_N(per-class config indot_config heartbeat.threshold.*) → status flipped tosilent. - Governance UI highlights silent workers in problem-first view.
- False-heal protection (Rev2 §15.5 ref): silent worker auto-resume blocked until heartbeat caller has emitted ≥1 successful tick AND the worker class flag is not
frozen(per feedback memory pattern).
§7.6 Event lag monitor (Rev2 R7.4.7)
fn_event_lag_compute(window)returns p50/p95/p99 per (producer, consumer) pair.- Traffic-light threshold in
dot_config event_lag.threshold.{p50,p95,p99}(per-pair override allowed). - UI shows trend line + threshold overlay.
§7.7 No raw event stream surface (Rev2 R7.4.8)
UI never offers a "raw outbox tail" view. Drill-down always summarized and filtered. Raw event inspection only via backend vw_audit_event_timeline(trace_id) per investigation.
§7.8 Same layout per role + backend filter (Rev2 R7.4.9)
UI component is single; permission filter is backend route. Sentinel: there is exactly one <GovernanceProblemView> component; role variance is via permission-filtered backend response.
§8. Proposal mode shape (Rev2 §7.1 + OD2)
§8.1 Generic vs per-domain
- Workflow proposals reuse
workflow_change_requests(Rev2 §12 row 17 [VL]). - Non-workflow proposals (IU split/merge, MOT template, MOIT field, MOUT view, KG edge proposal, bundle proposal) land in a new generic
proposaltable (design proposal — schema in06-open-decisions-and-readiness.md§S2). - Both share the same
proposal_stateenum and the same governance gates (Điều 32 approval for impactful proposals).
§8.2 KG proposal shape (Rev2 §9 + G1)
When KG (Điều 39) proposes a change (add edge / bundle / split / merge / re-parent / no-action), it writes to proposal (generic) with proposal_kind='kg_edge' / 'iu_bundle' / 'iu_split' / 'iu_merge' / etc. Each carries:
evidence_ref→iu_usage_evidencerow(s).affected_iu_idsarray.proposed_action_jsonb(DSL describing the change).proposer_class='kg_feedback'.proposal_stateenum.
Sentinel: KG never writes any registry directly; every KG output appears as a proposal row.
§9. Cross-references
- Substrate (queue / outbox / heartbeat / idempotency / retry / DLQ) →
03-event-5layer-realtime-dlq-design.md. - IU brick / bundle / 4 Mothers binding →
04-iu-centered-4mothers-binding-design.md. - OSS strategy (state-vocab Gate A applies here) →
05-oss-candidate-strategy-rev2.md(binding: any adopted workflow/state OSS tool must fit the 9-state floor + derived state design). - Open decisions (OD2/9/10/11/12/14/15) →
06-open-decisions-and-readiness.md.
§10. Acceptance criteria for this WS
A1. 9-state floor declared with all 5 attributes per state (Rev2 §7.3.4) — §3.1 + §3.4.
A2. Waiting facets §3.3 cover 4 sublabels (MP3); share Yellow; primary picking rule defined.
A3. A11y tokens §3.5 cover MP4 6 binding rules + 9 traffic-light tokens with WCAG ratios.
A4. Transition matrix §4 covers all 9 states + rollback edges + Điều 32-gated reopens.
A5. Derived-states decision (OD12) recorded in §4.5 — paused + cancelled accepted; retrying + escalated deferred to facets.
A6. Roll-up rules §5 cover MP5 binding R1..R6 + edge cases (no mandatory, cycle, sub-workflow, paused).
A7. State machine + transitions + rollup live in PG state_machine_registry / rollup_policy_def — Nuxt zero logic.
A8. Workflow UI §6 covers Standard + Runtime + Long-workflow patterns + proposal mode + drill-down chain.
A9. Governance UI §7 covers Rev2 R7.4.1..R7.4.9 with concrete view names + sentinel bindings.
A10. Proposal mode §8 keeps workflow proposals in workflow_change_requests; introduces generic proposal table only for non-workflow proposals (OD2 default refined).
A11. No PG mutation; no migration; no DOT command run; no law enactment. All schema is paper-only.
End WS4 design.