Master Design Rev2 — 09 Governance/Operability/Observability Addendum (rev4 MP-D16..MP-D22)
Master Design Rev2 — Governance / Operability / Observability Second-Order Hardening (Addendum)
Path:
knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/09-governance-operability-observability-addendum.mdStatus: DRAFT Rev4 patch addendum (DOCUMENT ONLY). Companion to00-master-design-rev2.mdand08-bidirectional-input-kaizen-governance-addendum.md. Materializes patch pack MP-D16..MP-D22. Cross-law-bound by10-…(rev5 MP-D23..MP-D30) — see §4.8:governance_problemand ops objects are born under Điều 36/0-G/29 and jurisdiction-bound under Điều 37; the cockpit/AI-ops surfaces are Điều 28 templates. Date: 2026-05-28 · cross-linked 2026-05-28 (rev510-…) Authority: MacroIU_4MOTHERS_MASTER_DESIGN_GOVERNANCE_SECOND_ORDER_PATCH_DOCUMENT_ONLY_3000X. Built strictly on Rev2 brief authority + Master Design Rev2 invariants (00-…§3) + Rev3 addendum invariants (08-…§2). Driven by GPT reviewknowledge/dev/reports/architecture/iu-4mothers-master-design-rev3-gpt-review-gaps-mp-d16-d20-2026-05-28.md(verdictREV3_STRONG_BUT_NEEDS_SECOND_ORDER_GOVERNANCE_PATCH). No new law surface introduced. Future framework law referent = future Điều XX (4 Mothers application layer); Điều 34 cited only as decision-path (per MP-D10). Boundary: Hiến pháp v4.6.3 (PG-first / DOT 100% / no hardcode / no hidden SoT), Điều 5 (Kiến trúc 5 tầng — do not build an upper tier on a weak lower tier), Điều 7 (Assembly First — reuse before build, OSS as adapter only), Điều 28 (Nuxt render/input shell), Điều 30 (reversibility), Điều 31 (integrity/audit), Điều 32 (approval/governance), Điều 33 v2.1 (3-layer / Nuxt never reads PG), Điều 35 (DOT governance for mutation), Điều 37 v3.3 (governance org + permission filter), Điều 38/39 (IU/KG ownership, no cross-IU vector pollution), Điều 45 v1.0 (event/queue/executor/heartbeat/state-machine boundary). All Master Design Rev2 invariants (1..26) and Rev3 addendum invariants (I21..I30) are preserved verbatim. Forbidden in this macro (binding): No PG mutation. No Directus mutation. No Qdrant/vector write or reindex. No migration. No DOT command run. No law enactment / drafting. No implementation macro. No UI deployment. No final OSS tool selection. No raw SQL apply. Nodot_configgate change. No schema creation. No code generation. No gate change. Every schema/table/view/function listed below is paper-only unless it is already verified live.
§1. Why this addendum exists
Master Design Rev2 Revision 3 (08-…, MP-D11..MP-D15) widened the design to cover bidirectional input flow, the unified MOW hierarchy canvas T6→T1, simple Kaizen intake, the MOT/JFT matrix, and the four separated UI surfaces. GPT review confirmed all five MP-D11..MP-D15 patches PASS and that Assembly First + no-production-mutation are preserved.
The same review found second-order gaps — points that are not wrong in Rev3, but become operationally ambiguous or unsafe at company scale (20 000+ concurrent items, thousands of Kaizen submissions, multi-department hierarchy). This addendum is a hardening patch, not a rewrite. It closes seven gaps:
- MP-D16 — T6→T1 business operating hierarchy is not yet explicitly mapped to the Điều 5 five-tier architecture (Hạ tầng / Cơ sở / Modules / Chuyên môn / Giám sát). The two tier systems are different axes and must not be confused.
- MP-D17 — The Governance Cockpit has panels (
08-…§7.3) but not yet a formal operations problem-queue taxonomy + lifecycle suitable for 20 000+ items. - MP-D18 — Kaizen intake (
08-…§5) is simple for the user, but needs a stronger backend anti-noise / duplicate-control / review lifecycle so it does not become noise at scale. - MP-D19 — The direct/no-op canonicalization branch (
08-…§3.3) is correct but needs stricter data-quality / lineage / retention / abuse guardrails. - MP-D20 — Observability is mentioned across Rev3, but needs a compact Minimum Observability Profile that holds from day 1 with a clear human-visible vs machine-only split.
- MP-D21 — The raw uploaded PG-event document's tool suggestions (Hasura, pg-boss/Graphile, Benthos, NATS, Redis, Temporal, Camunda, Airflow, Watermill, OTel/Jaeger, Prometheus/Grafana/Loki) need an explicit reconciliation under Assembly First.
- MP-D22 — The next survey sequence must include a Governance Ops Survey (not only Candidate Registry Survey + Tier Registry Survey) before cockpit / agent-ops implementation.
This addendum is mandatory reading alongside 00, 02, 03, 04, 05, 06, and 08. The patch text in those files cross-links here.
Source-access note (per macro §1). The raw uploaded files Bắt sự kiện của PG.docx and 4 mẹ mở rộng.txt are not directly accessible as files in this working environment. This addendum relies on their KB consolidation/recheck sources, which ARE accessible: knowledge/dev/reports/architecture/iu-4mothers-master-design-rev3-gpt-review-gaps-mp-d16-d20-2026-05-28.md, knowledge/dev/reports/architecture/iu-4mothers-event-foundation-gpt-recheck-after-drive-upload-2026-05-27.md, and knowledge/dev/design/assembly-first-open-source-integration-critique.md. Where this addendum reconciles raw-doc tool suggestions (MP-D21) it does so against those consolidations and the Rev2 §6.6 nine-lesson checklist; it does not pretend to have read the raw .docx. If Council demands a direct raw audit, 06-… §S17 already routes it.
§2. Top-line invariants added (extending 00-… §3 and 08-… §2)
These are additive — they relax or replace none of the existing 26 invariants in 00-… §3 nor the I21..I30 in 08-… §2. They continue the 08-… "I" sequence as I31..I37 (and correspond to 00-… §3 invariants 27..33, the condensed forms).
- I31. Two orthogonal tier systems, never conflated (MP-D16). The Điều 5 five-tier architecture (Tầng 1 Hạ tầng → Tầng 5 Giám sát) is an architecture / building-layer axis. The T6→T1 operating hierarchy (Lĩnh vực → Task) is a business operating axis. They are different axes and must not replace each other. T6→T1 content lives mostly in Tầng 4 (Chuyên môn) + Tầng 5 (Giám sát) and depends on Tầng 1/2/3 readiness. No implementation may build a Tầng 4/5 feature when its required Tầng 1/2/3 substrate is missing (Điều 5 §2: never build an upper tier on a weak lower tier). (MP-D16.)
- I32. Governance problems are a typed, lifecycled operations queue (MP-D17). Every operational problem is a typed
governance_problemrow with a problem-class, severity, ownership, an explicit lifecycle (detected→…→closed/reopened/suppressed/waived), dedupe + grouping + suppression + SLA/SLO clock. Acknowledgement ≠ resolution; mitigation ≠ verification. Suppression/waiver requires policy and may require Điều 32 approval. Auto-resolve requires a source*.resolved/*.recovered/*.healedevent (MP-D9). (MP-D17.) - I33. Kaizen anti-noise lives in the backend, never in the user flow (MP-D18). The five-click user flow (
08-…§5.1) is unchanged. Duplicate detection, clustering, rate-limit, spam/abuse flags, low-quality rejection, merge-with-credit, and the review lifecycle (received→…→archived) are backend/governance concerns. Ordinary staff UI gains zero new complexity. A rejected submission always returns a user-friendly reason; merged duplicates still credit the contributor as a supporter. (MP-D18.) - I34. Direct canonicalization is allow-listed, audited, lineaged, and reversible (MP-D19). A staging row may take the
directbranch only when every allow condition holds (permission, allowed target kind, schema-valid, size-within-limit, no structural change, no law/registry/IU-body mutation, no approval-required effect, idempotency key present, audit enabled, retention policy attached, PII/security classification checked). Any deny condition forces theworkflowbranch or refusal. Every direct write records lineage (staging_id↔canonical_target_ref), preserves a rejection reason on refusal, and is correctable by its own table's reversibility contract. (MP-D19.) - I35. Minimum Observability Profile holds from day 1 (MP-D20). Every workflow/event/input/Kaizen/agent surface emits the minimum machine-metric set (schema-validation pass/fail, trace/correlation coverage, event lag p50/p95/p99, queue depth, lease timeout, ACK/NACK, retry, DLQ depth + replay outcome, idempotency conflict, heartbeat freshness, silent-worker count, governance-problem count by class/severity, Kaizen duplicate/noise rate, direct-canonicalization rejection rate, audio transcription failure/confidence, AI/agent task status, top blocked / cannot_complete clusters). Human-visible surfaces show problem summary / severity / owner / affected T6→T1 path / impact / age-SLA / recommended next action / drill-down / evidence-backed AI summary. Raw event tail, raw queue payload, raw spans, raw prompt/output, raw audio bytes are machine-only by default. Every summary carries
generated_at; stale summaries are labelled stale; AI summary never replaces evidence and never auto-resolves without a source event. (MP-D20.) - I36. OSS tools remain adapters with SoT-pointback, never core owners (MP-D21). Every raw-doc-suggested tool (Hasura, pg-boss/Graphile, Benthos, NATS, Redis, Temporal, Camunda, Airflow, Watermill, OTel/Jaeger/Tempo, Prometheus/Grafana/Loki/VictoriaMetrics) is reconciled to a verdict + label via Gate A (state-vocab fit) + Gate B (config-first fit); none may own the core event plane, workflow definition, approval, or any state authority; each, if ever adopted, requires an
external_tool_registrySoT-pointback row. No final pick, no version pin, no implementation here. (MP-D21 + extends00-…§3 inv 15.) - I37. Governance Ops Survey precedes cockpit/agent-ops implementation (MP-D22). Before any Governance Cockpit or AI/Agent Ops Console implementation, a read-only
IU_4MOTHERS_GOVERNANCE_OPS_SURVEY_DOCUMENT_ONLY_*Xmacro surveys existing task-status queues, AI-task/agent-run tables, Directus task collections, governance views/logs, worker-heartbeat data, event/problem categories, existing dashboard/ops modules, prompt/dispatch modules, approval/governance collections, audit/event timeline views, and VPS observability tooling — classifying eachverified_live/KB_reported/legacy_trace/candidate_requires_survey/known_gapwith a reuse/extend/create recommendation. The survey sequence becomes: (1) Candidate Registry Survey (G7), (2) Tier Registry Survey, (3) Governance Ops Survey, then Phase 0/1 ordering decision. (MP-D22.)
These seven invariants are binding on every WS file and on every future macro that touches tier mapping, governance operations, Kaizen, input canonicalization, observability, or OSS adoption.
§3. MP-D16 — Map T6→T1 operating hierarchy onto the Điều 5 five-tier architecture
§3.1 The two tier systems are different axes
Rev3 added the T6→T1 business operating hierarchy (08-… §4.1). The project also has the Điều 5 five-tier architecture (knowledge/dev/laws/law-05-five-tiers.md, detailed in knowledge/dev/architecture/5-layers.md). These are NOT the same axis:
| Axis | What it organizes | Direction | Owner |
|---|---|---|---|
| Điều 5 five-tier architecture | How the system is built — infrastructure → supervision | Tầng 1 (bottom) → Tầng 5 (top) | Điều 5 (architecture law) |
| T6→T1 operating hierarchy | How the business operates — domain → task | T6 (broad) → T1 (atomic work) | future Điều XX (4 Mothers application layer) referent; MOW canvas surface (08-… §4) |
Binding clarification:
- T6→T1 is a business operating hierarchy (Lĩnh vực / Công ty / Phòng ban / Chuyên môn / Workflow / Task). It answers "where in the organization does this work live?"
- The Điều 5 five-tier model is an architecture / building-layer model. It answers "what substrate must exist before this feature can run?"
- They are not interchangeable and must not replace each other. A card at MOW tier T4 (Phòng ban) is a business node; it is rendered by code that lives in Điều-5 Tầng 3 (Modules), reads registries in Tầng 2 (Cơ sở), runs on Tầng 1 (Hạ tầng), and is monitored from Tầng 5 (Giám sát).
§3.2 Điều 5 five-tier definitions (canonical, from law-05-five-tiers.md)
Tầng 5: GIÁM SÁT + CẢI TIẾN — phát hiện bất đồng bộ, auto-fix, improvement loops (2 động cơ)
Tầng 4: CHUYÊN MÔN (đích đến) — quy trình nghiệp vụ thực tế
Tầng 3: MODULES NỀN TẢNG — Table, Comment, Workflow, CI/CD (reusable modules)
Tầng 2: CƠ SỞ (nguyên liệu) — Registries, Metadata, DOT, Fields, Taxonomy
Tầng 1: HẠ TẦNG — VPS, Docker, PG, Directus, Nuxt, Agent Data, Qdrant
Mapping the design package's concepts onto the five tiers:
| Điều 5 tier | Holds (this design's concepts) |
|---|---|
| Tầng 1 — Hạ tầng | VPS / Docker / PostgreSQL 16 / Directus / Nuxt / Agent Data / Qdrant boundary. The backend input gateway service (08-… §3.1) and the realtime gateway service (03-… §7) are Tầng-1 runtime processes. |
| Tầng 2 — Cơ sở | All registries + metadata + DOT + fields + taxonomy: event_type_registry, state_machine_registry, executor_class_registry, field_registry/input_form_registry/output_table_registry/dot_function_registry (CRS), tier_registry, task_template/assignee_policy/deadline_policy/escalation_policy, input_routing_policy, external_tool_registry, governance_problem/governance_slo_policy/governance_suppression_policy (this addendum). DOT command catalog. |
| Tầng 3 — Modules nền tảng | The reusable modules: Table (M-003), Comment (M-001), Workflow (M-002), plus the 4 Mothers modules (MOW / MOT / MOIT / MOUT) as application-platform modules, the realtime gateway abstraction, the canonicalizer/transcription/attachment workers, and the governance UI components. These are the building blocks Tầng 4 assembles. |
| Tầng 4 — Chuyên môn (đích đến) | The real business workflows + T6→T1 operating-hierarchy content — actual SOPs, department missions, workflow definitions, task instances. T2 (Workflow) and T1 (Task) of the operating hierarchy are Tầng-4 content; T6..T3 (Lĩnh vực/Công ty/Phòng ban/Chuyên môn) are the Tầng-4 classification context. |
| Tầng 5 — Giám sát + cải tiến | Governance Cockpit, AI/Agent Ops Console, Kaizen improvement loops, SLO/SLA, the governance_problem operations queue, the Minimum Observability Profile, usage-evidence learning. The heatmaps T6→T1 in the cockpit (08-… §7.3 item 4) are a Tầng-5 view over Tầng-4 operating content. |
Where T6→T1 lives. T6→T1 content is mostly Tầng 4 (operating workflows/tasks) and Tầng 5 (its supervision + improvement), but it depends on Tầng 1/2/3 readiness: the MOW canvas (a Tầng-3 module) cannot render T6→T1 cards without the Tầng-2 tier_registry + classification rows, the Tầng-1 PG/Nuxt substrate, and the Tầng-3 state-machine/workflow modules being live.
§3.3 Readiness matrix (per T6→T1 surface → required Điều 5 substrate → status → blocker)
Status vocabulary: verified_live / KB_reported / paper_only / survey_required / known_gap. Current Điều 5 build status (from law-05-five-tiers.md §3 + 5-layers.md): Tầng 1 verified_live (stable); Tầng 2 verified_live (138 collections, 27 registries, 108 DOT tools, 17 realtime triggers, verify_counts()=0); Tầng 3 partial (KB_reported: M-001 Comment commercial, M-003 Table live, M-002 Workflow Phase 2A done / Phase 2B paused, no state machine deployed yet, M-004 Auto-Tester SSOT-only); Tầng 4 not started (correct per Điều 5 ordering); Tầng 5 partial (KB_reported: Điều 30/31/26 enacted; unified sync monitor + self-healing still gap).
| T6→T1 surface (MOW canvas + JFT) | Required Điều 5 lower-layer substrate | Status | Blocker if missing |
|---|---|---|---|
| T6 Lĩnh vực card grid | Tầng 2 tier_registry + domain classification rows (extend workflow_categories category_kind='domain' OR new) |
survey_required |
Tier Registry Survey (08-… §11) must confirm existing vs paper. No T6 render until tier source confirmed. |
| T5 Công ty card grid | Tầng 2 company/tenant rows (existing tenant table? survey) | survey_required |
Same survey; multi-domain infra itself is known_gap (5-layers TD-086). |
| T4 Phòng ban card grid | Tầng 2 department rows + Tầng 3 permission filter (Điều 37 v3.3) | survey_required |
Department registry shape unconfirmed; permission predicate must exist. |
| T3 Chuyên môn card grid | Tầng 2 specialty rows + Tầng 3 classification | survey_required |
Specialty registry shape unconfirmed. |
| T2 Workflow card / Standard+Runtime Process View | Tầng 3 Workflow Module (M-002) + workflows/workflow_steps/workflow_step_relations + state_machine_registry |
KB_reported (M-002 Phase 2A done; Phase 2B paused; state machine paper-only) |
State machine registry (G6 / OD9) is paper_only; long-workflow UI Phase 5. T2 runtime depends on these. |
| T1 Task card / MOT-JFT envelope | Tầng 3 MOT module + tasks/task_checkpoints/task_comments + task_template/assignee_policy/deadline_policy/escalation_policy + state_machine_registry + executor_class_registry + CRS (MOIT/MOUT) |
KB_reported (tasks tables live) + paper_only (template/policy registries) + CRS-gated (MOIT/MOUT) |
Mass JFT generation blocked until template+policy registries land (Phase 2) AND G7 CRS closes (MP-D7). |
| T0 Field (atomic; NOT a tier) | Tầng 2 field_registry [CRS row 28] |
survey_required (CRS) |
MP-D7 sentinel: no executable reference by name until VL or shape-adapter. |
| Governance Cockpit (Tầng 5 view over T6→T1) | Tầng 1/2/3 all of the above + governance_problem queue (this addendum) + vw_governance_* + observability profile |
paper_only (cockpit + problem queue) on KB_reported/verified_live substrate |
Governance Ops Survey (MP-D22) must run before cockpit implementation; cockpit cannot precede the substrate it aggregates. |
Sentinel (MP-D16). No Phase-2 macro may schedule a T6→T1 surface whose required Tầng 1/2/3 substrate row in this matrix is paper_only / survey_required / known_gap without first landing (or surveying) that substrate. The MOW canvas (Tầng-3 module) and the Governance Cockpit (Tầng-5 view) are explicitly not the same axis as the T6→T1 business hierarchy they display.
§4. MP-D17 — Governance problem queue taxonomy and lifecycle
§4.1 The problem of "panels without a queue"
Rev3 (08-… §7.3) gives the Governance Cockpit eleven panels and 02-… §7 gives problem-first views. At 20 000+ items those are displays, not an operations governance model. MP-D17 adds the operations layer: a typed, owned, lifecycled, deduplicated, SLA-clocked problem queue — the industry incident/problem/change separation applied to this system.
Incident / problem / change separation (binding):
- Problem = a
governance_problemrow (a detected operational condition). This addendum owns the problem concept. - Incident = a grouping of related problems sharing a root cause (a cluster). Represented by the
groupedlifecycle state + a parentgovernance_problemrow of class*_cluster; NOT a separate table. - Change = a remediation that mutates the system → ALWAYS a
workflow_change_requests(workflow) or genericproposal(non-workflow) row (existing,02-…§8 /06-…§S2), gated by Điều 32. A problem may spawn a change, but a problem is never itself a change. Sentinel:governance_problemrows never carry mutation payload; remediation always references a separate change/proposal row +approval_id.
§4.2 Problem classes
governance_problem.problem_class vocab (paper, lives in dot_config vocab.governance_problem_class.*):
dlq · silent_worker · event_lag_breach · schema_validation_failure · idempotency_conflict · overdue_cluster · blocked_escalated · cannot_complete_cluster · failed_cut · orphan_workflow · kaizen_noise_spike · ai_agent_failure_cluster · permission_anomaly · data_quality_warning · direct_canonicalization_rejection · input_abuse_or_spam.
Each class maps to an existing detection source (no new detection substrate — see reuse table §10):
| problem_class | Detection source (existing / paper view) |
|---|---|
dlq |
job_dead_letter / vw_governance_dlq_count (02-… §7.2) |
silent_worker |
queue_heartbeat + dot_config heartbeat.threshold.* (03-… §5.5) |
event_lag_breach |
fn_event_lag_compute + vw_governance_event_lag (03-… §6.4) |
schema_validation_failure |
event_validation_audit (03-… §3.3 / §6.6) |
idempotency_conflict |
idempotency_registry observation_count anomalies (03-… §5.4) |
overdue_cluster |
vw_governance_overdue grouped (02-… §7.2) |
blocked_escalated |
fn_step_blocked_severity red-escalated (02-… §3.1 MP-D4) |
cannot_complete_cluster |
step_run/task_run cannot_complete grouped (08-… §7.3 item 9) |
failed_cut |
cut_request cut_failed (existing cut pipeline; memory) |
orphan_workflow |
workflow_run with no live owner/trigger (02-… §7.1) |
kaizen_noise_spike |
vw_governance_kaizen_* duplicate/noise rate (§5 + 08-… §7.5) |
ai_agent_failure_cluster |
agent_run failures grouped (08-… §7.4) |
permission_anomaly |
render/permission refusals (04-… §2.3 MP-D6 render_permission_denied) + gateway refusals |
data_quality_warning |
direct-canonicalization data-quality checks (§6) |
direct_canonicalization_rejection |
input.rejected on direct branch (§6 + 03-… §3.4a) |
input_abuse_or_spam |
Kaizen/input rate-limit + spam flags (§5.3) |
§4.3 Severity
governance_problem.severity ∈ {critical, high, medium, low, info}. Severity is config-driven per class + context (dot_config governance_severity.<problem_class>.*), never hardcoded in UI. Severity combines with SLA-breach proximity to order the cockpit's severity-prioritized problem queue (08-… §7.3 item 1).
§4.4 Lifecycle states
governance_problem.lifecycle_state vocab:
detected → grouped → triaged → acknowledged → assigned → investigating → waiting_external → waiting_human → mitigated → resolved → verified → closed; plus reopened, suppressed, waived as side states.
Binding distinctions (the core of MP-D17):
acknowledged≠resolved. Acknowledgement means a human/owner has seen the problem; the underlying condition still holds. Resolution means the condition no longer holds.mitigated≠verified. Mitigation reduces impact (e.g. paused a noisy producer); verification confirms the condition is actually gone with evidence.resolved→verified→closed. A problem may only reachclosedafterverified.verifiedrequires an evidence reference (a source*.resolved/*.recovered/*.healedevent per MP-D9).suppressed/waivedrequire policy.suppressedreferences agovernance_suppression_policyrow;waivedadditionally requires Điều 32approval_idfor any waiver that hides ahigh/criticalproblem.reopenedis additive (never deletes prior lifecycle history); it links to the priorclosed/resolvedrecord viacorrelation_id(same additive discipline as MP-D2/MP-D3).
§4.5 Controls
| Control | Mechanism (paper) |
|---|---|
| dedupe | governance_problem.dedupe_key = hash(problem_class, primary_entity_ref, window_bucket); insert-on-conflict folds duplicates into one row with occurrence_count++. |
| grouping | grouped state + parent *_cluster problem; member problems link via governance_problem.parent_problem_id. |
| suppression | governance_suppression_policy (paper) — predicate + reason + scope + expiry; suppressed problems hidden from default queue, audited. |
| snooze | governance_problem.snooze_until timestamp; reappears after expiry; snooze audited. |
| waive_with_approval | governance_problem.waiver_approval_id (Điều 32) required for high/critical waivers. |
| escalation | governance_problem.escalation_chain_id → escalation_policy (08-… §6.2); fires on SLA breach. |
| owner assignment | governance_problem.assignee_id + assignee_policy (08-… §6.2); assigned lifecycle state. |
| SLA/SLO clock | governance_problem.sla_policy_id → governance_slo_policy (paper); detected_at + acknowledged_at + resolved_at drive breach minutes. |
| impact estimate | governance_problem.impact_jsonb — affected entity counts + affected T6→T1 hierarchy path + estimated SLA exposure; evidence-backed (MP-D9), never fabricated. |
| drill-down | governance_problem.trace_id / correlation_id → vw_audit_event_timeline(trace_id) (03-… §6.5). |
§4.6 Auto-resolve discipline (inherits MP-D9)
A governance_problem may auto-advance to resolved (then await verified) only when a corresponding *.resolved / *.recovered / *.healed event exists in event_outbox. A summarizer flipping a problem to resolved without such an event is an integrity violation (event_validation_audit row + a new governance_problem of class schema_validation_failure/data_quality_warning). This is the MP-D9 rule (02-… §7.3) applied to the problem queue.
§4.7 Paper registry / view shapes (paper-only — no schema creation)
governance_problem
problem_id uuid PK
problem_class text -- §4.2 vocab (dot_config)
severity text -- {critical|high|medium|low|info}
lifecycle_state text -- §4.4 vocab
primary_entity_ref jsonb -- {kind, id} the problem is about
parent_problem_id uuid nullable -- grouping (incident)
dedupe_key text
occurrence_count int
first_detected_at timestamptz
last_detected_at timestamptz
acknowledged_at timestamptz nullable
assignee_id uuid nullable
mitigated_at timestamptz nullable
resolved_at timestamptz nullable
verified_at timestamptz nullable
closed_at timestamptz nullable
reopened_count int
snooze_until timestamptz nullable
suppression_policy_id uuid nullable
waiver_approval_id uuid nullable -- Điều 32
sla_policy_id uuid nullable
escalation_chain_id uuid nullable
impact_jsonb jsonb
resolution_evidence_refs jsonb nullable -- *.resolved/*.recovered/*.healed event ids (MP-D9)
trace_id text
correlation_id uuid
governance_problem_event_link -- many-to-many problem ↔ source events
problem_id uuid FK
event_outbox_id uuid FK
link_role text -- 'detection' | 'resolution' | 'evidence'
PRIMARY KEY (problem_id, event_outbox_id, link_role)
governance_problem_assignment -- ownership history (additive)
assignment_id uuid PK
problem_id uuid FK
assignee_id uuid
assigned_by uuid
assigned_at timestamptz
unassigned_at timestamptz nullable
reason text
governance_suppression_policy
suppression_policy_id uuid PK
problem_class text nullable -- null = all classes
predicate_ref text -- predicate fn name
reason text
scope_jsonb jsonb
approval_id uuid nullable -- Điều 32 for high/critical scope
expires_at timestamptz nullable
active bool
governance_slo_policy
slo_policy_id uuid PK
scope text -- 'problem_class.<X>' | 'workflow_category.<Y>' | 'executor_class.<Z>' | 'event_subscription.<S>'
objective_jsonb jsonb -- {target, window, ack_minutes, resolve_minutes}
active bool
These are paper-only. They reuse detection sources (§4.2) rather than re-detecting; they are a triage/lifecycle layer above the existing vw_governance_* views. The cockpit panels in 08-… §7.3 read these rows; this addendum gives them their queue model.
Sentinel (MP-D17). Every governance problem the cockpit displays is a governance_problem row with a class, severity, lifecycle_state, and owner (or explicit unassigned). acknowledged/mitigated never count as resolved/verified. No closed without verified; no verified without a source *.resolved event. Every remediation references a separate change/proposal row + approval_id; the problem row carries no mutation payload.
§4.8 Cross-law binding of governance/ops objects (MP-D24 + MP-D26, rev5)
The governance_problem(+_event_link/_assignment), governance_suppression_policy, governance_slo_policy, and agent_run paper registries (§4.7, 08-… §7.4) are governed objects and are bound by the rev5 cross-law patch (10-…):
- Birth / collection / species (MP-D24, Điều 36/0-G/29). When promoted from paper, each is born under the Industrial Birth Contract: a
collection_registryentry (4 mandatory attributes —governance_role,purpose, species mapping, birth trigger; Điều 29), abirth_registryrecord per row where governed, aspecies_code(e.g. aSPE-GOVgrouping species per Điều 29) +composition_level, created viaDOT-COL-REGISTER+dot-species-map+ DOT+APR (never raw psql).HC-REG/HC-SCHEMAkeep them registered + described. (10-…§4.) - Jurisdiction (MP-D26, Điều 37). Each governance/ops object type has a
governance_owneragency +escalation_ownerand alaw_jurisdiction(Phạm vi); cockpit visibility is backend-filtered per role (Layer B Directus role/field-allowlist + Điều 32) — super-admin sees aggregate, not all raw rows by default; AI/Agent Ops never leaks prompts/payloads/outputs outside authority (field allowlist + MP-D20 machine-only). (10-…§6.1 jurisdiction matrix.) - Template (MP-D23, Điều 28). The Governance Cockpit + AI/Agent Ops Console surfaces are
design_templatesrows (TPL-governance-cockpit,TPL-agent-ops-console) + product records with strict field allowlist (10-…§3.1). - No "đẻ rơi" (MP-D29). No governance/ops object reaches
activewithout birth+collection/species+owner law; orphan/phantom/nhầm-chuồng detectors apply (10-…§9).
Sentinel (MP-D24/D26 for §4): every promoted governance/ops registry has a collection_registry+species+governance_role; each object type has a governance_owner+escalation_owner; cockpit/console respect jurisdiction + field allowlist.
§5. MP-D18 — Kaizen anti-noise, duplicate-control, and review lifecycle
§5.1 The user flow does not change
The five-click flow stays exactly as in 08-… §5.1: Đề xuất cải tiến → Thêm|Sửa|Xoá → chọn vị trí → comment/audio → gửi. All MP-D18 machinery is backend/governance. Ordinary staff UI gains zero new fields, zero new decisions, zero new vocabulary (re-affirms 08-… §5.2 + I33).
§5.2 Kaizen review lifecycle (backend)
input_submission rows of input_kind='kaizen_*' carry a kaizen_lifecycle_state (paper, distinct from the generic processing_state):
received → auto_classified → duplicate_suspected → needs_clarification → accepted_for_review → (rejected_noise | converted_to_proposal) → approved → merged → (rejected_by_reviewer) → measured_after_change → archived.
received— gateway accepted the submission (input.submitted).auto_classified— backend classifier assigned candidate target + intent + department + tier.duplicate_suspected— duplicate detector (§5.3) flagged a likely duplicate cluster.needs_clarification— reviewer (or classifier) requests one short clarification; user gets a friendly prompt (still inside the simple UX — at most one extra question,08-…§5.2).accepted_for_review— passes anti-noise gate; enters reviewer queue.rejected_noise— failed anti-noise gate; user notified with a friendly reason (§5.4).converted_to_proposal— promoted toworkflow_change_requestsor genericproposal(existing routing,08-…§5.4).approved/rejected_by_reviewer— Điều 32 reviewer decision on the proposal.merged— applied via its own change macro (Phase 1+); contributor credited.measured_after_change— impact measurement after the change ships (closes the improvement loop, Tầng 5).archived— terminal.
§5.3 Duplicate detection dimensions
A Kaizen submission is clustered for duplicate suspicion by (paper fn_kaizen_duplicate_cluster):
target_artifact_ref · intent · hierarchy_context (T6→T1 path) · actor_department · raw_text_semantic_hash · audio_transcription_hash · time_window · existing_open_proposal_refs.
A new submission whose dimensions match an open or recently-decided cluster is set to duplicate_suspected and folded into the cluster's review item.
§5.4 Anti-noise controls (backend/governance only)
| Control | Mechanism (paper) |
|---|---|
| duplicate clustering | fn_kaizen_duplicate_cluster (§5.3); duplicates merge into one review item. |
| contributor rate-limit | per-role/per-time-window cap in dot_config kaizen.rate_limit.<role>; excess flagged input_abuse_or_spam (→ governance_problem). |
| spam/abuse flag | heuristic + reviewer flag; repeated abuse raises a governance_problem class input_abuse_or_spam. |
| low-quality/empty rejection | empty/near-empty comment with no audio + no attachment → rejected_noise with friendly reason. |
| merge duplicates into one review item | cluster → single proposal; all contributors recorded as supporters. |
| contributor credit | even on merge/duplicate, contributor stays a supporter_ref on the proposal (credit preserved). |
| reviewer clarification | reviewer may set needs_clarification; user receives one short friendly question. |
| user-friendly rejection reason | rejection reason from dot_config kaizen.rejection_reason.* rendered in plain language; never an internal error code. |
§5.5 Kaizen metrics (extends 08-… §7.5)
Computed by paper STABLE functions over input_submission + proposal + workflow_change_requests lifecycle, surfaced on the cockpit Kaizen panel (Tầng 5):
submission_rate · duplicate_rate · accepted_for_review_rate · approval_rate · merge_rate · time_to_decision (p50/p95) · time_to_impact_measurement · contributor_quality_score · department_improvement_index · repeated_problem_hotspots.
A sustained duplicate_rate / noise spike raises a governance_problem of class kaizen_noise_spike (§4.2) so governance can tune rate-limits or address a confusing surface — without ever touching the user flow.
§5.6 IU relation (preserves invariants)
Kaizen never mutates IU body directly. A Kaizen targeting an IU lands as a proposal (non-workflow) or workflow_change_requests (workflow) row; an approved IU-narrative change is authored through the Điều 38/39 author lifecycle producing a new iu_version (re-affirms 08-… §9 + 00-… §3 inv 1/14). The duplicate-detection raw_text_semantic_hash / audio_transcription_hash are hashes of the suggestion text in staging — never canonical IU body.
Sentinel (MP-D18). The user-facing Kaizen flow stays five clicks with zero internal vocabulary (re-affirms 08-… §5.2). All anti-noise/duplicate/lifecycle machinery is backend. Every rejected submission returns a friendly reason; every merged duplicate credits its contributor as a supporter. Kaizen never produces more than one downstream canonical mutation per submission (re-affirms 08-… §5.6).
§6. MP-D19 — Direct canonicalization policy and data-quality guardrails
§6.1 The direct branch needs a strict gate
08-… §3.3 routes direct vs workflow per input_kind. MP-D19 hardens the direct branch: it may bypass workflow/approval ONLY behind an explicit allow-list and a data-quality gate, and every direct write must be lineaged, retained-by-policy, classified for PII/security, and reversible.
§6.2 Direct canonicalization allow conditions (ALL must hold)
The canonicalizer worker (worker.canonicalizer) admits a direct write only if every condition holds:
actor_has_permission · target_kind_allowed_by_input_routing_policy · schema_valid · size_within_limit · no_structural_change · no_law/registry/IU_body_mutation · no_approval_required_effect · idempotency_key_present · audit_enabled · retention_policy_attached · PII/security_classification_checked.
§6.3 Direct canonicalization deny conditions (ANY forces workflow branch or refusal)
Direct is denied (routed to workflow branch, or refused with reason) if it:
changes the workflow graph · changes IU canonical body · changes a field/form/output registry · changes law/governance policy · affects other departments beyond permission scope · requires approval · has transcription confidence below threshold · trips spam/abuse/rate-limit.
A denied-but-legitimate input is re-routed to the workflow branch (proposal + Điều 32); an illegitimate one is refused with input.rejected + reason and raises a governance_problem of class direct_canonicalization_rejection if the rejection rate spikes.
§6.4 Data lineage (staging ↔ canonical)
| Lineage element | Mechanism (paper) |
|---|---|
| forward link | input_submission.canonical_target_ref = {kind, id} once a direct write lands (08-… §3.2). |
| back link | the canonical row stores source_staging_id where its table permits (e.g. task_comments.source_staging_id, audit_note.source_staging_id). Tables that cannot carry the column rely on the audit join. |
| audit join | a dot_iu_command_run / audit row joins staging_id ↔ canonical write ↔ trace_id so vw_audit_event_timeline(trace_id) shows the full staging→canonical path. |
| rejection preservation | input_submission.rejection_reason + validation_reason_code retained even after a row is closed (never nulled). |
| correction model | a direct write is corrected by its own table's reversibility contract (soft-delete + audit for task_comments; append-only correction note for audit_note); an IU-affecting correction always reroutes through Điều 38/39 author lifecycle (never a direct staging rewrite). |
§6.5 Retention / security (paper config)
| Policy | Paper config key | Default |
|---|---|---|
| raw_text retention | dot_config retention.staging.raw_text |
tunable Phase 1 (proposed 180 days) |
| audio retention | dot_config retention.staging.audio |
tunable Phase 1 |
| attachment retention | dot_config retention.staging.attachment |
tunable Phase 1 |
| PII/security classification | input_submission.security_classification ∈ {public, internal, confidential, pii} (paper); set at gateway |
required before direct write |
| access scope for staging artifacts | gateway permission_scope_hash (matches MP-D6 cache-key shape); staging artifacts readable only within submitter/reviewer scope |
backend-enforced |
| transcription confidence threshold | dot_config transcription.min_confidence |
default 0.6 (mirrors MP-D9 ai_summary default) |
| human review threshold | below confidence threshold → needs_clarification / human review, never auto-direct |
binding |
| abuse/spam/rate limit | dot_config kaizen.rate_limit.* + spam flags (§5.4) |
tunable Phase 1 |
§6.6 Relation to IU / PG / queue
Direct canonicalization writes only allow-listed, non-structural artifacts (task_comments, audit_note, input_submission_ack). It never writes IU body, never writes a registry, never changes a workflow graph. Governance-relevant direct writes go through a DOT pair (Điều 35); lighter writes use a typed RPC that emits the standard producer events. Queue rows for canonicalization carry refs only (staging_id + executor_class_ref + idempotency key + W3C trace). No raw text/audio/attachment bytes travel in events (MP-D8). (Re-affirms 08-… §9.)
Sentinel (MP-D19). No direct-branch write occurs unless all §6.2 allow conditions hold; any §6.3 deny condition forces workflow branch or refusal. Every direct write has a lineage pair (staging_id ↔ canonical_target_ref) and a retention + security classification. Below-threshold transcription never auto-canonicalizes. Direct writes never touch IU body / registries / workflow graph / law.
§7. MP-D20 — Minimum Observability Profile
§7.1 Purpose
Rev3 has observability pieces scattered (event lag 03-… §6.4, heartbeat 03-… §5.5, governance views 02-… §7.2, MP-D9 evidence 02-… §7.3, cockpit trend lines 08-… §7.3). MP-D20 binds them into one Minimum Observability Profile that every workflow/event/input/Kaizen/agent surface satisfies from day 1, with an explicit human-visible vs machine-only split. This is the standard-industry observability the raw PG-event document emphasizes (schema registry, distributed tracing, DLQ/ACK-NACK/idempotency, governance UI), expressed within the PG-first SoT.
§7.2 Required machine metrics (every surface)
| Metric | Source (existing / paper) |
|---|---|
event_schema_validation_pass/fail |
event_validation_audit (03-… §3.3) |
trace_id / correlation_id coverage |
event_outbox.trace_id IS NULL count (03-… §5.6 sentinel) |
event_lag p50/p95/p99 |
fn_event_lag_compute (03-… §6.4) |
queue_depth by queue |
job_queue group by job_class/status (03-… §4.2) |
lease_timeout count |
job_queue.lease_until < now() while leased |
ACK/NACK rate |
worker lease/release ledger (03-… §5.2) |
retry_count |
job_queue.attempt_count + retry policy (03-… §5.3) |
DLQ depth |
job_dead_letter (03-… §6.1) |
DLQ replay outcomes |
dlq_replay_request.outcome_jsonb (03-… §6.2) |
idempotency_conflict count |
idempotency_registry.observation_count (03-… §5.4) |
worker heartbeat freshness |
queue_heartbeat.last_tick_at (03-… §5.5) |
silent_worker count |
heartbeat vs dot_config heartbeat.threshold.* |
governance_problem_count by class/severity |
governance_problem (09-… §4) |
kaizen duplicate/noise rate |
vw_governance_kaizen_* (08-… §7.5 + 09-… §5.5) |
direct_canonicalization rejection rate |
input.rejected direct branch (§3.4a + 09-… §6) |
audio transcription failure/confidence |
input.audio_transcribed.confidence (§3.4a) |
AI/Agent task status |
agent_run (08-… §7.4) |
top blocked clusters |
fn_step_blocked_severity grouped (02-… §3.1) |
top cannot_complete clusters |
step_run/task_run cannot_complete grouped (08-… §7.3 item 9) |
§7.3 Human-visible vs machine-only
Human-visible (cockpit / dashboards): problem summary · severity · owner · affected hierarchy path T6→T1 · impact estimate · age/SLA · recommended next action · drill-down link · evidence-backed AI/worker summary (MP-D9 fields).
Machine-only by default (not surfaced raw to humans): raw event tail · raw queue payload · internal retry loop · debug logs · raw trace spans · raw prompt/output payload (if sensitive) · raw audio/attachment bytes.
This is the same "no raw event noise" boundary as 00-… §3 inv 16 + 02-… §7.7, now stated as a profile rule: humans see summaries + drill-down; machines hold the raw.
§7.4 Freshness rules
- Every dashboard summary carries
generated_at. - A summary older than its
dot_config observability.staleness.<surface>window is labelled stale in the UI. - An AI summary cannot replace evidence — the drill-down to source events is always present (MP-D9 rule 1).
- A summary cannot auto-resolve a problem without a source
*.resolved/*.recovered/*.healedevent (MP-D9 rule 2; §4.6).
§7.5 Reconciling the raw PG-event document's observability emphasis
The raw Bắt sự kiện của PG.docx (via its KB consolidations) emphasizes: schema registry, distributed tracing, orchestrator/governance visibility, DLQ, ACK/NACK, idempotency. Each is already embedded and is now profile-bound:
| Raw-doc emphasis | Where embedded | Profile binding |
|---|---|---|
| schema registry / event validation | event_type_registry + event_validation_audit (03-… §3.1) |
event_schema_validation_pass/fail metric (§7.2) |
| distributed tracing | W3C trace_id NOW (03-… §5.6) |
trace_id/correlation_id coverage metric (§7.2) |
| DLQ / ACK-NACK / idempotency | 03-… §5.2 + §5.4 + §6 |
DLQ depth + replay + idempotency_conflict metrics (§7.2) |
| orchestrator / governance visibility | governance cockpit (08-… §7) + problem queue (§4) |
governance_problem_count, human-visible summary (§7.3) |
Sentinel (MP-D20). Every workflow/event/input/Kaizen/agent surface exposes the §7.2 metric set (or declares the metric not_yet_wired as a known gap, never silently absent). Human surfaces show summaries + drill-down only; raw event tail / queue payload / spans / prompts / bytes stay machine-only. Every summary carries generated_at; stale summaries are labelled; no auto-resolve without a source event.
§8. MP-D21 — Raw PG-event document OSS/tool reconciliation under Assembly First
§8.1 Method
The raw PG-event document suggests a tool stack. Each is reconciled here under Assembly First (Điều 7) + Gate A (state-vocab fit) + Gate B (config-first fit) per 05-… §1, with a verdict label and an explicit "what it may NOT do". No final selection. No version pin. No implementation. This table extends 05-… and is mirrored there (05-… §7).
§8.2 Raw PG-event tool reconciliation table
| Tool | Verdict / label | May be used for | May NOT do (boundary) |
|---|---|---|---|
| Hasura | sandbox_reference_only OR reject_as_core_owner (L2 + L6) |
Study its all-in-one event-trigger/subscription patterns | Own the core event plane; bypass the Directus/Nuxt/PG boundary (Điều 33 v2.1); become a second SoT or connect clients directly to PG |
| pg-boss / Graphile Worker | reject_as_primary_substrate_now; future_adapter_slot_preserved only if it maps states to Điều 45 lifecycle and owns no state vocabulary (L2 + L5 + L7) |
Borrow PG-native queue patterns (priority/lease/backoff) implemented natively | Own the Incomex state vocabulary (9-state floor, Điều 45 §6.7); be the primary job substrate now |
| Benthos / Redpanda Connect | future_CDC_adapter_slot (L3 + L5 + L7) |
External table mirroring, high-volume CDC, config-driven PG change capture (read from event_outbox, sink externally) |
Be the domain-event SoT; bypass register-before-emit |
| NATS | future_transport_adapter (L4 + L7) |
Multi-host worker fanout, pub/sub transport | Be an event SoT; subjects must map 1:1 to event_type_registry, messages carry event_outbox_id |
| Redis Streams | future_lightweight_stream_adapter only if Redis already operational (L4 + L7) |
Lightweight stream transport when Redis is already in ops | Be the canonical audit/governance SoT |
| Temporal | future_execution_backend_adapter_after_triggers (L2 + L3 + L4) |
Possibly a bounded execution engine post-Phase-6 if MOW saturates (Council-gated) | Own workflow logic/definition — MOW/PG registry stays the workflow-definition SoT |
| Camunda | reference_only or future human-workflow adapter (L2 + L6) |
Reference BPMN/human-task patterns | Own approval — Điều 32 remains the approval owner |
| Airflow | batch/reference_only (L2 + L6) |
Reference for future batch/data workloads separate from MOW | Be the MOW orchestrator |
| Watermill | reference_only or future glue library (L3 + L6) |
Reference router/middleware patterns for a native worker library | Create any immediate dependency |
| OpenTelemetry / Jaeger / Tempo | future observability adapters (L4) |
Consume the W3C trace_id stream once ubiquitous; trace UI | Become the trace SoT — PG audit (vw_audit_event_timeline) stays SoT |
| Prometheus / Grafana / Loki / VictoriaMetrics | future observability/dashboard adapter slots (L4 + L7) |
Metrics scrape / dashboards / log view that reduce super-admin governance work | Replace PG governance state — governance_problem + event_outbox + dot_iu_command_run stay SoT |
§8.3 Universal conditions (all of the above)
Every adoption, if it ever happens, requires (per 05-… §5): Gate A + Gate B explicit verdict, an external_tool_registry SoT-pointback row, a documented reversibility/exit path, no double-ownership with Điều 32/35/38/39/45, Birth-registry registration (Điều 0-G), heartbeat for any process-class tool, and backend permission filter for any UI-exposed surface. No final OSS selection is made in this macro.
Sentinel (MP-D21). This addendum (and 05-… §7) mentions zero version numbers, zero CI steps, zero dockerfile lines. No tool is selected; each carries an explicit "may NOT" boundary preserving PG-first SoT + the law boundaries.
§9. MP-D22 — Governance Ops Survey + updated sequencing
§9.1 New survey macro
IU_4MOTHERS_GOVERNANCE_OPS_SURVEY_DOCUMENT_ONLY_*X — a read-only document-only macro (reads via mcp__claude_ai_Incomex_VPS__query_pg STABLE/SELECT only; writes need SSH per memory, and this survey writes nothing). It runs before any Governance Cockpit / AI-Agent Ops Console implementation.
§9.2 Survey targets
existing task-status queues · AI task queue / ai_tasks · agent_run / agent-history tables if any · Directus task collections · governance views/logs · worker-heartbeat data (queue_heartbeat) · event/problem categories · existing dashboard/ops module · existing prompt/task-dispatch module · existing approval/governance collections · existing audit/event timeline views · existing observability/logging tools on the VPS (Uptime Kuma, cron health checks per 5-layers.md).
§9.3 Survey output (per target)
A classification verified_live / KB_reported / legacy_trace / candidate_requires_survey / known_gap, plus a reuse/extend/create-paper recommendation, the risk of duplicate ownership, the law boundary owner, and a readiness verdict for cockpit implementation. Output as knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/<NN>-governance-ops-survey.md (paper-only).
Current best-knowledge seed for the survey (from KB, to be verified live):
ai_tasks/agent_runare not confirmed live — historical AI dispatch went through an ops connector that returned a DISABLED/connector error (KBgpt-dispatch-attempt-agent-readonly-investigation-iu-current-position-2026-05-14.md), so treat ascandidate_requires_survey.queue_heartbeat,event_outbox,job_queue,job_dead_letterareverified_live(Rev2 §12 / Điều 45).dot_iu_command_runisverified_live. M-002 Workflow module isKB_reportedpartial; a dedicated "M-005 task orchestration" artifact was not found in KB →candidate_requires_survey.
§9.4 Updated survey + phase sequencing
- Candidate Registry Survey (G7) —
field_registry/input_form_registry/output_table_registry/dot_function_registry(06-…§S16). - Tier Registry Survey — T6/T5/T4/T3 sources (
08-…§11 + §3.3 above). - Governance Ops Survey — this macro (§9.2).
- Then decide Phase 0 / Phase 1 implementation order (
06-…§S20), now informed by all three surveys.
Sentinel (MP-D22). No Governance Cockpit / AI-Agent Ops Console implementation macro runs before the Governance Ops Survey completes and classifies its targets. The three surveys precede the Phase 0/1 ordering decision.
§10. Existing infrastructure reuse table (decision per new concept)
For every concept introduced by MP-D16..MP-D22: reuse | extend | paper-only-new | survey-required | known-gap + reason. No new ownership.
| Concept | Decision | Substrate touched | Reason |
|---|---|---|---|
| Điều 5 ↔ T6→T1 mapping | reuse (documentation only) | law-05-five-tiers.md + 08-… §4 |
No new table; a clarifying mapping + readiness matrix only. |
| Readiness matrix status | reuse | 5-layers.md build status + Rev2 §12 |
Reads existing tier status; no mutation. |
governance_problem queue |
paper-only-new | reads job_dead_letter / queue_heartbeat / vw_governance_* / event_validation_audit / agent_run |
No existing typed problem-lifecycle row; it is a triage layer ABOVE existing detection sources. |
governance_problem_event_link |
paper-only-new | links to event_outbox |
Many-to-many provenance; no existing link table. |
governance_problem_assignment |
paper-only-new | — | Ownership history; additive. |
governance_suppression_policy / governance_slo_policy |
paper-only-new | reads dot_config |
Config-first suppression/SLO; no existing registry. |
| incident grouping | reuse pattern | governance_problem.parent_problem_id |
Grouping is a self-FK, not a new table. |
| change (remediation) | reuse | workflow_change_requests [VL] / generic proposal (06-… §S2) |
Existing change substrate; problems spawn changes, never are changes. |
| Kaizen review lifecycle | extend | input_submission (08-… §3.2) + paper kaizen_lifecycle_state |
Extends the staging row; no new table. |
| Kaizen duplicate detector | paper-only-new (fn) | reads input_submission + proposal |
fn_kaizen_duplicate_cluster STABLE; no mutation. |
| Kaizen anti-noise controls | extend (config) | dot_config kaizen.* |
Config-first; no hardcode. |
| Kaizen metrics | extend | vw_governance_kaizen_* (08-… §7.5) |
Extends existing Kaizen panel views. |
| Direct canonicalization gate | extend | input_routing_policy (08-… §3.3) + worker.canonicalizer |
Hardens existing gateway routing; no new owner. |
| Data lineage (staging↔canonical) | extend | input_submission.canonical_target_ref + canonical source_staging_id where permitted + dot_iu_command_run audit |
Reuses existing audit join; adds back-link column where the table allows. |
| Retention/security policy | extend (config) | dot_config retention.* + input_submission.security_classification |
Config-first; one paper column on staging. |
| Minimum Observability Profile | reuse + bind | event_validation_audit / fn_event_lag_compute / job_queue / job_dead_letter / queue_heartbeat / idempotency_registry / agent_run / governance_problem |
All metrics derive from existing/paper sources; the profile is a binding contract, not new substrate. |
| Observability adapter slots | candidate slots only | external_tool_registry (06-… §S18) |
Per 05-… Gate A + B; no final pick. |
| Raw PG-event tool reconciliation | reuse (documentation) | 05-… verdicts |
Extends 05-… §3-4 table; no new substrate. |
| Governance Ops Survey | survey-required | reads existing queues/tables read-only | Per memory: MCP query_pg is read-only; survey writes nothing. |
Especially-check infra (macro §4.1): event_outbox (reuse — detection events), event_type_registry (reuse — register-before-emit for any new *.resolved events), job_queue (reuse — queue-depth metric), job_dead_letter (reuse — dlq problem class), queue_heartbeat (reuse — silent_worker class), workflow_change_requests (reuse — change/remediation), tasks/task_comments (reuse — direct-canonicalization targets), ai_tasks/agent-run (survey-required — not confirmed live), dot_iu_command_run (reuse — lineage audit), iu_lifecycle_log (reuse — IU-scope audit), Directus collections (reuse — admin/triage views only, Điều 33 v2.1), M-002 workflow governance (reuse — KB_reported partial), M-005 task orchestration (survey-required — not found in KB). No row introduces double-ownership.
§11. Law / no-double-ownership matrix per MP (MP-D16..MP-D22)
Format: owner law/principle | what patch does | what patch must NOT do | sentinel test.
MP-D16 (Điều 5 ↔ T6→T1 mapping)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Điều 5 (architecture tiers) | Maps T6→T1 onto five tiers + readiness matrix | Conflate the two axes; build Tầng 4/5 on weak Tầng 1/2/3 | readiness matrix blocks any surface whose lower-tier substrate is paper/survey/gap |
| Future Điều XX (4 Mothers app layer) | Names T6→T1 as application-layer operating hierarchy | Enact law text | referenced as future referent only |
MP-D17 (Governance problem queue)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Điều 37 v3.3 (governance org) | Typed problem queue + ownership + permission filter | Bypass permission for super-admin | same backend predicate for all roles |
| Điều 32 (approval) | Waiver of high/critical needs approval_id; remediation = change/proposal |
Auto-mutate from a problem row | problem rows carry no mutation payload; remediation references separate change + approval |
| Điều 31 (audit) | Every lifecycle transition + grouping audited; reopen additive | Delete prior lifecycle history | reopened links via correlation_id, never erases |
| Điều 45 (event substrate) | Detection reads event_outbox/job_dead_letter/queue_heartbeat |
Re-detect with a new substrate | problem classes map to existing detection sources (§4.2) |
MP-D18 (Kaizen anti-noise)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Điều 28 (Nuxt input shell) | Keeps five-click UX; anti-noise in backend | Add user-facing complexity | user-facing strings unchanged; zero internal vocabulary |
| Điều 32 (approval) | Kaizen → proposal → Điều 32 merge | Auto-apply Kaizen | every workflow-branch Kaizen has proposal_state then approval_id |
| Điều 38/39 (IU) | Kaizen targets IU only via proposal | Mutate IU body from staging | IU edits route through author lifecycle |
| Hiến pháp v4.6.3 (config-first) | Rate-limit/rejection reasons in dot_config |
Hardcode anti-noise rules | functions read config, never embed thresholds |
MP-D19 (Direct canonicalization)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Hiến pháp v4.6.3 (PG-first / no hidden SoT) | Allow-listed direct writes only to non-structural canonical artifacts | Promote staging to SoT | staging never basis for canonical reads |
| Điều 35 (DOT) | Governance-relevant direct writes via DOT pair | Bypass DOT for registry/IU writes | every governance-relevant direct write has dot_iu_command_run |
| Điều 32 (approval) | Approval-required effects forced to workflow branch | Direct-write an approval-required change | deny conditions (§6.3) force workflow branch |
| Điều 30 (reversibility) + Điều 31 (audit) | Lineage + retention + correction model | Non-reversible/unaudited direct write | every direct write has lineage pair + retention + classification |
MP-D20 (Minimum Observability Profile)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Điều 45 (event/queue/heartbeat) | Metrics derive from event_outbox/job_queue/heartbeat | Mutate substrate to observe | observability reads only; emits events through standard producers |
| Điều 31 (audit) | Summaries carry generated_at + evidence; drill-down to vw_audit_event_timeline |
Surface raw event tail | no raw outbox view; machine-only raw |
| Điều 37 v3.3 (governance org) | Human-visible split permission-filtered | Bypass permission | same backend predicate |
| Hiến pháp v4.6.3 (no hidden SoT) | AI summary never replaces evidence; no auto-resolve without source event | Let a summary be the SoT | MP-D9 rule enforced (§4.6 / §7.4) |
MP-D21 (OSS reconciliation)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Điều 7 (Assembly First) | Reconciles each tool to verdict + adapter label | Select/pin/implement any tool | zero version/CI/dockerfile lines |
| Hiến pháp v4.6.3 (PG-first / no hidden SoT) | SoT-pointback required for any future adoption | Let any tool own event/workflow/approval/state | each tool has explicit "may NOT" boundary |
| Điều 32 / Điều 45 / Điều 38-39 | Camunda≠approval; Temporal≠workflow def; tools≠event SoT | Blur ownership | verdict rows name the owner law each tool may not displace |
MP-D22 (Governance Ops Survey)
| Owner | Patch does | Patch must NOT | Sentinel |
|---|---|---|---|
| Design process / Điều 7 | Adds read-only survey before cockpit build | Implement or mutate during survey | survey is SELECT-only; writes nothing |
| Điều 37 v3.3 | Surveys existing governance collections for reuse | Create duplicate ownership | survey flags duplicate-ownership risk per target |
No matrix row introduces double-ownership; future Điều XX is the only NEW concern referent, unchanged from Master Design Rev2 §11.
§12. IU ↔ PG ↔ queue/event relation review (binding)
| Question | Answer |
|---|---|
| Do governance problems duplicate IU body? | No. governance_problem rows carry refs (primary_entity_ref, event ids, trace_id) + impact JSON; never IU body bytes. |
| Does Kaizen anti-noise mutate IU directly? | No. Duplicate hashes are over staging suggestion text; approved IU changes route through Điều 38/39 author lifecycle producing a new iu_version (§5.6). |
| Does direct canonicalization mutate IU? | No. Direct branch is denied for any IU-body change (§6.3); IU-affecting input reroutes to workflow branch + Điều 38/39. |
| Do staging/proposal rows carry trace/correlation? | Yes. input_submission + proposal + governance_problem all carry trace_id + correlation_id (W3C MP-D1); vw_audit_event_timeline(trace_id) reconstructs the path. |
| Does the queue/event carry body? | No. Canonicalizer/transcription/indexer queue rows carry staging_id + executor_class_ref + idempotency key + W3C trace; all input.* and governance events obey the MP-D8 deny-list. |
| Is PG canonicalization explicit? | Yes. Governance-relevant direct writes use DOT pair (Điều 35); the canonical row stores source_staging_id where permitted; audit row joins staging ↔ canonical. |
| Does usage evidence learn without auto-mutating? | Yes. governance_problem + Kaizen metrics + direct-rejection rates feed iu_usage_evidence signals (03-… §9) → KG feedback proposals only; never auto-mutate IU/registry (00-… §3 inv 14). |
| Is the vector boundary intact? | Yes. Nothing here touches iu_vector_*; iu_vector_sync_enabled=false respected; no cross-IU pollution (00-… §3 inv 20). |
| Does the cockpit/observability mutate substrate? | No. Observability reads only; problem resolution requires a real *.resolved event (§4.6); cockpit actions (DLQ replay, suppression, waiver) emit through standard producers + Điều 32. |
All answers preserve 00-… §3 invariants 1, 6, 7, 11, 14, 20 verbatim.
§13. Remaining gaps / open decisions
| Gap | Type | Resolution path |
|---|---|---|
| Tier sources T6/T5/T4/T3 (existing vs paper?) | survey gap | Tier Registry Survey (08-… §11 + §3.3) before Phase 2; multi-domain infra itself is known_gap (5-layers TD-086). |
ai_tasks / agent_run table existence + shape |
survey gap | Governance Ops Survey (§9) — not confirmed live; treat as candidate_requires_survey. |
| M-005 task orchestration artifact | survey gap | Not found in KB; Governance Ops Survey confirms whether it exists. |
governance_problem + suppression/SLO registries |
paper-only, Phase 1/2 | Tracked in 06-… §S18 (patched) + §S23. Each needs Birth row (Điều 0-G). |
kaizen_lifecycle_state + fn_kaizen_duplicate_cluster |
paper-only, Phase 2 | Extends input_submission; Phase-2 DDL + function. |
Direct-canonicalization data-quality functions + security_classification column |
paper-only, Phase 1/2 | Extends input_submission + worker.canonicalizer. |
| Retention/PII config defaults | paper, Phase 1 | dot_config retention.* defaults tunable per data domain. |
| State machine registry not yet deployed (Tầng 3 gap) | known gap | G6/OD9 Phase 1 — blocks T2/T1 runtime in the readiness matrix (§3.3). |
| OSS observability/transport tool selection | open | Per 05-… Gate A + B + Council; no pick in this macro. |
| OD1 (Điều 34) | Council | Unchanged per MP-D10; Master Design Rev2 (incl. this addendum) does not depend on OD1. |
None of these blocks Master Design Rev2 approval. Each is paper-only or survey-only forward work. No new Council fork is introduced (every MP-D16..D22 item resolves within existing law boundaries + future Điều XX referent).
§14. Cross-document anchors (this addendum points back to)
- Top-level invariants + forbidden compliance + Điều 5 vs T6→T1 clarification —
00-master-design-rev2.md§3 (inv 27..33) + §3a + §12. - Governance problem lifecycle + cockpit queue semantics —
02-step-state-machine-and-workflow-ui-design.md§7.10. - Minimum Observability Profile + raw-doc reconciliation —
03-event-5layer-realtime-dlq-design.md§12. - Direct canonicalization guardrails + Kaizen anti-noise + input lineage —
04-iu-centered-4mothers-binding-design.md§4.3b. - Raw PG-event tool reconciliation table —
05-oss-candidate-strategy-rev2.md§7. - Readiness items + Governance Ops Survey + sequencing —
06-open-decisions-and-readiness.md§S16 + §S20.1 + §S23. - Rev3 addendum (MP-D11..MP-D15) this builds on —
08-bidirectional-input-kaizen-governance-addendum.md§14. - Revision-4 patch log —
07-master-design-rev2-report.md§15.
§15. Acceptance for this addendum
A1. MP-D16 maps T6→T1 onto Điều 5 five tiers with the two axes clearly distinguished §3.1-§3.2; readiness matrix per surface → required substrate → status → blocker §3.3.
A2. MP-D17 governance problem taxonomy (16 classes §4.2) + 5 severities §4.3 + 12+3 lifecycle states §4.4 + controls §4.5 + ack≠resolution / mitigation≠verification + auto-resolve discipline §4.6 + paper registry shapes §4.7.
A3. MP-D18 Kaizen review lifecycle §5.2 + duplicate dimensions §5.3 + anti-noise controls §5.4 + metrics §5.5; user flow unchanged (five clicks, backend-only complexity) §5.1.
A4. MP-D19 direct canonicalization allow §6.2 / deny §6.3 conditions + lineage §6.4 + retention/security §6.5 + IU/PG/queue relation §6.6.
A5. MP-D20 Minimum Observability Profile required machine metrics §7.2 + human-visible vs machine-only split §7.3 + freshness rules §7.4 + raw-doc reconciliation §7.5.
A6. MP-D21 raw PG-event tool reconciliation table §8.2 with verdict + adapter label + "may NOT" boundary per tool; universal SoT-pointback conditions §8.3; no final pick.
A7. MP-D22 Governance Ops Survey macro §9.1 + targets §9.2 + output §9.3 + updated three-survey sequencing §9.4.
A8. Existing infra reuse decision per concept §10 — every concept marked reuse | extend | paper-only-new | survey-required | known-gap with reason; especially-check infra list covered.
A9. Law / no-double-ownership matrix per MP §11 — sentinel test on each row; no double-ownership introduced.
A10. IU↔PG↔queue relation review §12 — preserves invariants 1, 6, 7, 11, 14, 20.
A11. No user-facing Kaizen complexity added §5.1 + I33.
A12. No final OSS tool selection §8.3 + I36; zero version/CI/dockerfile.
A13. Forbidden compliance: no PG mutation, no Directus mutation, no Qdrant/vector write, no migration, no DOT command run, no law enactment, no implementation, no UI deployment, no final OSS tool pick, no dot_config gate change, no schema creation, no code generation. All schemas paper-only.
End addendum.