KB-21D6 rev 3

Master Design Rev2 — 09 Governance/Operability/Observability Addendum (rev4 MP-D16..MP-D22)

68 min read Revision 3
master-design-rev2rev4mp-d16mp-d17mp-d18mp-d19mp-d20mp-d21mp-d22addendumiu-4mothersgovernanceobservabilitydieu-5kaizenoss-reconciliation2026-05-28

Master Design Rev2 — Governance / Operability / Observability Second-Order Hardening (Addendum)

Path: knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/09-governance-operability-observability-addendum.md Status: DRAFT Rev4 patch addendum (DOCUMENT ONLY). Companion to 00-master-design-rev2.md and 08-bidirectional-input-kaizen-governance-addendum.md. Materializes patch pack MP-D16..MP-D22. Cross-law-bound by 10-… (rev5 MP-D23..MP-D30) — see §4.8: governance_problem and ops objects are born under Điều 36/0-G/29 and jurisdiction-bound under Điều 37; the cockpit/AI-ops surfaces are Điều 28 templates. Date: 2026-05-28 · cross-linked 2026-05-28 (rev5 10-…) Authority: Macro IU_4MOTHERS_MASTER_DESIGN_GOVERNANCE_SECOND_ORDER_PATCH_DOCUMENT_ONLY_3000X. Built strictly on Rev2 brief authority + Master Design Rev2 invariants (00-… §3) + Rev3 addendum invariants (08-… §2). Driven by GPT review knowledge/dev/reports/architecture/iu-4mothers-master-design-rev3-gpt-review-gaps-mp-d16-d20-2026-05-28.md (verdict REV3_STRONG_BUT_NEEDS_SECOND_ORDER_GOVERNANCE_PATCH). No new law surface introduced. Future framework law referent = future Điều XX (4 Mothers application layer); Điều 34 cited only as decision-path (per MP-D10). Boundary: Hiến pháp v4.6.3 (PG-first / DOT 100% / no hardcode / no hidden SoT), Điều 5 (Kiến trúc 5 tầng — do not build an upper tier on a weak lower tier), Điều 7 (Assembly First — reuse before build, OSS as adapter only), Điều 28 (Nuxt render/input shell), Điều 30 (reversibility), Điều 31 (integrity/audit), Điều 32 (approval/governance), Điều 33 v2.1 (3-layer / Nuxt never reads PG), Điều 35 (DOT governance for mutation), Điều 37 v3.3 (governance org + permission filter), Điều 38/39 (IU/KG ownership, no cross-IU vector pollution), Điều 45 v1.0 (event/queue/executor/heartbeat/state-machine boundary). All Master Design Rev2 invariants (1..26) and Rev3 addendum invariants (I21..I30) are preserved verbatim. Forbidden in this macro (binding): No PG mutation. No Directus mutation. No Qdrant/vector write or reindex. No migration. No DOT command run. No law enactment / drafting. No implementation macro. No UI deployment. No final OSS tool selection. No raw SQL apply. No dot_config gate change. No schema creation. No code generation. No gate change. Every schema/table/view/function listed below is paper-only unless it is already verified live.


§1. Why this addendum exists

Master Design Rev2 Revision 3 (08-…, MP-D11..MP-D15) widened the design to cover bidirectional input flow, the unified MOW hierarchy canvas T6→T1, simple Kaizen intake, the MOT/JFT matrix, and the four separated UI surfaces. GPT review confirmed all five MP-D11..MP-D15 patches PASS and that Assembly First + no-production-mutation are preserved.

The same review found second-order gaps — points that are not wrong in Rev3, but become operationally ambiguous or unsafe at company scale (20 000+ concurrent items, thousands of Kaizen submissions, multi-department hierarchy). This addendum is a hardening patch, not a rewrite. It closes seven gaps:

  1. MP-D16 — T6→T1 business operating hierarchy is not yet explicitly mapped to the Điều 5 five-tier architecture (Hạ tầng / Cơ sở / Modules / Chuyên môn / Giám sát). The two tier systems are different axes and must not be confused.
  2. MP-D17 — The Governance Cockpit has panels (08-… §7.3) but not yet a formal operations problem-queue taxonomy + lifecycle suitable for 20 000+ items.
  3. MP-D18 — Kaizen intake (08-… §5) is simple for the user, but needs a stronger backend anti-noise / duplicate-control / review lifecycle so it does not become noise at scale.
  4. MP-D19 — The direct/no-op canonicalization branch (08-… §3.3) is correct but needs stricter data-quality / lineage / retention / abuse guardrails.
  5. MP-D20 — Observability is mentioned across Rev3, but needs a compact Minimum Observability Profile that holds from day 1 with a clear human-visible vs machine-only split.
  6. MP-D21 — The raw uploaded PG-event document's tool suggestions (Hasura, pg-boss/Graphile, Benthos, NATS, Redis, Temporal, Camunda, Airflow, Watermill, OTel/Jaeger, Prometheus/Grafana/Loki) need an explicit reconciliation under Assembly First.
  7. MP-D22 — The next survey sequence must include a Governance Ops Survey (not only Candidate Registry Survey + Tier Registry Survey) before cockpit / agent-ops implementation.

This addendum is mandatory reading alongside 00, 02, 03, 04, 05, 06, and 08. The patch text in those files cross-links here.

Source-access note (per macro §1). The raw uploaded files Bắt sự kiện của PG.docx and 4 mẹ mở rộng.txt are not directly accessible as files in this working environment. This addendum relies on their KB consolidation/recheck sources, which ARE accessible: knowledge/dev/reports/architecture/iu-4mothers-master-design-rev3-gpt-review-gaps-mp-d16-d20-2026-05-28.md, knowledge/dev/reports/architecture/iu-4mothers-event-foundation-gpt-recheck-after-drive-upload-2026-05-27.md, and knowledge/dev/design/assembly-first-open-source-integration-critique.md. Where this addendum reconciles raw-doc tool suggestions (MP-D21) it does so against those consolidations and the Rev2 §6.6 nine-lesson checklist; it does not pretend to have read the raw .docx. If Council demands a direct raw audit, 06-… §S17 already routes it.


§2. Top-line invariants added (extending 00-… §3 and 08-… §2)

These are additive — they relax or replace none of the existing 26 invariants in 00-… §3 nor the I21..I30 in 08-… §2. They continue the 08-… "I" sequence as I31..I37 (and correspond to 00-… §3 invariants 27..33, the condensed forms).

  • I31. Two orthogonal tier systems, never conflated (MP-D16). The Điều 5 five-tier architecture (Tầng 1 Hạ tầng → Tầng 5 Giám sát) is an architecture / building-layer axis. The T6→T1 operating hierarchy (Lĩnh vực → Task) is a business operating axis. They are different axes and must not replace each other. T6→T1 content lives mostly in Tầng 4 (Chuyên môn) + Tầng 5 (Giám sát) and depends on Tầng 1/2/3 readiness. No implementation may build a Tầng 4/5 feature when its required Tầng 1/2/3 substrate is missing (Điều 5 §2: never build an upper tier on a weak lower tier). (MP-D16.)
  • I32. Governance problems are a typed, lifecycled operations queue (MP-D17). Every operational problem is a typed governance_problem row with a problem-class, severity, ownership, an explicit lifecycle (detected→…→closed/reopened/suppressed/waived), dedupe + grouping + suppression + SLA/SLO clock. Acknowledgement ≠ resolution; mitigation ≠ verification. Suppression/waiver requires policy and may require Điều 32 approval. Auto-resolve requires a source *.resolved/*.recovered/*.healed event (MP-D9). (MP-D17.)
  • I33. Kaizen anti-noise lives in the backend, never in the user flow (MP-D18). The five-click user flow (08-… §5.1) is unchanged. Duplicate detection, clustering, rate-limit, spam/abuse flags, low-quality rejection, merge-with-credit, and the review lifecycle (received→…→archived) are backend/governance concerns. Ordinary staff UI gains zero new complexity. A rejected submission always returns a user-friendly reason; merged duplicates still credit the contributor as a supporter. (MP-D18.)
  • I34. Direct canonicalization is allow-listed, audited, lineaged, and reversible (MP-D19). A staging row may take the direct branch only when every allow condition holds (permission, allowed target kind, schema-valid, size-within-limit, no structural change, no law/registry/IU-body mutation, no approval-required effect, idempotency key present, audit enabled, retention policy attached, PII/security classification checked). Any deny condition forces the workflow branch or refusal. Every direct write records lineage (staging_idcanonical_target_ref), preserves a rejection reason on refusal, and is correctable by its own table's reversibility contract. (MP-D19.)
  • I35. Minimum Observability Profile holds from day 1 (MP-D20). Every workflow/event/input/Kaizen/agent surface emits the minimum machine-metric set (schema-validation pass/fail, trace/correlation coverage, event lag p50/p95/p99, queue depth, lease timeout, ACK/NACK, retry, DLQ depth + replay outcome, idempotency conflict, heartbeat freshness, silent-worker count, governance-problem count by class/severity, Kaizen duplicate/noise rate, direct-canonicalization rejection rate, audio transcription failure/confidence, AI/agent task status, top blocked / cannot_complete clusters). Human-visible surfaces show problem summary / severity / owner / affected T6→T1 path / impact / age-SLA / recommended next action / drill-down / evidence-backed AI summary. Raw event tail, raw queue payload, raw spans, raw prompt/output, raw audio bytes are machine-only by default. Every summary carries generated_at; stale summaries are labelled stale; AI summary never replaces evidence and never auto-resolves without a source event. (MP-D20.)
  • I36. OSS tools remain adapters with SoT-pointback, never core owners (MP-D21). Every raw-doc-suggested tool (Hasura, pg-boss/Graphile, Benthos, NATS, Redis, Temporal, Camunda, Airflow, Watermill, OTel/Jaeger/Tempo, Prometheus/Grafana/Loki/VictoriaMetrics) is reconciled to a verdict + label via Gate A (state-vocab fit) + Gate B (config-first fit); none may own the core event plane, workflow definition, approval, or any state authority; each, if ever adopted, requires an external_tool_registry SoT-pointback row. No final pick, no version pin, no implementation here. (MP-D21 + extends 00-… §3 inv 15.)
  • I37. Governance Ops Survey precedes cockpit/agent-ops implementation (MP-D22). Before any Governance Cockpit or AI/Agent Ops Console implementation, a read-only IU_4MOTHERS_GOVERNANCE_OPS_SURVEY_DOCUMENT_ONLY_*X macro surveys existing task-status queues, AI-task/agent-run tables, Directus task collections, governance views/logs, worker-heartbeat data, event/problem categories, existing dashboard/ops modules, prompt/dispatch modules, approval/governance collections, audit/event timeline views, and VPS observability tooling — classifying each verified_live/KB_reported/legacy_trace/candidate_requires_survey/known_gap with a reuse/extend/create recommendation. The survey sequence becomes: (1) Candidate Registry Survey (G7), (2) Tier Registry Survey, (3) Governance Ops Survey, then Phase 0/1 ordering decision. (MP-D22.)

These seven invariants are binding on every WS file and on every future macro that touches tier mapping, governance operations, Kaizen, input canonicalization, observability, or OSS adoption.


§3. MP-D16 — Map T6→T1 operating hierarchy onto the Điều 5 five-tier architecture

§3.1 The two tier systems are different axes

Rev3 added the T6→T1 business operating hierarchy (08-… §4.1). The project also has the Điều 5 five-tier architecture (knowledge/dev/laws/law-05-five-tiers.md, detailed in knowledge/dev/architecture/5-layers.md). These are NOT the same axis:

Axis What it organizes Direction Owner
Điều 5 five-tier architecture How the system is built — infrastructure → supervision Tầng 1 (bottom) → Tầng 5 (top) Điều 5 (architecture law)
T6→T1 operating hierarchy How the business operates — domain → task T6 (broad) → T1 (atomic work) future Điều XX (4 Mothers application layer) referent; MOW canvas surface (08-… §4)

Binding clarification:

  • T6→T1 is a business operating hierarchy (Lĩnh vực / Công ty / Phòng ban / Chuyên môn / Workflow / Task). It answers "where in the organization does this work live?"
  • The Điều 5 five-tier model is an architecture / building-layer model. It answers "what substrate must exist before this feature can run?"
  • They are not interchangeable and must not replace each other. A card at MOW tier T4 (Phòng ban) is a business node; it is rendered by code that lives in Điều-5 Tầng 3 (Modules), reads registries in Tầng 2 (Cơ sở), runs on Tầng 1 (Hạ tầng), and is monitored from Tầng 5 (Giám sát).

§3.2 Điều 5 five-tier definitions (canonical, from law-05-five-tiers.md)

Tầng 5: GIÁM SÁT + CẢI TIẾN — phát hiện bất đồng bộ, auto-fix, improvement loops (2 động cơ)
Tầng 4: CHUYÊN MÔN (đích đến) — quy trình nghiệp vụ thực tế
Tầng 3: MODULES NỀN TẢNG — Table, Comment, Workflow, CI/CD (reusable modules)
Tầng 2: CƠ SỞ (nguyên liệu) — Registries, Metadata, DOT, Fields, Taxonomy
Tầng 1: HẠ TẦNG — VPS, Docker, PG, Directus, Nuxt, Agent Data, Qdrant

Mapping the design package's concepts onto the five tiers:

Điều 5 tier Holds (this design's concepts)
Tầng 1 — Hạ tầng VPS / Docker / PostgreSQL 16 / Directus / Nuxt / Agent Data / Qdrant boundary. The backend input gateway service (08-… §3.1) and the realtime gateway service (03-… §7) are Tầng-1 runtime processes.
Tầng 2 — Cơ sở All registries + metadata + DOT + fields + taxonomy: event_type_registry, state_machine_registry, executor_class_registry, field_registry/input_form_registry/output_table_registry/dot_function_registry (CRS), tier_registry, task_template/assignee_policy/deadline_policy/escalation_policy, input_routing_policy, external_tool_registry, governance_problem/governance_slo_policy/governance_suppression_policy (this addendum). DOT command catalog.
Tầng 3 — Modules nền tảng The reusable modules: Table (M-003), Comment (M-001), Workflow (M-002), plus the 4 Mothers modules (MOW / MOT / MOIT / MOUT) as application-platform modules, the realtime gateway abstraction, the canonicalizer/transcription/attachment workers, and the governance UI components. These are the building blocks Tầng 4 assembles.
Tầng 4 — Chuyên môn (đích đến) The real business workflows + T6→T1 operating-hierarchy content — actual SOPs, department missions, workflow definitions, task instances. T2 (Workflow) and T1 (Task) of the operating hierarchy are Tầng-4 content; T6..T3 (Lĩnh vực/Công ty/Phòng ban/Chuyên môn) are the Tầng-4 classification context.
Tầng 5 — Giám sát + cải tiến Governance Cockpit, AI/Agent Ops Console, Kaizen improvement loops, SLO/SLA, the governance_problem operations queue, the Minimum Observability Profile, usage-evidence learning. The heatmaps T6→T1 in the cockpit (08-… §7.3 item 4) are a Tầng-5 view over Tầng-4 operating content.

Where T6→T1 lives. T6→T1 content is mostly Tầng 4 (operating workflows/tasks) and Tầng 5 (its supervision + improvement), but it depends on Tầng 1/2/3 readiness: the MOW canvas (a Tầng-3 module) cannot render T6→T1 cards without the Tầng-2 tier_registry + classification rows, the Tầng-1 PG/Nuxt substrate, and the Tầng-3 state-machine/workflow modules being live.

§3.3 Readiness matrix (per T6→T1 surface → required Điều 5 substrate → status → blocker)

Status vocabulary: verified_live / KB_reported / paper_only / survey_required / known_gap. Current Điều 5 build status (from law-05-five-tiers.md §3 + 5-layers.md): Tầng 1 verified_live (stable); Tầng 2 verified_live (138 collections, 27 registries, 108 DOT tools, 17 realtime triggers, verify_counts()=0); Tầng 3 partial (KB_reported: M-001 Comment commercial, M-003 Table live, M-002 Workflow Phase 2A done / Phase 2B paused, no state machine deployed yet, M-004 Auto-Tester SSOT-only); Tầng 4 not started (correct per Điều 5 ordering); Tầng 5 partial (KB_reported: Điều 30/31/26 enacted; unified sync monitor + self-healing still gap).

T6→T1 surface (MOW canvas + JFT) Required Điều 5 lower-layer substrate Status Blocker if missing
T6 Lĩnh vực card grid Tầng 2 tier_registry + domain classification rows (extend workflow_categories category_kind='domain' OR new) survey_required Tier Registry Survey (08-… §11) must confirm existing vs paper. No T6 render until tier source confirmed.
T5 Công ty card grid Tầng 2 company/tenant rows (existing tenant table? survey) survey_required Same survey; multi-domain infra itself is known_gap (5-layers TD-086).
T4 Phòng ban card grid Tầng 2 department rows + Tầng 3 permission filter (Điều 37 v3.3) survey_required Department registry shape unconfirmed; permission predicate must exist.
T3 Chuyên môn card grid Tầng 2 specialty rows + Tầng 3 classification survey_required Specialty registry shape unconfirmed.
T2 Workflow card / Standard+Runtime Process View Tầng 3 Workflow Module (M-002) + workflows/workflow_steps/workflow_step_relations + state_machine_registry KB_reported (M-002 Phase 2A done; Phase 2B paused; state machine paper-only) State machine registry (G6 / OD9) is paper_only; long-workflow UI Phase 5. T2 runtime depends on these.
T1 Task card / MOT-JFT envelope Tầng 3 MOT module + tasks/task_checkpoints/task_comments + task_template/assignee_policy/deadline_policy/escalation_policy + state_machine_registry + executor_class_registry + CRS (MOIT/MOUT) KB_reported (tasks tables live) + paper_only (template/policy registries) + CRS-gated (MOIT/MOUT) Mass JFT generation blocked until template+policy registries land (Phase 2) AND G7 CRS closes (MP-D7).
T0 Field (atomic; NOT a tier) Tầng 2 field_registry [CRS row 28] survey_required (CRS) MP-D7 sentinel: no executable reference by name until VL or shape-adapter.
Governance Cockpit (Tầng 5 view over T6→T1) Tầng 1/2/3 all of the above + governance_problem queue (this addendum) + vw_governance_* + observability profile paper_only (cockpit + problem queue) on KB_reported/verified_live substrate Governance Ops Survey (MP-D22) must run before cockpit implementation; cockpit cannot precede the substrate it aggregates.

Sentinel (MP-D16). No Phase-2 macro may schedule a T6→T1 surface whose required Tầng 1/2/3 substrate row in this matrix is paper_only / survey_required / known_gap without first landing (or surveying) that substrate. The MOW canvas (Tầng-3 module) and the Governance Cockpit (Tầng-5 view) are explicitly not the same axis as the T6→T1 business hierarchy they display.


§4. MP-D17 — Governance problem queue taxonomy and lifecycle

§4.1 The problem of "panels without a queue"

Rev3 (08-… §7.3) gives the Governance Cockpit eleven panels and 02-… §7 gives problem-first views. At 20 000+ items those are displays, not an operations governance model. MP-D17 adds the operations layer: a typed, owned, lifecycled, deduplicated, SLA-clocked problem queue — the industry incident/problem/change separation applied to this system.

Incident / problem / change separation (binding):

  • Problem = a governance_problem row (a detected operational condition). This addendum owns the problem concept.
  • Incident = a grouping of related problems sharing a root cause (a cluster). Represented by the grouped lifecycle state + a parent governance_problem row of class *_cluster; NOT a separate table.
  • Change = a remediation that mutates the system → ALWAYS a workflow_change_requests (workflow) or generic proposal (non-workflow) row (existing, 02-… §8 / 06-… §S2), gated by Điều 32. A problem may spawn a change, but a problem is never itself a change. Sentinel: governance_problem rows never carry mutation payload; remediation always references a separate change/proposal row + approval_id.

§4.2 Problem classes

governance_problem.problem_class vocab (paper, lives in dot_config vocab.governance_problem_class.*):

dlq · silent_worker · event_lag_breach · schema_validation_failure · idempotency_conflict · overdue_cluster · blocked_escalated · cannot_complete_cluster · failed_cut · orphan_workflow · kaizen_noise_spike · ai_agent_failure_cluster · permission_anomaly · data_quality_warning · direct_canonicalization_rejection · input_abuse_or_spam.

Each class maps to an existing detection source (no new detection substrate — see reuse table §10):

problem_class Detection source (existing / paper view)
dlq job_dead_letter / vw_governance_dlq_count (02-… §7.2)
silent_worker queue_heartbeat + dot_config heartbeat.threshold.* (03-… §5.5)
event_lag_breach fn_event_lag_compute + vw_governance_event_lag (03-… §6.4)
schema_validation_failure event_validation_audit (03-… §3.3 / §6.6)
idempotency_conflict idempotency_registry observation_count anomalies (03-… §5.4)
overdue_cluster vw_governance_overdue grouped (02-… §7.2)
blocked_escalated fn_step_blocked_severity red-escalated (02-… §3.1 MP-D4)
cannot_complete_cluster step_run/task_run cannot_complete grouped (08-… §7.3 item 9)
failed_cut cut_request cut_failed (existing cut pipeline; memory)
orphan_workflow workflow_run with no live owner/trigger (02-… §7.1)
kaizen_noise_spike vw_governance_kaizen_* duplicate/noise rate (§5 + 08-… §7.5)
ai_agent_failure_cluster agent_run failures grouped (08-… §7.4)
permission_anomaly render/permission refusals (04-… §2.3 MP-D6 render_permission_denied) + gateway refusals
data_quality_warning direct-canonicalization data-quality checks (§6)
direct_canonicalization_rejection input.rejected on direct branch (§6 + 03-… §3.4a)
input_abuse_or_spam Kaizen/input rate-limit + spam flags (§5.3)

§4.3 Severity

governance_problem.severity ∈ {critical, high, medium, low, info}. Severity is config-driven per class + context (dot_config governance_severity.<problem_class>.*), never hardcoded in UI. Severity combines with SLA-breach proximity to order the cockpit's severity-prioritized problem queue (08-… §7.3 item 1).

§4.4 Lifecycle states

governance_problem.lifecycle_state vocab:

detectedgroupedtriagedacknowledgedassignedinvestigatingwaiting_externalwaiting_humanmitigatedresolvedverifiedclosed; plus reopened, suppressed, waived as side states.

Binding distinctions (the core of MP-D17):

  • acknowledgedresolved. Acknowledgement means a human/owner has seen the problem; the underlying condition still holds. Resolution means the condition no longer holds.
  • mitigatedverified. Mitigation reduces impact (e.g. paused a noisy producer); verification confirms the condition is actually gone with evidence.
  • resolvedverifiedclosed. A problem may only reach closed after verified. verified requires an evidence reference (a source *.resolved/*.recovered/*.healed event per MP-D9).
  • suppressed / waived require policy. suppressed references a governance_suppression_policy row; waived additionally requires Điều 32 approval_id for any waiver that hides a high/critical problem.
  • reopened is additive (never deletes prior lifecycle history); it links to the prior closed/resolved record via correlation_id (same additive discipline as MP-D2/MP-D3).

§4.5 Controls

Control Mechanism (paper)
dedupe governance_problem.dedupe_key = hash(problem_class, primary_entity_ref, window_bucket); insert-on-conflict folds duplicates into one row with occurrence_count++.
grouping grouped state + parent *_cluster problem; member problems link via governance_problem.parent_problem_id.
suppression governance_suppression_policy (paper) — predicate + reason + scope + expiry; suppressed problems hidden from default queue, audited.
snooze governance_problem.snooze_until timestamp; reappears after expiry; snooze audited.
waive_with_approval governance_problem.waiver_approval_id (Điều 32) required for high/critical waivers.
escalation governance_problem.escalation_chain_idescalation_policy (08-… §6.2); fires on SLA breach.
owner assignment governance_problem.assignee_id + assignee_policy (08-… §6.2); assigned lifecycle state.
SLA/SLO clock governance_problem.sla_policy_idgovernance_slo_policy (paper); detected_at + acknowledged_at + resolved_at drive breach minutes.
impact estimate governance_problem.impact_jsonb — affected entity counts + affected T6→T1 hierarchy path + estimated SLA exposure; evidence-backed (MP-D9), never fabricated.
drill-down governance_problem.trace_id / correlation_idvw_audit_event_timeline(trace_id) (03-… §6.5).

§4.6 Auto-resolve discipline (inherits MP-D9)

A governance_problem may auto-advance to resolved (then await verified) only when a corresponding *.resolved / *.recovered / *.healed event exists in event_outbox. A summarizer flipping a problem to resolved without such an event is an integrity violation (event_validation_audit row + a new governance_problem of class schema_validation_failure/data_quality_warning). This is the MP-D9 rule (02-… §7.3) applied to the problem queue.

§4.7 Paper registry / view shapes (paper-only — no schema creation)

governance_problem
  problem_id              uuid PK
  problem_class           text   -- §4.2 vocab (dot_config)
  severity                text   -- {critical|high|medium|low|info}
  lifecycle_state         text   -- §4.4 vocab
  primary_entity_ref      jsonb  -- {kind, id} the problem is about
  parent_problem_id       uuid nullable -- grouping (incident)
  dedupe_key              text
  occurrence_count        int
  first_detected_at       timestamptz
  last_detected_at        timestamptz
  acknowledged_at         timestamptz nullable
  assignee_id             uuid nullable
  mitigated_at            timestamptz nullable
  resolved_at             timestamptz nullable
  verified_at             timestamptz nullable
  closed_at               timestamptz nullable
  reopened_count          int
  snooze_until            timestamptz nullable
  suppression_policy_id   uuid nullable
  waiver_approval_id      uuid nullable   -- Điều 32
  sla_policy_id           uuid nullable
  escalation_chain_id     uuid nullable
  impact_jsonb            jsonb
  resolution_evidence_refs jsonb nullable -- *.resolved/*.recovered/*.healed event ids (MP-D9)
  trace_id                text
  correlation_id          uuid

governance_problem_event_link   -- many-to-many problem ↔ source events
  problem_id              uuid FK
  event_outbox_id         uuid FK
  link_role               text   -- 'detection' | 'resolution' | 'evidence'
  PRIMARY KEY (problem_id, event_outbox_id, link_role)

governance_problem_assignment   -- ownership history (additive)
  assignment_id           uuid PK
  problem_id              uuid FK
  assignee_id             uuid
  assigned_by             uuid
  assigned_at             timestamptz
  unassigned_at           timestamptz nullable
  reason                  text

governance_suppression_policy
  suppression_policy_id   uuid PK
  problem_class           text nullable  -- null = all classes
  predicate_ref           text           -- predicate fn name
  reason                  text
  scope_jsonb             jsonb
  approval_id             uuid nullable  -- Điều 32 for high/critical scope
  expires_at              timestamptz nullable
  active                  bool

governance_slo_policy
  slo_policy_id           uuid PK
  scope                   text   -- 'problem_class.<X>' | 'workflow_category.<Y>' | 'executor_class.<Z>' | 'event_subscription.<S>'
  objective_jsonb         jsonb  -- {target, window, ack_minutes, resolve_minutes}
  active                  bool

These are paper-only. They reuse detection sources (§4.2) rather than re-detecting; they are a triage/lifecycle layer above the existing vw_governance_* views. The cockpit panels in 08-… §7.3 read these rows; this addendum gives them their queue model.

Sentinel (MP-D17). Every governance problem the cockpit displays is a governance_problem row with a class, severity, lifecycle_state, and owner (or explicit unassigned). acknowledged/mitigated never count as resolved/verified. No closed without verified; no verified without a source *.resolved event. Every remediation references a separate change/proposal row + approval_id; the problem row carries no mutation payload.

§4.8 Cross-law binding of governance/ops objects (MP-D24 + MP-D26, rev5)

The governance_problem(+_event_link/_assignment), governance_suppression_policy, governance_slo_policy, and agent_run paper registries (§4.7, 08-… §7.4) are governed objects and are bound by the rev5 cross-law patch (10-…):

  • Birth / collection / species (MP-D24, Điều 36/0-G/29). When promoted from paper, each is born under the Industrial Birth Contract: a collection_registry entry (4 mandatory attributes — governance_role, purpose, species mapping, birth trigger; Điều 29), a birth_registry record per row where governed, a species_code (e.g. a SPE-GOV grouping species per Điều 29) + composition_level, created via DOT-COL-REGISTER + dot-species-map + DOT+APR (never raw psql). HC-REG/HC-SCHEMA keep them registered + described. (10-… §4.)
  • Jurisdiction (MP-D26, Điều 37). Each governance/ops object type has a governance_owner agency + escalation_owner and a law_jurisdiction (Phạm vi); cockpit visibility is backend-filtered per role (Layer B Directus role/field-allowlist + Điều 32) — super-admin sees aggregate, not all raw rows by default; AI/Agent Ops never leaks prompts/payloads/outputs outside authority (field allowlist + MP-D20 machine-only). (10-… §6.1 jurisdiction matrix.)
  • Template (MP-D23, Điều 28). The Governance Cockpit + AI/Agent Ops Console surfaces are design_templates rows (TPL-governance-cockpit, TPL-agent-ops-console) + product records with strict field allowlist (10-… §3.1).
  • No "đẻ rơi" (MP-D29). No governance/ops object reaches active without birth+collection/species+owner law; orphan/phantom/nhầm-chuồng detectors apply (10-… §9).

Sentinel (MP-D24/D26 for §4): every promoted governance/ops registry has a collection_registry+species+governance_role; each object type has a governance_owner+escalation_owner; cockpit/console respect jurisdiction + field allowlist.


§5. MP-D18 — Kaizen anti-noise, duplicate-control, and review lifecycle

§5.1 The user flow does not change

The five-click flow stays exactly as in 08-… §5.1: Đề xuất cải tiến → Thêm|Sửa|Xoá → chọn vị trí → comment/audio → gửi. All MP-D18 machinery is backend/governance. Ordinary staff UI gains zero new fields, zero new decisions, zero new vocabulary (re-affirms 08-… §5.2 + I33).

§5.2 Kaizen review lifecycle (backend)

input_submission rows of input_kind='kaizen_*' carry a kaizen_lifecycle_state (paper, distinct from the generic processing_state):

receivedauto_classifiedduplicate_suspectedneeds_clarificationaccepted_for_review → (rejected_noise | converted_to_proposal) → approvedmerged → (rejected_by_reviewer) → measured_after_changearchived.

  • received — gateway accepted the submission (input.submitted).
  • auto_classified — backend classifier assigned candidate target + intent + department + tier.
  • duplicate_suspected — duplicate detector (§5.3) flagged a likely duplicate cluster.
  • needs_clarification — reviewer (or classifier) requests one short clarification; user gets a friendly prompt (still inside the simple UX — at most one extra question, 08-… §5.2).
  • accepted_for_review — passes anti-noise gate; enters reviewer queue.
  • rejected_noise — failed anti-noise gate; user notified with a friendly reason (§5.4).
  • converted_to_proposal — promoted to workflow_change_requests or generic proposal (existing routing, 08-… §5.4).
  • approved / rejected_by_reviewer — Điều 32 reviewer decision on the proposal.
  • merged — applied via its own change macro (Phase 1+); contributor credited.
  • measured_after_change — impact measurement after the change ships (closes the improvement loop, Tầng 5).
  • archived — terminal.

§5.3 Duplicate detection dimensions

A Kaizen submission is clustered for duplicate suspicion by (paper fn_kaizen_duplicate_cluster):

target_artifact_ref · intent · hierarchy_context (T6→T1 path) · actor_department · raw_text_semantic_hash · audio_transcription_hash · time_window · existing_open_proposal_refs.

A new submission whose dimensions match an open or recently-decided cluster is set to duplicate_suspected and folded into the cluster's review item.

§5.4 Anti-noise controls (backend/governance only)

Control Mechanism (paper)
duplicate clustering fn_kaizen_duplicate_cluster (§5.3); duplicates merge into one review item.
contributor rate-limit per-role/per-time-window cap in dot_config kaizen.rate_limit.<role>; excess flagged input_abuse_or_spam (→ governance_problem).
spam/abuse flag heuristic + reviewer flag; repeated abuse raises a governance_problem class input_abuse_or_spam.
low-quality/empty rejection empty/near-empty comment with no audio + no attachment → rejected_noise with friendly reason.
merge duplicates into one review item cluster → single proposal; all contributors recorded as supporters.
contributor credit even on merge/duplicate, contributor stays a supporter_ref on the proposal (credit preserved).
reviewer clarification reviewer may set needs_clarification; user receives one short friendly question.
user-friendly rejection reason rejection reason from dot_config kaizen.rejection_reason.* rendered in plain language; never an internal error code.

§5.5 Kaizen metrics (extends 08-… §7.5)

Computed by paper STABLE functions over input_submission + proposal + workflow_change_requests lifecycle, surfaced on the cockpit Kaizen panel (Tầng 5):

submission_rate · duplicate_rate · accepted_for_review_rate · approval_rate · merge_rate · time_to_decision (p50/p95) · time_to_impact_measurement · contributor_quality_score · department_improvement_index · repeated_problem_hotspots.

A sustained duplicate_rate / noise spike raises a governance_problem of class kaizen_noise_spike (§4.2) so governance can tune rate-limits or address a confusing surface — without ever touching the user flow.

§5.6 IU relation (preserves invariants)

Kaizen never mutates IU body directly. A Kaizen targeting an IU lands as a proposal (non-workflow) or workflow_change_requests (workflow) row; an approved IU-narrative change is authored through the Điều 38/39 author lifecycle producing a new iu_version (re-affirms 08-… §9 + 00-… §3 inv 1/14). The duplicate-detection raw_text_semantic_hash / audio_transcription_hash are hashes of the suggestion text in staging — never canonical IU body.

Sentinel (MP-D18). The user-facing Kaizen flow stays five clicks with zero internal vocabulary (re-affirms 08-… §5.2). All anti-noise/duplicate/lifecycle machinery is backend. Every rejected submission returns a friendly reason; every merged duplicate credits its contributor as a supporter. Kaizen never produces more than one downstream canonical mutation per submission (re-affirms 08-… §5.6).


§6. MP-D19 — Direct canonicalization policy and data-quality guardrails

§6.1 The direct branch needs a strict gate

08-… §3.3 routes direct vs workflow per input_kind. MP-D19 hardens the direct branch: it may bypass workflow/approval ONLY behind an explicit allow-list and a data-quality gate, and every direct write must be lineaged, retained-by-policy, classified for PII/security, and reversible.

§6.2 Direct canonicalization allow conditions (ALL must hold)

The canonicalizer worker (worker.canonicalizer) admits a direct write only if every condition holds:

actor_has_permission · target_kind_allowed_by_input_routing_policy · schema_valid · size_within_limit · no_structural_change · no_law/registry/IU_body_mutation · no_approval_required_effect · idempotency_key_present · audit_enabled · retention_policy_attached · PII/security_classification_checked.

§6.3 Direct canonicalization deny conditions (ANY forces workflow branch or refusal)

Direct is denied (routed to workflow branch, or refused with reason) if it:

changes the workflow graph · changes IU canonical body · changes a field/form/output registry · changes law/governance policy · affects other departments beyond permission scope · requires approval · has transcription confidence below threshold · trips spam/abuse/rate-limit.

A denied-but-legitimate input is re-routed to the workflow branch (proposal + Điều 32); an illegitimate one is refused with input.rejected + reason and raises a governance_problem of class direct_canonicalization_rejection if the rejection rate spikes.

§6.4 Data lineage (staging ↔ canonical)

Lineage element Mechanism (paper)
forward link input_submission.canonical_target_ref = {kind, id} once a direct write lands (08-… §3.2).
back link the canonical row stores source_staging_id where its table permits (e.g. task_comments.source_staging_id, audit_note.source_staging_id). Tables that cannot carry the column rely on the audit join.
audit join a dot_iu_command_run / audit row joins staging_id ↔ canonical write ↔ trace_id so vw_audit_event_timeline(trace_id) shows the full staging→canonical path.
rejection preservation input_submission.rejection_reason + validation_reason_code retained even after a row is closed (never nulled).
correction model a direct write is corrected by its own table's reversibility contract (soft-delete + audit for task_comments; append-only correction note for audit_note); an IU-affecting correction always reroutes through Điều 38/39 author lifecycle (never a direct staging rewrite).

§6.5 Retention / security (paper config)

Policy Paper config key Default
raw_text retention dot_config retention.staging.raw_text tunable Phase 1 (proposed 180 days)
audio retention dot_config retention.staging.audio tunable Phase 1
attachment retention dot_config retention.staging.attachment tunable Phase 1
PII/security classification input_submission.security_classification{public, internal, confidential, pii} (paper); set at gateway required before direct write
access scope for staging artifacts gateway permission_scope_hash (matches MP-D6 cache-key shape); staging artifacts readable only within submitter/reviewer scope backend-enforced
transcription confidence threshold dot_config transcription.min_confidence default 0.6 (mirrors MP-D9 ai_summary default)
human review threshold below confidence threshold → needs_clarification / human review, never auto-direct binding
abuse/spam/rate limit dot_config kaizen.rate_limit.* + spam flags (§5.4) tunable Phase 1

§6.6 Relation to IU / PG / queue

Direct canonicalization writes only allow-listed, non-structural artifacts (task_comments, audit_note, input_submission_ack). It never writes IU body, never writes a registry, never changes a workflow graph. Governance-relevant direct writes go through a DOT pair (Điều 35); lighter writes use a typed RPC that emits the standard producer events. Queue rows for canonicalization carry refs only (staging_id + executor_class_ref + idempotency key + W3C trace). No raw text/audio/attachment bytes travel in events (MP-D8). (Re-affirms 08-… §9.)

Sentinel (MP-D19). No direct-branch write occurs unless all §6.2 allow conditions hold; any §6.3 deny condition forces workflow branch or refusal. Every direct write has a lineage pair (staging_idcanonical_target_ref) and a retention + security classification. Below-threshold transcription never auto-canonicalizes. Direct writes never touch IU body / registries / workflow graph / law.


§7. MP-D20 — Minimum Observability Profile

§7.1 Purpose

Rev3 has observability pieces scattered (event lag 03-… §6.4, heartbeat 03-… §5.5, governance views 02-… §7.2, MP-D9 evidence 02-… §7.3, cockpit trend lines 08-… §7.3). MP-D20 binds them into one Minimum Observability Profile that every workflow/event/input/Kaizen/agent surface satisfies from day 1, with an explicit human-visible vs machine-only split. This is the standard-industry observability the raw PG-event document emphasizes (schema registry, distributed tracing, DLQ/ACK-NACK/idempotency, governance UI), expressed within the PG-first SoT.

§7.2 Required machine metrics (every surface)

Metric Source (existing / paper)
event_schema_validation_pass/fail event_validation_audit (03-… §3.3)
trace_id / correlation_id coverage event_outbox.trace_id IS NULL count (03-… §5.6 sentinel)
event_lag p50/p95/p99 fn_event_lag_compute (03-… §6.4)
queue_depth by queue job_queue group by job_class/status (03-… §4.2)
lease_timeout count job_queue.lease_until < now() while leased
ACK/NACK rate worker lease/release ledger (03-… §5.2)
retry_count job_queue.attempt_count + retry policy (03-… §5.3)
DLQ depth job_dead_letter (03-… §6.1)
DLQ replay outcomes dlq_replay_request.outcome_jsonb (03-… §6.2)
idempotency_conflict count idempotency_registry.observation_count (03-… §5.4)
worker heartbeat freshness queue_heartbeat.last_tick_at (03-… §5.5)
silent_worker count heartbeat vs dot_config heartbeat.threshold.*
governance_problem_count by class/severity governance_problem (09-… §4)
kaizen duplicate/noise rate vw_governance_kaizen_* (08-… §7.5 + 09-… §5.5)
direct_canonicalization rejection rate input.rejected direct branch (§3.4a + 09-… §6)
audio transcription failure/confidence input.audio_transcribed.confidence (§3.4a)
AI/Agent task status agent_run (08-… §7.4)
top blocked clusters fn_step_blocked_severity grouped (02-… §3.1)
top cannot_complete clusters step_run/task_run cannot_complete grouped (08-… §7.3 item 9)

§7.3 Human-visible vs machine-only

Human-visible (cockpit / dashboards): problem summary · severity · owner · affected hierarchy path T6→T1 · impact estimate · age/SLA · recommended next action · drill-down link · evidence-backed AI/worker summary (MP-D9 fields).

Machine-only by default (not surfaced raw to humans): raw event tail · raw queue payload · internal retry loop · debug logs · raw trace spans · raw prompt/output payload (if sensitive) · raw audio/attachment bytes.

This is the same "no raw event noise" boundary as 00-… §3 inv 16 + 02-… §7.7, now stated as a profile rule: humans see summaries + drill-down; machines hold the raw.

§7.4 Freshness rules

  • Every dashboard summary carries generated_at.
  • A summary older than its dot_config observability.staleness.<surface> window is labelled stale in the UI.
  • An AI summary cannot replace evidence — the drill-down to source events is always present (MP-D9 rule 1).
  • A summary cannot auto-resolve a problem without a source *.resolved/*.recovered/*.healed event (MP-D9 rule 2; §4.6).

§7.5 Reconciling the raw PG-event document's observability emphasis

The raw Bắt sự kiện của PG.docx (via its KB consolidations) emphasizes: schema registry, distributed tracing, orchestrator/governance visibility, DLQ, ACK/NACK, idempotency. Each is already embedded and is now profile-bound:

Raw-doc emphasis Where embedded Profile binding
schema registry / event validation event_type_registry + event_validation_audit (03-… §3.1) event_schema_validation_pass/fail metric (§7.2)
distributed tracing W3C trace_id NOW (03-… §5.6) trace_id/correlation_id coverage metric (§7.2)
DLQ / ACK-NACK / idempotency 03-… §5.2 + §5.4 + §6 DLQ depth + replay + idempotency_conflict metrics (§7.2)
orchestrator / governance visibility governance cockpit (08-… §7) + problem queue (§4) governance_problem_count, human-visible summary (§7.3)

Sentinel (MP-D20). Every workflow/event/input/Kaizen/agent surface exposes the §7.2 metric set (or declares the metric not_yet_wired as a known gap, never silently absent). Human surfaces show summaries + drill-down only; raw event tail / queue payload / spans / prompts / bytes stay machine-only. Every summary carries generated_at; stale summaries are labelled; no auto-resolve without a source event.


§8. MP-D21 — Raw PG-event document OSS/tool reconciliation under Assembly First

§8.1 Method

The raw PG-event document suggests a tool stack. Each is reconciled here under Assembly First (Điều 7) + Gate A (state-vocab fit) + Gate B (config-first fit) per 05-… §1, with a verdict label and an explicit "what it may NOT do". No final selection. No version pin. No implementation. This table extends 05-… and is mirrored there (05-… §7).

§8.2 Raw PG-event tool reconciliation table

Tool Verdict / label May be used for May NOT do (boundary)
Hasura sandbox_reference_only OR reject_as_core_owner (L2 + L6) Study its all-in-one event-trigger/subscription patterns Own the core event plane; bypass the Directus/Nuxt/PG boundary (Điều 33 v2.1); become a second SoT or connect clients directly to PG
pg-boss / Graphile Worker reject_as_primary_substrate_now; future_adapter_slot_preserved only if it maps states to Điều 45 lifecycle and owns no state vocabulary (L2 + L5 + L7) Borrow PG-native queue patterns (priority/lease/backoff) implemented natively Own the Incomex state vocabulary (9-state floor, Điều 45 §6.7); be the primary job substrate now
Benthos / Redpanda Connect future_CDC_adapter_slot (L3 + L5 + L7) External table mirroring, high-volume CDC, config-driven PG change capture (read from event_outbox, sink externally) Be the domain-event SoT; bypass register-before-emit
NATS future_transport_adapter (L4 + L7) Multi-host worker fanout, pub/sub transport Be an event SoT; subjects must map 1:1 to event_type_registry, messages carry event_outbox_id
Redis Streams future_lightweight_stream_adapter only if Redis already operational (L4 + L7) Lightweight stream transport when Redis is already in ops Be the canonical audit/governance SoT
Temporal future_execution_backend_adapter_after_triggers (L2 + L3 + L4) Possibly a bounded execution engine post-Phase-6 if MOW saturates (Council-gated) Own workflow logic/definition — MOW/PG registry stays the workflow-definition SoT
Camunda reference_only or future human-workflow adapter (L2 + L6) Reference BPMN/human-task patterns Own approval — Điều 32 remains the approval owner
Airflow batch/reference_only (L2 + L6) Reference for future batch/data workloads separate from MOW Be the MOW orchestrator
Watermill reference_only or future glue library (L3 + L6) Reference router/middleware patterns for a native worker library Create any immediate dependency
OpenTelemetry / Jaeger / Tempo future observability adapters (L4) Consume the W3C trace_id stream once ubiquitous; trace UI Become the trace SoT — PG audit (vw_audit_event_timeline) stays SoT
Prometheus / Grafana / Loki / VictoriaMetrics future observability/dashboard adapter slots (L4 + L7) Metrics scrape / dashboards / log view that reduce super-admin governance work Replace PG governance state — governance_problem + event_outbox + dot_iu_command_run stay SoT

§8.3 Universal conditions (all of the above)

Every adoption, if it ever happens, requires (per 05-… §5): Gate A + Gate B explicit verdict, an external_tool_registry SoT-pointback row, a documented reversibility/exit path, no double-ownership with Điều 32/35/38/39/45, Birth-registry registration (Điều 0-G), heartbeat for any process-class tool, and backend permission filter for any UI-exposed surface. No final OSS selection is made in this macro.

Sentinel (MP-D21). This addendum (and 05-… §7) mentions zero version numbers, zero CI steps, zero dockerfile lines. No tool is selected; each carries an explicit "may NOT" boundary preserving PG-first SoT + the law boundaries.


§9. MP-D22 — Governance Ops Survey + updated sequencing

§9.1 New survey macro

IU_4MOTHERS_GOVERNANCE_OPS_SURVEY_DOCUMENT_ONLY_*X — a read-only document-only macro (reads via mcp__claude_ai_Incomex_VPS__query_pg STABLE/SELECT only; writes need SSH per memory, and this survey writes nothing). It runs before any Governance Cockpit / AI-Agent Ops Console implementation.

§9.2 Survey targets

existing task-status queues · AI task queue / ai_tasks · agent_run / agent-history tables if any · Directus task collections · governance views/logs · worker-heartbeat data (queue_heartbeat) · event/problem categories · existing dashboard/ops module · existing prompt/task-dispatch module · existing approval/governance collections · existing audit/event timeline views · existing observability/logging tools on the VPS (Uptime Kuma, cron health checks per 5-layers.md).

§9.3 Survey output (per target)

A classification verified_live / KB_reported / legacy_trace / candidate_requires_survey / known_gap, plus a reuse/extend/create-paper recommendation, the risk of duplicate ownership, the law boundary owner, and a readiness verdict for cockpit implementation. Output as knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/<NN>-governance-ops-survey.md (paper-only).

Current best-knowledge seed for the survey (from KB, to be verified live): ai_tasks/agent_run are not confirmed live — historical AI dispatch went through an ops connector that returned a DISABLED/connector error (KB gpt-dispatch-attempt-agent-readonly-investigation-iu-current-position-2026-05-14.md), so treat as candidate_requires_survey. queue_heartbeat, event_outbox, job_queue, job_dead_letter are verified_live (Rev2 §12 / Điều 45). dot_iu_command_run is verified_live. M-002 Workflow module is KB_reported partial; a dedicated "M-005 task orchestration" artifact was not found in KBcandidate_requires_survey.

§9.4 Updated survey + phase sequencing

  1. Candidate Registry Survey (G7) — field_registry / input_form_registry / output_table_registry / dot_function_registry (06-… §S16).
  2. Tier Registry Survey — T6/T5/T4/T3 sources (08-… §11 + §3.3 above).
  3. Governance Ops Survey — this macro (§9.2).
  4. Then decide Phase 0 / Phase 1 implementation order (06-… §S20), now informed by all three surveys.

Sentinel (MP-D22). No Governance Cockpit / AI-Agent Ops Console implementation macro runs before the Governance Ops Survey completes and classifies its targets. The three surveys precede the Phase 0/1 ordering decision.


§10. Existing infrastructure reuse table (decision per new concept)

For every concept introduced by MP-D16..MP-D22: reuse | extend | paper-only-new | survey-required | known-gap + reason. No new ownership.

Concept Decision Substrate touched Reason
Điều 5 ↔ T6→T1 mapping reuse (documentation only) law-05-five-tiers.md + 08-… §4 No new table; a clarifying mapping + readiness matrix only.
Readiness matrix status reuse 5-layers.md build status + Rev2 §12 Reads existing tier status; no mutation.
governance_problem queue paper-only-new reads job_dead_letter / queue_heartbeat / vw_governance_* / event_validation_audit / agent_run No existing typed problem-lifecycle row; it is a triage layer ABOVE existing detection sources.
governance_problem_event_link paper-only-new links to event_outbox Many-to-many provenance; no existing link table.
governance_problem_assignment paper-only-new Ownership history; additive.
governance_suppression_policy / governance_slo_policy paper-only-new reads dot_config Config-first suppression/SLO; no existing registry.
incident grouping reuse pattern governance_problem.parent_problem_id Grouping is a self-FK, not a new table.
change (remediation) reuse workflow_change_requests [VL] / generic proposal (06-… §S2) Existing change substrate; problems spawn changes, never are changes.
Kaizen review lifecycle extend input_submission (08-… §3.2) + paper kaizen_lifecycle_state Extends the staging row; no new table.
Kaizen duplicate detector paper-only-new (fn) reads input_submission + proposal fn_kaizen_duplicate_cluster STABLE; no mutation.
Kaizen anti-noise controls extend (config) dot_config kaizen.* Config-first; no hardcode.
Kaizen metrics extend vw_governance_kaizen_* (08-… §7.5) Extends existing Kaizen panel views.
Direct canonicalization gate extend input_routing_policy (08-… §3.3) + worker.canonicalizer Hardens existing gateway routing; no new owner.
Data lineage (staging↔canonical) extend input_submission.canonical_target_ref + canonical source_staging_id where permitted + dot_iu_command_run audit Reuses existing audit join; adds back-link column where the table allows.
Retention/security policy extend (config) dot_config retention.* + input_submission.security_classification Config-first; one paper column on staging.
Minimum Observability Profile reuse + bind event_validation_audit / fn_event_lag_compute / job_queue / job_dead_letter / queue_heartbeat / idempotency_registry / agent_run / governance_problem All metrics derive from existing/paper sources; the profile is a binding contract, not new substrate.
Observability adapter slots candidate slots only external_tool_registry (06-… §S18) Per 05-… Gate A + B; no final pick.
Raw PG-event tool reconciliation reuse (documentation) 05-… verdicts Extends 05-… §3-4 table; no new substrate.
Governance Ops Survey survey-required reads existing queues/tables read-only Per memory: MCP query_pg is read-only; survey writes nothing.

Especially-check infra (macro §4.1): event_outbox (reuse — detection events), event_type_registry (reuse — register-before-emit for any new *.resolved events), job_queue (reuse — queue-depth metric), job_dead_letter (reuse — dlq problem class), queue_heartbeat (reuse — silent_worker class), workflow_change_requests (reuse — change/remediation), tasks/task_comments (reuse — direct-canonicalization targets), ai_tasks/agent-run (survey-required — not confirmed live), dot_iu_command_run (reuse — lineage audit), iu_lifecycle_log (reuse — IU-scope audit), Directus collections (reuse — admin/triage views only, Điều 33 v2.1), M-002 workflow governance (reuse — KB_reported partial), M-005 task orchestration (survey-required — not found in KB). No row introduces double-ownership.


§11. Law / no-double-ownership matrix per MP (MP-D16..MP-D22)

Format: owner law/principle | what patch does | what patch must NOT do | sentinel test.

MP-D16 (Điều 5 ↔ T6→T1 mapping)

Owner Patch does Patch must NOT Sentinel
Điều 5 (architecture tiers) Maps T6→T1 onto five tiers + readiness matrix Conflate the two axes; build Tầng 4/5 on weak Tầng 1/2/3 readiness matrix blocks any surface whose lower-tier substrate is paper/survey/gap
Future Điều XX (4 Mothers app layer) Names T6→T1 as application-layer operating hierarchy Enact law text referenced as future referent only

MP-D17 (Governance problem queue)

Owner Patch does Patch must NOT Sentinel
Điều 37 v3.3 (governance org) Typed problem queue + ownership + permission filter Bypass permission for super-admin same backend predicate for all roles
Điều 32 (approval) Waiver of high/critical needs approval_id; remediation = change/proposal Auto-mutate from a problem row problem rows carry no mutation payload; remediation references separate change + approval
Điều 31 (audit) Every lifecycle transition + grouping audited; reopen additive Delete prior lifecycle history reopened links via correlation_id, never erases
Điều 45 (event substrate) Detection reads event_outbox/job_dead_letter/queue_heartbeat Re-detect with a new substrate problem classes map to existing detection sources (§4.2)

MP-D18 (Kaizen anti-noise)

Owner Patch does Patch must NOT Sentinel
Điều 28 (Nuxt input shell) Keeps five-click UX; anti-noise in backend Add user-facing complexity user-facing strings unchanged; zero internal vocabulary
Điều 32 (approval) Kaizen → proposal → Điều 32 merge Auto-apply Kaizen every workflow-branch Kaizen has proposal_state then approval_id
Điều 38/39 (IU) Kaizen targets IU only via proposal Mutate IU body from staging IU edits route through author lifecycle
Hiến pháp v4.6.3 (config-first) Rate-limit/rejection reasons in dot_config Hardcode anti-noise rules functions read config, never embed thresholds

MP-D19 (Direct canonicalization)

Owner Patch does Patch must NOT Sentinel
Hiến pháp v4.6.3 (PG-first / no hidden SoT) Allow-listed direct writes only to non-structural canonical artifacts Promote staging to SoT staging never basis for canonical reads
Điều 35 (DOT) Governance-relevant direct writes via DOT pair Bypass DOT for registry/IU writes every governance-relevant direct write has dot_iu_command_run
Điều 32 (approval) Approval-required effects forced to workflow branch Direct-write an approval-required change deny conditions (§6.3) force workflow branch
Điều 30 (reversibility) + Điều 31 (audit) Lineage + retention + correction model Non-reversible/unaudited direct write every direct write has lineage pair + retention + classification

MP-D20 (Minimum Observability Profile)

Owner Patch does Patch must NOT Sentinel
Điều 45 (event/queue/heartbeat) Metrics derive from event_outbox/job_queue/heartbeat Mutate substrate to observe observability reads only; emits events through standard producers
Điều 31 (audit) Summaries carry generated_at + evidence; drill-down to vw_audit_event_timeline Surface raw event tail no raw outbox view; machine-only raw
Điều 37 v3.3 (governance org) Human-visible split permission-filtered Bypass permission same backend predicate
Hiến pháp v4.6.3 (no hidden SoT) AI summary never replaces evidence; no auto-resolve without source event Let a summary be the SoT MP-D9 rule enforced (§4.6 / §7.4)

MP-D21 (OSS reconciliation)

Owner Patch does Patch must NOT Sentinel
Điều 7 (Assembly First) Reconciles each tool to verdict + adapter label Select/pin/implement any tool zero version/CI/dockerfile lines
Hiến pháp v4.6.3 (PG-first / no hidden SoT) SoT-pointback required for any future adoption Let any tool own event/workflow/approval/state each tool has explicit "may NOT" boundary
Điều 32 / Điều 45 / Điều 38-39 Camunda≠approval; Temporal≠workflow def; tools≠event SoT Blur ownership verdict rows name the owner law each tool may not displace

MP-D22 (Governance Ops Survey)

Owner Patch does Patch must NOT Sentinel
Design process / Điều 7 Adds read-only survey before cockpit build Implement or mutate during survey survey is SELECT-only; writes nothing
Điều 37 v3.3 Surveys existing governance collections for reuse Create duplicate ownership survey flags duplicate-ownership risk per target

No matrix row introduces double-ownership; future Điều XX is the only NEW concern referent, unchanged from Master Design Rev2 §11.


§12. IU ↔ PG ↔ queue/event relation review (binding)

Question Answer
Do governance problems duplicate IU body? No. governance_problem rows carry refs (primary_entity_ref, event ids, trace_id) + impact JSON; never IU body bytes.
Does Kaizen anti-noise mutate IU directly? No. Duplicate hashes are over staging suggestion text; approved IU changes route through Điều 38/39 author lifecycle producing a new iu_version (§5.6).
Does direct canonicalization mutate IU? No. Direct branch is denied for any IU-body change (§6.3); IU-affecting input reroutes to workflow branch + Điều 38/39.
Do staging/proposal rows carry trace/correlation? Yes. input_submission + proposal + governance_problem all carry trace_id + correlation_id (W3C MP-D1); vw_audit_event_timeline(trace_id) reconstructs the path.
Does the queue/event carry body? No. Canonicalizer/transcription/indexer queue rows carry staging_id + executor_class_ref + idempotency key + W3C trace; all input.* and governance events obey the MP-D8 deny-list.
Is PG canonicalization explicit? Yes. Governance-relevant direct writes use DOT pair (Điều 35); the canonical row stores source_staging_id where permitted; audit row joins staging ↔ canonical.
Does usage evidence learn without auto-mutating? Yes. governance_problem + Kaizen metrics + direct-rejection rates feed iu_usage_evidence signals (03-… §9) → KG feedback proposals only; never auto-mutate IU/registry (00-… §3 inv 14).
Is the vector boundary intact? Yes. Nothing here touches iu_vector_*; iu_vector_sync_enabled=false respected; no cross-IU pollution (00-… §3 inv 20).
Does the cockpit/observability mutate substrate? No. Observability reads only; problem resolution requires a real *.resolved event (§4.6); cockpit actions (DLQ replay, suppression, waiver) emit through standard producers + Điều 32.

All answers preserve 00-… §3 invariants 1, 6, 7, 11, 14, 20 verbatim.


§13. Remaining gaps / open decisions

Gap Type Resolution path
Tier sources T6/T5/T4/T3 (existing vs paper?) survey gap Tier Registry Survey (08-… §11 + §3.3) before Phase 2; multi-domain infra itself is known_gap (5-layers TD-086).
ai_tasks / agent_run table existence + shape survey gap Governance Ops Survey (§9) — not confirmed live; treat as candidate_requires_survey.
M-005 task orchestration artifact survey gap Not found in KB; Governance Ops Survey confirms whether it exists.
governance_problem + suppression/SLO registries paper-only, Phase 1/2 Tracked in 06-… §S18 (patched) + §S23. Each needs Birth row (Điều 0-G).
kaizen_lifecycle_state + fn_kaizen_duplicate_cluster paper-only, Phase 2 Extends input_submission; Phase-2 DDL + function.
Direct-canonicalization data-quality functions + security_classification column paper-only, Phase 1/2 Extends input_submission + worker.canonicalizer.
Retention/PII config defaults paper, Phase 1 dot_config retention.* defaults tunable per data domain.
State machine registry not yet deployed (Tầng 3 gap) known gap G6/OD9 Phase 1 — blocks T2/T1 runtime in the readiness matrix (§3.3).
OSS observability/transport tool selection open Per 05-… Gate A + B + Council; no pick in this macro.
OD1 (Điều 34) Council Unchanged per MP-D10; Master Design Rev2 (incl. this addendum) does not depend on OD1.

None of these blocks Master Design Rev2 approval. Each is paper-only or survey-only forward work. No new Council fork is introduced (every MP-D16..D22 item resolves within existing law boundaries + future Điều XX referent).


§14. Cross-document anchors (this addendum points back to)

  • Top-level invariants + forbidden compliance + Điều 5 vs T6→T1 clarification — 00-master-design-rev2.md §3 (inv 27..33) + §3a + §12.
  • Governance problem lifecycle + cockpit queue semantics — 02-step-state-machine-and-workflow-ui-design.md §7.10.
  • Minimum Observability Profile + raw-doc reconciliation — 03-event-5layer-realtime-dlq-design.md §12.
  • Direct canonicalization guardrails + Kaizen anti-noise + input lineage — 04-iu-centered-4mothers-binding-design.md §4.3b.
  • Raw PG-event tool reconciliation table — 05-oss-candidate-strategy-rev2.md §7.
  • Readiness items + Governance Ops Survey + sequencing — 06-open-decisions-and-readiness.md §S16 + §S20.1 + §S23.
  • Rev3 addendum (MP-D11..MP-D15) this builds on — 08-bidirectional-input-kaizen-governance-addendum.md §14.
  • Revision-4 patch log — 07-master-design-rev2-report.md §15.

§15. Acceptance for this addendum

A1. MP-D16 maps T6→T1 onto Điều 5 five tiers with the two axes clearly distinguished §3.1-§3.2; readiness matrix per surface → required substrate → status → blocker §3.3. A2. MP-D17 governance problem taxonomy (16 classes §4.2) + 5 severities §4.3 + 12+3 lifecycle states §4.4 + controls §4.5 + ack≠resolution / mitigation≠verification + auto-resolve discipline §4.6 + paper registry shapes §4.7. A3. MP-D18 Kaizen review lifecycle §5.2 + duplicate dimensions §5.3 + anti-noise controls §5.4 + metrics §5.5; user flow unchanged (five clicks, backend-only complexity) §5.1. A4. MP-D19 direct canonicalization allow §6.2 / deny §6.3 conditions + lineage §6.4 + retention/security §6.5 + IU/PG/queue relation §6.6. A5. MP-D20 Minimum Observability Profile required machine metrics §7.2 + human-visible vs machine-only split §7.3 + freshness rules §7.4 + raw-doc reconciliation §7.5. A6. MP-D21 raw PG-event tool reconciliation table §8.2 with verdict + adapter label + "may NOT" boundary per tool; universal SoT-pointback conditions §8.3; no final pick. A7. MP-D22 Governance Ops Survey macro §9.1 + targets §9.2 + output §9.3 + updated three-survey sequencing §9.4. A8. Existing infra reuse decision per concept §10 — every concept marked reuse | extend | paper-only-new | survey-required | known-gap with reason; especially-check infra list covered. A9. Law / no-double-ownership matrix per MP §11 — sentinel test on each row; no double-ownership introduced. A10. IU↔PG↔queue relation review §12 — preserves invariants 1, 6, 7, 11, 14, 20. A11. No user-facing Kaizen complexity added §5.1 + I33. A12. No final OSS tool selection §8.3 + I36; zero version/CI/dockerfile. A13. Forbidden compliance: no PG mutation, no Directus mutation, no Qdrant/vector write, no migration, no DOT command run, no law enactment, no implementation, no UI deployment, no final OSS tool pick, no dot_config gate change, no schema creation, no code generation. All schemas paper-only.

End addendum.

Back to Knowledge Hub knowledge/dev/design/v0.6-iu-4mothers-event-foundation-rev2/09-governance-operability-observability-addendum.md