KB-35CB

05 — Scalable Governance Detection Architecture (2026-06-01)

10 min read Revision 1
one-roof-governancedetection-architecturecoverage-viewsv-governed-object-candidatesv-governance-coveragev-governance-orphansscaleincremental-scandieu31dieu262026-06-01

05 — Scalable Governance Detection Architecture

Branch E. Design only — these views/models are proposed, not created. No DDL in this mission. Names are design handles.

5.1 Design constraints (from the user requirement)

  • Scale to 10⁶–10⁸ objects without full-table scans in the UI.
  • Memory-independent: rules live in PG/registry/DOT, never in agent recall.
  • Incremental where possible (rescan only changed sources).
  • Typed by object_type / source_model so a finding is attributable.
  • Pivot/count-summarisable so Registries-Pivot shows coverage without loading rows (Điều 26 5-layer: L1 summary is cheap, L3 rows are on-demand).
  • Feeds Registries-Pivot + system_issues (Điều 31) + event_outbox (Điều 45).
  • No hardcoded per-object checks in UI (Điều 28 NT-D1).

Architectural reuse: this is the same shape as the live Điều 31 contract/runner model and the Registries-Pivot count-integrity views (v_registry_leaf_setv_count_integrityv_count_drift → pivot). The governance layer reuses that pipeline with a governance lens.

5.2 Six detection layers

L1 Governance source inventory        (registry of registries — what sources exist)
        │  feeds
L2 v_governed_object_candidates        (every governed object, typed, from all sources)
        │  resolve owner/links
L3 v_governance_coverage               (each candidate + resolved link set + covered?)
        │  filter missing links
L4 v_governance_orphans                (only the gaps, typed + severity)
        │  aggregate
L5 v_governance_coverage_summary       (pivot: by type/source/owner/gap/severity)
        │  route
L6 system_issues + event_outbox        (issue per orphan; event per Đ45 register-before-emit)

Layer 1 — Governance source inventory

  • Purpose: the registry of registries — the authoritative list of where governed objects live. Without it, "what is everything?" depends on memory (the exact failure mode to kill).
  • Source: meta_catalog, collection_registry/directus_collections, table registry / information_schema.tables, pivot_definitions, dot_tools, label_rules, taxonomy_facets, event_type_registry, governance_registry, normative_registry, design_templates, approval_requests/apr_action_types, routes/API list (from Nuxt/nginx config or a route registry if/when one exists), workflow/task tables, and future object registries (added as rows, not code — Điều 26 §0-AU "thêm dòng = INSERT, không sửa code").
  • Key columns: source_id, source_kind, object_type_produced, extraction_rule_ref, owner_resolution_rule_ref, last_scanned_at, row_estimate.
  • Scale: small (tens–hundreds of sources). This table is the scale lever: adding a new object class = adding one inventory row, and the rest of the pipeline covers it automatically.
  • Refresh: on source-set change (rare); a tier-A DOT verifies no live table/registry is missing from the inventory (itself an orphan-of-inventory check).
  • Owner: GOV-COUNCIL (it is cross-system policy). Approval/gate: new inventory row = APR (medium).

Layer 2 — v_governed_object_candidates

  • Purpose: normalize every governed object from every L1 source into one typed stream.
  • Source: L1 inventory drives per-source extraction (UNION ALL of per-source SELECTs, each tagged with source_id).
  • Key columns: governed_object_type, governed_object_ref, source_id, source_model (A/B/file/pg/registry), parent_ref, risk_class, lifecycle_status, born_at.
  • Scale: this is the largest relation (≈ sum of all governed-object rows). Never materialised to the UI. Use:
    • incremental: a WHERE changed_since(last_scanned_at) predicate per source where the source has a date_updated/last_seen_at (most do);
    • partition by source_model / governed_object_type so policy objects (hundreds) are scanned every cycle while record-grade objects (millions) are sampled/aggregated.
  • Refresh: incremental per source cadence (doc 11 §cadence).
  • Owner: GOV-SIV. Gate: read-only view; no approval.

Layer 3 — v_governance_coverage

  • Purpose: for each candidate, resolve the governance link set (doc 02 §2.2) and compute covered (doc 04 §4.3).
  • Source: L2 ⋈ governance_relationslaw_jurisdictiongovernance_registryapproval_requests (exceptions) ⋈ dot_tools (dot_authority) ⋈ parent-inheritance walk.
  • Key columns: governed_object_ref, governed_object_type, owner_gov_code, owner_path_kind (direct/relation/jurisdiction/exception/delegated/inherited), capability_ok, law_ref, approval_ref, audit_ref, rollback_ref, dot_authority_ref, covered (bool), required_links_missing (array).
  • Scale critical: owner resolution uses scalar/EXISTS lookups, never fan-out joins (lesson from RP: a naive LEFT JOIN pivot_definitions ON source_object fanned 160→172 rows and broke the invariant — Điều 28 double-count via SQL). Inheritance walk is bounded-depth recursive (the RP tree walks 37 nodes depth-3 cycle-free; the bound prevents runaway on 10⁸).
  • Refresh: follows L2.
  • Owner: GOV-SIV. Gate: read-only.

Layer 4 — v_governance_orphans

  • Purpose: the gap set — only candidates where covered = false, each typed (doc 03 §3.2) and severity-graded (doc 03 §3.3).
  • Source: SELECT … FROM v_governance_coverage WHERE NOT covered.
  • Key columns: governed_object_ref, governed_object_type, gap_type, severity, source_id, source_model, owner_path_kind, detected_at, coalesce_key.
  • coalesce_key: stable hash of (object_ref, gap_type) — reuses the live system_issues.coalesce_key idempotency pattern so a persistent orphan does not create duplicate issues across scans (it bumps occurrence_count/last_seen_at).
  • Scale: small relative to L2 (the goal is for this to trend to zero for truth-class objects). Capped row materialization (doc 11 §threshold): above N orphans of one type, materialise the summary + top-N exemplars, and log() the truncation — never silently cap.
  • Owner: GOV-SIV. Gate: read-only.

Layer 5 — v_governance_coverage_summary

  • Purpose: the pivot that Registries-Pivot renders — coverage by governed_object_type × source × owner × gap_type × severity, plus the four invariant terms (doc 04 §4.2) per scope.
  • Source: v_governance_coverage + v_governance_orphans, grouped. Backed by pivot_definitions (Điều 26: counting is pivot_count() only; a grand-total bucket must be a constant-bucket VIEW, never an un-grouped count(*) — RP PIV-500 lesson).
  • Key columns: scope, group_values (jsonb), total_governed, covered, orphans, approved_exceptions, retired, coverage_pct, max_severity.
  • Scale: L1 summary is a few thousand grouped rows max → cheap to render. This is what makes "reflect coverage without loading millions of rows" true (Điều 26 L1/L2 cheap, L3 on demand).
  • Owner: GOV-SIV (health) for the numbers; GOV-MOUT (render) for display. Gate: read-only; pivot_definitions INSERT = APR.

Layer 6 — Issue / event routing

  • Purpose: turn orphans into governed signals.
  • Source: v_governance_orphanssystem_issues (one row per coalesce_key, severity-routed) → event_outbox (one event per registered event_type under Điều 45 §3.2 register-before-emit; signal-not-data, Điều 45 §4 — the event carries governed_object_ref, never payload).
  • Scale: issue creation is idempotent (coalesce); event emission is throttled (doc 11 §throttle) and aggregated (one governance_coverage_degraded summary event per scope per cycle, not one per orphan, for the high-volume case).
  • Owner: GOV-SIV emits; routing types owned per Điều 45. Gate: event types must be registered first (doc 07) — this is design, no emit happens in this mission.

5.3 Why this is not a per-object UI check (anti-Đ28-violation)

The UI (Registries-Pivot) renders only L5 (a pivot the backend computed), never L2/L3/L4 logic. There is no if governed_object_type == 'x' branch in Nuxt — the screen reads a registered design_template + a pivot_definitions-backed result (Điều 28 NT-D1/NT-D3, Điều 26 1C/1E). Drill-down from L5 → L4 exemplars → L3 object → its DB substrate is the same recursive contract as the RP drill-down (doc 09 / 04-dynamic-drilldown-layer-model.md).

5.4 Mapping to existing pipeline (reuse ledger)

New layer Reuses (live)
L1 inventory meta_catalog (169) + check_registry_coverage + fn_birth_onboarding_full_scan
L2 candidates RP v_registry_leaf_set shape (leaf-scoped, no meta double-count)
L3 coverage RP v_count_integrity shape (scalar-subquery resolution)
L4 orphans RP v_count_drift shape + system_issues.coalesce_key
L5 summary pivot_definitions + refresh_pivot_results (statement-trigger)
L6 routing system_issues + event_outbox + event_type_registry (Đ45)

Cross-refs: doc 06 (the DOTs that run these views), doc 07 (the issue/event types L6 needs), doc 11 (scale/cadence/throttle/partition detail).

Back to Knowledge Hub knowledge/dev/reports/architecture/one-roof-governance-decision-pack-2026-06-01/05-scalable-detection-architecture.md