KB-43A6

07 — Scalable Detection Hardening (Branch G) (2026-06-01)

14 min read Revision 1
one-roof-governanceclause-hardeningbranch-gdetection-viewssource-inventoryroute-registryground-truthincremental-scan10e8-scaleunverifiable2026-06-01

07 — Scalable Detection Hardening (Branch G)

Reviews decision-pack doc 05 (…/05-scalable-detection-architecture.md) + doc 11 (scale). Adversarial. Stress-tested at 10⁸ objects.

G0. What the pack says (summary)

Doc 05 defines a 6-layer pipeline: L1 source inventory (registry of registries) → L2 v_governed_object_candidates → L3 v_governance_coverage (resolve links, scalar/EXISTS only, no fan-out) → L4 v_governance_orphans (gaps, severity, coalesce_key) → L5 v_governance_coverage_summary (pivot) → L6 system_issues+event_outbox. Reuses RP count-integrity view shapes; UI renders only L5. Doc 11 adds partitioning by (type × source_model), incremental changed_since, cadence, throttle, bounded-depth inheritance, log() truncation, stale handling.

Overall: the pipeline shape is correct and the reuse of the proven RP view discipline (scalar resolution, leaf-scoping, pivot-only UI) is exactly right. The defects are all about the ground-truth foundation: the architecture is only as complete as L1, and L1's completeness is under-guaranteed — especially for routes/API.


G1 — L1 completeness is the load-bearing assumption and it is under-guaranteed

  • Original: §5.2 L1 = "the authoritative list of where governed objects live … without it, 'what is everything?' depends on memory (the exact failure mode to kill)." A tier-A DOT "verifies no live table/registry is missing from the inventory."
  • Recursion not closed: if a source is missing from L1, its objects are invisible — the memory-dependence merely moves to "remembering to add the L1 row." The verifier DOT must enumerate "all live sources," but how does it enumerate them without its own hardcoded list? The pack says it checks "no live table/registry is missing" but never names the ground-truth enumeration it checks against. This is the recursion's base case and it's left implicit.
  • Hardened wording — define the inventory-completeness check against named ground truths:

    "L1 completeness is verified by reconciling the inventory against the machine-enumerable ground truths: information_schema.tables (all PG tables), directus_collections (all collections), meta_catalog (all registered catalogs), pivot_definitions (all pivots), dot_tools (all DOTs), event_type_registry (all event types), normative_registry (all laws), label_rules/taxonomy_facets, and the route registry (G2). Any ground-truth object not mapped to an L1 source row = inventory_gap (critical — the detector is provably blind to it). The base case requires no memory: PG's own catalog is the floor."

  • Acceptance test: dropping a known table out of the L1 inventory (in rehearsal) makes the completeness DOT raise inventory_gap for it; the check enumerates from information_schema, not a hand list.
  • Open question: OQ-G1 — some governed objects live in files (the RP File:dot/bin/... substrate, model-B) or in nginx config, not PG. What is the file/config ground truth, and how is it enumerated without memory? (Partially answered by G2 for routes; files need a defined scan root.)

G2 — Routes/API have NO ground-truth registry → the most island-prone surface is undetectable

  • Original: §5.2 L1 lists "routes/API list (from Nuxt/nginx config or a route registry if/when one exists)."
  • Critical gap: there is no route registry today (the RP routes were added to nginx + server/api/ by hand — see the RP ship history). Routes/API are the most island-prone surface (red-team #1: "new API route added without owner") and the live Direct-PG adapter is exactly an unregistered route. "If/when one exists" means route coverage is currently unverifiable — yet the pack's application matrix (doc 01 §1.4) marks "Directus / API surface" as governed-by-MOUT as if it were detectable.
  • Hardened wording:

    "A route/API registry is a prerequisite for route-class coverage. Until it exists, route coverage is UNVERIFIABLE (a distinct state from covered and from orphan), and the production gate treats UNVERIFIABLE route-class as a fail for any feature adding a route (you cannot ship an unverifiable surface). Interim ground truth: derive a route inventory from (a) the Nuxt server/api/** directory tree and (b) the nginx location blocks, reconciled — a route present in nginx/Nuxt but absent from the (derived) registry = route_orphan (high). Building the route registry is a named prerequisite macro, not an assumption."

  • Acceptance test: the Direct-PG adapter route (/api/registries-pivot/*) and any new route appear in the derived route inventory; a route in nginx with no owner = route_orphan; a feature adding an unverifiable route is gate-blocked.
  • Open question: OQ-G2 — should the route registry be a new governed table, or derived-on-scan from nginx+Nuxt? (Derived-on-scan is reuse-first and avoids a new island, but needs a defined scan root on the VPS.)

G3 — Incremental changed_since(object.date_updated) misses governance-context changes

  • Original: §5.2 L2 / doc 11 §11.2: incremental scan via WHERE changed_since(last_scanned_at) per source.
  • Trap: an object can become an orphan without the object row changing: its owner agency flips draft (already a concern given 4 draft agencies), its governance_relations owner edge is deleted, an exception TTL expires, or a law's owner edge is removed. changed_since(object.date_updated) won't catch any of these — the object didn't change, the governance context did. Coverage silently rots; the gate goes green on stale TRUEs.
  • Hardened wording: incremental scan triggers on the union of (a) object change (date_updated) and (b) governance-context change — subscribe to governance_registry status flips, governance_relations edge insert/delete, approval_requests/exception expiry, and apr_action_types changes. A governance-context change forces re-scan of all objects resolving ownership through the changed context (e.g. flipping GOV-MOUT active→draft re-scans all MOUT-owned objects). The full weekly reconciliation (§11.2) is the backstop, but context-change triggers prevent week-long blind windows.
  • Acceptance test: deleting an owner edge (in rehearsal) immediately marks the dependent objects for re-scan and surfaces them as OWNER_GAP, without waiting for the object rows to change.
  • Open question: none — reuses the Đ45 event-driven re-scan the pack already proposes (§11.2 bullet 3), just extends it to governance-context events.

G4 — The coverage_pct degradation threshold is an unsourced literal (island risk)

  • Original: doc 06 DOT-GOV-COVERAGE-AUDIT emits governance_coverage_degraded "when coverage_pct drops below a governed threshold."
  • Self-inconsistency: if "governed threshold" is a literal baked into the DOT, it is itself an F-ISLAND-3 violation (a policy value with no owner/approval) — the detector breaks its own rule. The pack says "governed" but doesn't say where the threshold lives.
  • Hardened wording: the degradation threshold is a row in the same COUNCIL-owned threshold-policy table as the ≤50 grouping ceiling (doc 08 §8.4) — owner_gov_code='GOV-COUNCIL', changed only via APR, never a DOT literal. Same for the stale-multiplier k (doc 11 §11.9), the materialization cap N (§11.6), and the throttle thresholds — every numeric threshold in the detection pipeline is a governed policy row, not a code literal.
  • Acceptance test: grepping the scanner DOT code for numeric thresholds finds none; all thresholds resolve from the governed policy table; CI (P8) fails a planted literal threshold.
  • Open question: none.

G5 — L3 scalar-resolution discipline is right but the inheritance walk is the cardinality risk at 10⁸

  • Original: §5.2 L3 uses scalar/EXISTS (no fan-out — the RP 160→172 lesson) and bounded-depth recursive inheritance.
  • Scale stress: at 10⁸, even a bounded-depth recursive CTE per candidate is expensive if run over the leaf population. But with F1 (governance grain — leaves not individually counted), the walk runs only over containers + non-inheriting objects (thousands), so the walk is cheap iff F1 is applied. The pack's L3 doesn't reference the grain restriction, so an implementer might run L3 over all 10⁸ candidates.
  • Hardened wording: L3 (and L2) operate on the governance-grain population only (F1); leaf records never enter L2/L3 individually. State the join order: resolve container coverage first (thousands), then leaves inherit by a single container lookup (no per-leaf recursion). This is what makes "scale to 10⁸ without full-table scans" actually true.
  • Acceptance test: L2/L3 row counts are O(containers + non-inheriting), independent of leaf population; EXPLAIN shows no per-leaf recursion.
  • Open question: none.

G6 — Stale handling (§11.9) is good but "STALE ≠ covered" must propagate to the gate AND the identity

  • Original: §11.9: a partition scanned older than k×cadence shows STALE + raises governance_scan_stale; "stale ≠ covered."
  • Gap: the identity (doc 04 §4.2) has four terms — covered/orphans/exceptions/retired. STALE is a fifth state with no home in the identity. An object whose coverage is stale is neither reliably covered nor confirmed orphan. If STALE objects are silently counted as covered, the identity lies.
  • Hardened wording: add stale/unverifiable as an explicit identity term (or fold into orphans with gap_type=STALE): total = covered + orphans + exceptions + retired_or_approved_ignore + **stale_unverifiable**. The gate treats stale_unverifiable (and UNVERIFIABLE route-class, G2) as fail, never pass. This also gives UNVERIFIABLE (G2) a home in the identity.
  • Acceptance test: a stale partition's objects are counted in stale_unverifiable, not covered; the feature gate fails on stale touched objects.
  • Open question: none — but note this makes the identity 5-term; council to confirm.

G7 — Throttle/aggregate (§11.5/§11.11) is right; add a hard emit ceiling as a backstop

  • Original: §11.5/§11.11: one issue per (object_ref, gap_type) via coalesce_key; mass-orphan classes aggregate to one summary event per scope/cycle; log() truncation; never silent cap. (Red-team #9: "scanner reports 1M duplicate issues" — addressed by coalesce + aggregate.)
  • Residual risk: coalesce dedupes repeats of the same key, and aggregation handles known high-cardinality classes — but a new mass-orphan class not yet recognized as high-cardinality could still emit one issue per distinct (ref, gap) before aggregation kicks in (e.g. a misconfigured L1 source extracting 10⁶ distinct refs).
  • Hardened wording: add a hard per-scan emit ceiling (a governed threshold, G4): if a single scan would open more than M new issues of one type, it instead opens one aggregate issue ("N new orphans of type X — enumeration suppressed, see summary") + log()s, and raises a scan_anomaly (the detector itself may be misconfigured). This backstops the unknown high-cardinality case, not just the known ones.
  • Acceptance test: a rehearsal source that yields 10⁶ candidates produces one aggregate issue + a scan_anomaly, never 10⁶ rows.
  • Open question: none.

G8 — "UI renders only L5, never scans base tables" (§5.3/§11.8) — verify the API layer too

  • Original: §5.3/§11.8: the UI reads pre-computed L5; "the UI literally cannot scan" (Đ28 NT-D1).
  • Note (not a defect, a completeness check): the live RP anti-pattern (health.get.ts computing totalGap via reduce) is in the Nitro API layer, not the Vue layer. The pack's "no UI scan" rule must cover the server/api endpoints too, not just .vue files — the API endpoint is where the count-math island actually lived. The pack says this (doc 08 NT-D1-ext names health.get.ts) but the §11.8 phrasing ("UI") should explicitly include the Nitro/server-api tier.
  • Hardened wording: "UI/render tier" = both Vue components and Nitro/server-api endpoints; neither may compute governance/count/grouping truth; both read L5/pivot results only. CI (P8) scans both trees.
  • Acceptance test: CI scans web/server/api/** and web/pages|components/** for governance/count math; the retired health.get.ts reduce is caught.
  • Open question: none.

G-summary — Branch G verdict

ID Severity Type Disposition
G1 high recursion not closed inventory-completeness check vs information_schema et al; inventory_gap=critical
G2 high gap / trap route registry is a prerequisite; routes are UNVERIFIABLE until built
G3 high trap incremental scan must trigger on governance-context changes, not just object changes
G4 medium island risk all numeric thresholds are governed policy rows, not literals
G5 medium scale L2/L3 over governance-grain only (depends on F1)
G6 medium gap STALE/UNVERIFIABLE is a 5th identity term; gate-fails
G7 low backstop hard per-scan emit ceiling for unknown mass-orphan classes
G8 low completeness "UI tier" includes Nitro server-api

G1+G2+G3 are the foundation fixes: the whole pipeline is only as trustworthy as L1's completeness and the freshness of its triggers. G2 (no route registry) is the single most concrete detection blind spot.

Back to Knowledge Hub knowledge/dev/reports/architecture/one-roof-governance-clause-review-hardening-2026-06-01/07-scalable-detection-hardening.md