KB-70A6

31 — Backfill Existing Objects → Governance Onboarding Design (Branch A, design-only, apply NO-GO, read-only zero mutation, 2026-06-01)

22 min read Revision 1
one-roof-governanceimplementation-indexbackfilllegacy-onboardingbranch-acursor-checkpointsource-snapshotruleset-versionidempotency-keycoverage-proofretry-dlqbackfill-issue-typesresource-budgetbirth-registryno-rescaninvariant-v3no-hardcodeno-islandapply-no-godesign-only2026-06-01

31 — Backfill Existing Objects → Governance Onboarding Design (Branch A)

Path: knowledge/dev/reports/architecture/one-roof-governance-technical-addendum-and-implementation-index-2026-06-01/ Doc: 31. Track: T6-adjacent (Branch A of the Backfill / Handoff / Input-Control addendum). Builds on docs 24 (T7), 25 (T6), concept canon 01–02, blocker register (doc 03), and GPT direction gpt-direction-add-backfill-handoff-queue-and-input-control-to-governance-design-2026-06-01. Status: DESIGN ONLY. APPLY IS NO-GO. No PG/Directus/Qdrant/Nuxt mutation. No table/view/function/trigger created. No DOT/event registered. No event/job/notification emitted. No approval/enactment/version-bump. The backfill sweep itself is a later, gated build step — this doc specifies what it would do, not the doing. Owner (proposed): GOV-SIV (Điều 31, monitoring.integrity) runs the read/detect/propose sweep; the only mutating member (apply) is a GOV-DOT (Điều 35) executor under an approved APR — NO-GO (SB-1/SB-2/H-2). SIV proposes; COUNCIL/owner approves; GOV-DOT executes. Evidence base: live read-only re-verified 2026-06-01 (counts below); concept canon M-DEF-1..10.


0. §0-GOV declaration (this design's coverage hook)

§0-GOV Governance Coverage Declaration — Branch A (Backfill)
  governed_objects:   [ backfill_run, backfill_batch, backfill_cursor ]   (Class-2 governed process records)
  owner_per_scope:    { policy: GOV-COUNCIL, health: GOV-SIV, execution: GOV-DOT,
                        render: GOV-MOUT(TTL→C-5), approval: Điều32-spine, audit: GOV-SIV }
  coverage_profile:   [ process/job profile — owner, audit, rollback, issue-event, capability ]
  axes_introduced:    [ none — consumes Axis Registry (M-DEF-9), does not mint axes ]
  detection_path:     birth_registry (master born ledger) + per-class registries + meta_catalog
  issue_event_types:  [ backfill_inventory_gap, backfill_batch_failed, backfill_incomplete,
                        backfill_dlq_overflow ] (register-before-emit, Điều 45 — NOT registered here)
  exceptions:         [ governed-exception refs reused from M-DEF-6; none minted here ]

1. The problem (why backfill is a missing layer)

The T6 scanner (doc 25) is designed to enumerate the full inventory every pass: L1 rebuilds the run's object set from registries on each scan. That is correct for a small set but is unsafe at the live scale and gives no answer to four operational questions the GPT council flagged:

  1. Existing already-born objects must be brought into governance coverage without omission and without endless full rescans.
  2. The first onboarding of a ~10⁶-object backlog must be checkpointed and resumable, not a single monolithic scan that loses all progress on failure/timeout.
  3. The result must be a provable coverage closure, not "we ran it once."
  4. It must distinguish birth-orphan (birth's job) from governance-orphan (this layer's job) so the backfill does not double-report or mislabel.

Live scale (read-only, 2026-06-01): birth_registry = 1,037,716 rows, all status='born', born_at spanning 2026-02-17 → 2026-06-01, all carrying a non-null governance_role, only 1,402 certified, across 78 distinct collection_name and 39 species_code. This is the authoritative born population the backfill must onboard. A naïve "scan everything every pass" over 1.04M rows under a 5 s read statement-timeout is not viable; backfill must be a bounded, cursored, idempotent, resumable sweep that seeds a durable candidate-state store, after which only dirty/expired groups are re-evaluated (Branch D, doc 34).


2. Authoritative source inventories (Q1 — what is authoritative)

Discover-first / no-hardcode: the backfill enumerates from registries, not a code list. Priority-ordered authoritative inventories (all live):

Inventory Live count Grain / key Role in backfill
birth_registry 1,037,716 (collection_name, entity_code); order (born_at, id) master spine — the definitive "what has been born/registered" ledger; the cursor pages this
meta_catalog 169 entity_type (+ registry_collection) class inventory — every governed object class; inventory-completeness check
collection_registry 168 code/collection_name; carries coverage_status, coverage_scope_status, coverage_exemption_reason, coverage_review_owner collection-grain coverage decisions already recorded at the birth layer (see §3)
information_unit 219 unit_id IU class members (IU coverage gated on OP-B/SB-3)
dot_tools 309 dot_code DOT class members
pivot_definitions 37 code + group_spec dims axis/pivot members (interim axis inventory until Axis Registry)
measurement_registry 142 (140 enabled) measurement_id the data-driven detector set; contributes to ruleset_version (§5)
derived_objects_registry 7 code; carries depends_on_collections[], stale_after, recompute_status derived-object members + the dirty/stale precedent Branch D reuses
binding_registry, table_registry, universal_rule_registry, law_jurisdiction, … per-class additional class members per meta_catalog.registry_collection

Rule: the per-class member registry is read from meta_catalog.registry_collection for each class — so a new class is automatically in scope when its meta_catalog row exists. If a class's source-of-truth registry is missing, the sweep raises backfill_inventory_gap and fails closed for that class (it never invents a member list). This is the governance twin of birth-orphan inventory reconciliation (concept canon 01 §9).


3. Reuse: the birth layer already records collection-grain coverage decisions

collection_registry is not greenfield for coverage onboarding — it already carries a per-collection coverage decision ledger (read-only, 2026-06-01):

coverage_status coverage_scope_status n
BIRTH_REQUIRED IN_SCOPE 74
BIRTH_DEFERRED_NEEDS_REVIEW USER_EXCLUDED 31
BIRTH_EXEMPT_STRUCTURAL_JUNCTION IN_SCOPE 20
BIRTH_DEFERRED_NEEDS_REVIEW FUTURE_SCOPE 17
BIRTH_EXEMPT_SYSTEM_LOG_OR_AUDIT IN_SCOPE 12
BIRTH_DEFERRED_NEEDS_REVIEW ORPHAN_REGISTRY 7
BIRTH_EXEMPT_DERIVED_CACHE IN_SCOPE 4
BIRTH_DEFERRED_NEEDS_REVIEW IN_SCOPE 3

Design consequences (reuse-first, no second roof):

  • Governance backfill is the layer above this, exactly as governance coverage is the layer above birth (M-DEF-4). The governance candidate verdict (doc 34) reads these collection decisions and inherits them: a collection marked BIRTH_EXEMPT_* propagates an exemption hint to its members (its children do not generate governance-orphan findings — they are accounted in the invariant's retired/ignored term), and USER_EXCLUDED/FUTURE_SCOPE collections map to Class-0 / deferred candidate verdicts rather than orphans.
  • The governance onboarding state is modeled on this existing shape (a coverage_status enum + coverage_scope_status + coverage_exemption_reason + coverage_review_owner + coverage_decided_at/by) but at the governance grain and stored in the candidate-state store (doc 34), not by widening collection_registry. No collection columns are added.
  • The existing system_issues type collection_onboarding_gap (345 live) is the birth-layer analog of the governance backfill_inventory_gap; the governance type rides the same system_issues machinery (T7).

4. Enumeration without omission + the cursor/checkpoint (Q2, Q3, Q5, Q9)

4.1 Keyset cursor over the master spine (reuse iu_route_worker_cursor shape)

The live iu_route_worker_cursor (1 active row, iu_outbound_default, events_seen=68, attempts_written=67, dead_lettered=0) is the exact durable cursor/checkpoint precedent. The backfill reuses its shape (a per-worker row), not a new bespoke mechanism:

gov_worker_cursor  (proposed, design-only — reuses iu_route_worker_cursor columns 1:1)
  worker_name        = 'gov_backfill_sweep'
  event_domain       = 'governance'
  last_created_at    = <last processed birth_registry.born_at>     -- keyset low-water mark
  last_event_id      = <last processed birth_registry.id>          -- tie-breaker (monotone)
  last_run_at, last_run_summary(jsonb)
  events_seen, attempts_written, dead_lettered (bigint)            -- progress + DLQ counters
  metadata(jsonb)    = { ruleset_version, source_snapshot_ref, batch_size, phase }

Keyset (seek) pagination, not OFFSET: each batch reads SELECT … FROM birth_registry WHERE (born_at, id) > (:last_created_at, :last_event_id) ORDER BY born_at, id LIMIT :batch_size. Because (born_at, id) is monotone and append-only, this guarantees every row is visited exactly once and no omission even as new births arrive during the sweep (new births have born_at ≥ now and are caught either by a later batch or by the handoff intake, doc 32). The cursor advances only after the batch is durably processed → crash/timeout resumes from the last committed (last_created_at, last_event_id) with zero gap and zero double-commit of the cursor.

4.2 Checkpoint semantics

  • Cursor is committed after the batch's candidate-state upserts are durable (transactional with the batch, or a strict happens-after).
  • last_run_summary records {scanned, candidates_relevant, orphans, exceptions, retired, class_0, deferred_birth, failed} per run for the coverage proof and observability.
  • phase in metadata: seeding (initial sweep) → reconciling (invariant close) → incremental (handoff-driven only, Branch D owns it thereafter). Backfill is a one-time seed, then the system is incremental; backfill never becomes a perpetual full scan.

4.3 No repeated full scans (Q5) — the core anti-rescan rule

A batch writes, per object/group, a candidate-state row keyed by (candidate_key, source_snapshot_ref, ruleset_version) (doc 34). Once written, that object/group is not re-evaluated until (a) Branch B marks its group dirty, (b) ruleset_version is bumped, or (c) its stale_after TTL expires (periodic audit). "Object checked forever" is never stored — the verdict is always qualified by (snapshot, ruleset, scan_time), so a later ruleset/snapshot change invalidates precisely the affected groups instead of forcing a blind 1.04M rescan.


5. Source snapshot + ruleset version (Q-reproducibility; reuse evolution_snapshots)

  • Source snapshot — at sweep start, capture an inventory fingerprint per group: counts + max (born_at,id) + registry content hashes. Reuse the evolution_snapshots shape (snapshot_at, scope, metrics jsonb, delta_previous jsonb, notes): one snapshot row per backfill run with scope='governance.backfill' and metrics = the per-group fingerprint. The candidate-state rows reference this source_snapshot_ref. A snapshot change (counts/hash drift) dirties the affected groups (Branch D) — it does not silently invalidate everything.
  • Ruleset version — the classification rules are data, not code: the active set of measurement_registry rows (142, 140 enabled) + the M-DEF-2 coverage-profile registry + the Axis Registry + the responsibility-scope rows. ruleset_version = hash(enabled detector rows ⊕ profile registry ⊕ axis registry ⊕ scope rows). A proposed governance_ruleset registry row (design-only) records {ruleset_version, content_hash, activated_at, activated_by}. When the ruleset changes (a detector enabled/disabled, a profile tightened), the version bump dirties the groups in that rule's scope — targeted re-evaluation, never a blanket rescan. This makes every verdict reproducible: "object X was not_relevant under ruleset v7 / snapshot S12."

6. Idempotency, dedup, conflicts (Q6, Q7)

  • Idempotency key = candidate_key = canonical_address (preferred; birth_registry.canonical_address is the stable address) falling back to (object_type, object_ref). Backfill writes are upserts keyed by (candidate_key, ruleset_version); re-running a batch is a no-op on already-seeded rows under the same ruleset. business_logic_hash/violation_hash on system_issues carry the evidence fingerprint so a re-detection increments occurrence_count rather than duplicating (T7 anti-spam).
  • Duplicates — multiple inventory rows resolving to the same canonical_address collapse to one candidate at the governance grain (M-DEF-7: roots + non-inheriting + containers; inherited leaves do not count separately). 10⁶ children under one anchored container → one candidate, Δtotal=0.
  • Conflicts — two accountable owners for the same (object × scope) is not resolved by the backfill; it raises owner_conflict (T7 #4 → GOV-COUNCIL) and the object is counted once. The backfill never adjudicates.

7. Deleted / retired / superseded (Q6) and birth↔governance precedence (Q8)

  • Retired / superseded — an object whose registry status indicates retirement (e.g. a status/migration_state terminal value, a supersedes_id chain head superseded) gets candidate verdict retired and is counted in the invariant's retired/ignored term, not as an orphan. Tombstones are never hard-deleted; the candidate-state row persists with verdict='retired' so a future re-birth is detectable.
  • Deleted — an inventory row that disappears between snapshots: the candidate-state row is marked verdict='retired', dirty_reason='source_row_gone'; the invariant accounts for it as retired; if it reappears it is re-evaluated (no silent resurrection).
  • Birth-orphan vs governance-orphan (M-DEF-4, hard precedence): an object unborn/unregistered in birth_registry → governance backfill yields (verdict deferred_birth), raising 0 governance issues; it is the Điều 19 birth-orphan scanner's job, sharing one coalesce_key (one root cause = one issue). Only born-but-uncovered objects become governance-orphan candidates. Branch C (doc 33) enforces this at the input gate via the birth_or_registry_missing state.

8. Resume after failure, retry, DLQ (Q9)

  • Resume — the committed cursor (last_created_at, last_event_id) is the resume point; a crashed run restarts from it with no gap.
  • Per-item retry — a batch item that errors (transient: lock/timeout/degraded SB-2 view) is captured in the durable retry ledger (reuse event_pending shape: error_count, last_error, processed_at); retried with bounded backoff up to N attempts.
  • DLQ — after N failed attempts an item is dead-lettered (dead_lettered counter on the cursor; row flagged in the retry ledger) and raises backfill_batch_failed; the sweep continues (one poison item never stalls the 1.04M sweep). A DLQ exceeding a threshold raises backfill_dlq_overflow (high) to GOV-SIV.
  • Never advance the cursor past a dead-lettered low-water mark silently — the DLQ item's (born_at,id) is recorded so reconciliation (§9) knows that key is accounted-but-failed, not missed.

9. Coverage proof — "no missed objects, no duplicate processing" (Q4)

The backfill is proven, not asserted, via Coverage Invariant v3 (concept canon 01 §8) closed at the governance grain over the seeded set:

for every scope S:
  total(S) = covered + orphans + approved_exceptions + retired/ignored + stale + deferred_birth + class_0

where every term is a count of candidate-state rows under the current (snapshot, ruleset). The proof obligations:

  1. No omissioncount(distinct candidate_key seeded) + count(deferred_birth) == count(governance-grain objects in the authoritative inventory at snapshot S). The keyset cursor (§4) guarantees the LHS; the snapshot (§5) fixes the RHS. A mismatch ⇒ backfill_incomplete (the sweep is not done).
  2. No duplicate processingcount(candidate_key) == count(distinct candidate_key) (idempotency key uniqueness, §6).
  3. Invariant closes — if total ≠ Σ(terms) for any scope, raise governance_schema_drift (T7 #16). A non-closing scope is a drift finding, never a silent pass.
  4. Audit-anchored — the closing ledger is written to the run's evolution_snapshots metrics + a registry_changelog reconciliation row, so the proof is durable and re-checkable.

IU-grain caveat (SB-3): IU axis-grain closure is capped at 3 axes at the substrate until iu_three_axis_envelope is generalized; the proof records this caveat for IU objects (concept-true now, IU-substrate-true after SB-3).


10. Backfill issue types (proposed, register-before-emit — NOT registered)

These ride the existing system_issues taxonomy (free-text issue_type, reuse buckets) per T7, and the governance event domain (doc 32):

issue_type bucket severity (base→max) detection event (domain=governance) route
backfill_inventory_gap thiếu_quan_hệ medium→high governance.backfill.inventory_gap class's source owner; else GOV-COUNCIL
backfill_batch_failed silent_fail (reuse) high governance.backfill.batch_failed GOV-SIV
backfill_incomplete sai_lệch_dữ_liệu high governance.backfill.incomplete GOV-SIV
backfill_dlq_overflow silent_fail high→critical governance.backfill.dlq_overflow GOV-SIV + GOV-COUNCIL

Beyond these process issues, the backfill produces the standard 20 T7 governance findings (orphan/owner/approval-path/…) for genuinely-uncovered objects — it is a producer of doc 24's taxonomy, exactly like the T6 scanner. Anti-spam (coalesce/cooldown/ceiling/summary/heartbeat) is reused verbatim from T7; the backfill emits a governance.backfill.sweep_completed heartbeat per run so silence ≠ completion.


11. Resource budget (mirrors Branch F, doc 35 §; summarized here)

  • Batch size — bounded (e.g. 2k–5k rows/batch ⇒ ~210–520 batches over 1.04M); each batch's read must complete under the read-role statement-timeout (5 s) → keyset pagination, no OFFSET, no full-table scan.
  • Throttle / schedule — initial seed runs off-peak, rate-limited (max batches/min) with a server-load guard (pause if lock-wait/replication lag exceeds threshold).
  • Concurrency — single backfill worker per scope by default (cursor is per-worker); parallelism only by disjoint group_key ranges to avoid cursor races.
  • Sampling for huge classes — for a class with millions of inherited leaves, sample representatives for detail and report the population count (no N per-row emits) — same rule as T6 §6.5.
  • No UI full-table scan — the UI reads coverage summary views, never the raw 1.04M sweep (Điều 28; doc 35 §F).
  • Observability — cursor lag, batches done/total, DLQ depth, invariant-closure %, heartbeat freshness (reuse the iu_route_worker_cursor counters + evolution_snapshots).

12. Dependencies, gates, NO-GO

Capability Needs Status
Read-only enumeration + candidate seeding (verdict only) nothing live (read-only) design + dry-run designable now
Owner-dependent verdicts (covered/orphan via owner) SB-2 views live gated; degrades (non-owner verdicts only) pre-SB-2
Durable candidate-state store (where verdicts live) SB-10 (new — candidate-state store, doc 34) build NO-GO
Backfill worker cursor row/family SB-13 (new — gov worker-cursor family; reuse iu_route_worker_cursor shape) build NO-GO
Source snapshot + ruleset registry SB-12 (new — snapshot/ruleset registry; reuse evolution_snapshots/measurement_registry) build NO-GO
Emit any backfill finding/heartbeat governance event domain registered+active (SB-11/SB-4) + register-before-emit NO-GO
Propose remediation for an uncovered object SB-1 Phase-A (C-2) gated; degrades to draft-only
Apply any owner/exception row SB-1 Phase-B + SB-2 live + approved APR + sovereign sign-off (H-2/SB-6) NO-GO
Backfill ruleset ownership + legacy-bypass deadline C-7 (new council item; extends C-6/A3 60-day) decision pending

No gate may be satisfied by self-approval. New blockers SB-10..SB-13 and council item C-7 are registered in doc 35 §2.


13. Verdict

Branch A backfill / legacy governance-onboarding design: COMPLETE. All ten mandated questions are answered: authoritative inventories (birth_registry spine + meta_catalog classes + per-class registries, §2); omission-free keyset enumeration with a reused-shape cursor/checkpoint (§4); reproducibility via source snapshot + ruleset version (§5); coverage proof via Invariant v3 with explicit no-omission/no-duplicate obligations (§9); the anti-rescan rule (verdict keyed by snapshot+ruleset, §4.3); deleted/retired/superseded tombstoning (§7); dedup/conflict at governance grain (§6); birth↔governance precedence (§7); resume/retry/DLQ via reused event_pending/cursor shapes (§8); audit via registry_changelog+evolution_snapshots (§9). The backfill reuses birth_registry, collection_registry.coverage_status, iu_route_worker_cursor, event_pending, evolution_snapshots, measurement_registry, system_issues, registry_changelog, meta_catalogno second roof, no hardcoded object/axis list. Apply is NO-GO; nothing registered, emitted, or mutated. Next: doc 32 (durable birth→governance handoff ledger that feeds the candidate layer the backfill seeds).

Back to Knowledge Hub knowledge/dev/reports/architecture/one-roof-governance-technical-addendum-and-implementation-index-2026-06-01/31-backfill-existing-objects-governance-onboarding-design.md