KB-4C86
02 — Leaf-Set Definition (Branch B)
4 min read Revision 1
registries-pivotleaf-setbranch-baccounting-invariantrollupdata-drivenno-hardcode2026-05-31
title: 02 — Leaf-Set Definition (Branch B) date: 2026-05-31 verdict: LEAF_SET_RULE = EXPLICIT + DATA-DRIVEN (no hardcoded code list); 160 leaf / 9 rollup
02 — Leaf-Set Definition (Branch B)
The accounting invariant total = counted + orphan + phantom is only meaningful over a set with no nested rollups. meta_catalog (169 rows) mixes leaf object-sets with rollup/meta summary rows; a blind SUM double-counts — the exact disguised-math trap Đ28 forbids.
composition_level distribution (live, all 169 rows)
| composition_level | rows | Σrecord | Σactual | Σorphan |
|---|---|---|---|---|
| atom | 77 | 3,635,579 | 3,836,259 | 0 |
| molecule | 51 | 1,670 | 1,677 | 0 |
| compound | 34 | 846 | 749 | 0 |
| material | 2 | 58 | 113 | 0 |
| meta | 3 | 517 | null | 161 |
| product | 1 | 0 | 0 | 0 |
| building | 1 | 0 | 0 | 0 |
The 9 rollup/meta rows (must be EXCLUDED from the invariant)
| code | entity_type | composition_level | record | actual | note |
|---|---|---|---|---|---|
| CAT-ALL | all |
atom | 1,682,270 | 1,919,748 | grand rollup of all atoms — sits inside the atom level → the double-count trap |
| CAT-MOL | molecule_total |
molecule | 774 | 766 | per-layer rollup |
| CAT-CMP | compound_total |
compound | 423 | 326 | per-layer rollup |
| CAT-MAT | material_total |
material | 0 | 55 | per-layer rollup |
| CAT-BLD | building_total |
building | 0 | 0 | per-layer rollup |
| CAT-PRD | product_total |
product | 0 | 0 | per-layer rollup |
| CAT-DOT | dot_total |
meta | 307 | null | DOT registry rollup (carries 140 orphans) |
| CAT-COL | collection_total |
meta | 168 | null | collection registry rollup (20 orphans) |
| CAT-SPE | species_total |
meta | 42 | null | species rollup (1 orphan) |
LEAF-SET RULE (data-driven; no hardcoded code list)
leaf_set := meta_catalog
WHERE composition_level <> 'meta' -- excludes CAT-DOT/COL/SPE
AND entity_type NOT LIKE '%_total' -- excludes CAT-MOL/CMP/MAT/BLD/PRD
AND entity_type <> 'all' -- excludes CAT-ALL (lives inside 'atom')
- Live result: 160 leaf rows (169 − 9). Cross-check inside the rehearsal:
meta_rollup_record_excluded = 517= 307+168+42 ✓ (the three meta rows), and CAT-ALL/layer-totals excluded byentity_type. - The rule keys on existing data columns only (
composition_level,entity_type) — there is no hardcoded list of CAT codes. It survives new categories: any new*_total/all/metarow is auto-excluded; any new real object-set is auto-included.
Which rows are what
- Leaf / real object-set (160): each maps to a concrete substrate via
source_location(e.g.Directus:birth_registry,Directus:entity_labels,File:dot/bin/). Top leaves: CAT-023 birth 980,378 · CAT-068 entity_labels 690,341 · CAT-017 system_issue 171,939 · CAT-016 changelog 66,095 · CAT-141 measurement_log 31,984. - Rollup / derived-summary (9): the table re-aggregates leaves (per-layer
*_total, the globalall, and the threemetaregistry totals). These are display conveniences, not object-sets. - Excluded from invariant computation: the 9 rollups.
Residual gaps (recorded, not hidden)
- LEAF_SET is necessary but not provably MECE. Even within the 160 leaves there may be overlap (e.g. a "junction_table" count vs an "all" count of the same physical rows). The rule removes the known rollups; it cannot guarantee mutual exclusivity. → The true cross-check anchor is PIV-500 (a grand-total pivot over the universal object substrate, e.g.
birth_registry), not a catalog re-sum. PIV-500 isPIVOT_MISSING(doc 03). - 5 leaf rows have
actual_count IS NULL(leaf_actual_null = 5) → those leaves arecount_integrity_status='unverified'(can't be reconciled until scanned). - The leaf rule lives in PG (a view
v_registry_leaf_set, rehearsed in doc 04). It must never be re-implemented as a Nuxt filter (Đ28 NT-D1).