KB-7B72 rev 5

P3D Pack 1 Phase 4B — Species/Composition/Registry Alignment Design

14 min read Revision 5
p3dpack1phase4bspeciescompositionregistrydesign

P3D Pack 1 Phase 4B — Species/Composition/Registry Alignment Design

Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no execution, no seed, no migration Prerequisite: Phase 4 vocab prep PASS (14 keys committed, planner resolves law_unit=plan_ok) Directive: gpt-directive-opus-p3d-pack1-phase4b-species-composition-registry-alignment-2026-05-11.md


A. Why Phase 4B is needed

Vocab gate is open (law_unit = plan_ok). But if we migrate 86 TAC units into IU now, every new IU row would be born with species_code=NULL, composition_level=NULL, zero labels, zero edges — exactly like the 12 pilot IU rows already in production. Meanwhile, 153 other collection mappings in species_collection_map have proper species/composition wiring.

Anh Huyên's principle: "tất cả các đối tượng phải có khai sinh chuẩn giống nhau." IU born without species is a second-class birth, creating drift between IU and the rest of the governed system. Phase 4B closes this gap before migration, not after.

Legal anchor (Điều 29 v2.0): "MỌI collection trong PG đều quản lý CÙNG 1 CÁCH: species + birth + đếm. Thiếu BẤT KỲ thuộc tính nào = collection CHƯA QUẢN LÝ = KHÔNG TIN CẬY." IU currently lacks species mapping → it is NOT TRUSTWORTHY per Điều 29. Phase 4B must join IU into the existing one-system classification, not create a parallel scheme.

Compliance matrix: See p3d-pack1-phase4b-legal-alignment-addendum.md for full law-by-law crosswalk (Điều 0-B, 0-G, 29, QT-001/003R/005).


B. Existing pre-TAC registry/species/composition model

The system already has a complete pipeline built over S111–S157:

Component Role Current state
entity_species Master taxonomy: species_code + composition_level + hierarchy (parent_id, depth) 40 rows live (v1.2 doc listed 21 — 19 added since)
species_collection_map Links each collection to its species + composition_level 153 rows (0 for IU/UV/TAC)
collection_registry Governance metadata per collection: species_code, governance_role, birth_code_strategy, migration_state IU=COL-176 (observed/pilot, species_code=NULL)
birth_registry Auto-registered births with species_code FROM species_collection_map 12 IU rows (species_code=NULL, composition_level=NULL)
entity_labels Facet-based labeling via label_rules 91,544 rows (0 for IU/UV)
universal_edges Relationship graph between entities 2,199 rows (0 for IU/UV)
fn_birth_registry_auto PG trigger: on INSERT → registers birth, fills species from species_collection_map Active on IU; produces NULL species because no mapping exists

Birth flow for a governed collection: INSERT → fn_birth_registry_auto reads species_collection_map → fills species_code + composition_level in birth_registry → labels applied via label_rules → edges materialized. For IU, this chain breaks at step 2 (no mapping) and everything downstream is empty.


C. Current IU/TAC gaps (from Phase 3 evidence)

Gap Evidence Impact
No species_collection_map for IU/UV 0 rows birth_registry species=NULL
No species_code in collection_registry for IU COL-176 species_code=NULL IU not classified in registry universe
birth_registry 12 rows with NULL species/composition Verified all 12 Existing IU births are second-class
No entity_labels for IU/UV 0/91,544 IU invisible to label-based queries/inspectors
No universal_edges for IU/UV 0/2,199 IU relationships not graph-navigable
collection_registry governance_role='observed' COL-176 migration_state='pilot' IU not governed — species/labels/edges not enforced
Birth gate Tier-0 only fn_iu_birth_gate_layer1: no species/composition checks Even if species_collection_map exists, birth gate won't enforce it
TAC completely outside registry 0 rows in birth_registry, collection_registry, species_collection_map TAC→IU migration can't be transparent rebadge

D. Birth core vs registry alignment vs DOT enrichment

Three distinct layers of "making IU a proper citizen":

Layer What it covers When it must happen Can be deferred?
Birth core What fn_iu_birth_gate_layer1/2 + fn_iu_verify_invariants enforce: canonical_address, unit_kind∈vocab, lifecycle_status, owner_ref, identity_profile keys, anchors, birth_registry row Must be correct AT birth time No — birth fails or produces wrong data
Registry alignment species_collection_map entries, collection_registry promotion (observed→governed), birth_registry species/composition backfill Before or immediately after migration Yes — can backfill, but creates temporary NULL window
DOT enrichment entity_labels, universal_edges, description_policy tier, DOT inspector certification After entities exist Yes — by design, enrichment is post-birth

Key decision: Should species/composition become part of birth core (add to birth gate) or remain registry alignment (backfill-able)? Currently birth gate does NOT check species. Adding it would be a function/trigger patch (out of scope for Phase 4B per directive). The alternative is: ensure species_collection_map exists so fn_birth_registry_auto fills species automatically at birth, without changing birth gate logic.

Opus recommendation: Keep birth gate Tier-0 (no species enforcement). Instead, ensure species_collection_map is correct so fn_birth_registry_auto fills species passively. This achieves "khai sinh chuẩn giống nhau" without patching birth gate functions. Birth gate upgrade can be a separate governance decision (Tier-1 promotion).


E. Species mapping policy options

IU needs a species. Three options:

Option Description Pros Cons
E1: New species Create species (e.g., SPE-IUL "Information Unit — Law") in entity_species, map to IU in species_collection_map Clean, specific, follows existing pattern (each governed collection type has its own species) Adds to 40-species catalog; need to decide composition_level, depth, parent_id
E2: Existing species Map IU to an existing species if one fits (e.g., a documentation/content species if it exists) No catalog growth Unlikely to fit — none of the 21 v1.2 species (DOT, workflow, page, collection, task, etc.) match "information unit"
E3: Defer Keep species_code=NULL; document deferral with traceability No catalog change Violates "khai sinh chuẩn giống nhau"; 12+86 rows stay NULL

Hypothesis pending discovery — NOT a recommendation. E1 (new species) is plausible but cannot be locked before: (a) full catalog of 40 live species inspected — an existing species may fit, (b) Điều 29 §III checked — if IU stays observed, a species-gom (e.g., SPE-GOV) may be required instead of a dedicated species, (c) QT-005 consulted if governance promotion to governed is chosen. Per Điều 29: "Hai hệ thống hành chính song song = vi phạm Assembly First."

Open questions for GPT/User (after discovery): (1) Should IU be promoted to governed (triggering QT-005 + dedicated species) or kept observed (using gom species per Điều 29 §III)? (2) If governed: one species for all IU, or per unit_kind? (3) If observed: which gom species fits?


F. Composition-level policy options

What composition_level should IU law_units have?

Option Composition Reasoning
F1: atom 🟢 atom A law_unit is a self-contained text unit. Its "sections" (article, paragraph, heading) are sub-atomic field-level content, not separate entities. Matches Điều 0-B: "Chỉ chứa hạ nguyên tử = atom."
F2: molecule 🔵 molecule If sections are treated as grouped field-level atoms. Matches "nhóm nguyên tử gắn với nhau."
F3: compound 🟣 compound If the law_unit parent→child hierarchy (via parent_or_container_ref) creates a process/structure. Matches "chứa phân tử + nguyên tử + CÓ quy trình."
F4: context-dependent varies Different section_types get different levels. Complex, harder to govern.

Hypothesis pending containment analysis under Điều 0-B — NOT a recommendation. F1 (atom) is plausible if law_unit contains no other entities. Per Điều 0-B: "Câu hỏi duy nhất: Entity này chứa gì bên trong?" Discovery must provide containment evidence before locking:

  1. Does tac_logical_unit parent/child structure mean IU CONTAINS other IU? Or are they siblings within a publication?
  2. Is parent_or_container_ref structural containment (molecule/compound) or just a document-tree reference (atom with metadata pointer)?
  3. Are sections (article, paragraph, heading) separate entities with IDs? If yes → molecule. If sub-atomic field values → atom.
  4. Does the IU planner or birth gate treat parent relationships as containment?

Decision LOCKED = false. Cannot determine composition_level until discovery answers these questions with live evidence.


G. Birth registry policy options

12 existing IU birth_registry rows have species_code=NULL, composition_level=NULL. Any species decision creates a consistency question.

Option Description Impact
G1: Backfill + forward (a) Add species_collection_map for IU. (b) Backfill 12 existing birth_registry rows with the decided species/composition. (c) Future births auto-fill via fn_birth_registry_auto. Clean: all IU rows (past + future) have same species. Requires UPDATE on birth_registry.
G2: Forward only Add species_collection_map; future births get species. 12 existing rows stay NULL. Document the gap. Simpler. But creates "before/after" split in birth_registry.
G3: Defer all No species_collection_map change. All IU births continue with NULL. Weakest. Violates alignment principle.

Opus recommendation: G1 (backfill + forward), aligned with QT-001 procedure (not ad-hoc UPDATE). The 12 existing rows are all unit_kind=design_doc_section, lifecycle_status=draft (Phase 3 evidence). These are pilot entities. Backfilling them eliminates the NULL gap.

QT-001 alignment (5-step procedure, Birth Procedures v3.1):

  1. CHECK: species_collection_map for IU exists? birth trigger exists? (both required before backfill)
  2. COUNT: source (information_unit) rows vs birth_registry rows with non-NULL species. Gap = 12.
  3. EXECUTE: dot-birth-backfill --collection=information_unit if available, or governed SQL per QT-001 §BƯỚC 3 semantics (UPDATE birth_registry SET species_code, composition_level WHERE collection_name='information_unit' AND species_code IS NULL).
  4. VERIFY: count again — all 12 rows must have species_code + composition_level.
  5. INSPECT: if governed (post QT-005 promotion), run dot-inspect-pen.

Phase 4C executable prompt must implement these 5 steps explicitly. No shortcuts.


H. Parent-child / edge implications

Phase 3 found: parent_or_container_ref exists on information_unit but is NULL on all 12 rows. universal_edges has 0 IU/UV rows.

TAC has hierarchy: tac_logical_unit has a parent/child structure within publications. If migrated, should this become:

  • H1: parent_or_container_ref populated on IU rows (structural parent)
  • H2: universal_edges materialized (graph relationships)
  • H3: Both
  • H4: Neither (defer to DOT enrichment phase)

Opus recommendation: H1 for direct parent→child (populate parent_or_container_ref during migration). H2 defer to DOT enrichment (edges are post-birth by design, per Điều 0-B §IV). This keeps migration scope narrow while preserving structural hierarchy.


I. No-hardcode contract for species/composition

RULE: Every species_code, composition_level, and species_collection_map entry used in executable
prompts must be derived from live registry queries, not from document-level or memory-level values.

AUTHORITATIVE SOURCES:
  species_code       <- live public.entity_species.species_code
  composition_level  <- live public.entity_species.composition_level or species_collection_map.composition_level
  species_collection_map.collection_name <- live public.collection_registry.collection_name

FORBIDDEN:
  - Literal 'atom', 'molecule', 'compound' in executable INSERT without a preceding SELECT
    verifying the value exists in the target table's CHECK constraint or live vocabulary.
  - Literal species codes (e.g. 'SPE-IUL') without verifying they exist in entity_species.
  - Row counts like "21 species" or "153 mappings" used as logic (snapshots only).
  - ILIKE fuzzy matching for production species assignment.

ALLOWED:
  - Live-selected values from entity_species/species_collection_map.
  - Contract-declared values (like 'law_unit' in vocab) IF gated by a verification query.
  - Counts as snapshots in reports only.

J. Recommendation and migration gate impact

Phase 4B is a design + discovery phase. After GPT/User review the discovery results, the decisions in E/F/G/H become inputs to a Phase 4C executable prompt (species_collection_map + collection_registry + birth_registry alignment).

Migration gate status:

vocab_gate=OPEN (law_unit resolves)
species_gate=BLOCKED (no species_collection_map for IU)
composition_gate=BLOCKED (no composition_level decision)
registry_alignment_gate=BLOCKED (collection_registry=observed/pilot)
label_gate=DEFERRED (DOT enrichment, post-migration)
edge_gate=DEFERRED (DOT enrichment, post-migration)
migration_allowed=false_until_species+composition+registry_gates_resolved

Sequence:

  1. Phase 4B: design (this doc) + discovery prompt (read-only) → GPT/User decisions
  2. Phase 4C: executable alignment (species_collection_map INSERT, collection_registry UPDATE, birth_registry backfill) → GPT review
  3. Phase 5: TAC→IU migration design
  4. Phase 5 execution: migrate 86 units

Phase 4B Design | No execution | Species/Composition/Registry alignment before TAC→IU migration | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/p3d-pack1-phase4b-species-composition-registry-alignment-design.md