P3D Pack 1 Phase 4B — Species/Composition/Registry Alignment Design
P3D Pack 1 Phase 4B — Species/Composition/Registry Alignment Design
Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no execution, no seed, no migration Prerequisite: Phase 4 vocab prep PASS (14 keys committed, planner resolves law_unit=plan_ok) Directive:
gpt-directive-opus-p3d-pack1-phase4b-species-composition-registry-alignment-2026-05-11.md
A. Why Phase 4B is needed
Vocab gate is open (law_unit = plan_ok). But if we migrate 86 TAC units into IU now, every new IU row would be born with species_code=NULL, composition_level=NULL, zero labels, zero edges — exactly like the 12 pilot IU rows already in production. Meanwhile, 153 other collection mappings in species_collection_map have proper species/composition wiring.
Anh Huyên's principle: "tất cả các đối tượng phải có khai sinh chuẩn giống nhau." IU born without species is a second-class birth, creating drift between IU and the rest of the governed system. Phase 4B closes this gap before migration, not after.
Legal anchor (Điều 29 v2.0): "MỌI collection trong PG đều quản lý CÙNG 1 CÁCH: species + birth + đếm. Thiếu BẤT KỲ thuộc tính nào = collection CHƯA QUẢN LÝ = KHÔNG TIN CẬY." IU currently lacks species mapping → it is NOT TRUSTWORTHY per Điều 29. Phase 4B must join IU into the existing one-system classification, not create a parallel scheme.
Compliance matrix: See p3d-pack1-phase4b-legal-alignment-addendum.md for full law-by-law crosswalk (Điều 0-B, 0-G, 29, QT-001/003R/005).
B. Existing pre-TAC registry/species/composition model
The system already has a complete pipeline built over S111–S157:
| Component | Role | Current state |
|---|---|---|
entity_species |
Master taxonomy: species_code + composition_level + hierarchy (parent_id, depth) | 40 rows live (v1.2 doc listed 21 — 19 added since) |
species_collection_map |
Links each collection to its species + composition_level | 153 rows (0 for IU/UV/TAC) |
collection_registry |
Governance metadata per collection: species_code, governance_role, birth_code_strategy, migration_state | IU=COL-176 (observed/pilot, species_code=NULL) |
birth_registry |
Auto-registered births with species_code FROM species_collection_map | 12 IU rows (species_code=NULL, composition_level=NULL) |
entity_labels |
Facet-based labeling via label_rules | 91,544 rows (0 for IU/UV) |
universal_edges |
Relationship graph between entities | 2,199 rows (0 for IU/UV) |
fn_birth_registry_auto |
PG trigger: on INSERT → registers birth, fills species from species_collection_map | Active on IU; produces NULL species because no mapping exists |
Birth flow for a governed collection: INSERT → fn_birth_registry_auto reads species_collection_map → fills species_code + composition_level in birth_registry → labels applied via label_rules → edges materialized. For IU, this chain breaks at step 2 (no mapping) and everything downstream is empty.
C. Current IU/TAC gaps (from Phase 3 evidence)
| Gap | Evidence | Impact |
|---|---|---|
| No species_collection_map for IU/UV | 0 rows | birth_registry species=NULL |
| No species_code in collection_registry for IU | COL-176 species_code=NULL | IU not classified in registry universe |
| birth_registry 12 rows with NULL species/composition | Verified all 12 | Existing IU births are second-class |
| No entity_labels for IU/UV | 0/91,544 | IU invisible to label-based queries/inspectors |
| No universal_edges for IU/UV | 0/2,199 | IU relationships not graph-navigable |
| collection_registry governance_role='observed' | COL-176 migration_state='pilot' | IU not governed — species/labels/edges not enforced |
| Birth gate Tier-0 only | fn_iu_birth_gate_layer1: no species/composition checks | Even if species_collection_map exists, birth gate won't enforce it |
| TAC completely outside registry | 0 rows in birth_registry, collection_registry, species_collection_map | TAC→IU migration can't be transparent rebadge |
D. Birth core vs registry alignment vs DOT enrichment
Three distinct layers of "making IU a proper citizen":
| Layer | What it covers | When it must happen | Can be deferred? |
|---|---|---|---|
| Birth core | What fn_iu_birth_gate_layer1/2 + fn_iu_verify_invariants enforce: canonical_address, unit_kind∈vocab, lifecycle_status, owner_ref, identity_profile keys, anchors, birth_registry row |
Must be correct AT birth time | No — birth fails or produces wrong data |
| Registry alignment | species_collection_map entries, collection_registry promotion (observed→governed), birth_registry species/composition backfill | Before or immediately after migration | Yes — can backfill, but creates temporary NULL window |
| DOT enrichment | entity_labels, universal_edges, description_policy tier, DOT inspector certification | After entities exist | Yes — by design, enrichment is post-birth |
Key decision: Should species/composition become part of birth core (add to birth gate) or remain registry alignment (backfill-able)? Currently birth gate does NOT check species. Adding it would be a function/trigger patch (out of scope for Phase 4B per directive). The alternative is: ensure species_collection_map exists so fn_birth_registry_auto fills species automatically at birth, without changing birth gate logic.
Opus recommendation: Keep birth gate Tier-0 (no species enforcement). Instead, ensure species_collection_map is correct so fn_birth_registry_auto fills species passively. This achieves "khai sinh chuẩn giống nhau" without patching birth gate functions. Birth gate upgrade can be a separate governance decision (Tier-1 promotion).
E. Species mapping policy options
IU needs a species. Three options:
| Option | Description | Pros | Cons |
|---|---|---|---|
| E1: New species | Create species (e.g., SPE-IUL "Information Unit — Law") in entity_species, map to IU in species_collection_map |
Clean, specific, follows existing pattern (each governed collection type has its own species) | Adds to 40-species catalog; need to decide composition_level, depth, parent_id |
| E2: Existing species | Map IU to an existing species if one fits (e.g., a documentation/content species if it exists) | No catalog growth | Unlikely to fit — none of the 21 v1.2 species (DOT, workflow, page, collection, task, etc.) match "information unit" |
| E3: Defer | Keep species_code=NULL; document deferral with traceability | No catalog change | Violates "khai sinh chuẩn giống nhau"; 12+86 rows stay NULL |
Hypothesis pending discovery — NOT a recommendation. E1 (new species) is plausible but cannot be locked before: (a) full catalog of 40 live species inspected — an existing species may fit, (b) Điều 29 §III checked — if IU stays observed, a species-gom (e.g., SPE-GOV) may be required instead of a dedicated species, (c) QT-005 consulted if governance promotion to governed is chosen. Per Điều 29: "Hai hệ thống hành chính song song = vi phạm Assembly First."
Open questions for GPT/User (after discovery): (1) Should IU be promoted to governed (triggering QT-005 + dedicated species) or kept observed (using gom species per Điều 29 §III)? (2) If governed: one species for all IU, or per unit_kind? (3) If observed: which gom species fits?
F. Composition-level policy options
What composition_level should IU law_units have?
| Option | Composition | Reasoning |
|---|---|---|
| F1: atom | 🟢 atom | A law_unit is a self-contained text unit. Its "sections" (article, paragraph, heading) are sub-atomic field-level content, not separate entities. Matches Điều 0-B: "Chỉ chứa hạ nguyên tử = atom." |
| F2: molecule | 🔵 molecule | If sections are treated as grouped field-level atoms. Matches "nhóm nguyên tử gắn với nhau." |
| F3: compound | 🟣 compound | If the law_unit parent→child hierarchy (via parent_or_container_ref) creates a process/structure. Matches "chứa phân tử + nguyên tử + CÓ quy trình." |
| F4: context-dependent | varies | Different section_types get different levels. Complex, harder to govern. |
Hypothesis pending containment analysis under Điều 0-B — NOT a recommendation. F1 (atom) is plausible if law_unit contains no other entities. Per Điều 0-B: "Câu hỏi duy nhất: Entity này chứa gì bên trong?" Discovery must provide containment evidence before locking:
- Does
tac_logical_unitparent/child structure mean IU CONTAINS other IU? Or are they siblings within a publication? - Is
parent_or_container_refstructural containment (molecule/compound) or just a document-tree reference (atom with metadata pointer)? - Are sections (article, paragraph, heading) separate entities with IDs? If yes → molecule. If sub-atomic field values → atom.
- Does the IU planner or birth gate treat parent relationships as containment?
Decision LOCKED = false. Cannot determine composition_level until discovery answers these questions with live evidence.
G. Birth registry policy options
12 existing IU birth_registry rows have species_code=NULL, composition_level=NULL. Any species decision creates a consistency question.
| Option | Description | Impact |
|---|---|---|
| G1: Backfill + forward | (a) Add species_collection_map for IU. (b) Backfill 12 existing birth_registry rows with the decided species/composition. (c) Future births auto-fill via fn_birth_registry_auto. | Clean: all IU rows (past + future) have same species. Requires UPDATE on birth_registry. |
| G2: Forward only | Add species_collection_map; future births get species. 12 existing rows stay NULL. Document the gap. | Simpler. But creates "before/after" split in birth_registry. |
| G3: Defer all | No species_collection_map change. All IU births continue with NULL. | Weakest. Violates alignment principle. |
Opus recommendation: G1 (backfill + forward), aligned with QT-001 procedure (not ad-hoc UPDATE). The 12 existing rows are all unit_kind=design_doc_section, lifecycle_status=draft (Phase 3 evidence). These are pilot entities. Backfilling them eliminates the NULL gap.
QT-001 alignment (5-step procedure, Birth Procedures v3.1):
- CHECK: species_collection_map for IU exists? birth trigger exists? (both required before backfill)
- COUNT: source (information_unit) rows vs birth_registry rows with non-NULL species. Gap = 12.
- EXECUTE:
dot-birth-backfill --collection=information_unitif available, or governed SQL per QT-001 §BƯỚC 3 semantics (UPDATE birth_registry SET species_code, composition_level WHERE collection_name='information_unit' AND species_code IS NULL). - VERIFY: count again — all 12 rows must have species_code + composition_level.
- INSPECT: if governed (post QT-005 promotion), run dot-inspect-pen.
Phase 4C executable prompt must implement these 5 steps explicitly. No shortcuts.
H. Parent-child / edge implications
Phase 3 found: parent_or_container_ref exists on information_unit but is NULL on all 12 rows. universal_edges has 0 IU/UV rows.
TAC has hierarchy: tac_logical_unit has a parent/child structure within publications. If migrated, should this become:
- H1:
parent_or_container_refpopulated on IU rows (structural parent) - H2:
universal_edgesmaterialized (graph relationships) - H3: Both
- H4: Neither (defer to DOT enrichment phase)
Opus recommendation: H1 for direct parent→child (populate parent_or_container_ref during migration). H2 defer to DOT enrichment (edges are post-birth by design, per Điều 0-B §IV). This keeps migration scope narrow while preserving structural hierarchy.
I. No-hardcode contract for species/composition
RULE: Every species_code, composition_level, and species_collection_map entry used in executable
prompts must be derived from live registry queries, not from document-level or memory-level values.
AUTHORITATIVE SOURCES:
species_code <- live public.entity_species.species_code
composition_level <- live public.entity_species.composition_level or species_collection_map.composition_level
species_collection_map.collection_name <- live public.collection_registry.collection_name
FORBIDDEN:
- Literal 'atom', 'molecule', 'compound' in executable INSERT without a preceding SELECT
verifying the value exists in the target table's CHECK constraint or live vocabulary.
- Literal species codes (e.g. 'SPE-IUL') without verifying they exist in entity_species.
- Row counts like "21 species" or "153 mappings" used as logic (snapshots only).
- ILIKE fuzzy matching for production species assignment.
ALLOWED:
- Live-selected values from entity_species/species_collection_map.
- Contract-declared values (like 'law_unit' in vocab) IF gated by a verification query.
- Counts as snapshots in reports only.
J. Recommendation and migration gate impact
Phase 4B is a design + discovery phase. After GPT/User review the discovery results, the decisions in E/F/G/H become inputs to a Phase 4C executable prompt (species_collection_map + collection_registry + birth_registry alignment).
Migration gate status:
vocab_gate=OPEN (law_unit resolves)
species_gate=BLOCKED (no species_collection_map for IU)
composition_gate=BLOCKED (no composition_level decision)
registry_alignment_gate=BLOCKED (collection_registry=observed/pilot)
label_gate=DEFERRED (DOT enrichment, post-migration)
edge_gate=DEFERRED (DOT enrichment, post-migration)
migration_allowed=false_until_species+composition+registry_gates_resolved
Sequence:
- Phase 4B: design (this doc) + discovery prompt (read-only) → GPT/User decisions
- Phase 4C: executable alignment (species_collection_map INSERT, collection_registry UPDATE, birth_registry backfill) → GPT review
- Phase 5: TAC→IU migration design
- Phase 5 execution: migrate 86 units
Phase 4B Design | No execution | Species/Composition/Registry alignment before TAC→IU migration | 2026-05-11