P3D Pack 1 Phase 4C — Species Mapping Strategy + QT-001 Backfill Design
P3D Pack 1 Phase 4C — Species Mapping Strategy + QT-001 Backfill Design
Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no seed, no backfill, no migration Prerequisite: Phase 4B discovery PASS
A. Phase 4B discovery summary (key facts only)
species_collection_map: 0 rows for IU/UV. 153 rows total. Nocomposition_levelcolumn — composition comes fromentity_speciesvia JOIN.fn_birth_registry_auto: readsspecies_collection_map WHERE collection_name = TG_TABLE_NAME AND is_primary = true LIMIT 1. JOINsentity_speciesfor composition_level.entity_species: 40 species.SPE-GOV(governance_infra, atom, observed).SPE-LAW(law, compound, governed).birth_registry: 12 IU rows, all NULL species/composition. 0 UV rows.species_collection_maphas discriminator columns:discriminator_field,discriminator_value,discriminator_operator,discriminator_config.- TAC nesting:
parent_idcolumn, 83/86 have parent, max depth 2. Structural containment. - IU:
parent_or_container_refexists, 12 rows all NULL. - Species tree: flat (all 40 at depth=1).
B. Why immediate global SPE-GOV mapping is not automatically safe
SPE-GOV = governance_infra, atom, observed. If mapped globally to information_unit:
| Scenario | What happens | Problem? |
|---|---|---|
Current 12 pilot rows (design_doc_section) |
Classified as governance_infra/atom | Acceptable — these ARE governance infrastructure artifacts. |
| Phase 5 migrates 86 TAC law_units into IU | Also classified as governance_infra/atom | WRONG — law content is not governance infrastructure. Atom may be wrong if nesting preserved. |
| Future IU types (e.g., policy_unit, regulation_unit) | Also governance_infra/atom | Potentially wrong — depends on nature. |
Core risk: Global mapping makes ALL future IU rows the same species/composition regardless of unit_kind. Once 86 law_units enter, they'd carry the wrong classification until manually corrected.
C. Mapping strategy options
| Option | Description | Pros | Cons |
|---|---|---|---|
| C1: Single global mapping | One species_collection_map row: information_unit → SPE-GOV (is_primary=true) |
Simple, immediate Điều 29 compliance | Misclassifies law_unit; creates tech debt |
| C2: Discriminator mapping (design-only) | Multiple rows with discriminator_field=unit_kind, one per species |
Architecturally correct; future-proof | fn_birth_registry_auto doesn't use discriminators (see §D) |
| C3: New dedicated species | Create species information_unit in entity_species, map globally |
Accurate classification | Must decide composition_level (atom debate); single-species still classifies all unit_kinds the same |
| C4: New species per unit_kind | SPE-IUL (law_unit, compound), SPE-IUD (design_doc_section, atom), etc. |
Most granular | Scales with unit_kind vocab (each new kind needs new species); fn_birth_registry_auto can't discriminate |
| C5: Staged — SPE-GOV now + reclassify at migration | Map SPE-GOV as is_primary. During Phase 5 migration, a migration-specific script assigns correct species per unit_kind | Unblocks Phase 5; keeps Điều 29 compliance | Requires migration script to handle species; birth_registry rows need post-birth correction |
| C6: Defer mapping until fn_birth_registry_auto enhanced | No mapping. Document the gap. Enhance trigger to use discriminators. Then map. | Cleanest long-term | Leaves Điều 29 violation open; function patch is separate governance decision |
D. Discriminator-based mapping feasibility — CRITICAL FINDING
Schema capability: YES
species_collection_map has columns: discriminator_field, discriminator_value, discriminator_operator, discriminator_config. These were designed for per-row species differentiation within the same collection.
Trigger function capability: NO
Phase 4B D6 captured fn_birth_registry_auto source. The species resolution query is:
WHERE scm.collection_name = TG_TABLE_NAME AND scm.is_primary = true LIMIT 1
It does NOT reference discriminator_field, discriminator_value, or any NEW record field. Discriminator columns are ignored at birth time.
Discovery needed: does ANY collection use discriminators today?
Phase 4C dry-run prompt must answer: are there existing species_collection_map rows with non-NULL discriminator values? If yes, how does the system use them (maybe a different mechanism, not fn_birth_registry_auto)?
Options if discriminators are needed
| Sub-option | What | Scope |
|---|---|---|
| D-A: Enhance fn_birth_registry_auto | Patch function to check discriminator_field against NEW record | Function patch — separate governance decision, OUT OF SCOPE for Phase 4C |
| D-B: Post-birth DOT correction | Birth with primary mapping; DOT tool corrects species after birth | No function patch; requires DOT tool design |
| D-C: Migration-time explicit species | During Phase 5 TAC→IU migration, the migration script explicitly sets species in birth_registry per unit_kind | No function patch; species logic in migration script |
| D-D: Design-only, implement later | Create discriminator rows in species_collection_map for documentation; fn_birth_registry_auto ignores them until enhanced | Clean design; deferred implementation; no misclassification risk at birth |
Opus assessment: D-C (migration-time explicit species) or D-D (design-only) are the most pragmatic. D-A requires function patch governance. D-B requires new DOT tooling. Both are scope creep for Phase 4C.
E. SPE-GOV vs SPE-LAW vs new species tradeoff
| Species | composition_level | management_mode | Fits IU pilot rows? | Fits law_unit? | Fits future unit_kinds? |
|---|---|---|---|---|---|
| SPE-GOV (governance_infra) | atom | observed | ✅ design_doc_section is gov infra | ❌ law content ≠ gov infra | ❓ depends |
| SPE-LAW (law) | compound | governed | ❌ pilot is observed, not governed | ✅ law content IS law | ❓ only for law-related |
| New SPE (e.g., information_unit) | must decide | must decide | ✅ if designed for IU | ✅ if composition correct | ✅ if generic enough |
Key constraint: Species ↔ composition is 1:1 in entity_species. If law_unit should be compound (nesting) but design_doc_section should be atom (no nesting), they CANNOT share a species without one being wrong.
This means: if IU has multiple unit_kinds with different containment properties → either (a) multiple species per unit_kind (C4) or (b) choose the most common/important composition and accept the rest are approximate.
F. Composition consequence of each species choice
Since species_collection_map has no composition_level and fn_birth_registry_auto JOINs entity_species.composition_level:
| Species choice | Composition at birth | Correct for design_doc_section? | Correct for law_unit (if nesting preserved)? |
|---|---|---|---|
| SPE-GOV | atom | ✅ (no containment) | ❌ (containment = not atom) |
| SPE-LAW | compound | ❌ (pilot rows are not compound) | ✅ (nesting = compound) |
| New species (atom) | atom | ✅ | ❌ |
| New species (compound) | compound | ❌ | ✅ |
| New species (molecule) | molecule | ⚠️ debatable | ⚠️ debatable |
No single species correctly classifies both design_doc_section (no containment → atom) and law_unit (TAC nesting → compound). This is the fundamental tension.
G. QT-001 backfill plan for 12 existing IU birth rows
Regardless of which species is chosen, 12 birth_registry rows need updating. QT-001 v3.1 requires:
- BƯỚC 1 — Check: species_collection_map entry exists for
information_unit? birth trigger active? - BƯỚC 2 — Count gap: birth_registry rows WHERE collection_name='information_unit' AND species_code IS NULL.
- BƯỚC 3 — Execute: UPDATE birth_registry SET species_code, composition_level FROM the same JOIN as fn_birth_registry_auto (species_collection_map → entity_species).
- BƯỚC 4 — Verify: count again. All 12 should have species.
- BƯỚC 5 — Inspect: if governed → dot-inspect-pen. If observed → verify birth only.
The backfill query (dry-run in Phase 4C prompt) would simulate:
SELECT br.entity_code, scm.species_code, es.composition_level
FROM birth_registry br
JOIN species_collection_map scm ON scm.collection_name = br.collection_name AND scm.is_primary = true
JOIN entity_species es ON es.species_code = scm.species_code
WHERE br.collection_name = 'information_unit' AND br.species_code IS NULL;
This returns what each row WOULD become. Agent reports this as candidate_not_approved.
H. unit_version strategy
UV has birth_code_strategy=subordinate and 0 birth_registry rows. Options:
| Option | Description |
|---|---|
| H1: No mapping needed | UV is subordinate → no birth record → no species needed. Phase 3 confirmed fn_iu_verify_invariants i5 checks UV is NOT separately birth-registered. |
| H2: Map for completeness | Add species_collection_map row anyway. If UV ever gets birth trigger, it's pre-wired. |
| H3: Defer | Decide when UV governance changes. |
Opus assessment: H1 (no mapping needed). UV is subordinate by design — it doesn't get birth records, so species mapping has no effect. If this changes, QT-003 applies.
I. Phase 5 TAC migration implications
| Migration decision | Species impact | Composition impact |
|---|---|---|
| Preserve TAC nesting in IU via parent_or_container_ref | IU rows that have children → NOT atom | SPE-GOV (atom) wrong for parent IU rows |
| Flatten TAC — siblings only, no parent_or_container_ref | All IU rows are independent → atom valid | SPE-GOV (atom) acceptable |
| Hybrid — preserve nesting but encode as metadata, not FK | Depends on whether metadata = "containment" per Điều 0-B | Ambiguous |
This decision blocks composition choice. If Phase 5 preserves nesting → atom is wrong → SPE-GOV is wrong long-term. If Phase 5 flattens → atom is fine → SPE-GOV is viable.
J. Options pending dry-run
The following is an option for GPT/User to evaluate, not a locked recommendation. Phase 4C dry-run must produce the evidence; GPT/User then choose.
Candidate strategy O1: Staged with fallback + future discriminator (option, NOT recommended yet)
-
Phase 4C executable (if approved): Map
information_unitto a species selected by GPT/User from the dry-run options. The species must satisfy: (a) composition_level value matches the dominant containment characteristic of the current 12 pilot rows (no nesting → atom-equivalent), (b) management_mode aligns withcollection_registry.governance_role='observed'for IU. -
Phase 4C also: Optionally add discriminator rows in
species_collection_mapwithis_primary=falseanddiscriminator_field='unit_kind'for eachunit_kindGPT/User wants to differentiate. These are documentation rows —fn_birth_registry_autoignores them — but they record the design intent for future enhancement. -
Phase 5 migration script: When TAC
law_unitcontent migrates → script explicitly setsspecies_codeandcomposition_levelperunit_kind, overriding the default. The discriminator rows are the reference table. -
Post-Phase 5: If IU is promoted to governed → QT-005 creates dedicated species with composition matching the final containment decision.
Alternative strategies (also pending dry-run):
- O2: Defer all mapping until
fn_birth_registry_autois enhanced to read discriminators (separate governance decision). Leaves Điều 29 gap open longer. - O3: Create new dedicated species NOW matching IU's pilot characteristics. Avoids gom species risk but adds a new species before knowing the full unit_kind landscape.
- O4: Map to an existing species in the law domain if dry-run shows one with composition matching Phase 5 nesting decision. Requires governance promotion (QT-005).
Decision dependency: Phase 5 nesting decision (preserve TAC parent_id → IU has containment → not atom; flatten → IU is atom). Composition choice tied to this.
Updated gates:
vocab_gate = OPEN
species_mapping_gate = OPEN after Phase 4C dry-run reviewed + species choice approved
birth_backfill_gate = OPEN after species choice + QT-001 procedure
discriminator_runtime_gate = DEFERRED (fn_birth_registry_auto enhancement = separate governance)
governance_promotion_gate = DEFERRED to post-Phase 5
migration_allowed = OPEN after Phase 4C species + backfill executed
Phase 4C Design | Species mapping strategy | Discriminator analysis | QT-001 backfill | No seed | 2026-05-11