KB-2FEB rev 2

P3D Pack 1 Phase 4C — Species Mapping Strategy + QT-001 Backfill Design

13 min read Revision 2
p3dpack1phase4cspecies-strategydiscriminatorqt001design

P3D Pack 1 Phase 4C — Species Mapping Strategy + QT-001 Backfill Design

Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no seed, no backfill, no migration Prerequisite: Phase 4B discovery PASS


A. Phase 4B discovery summary (key facts only)

  • species_collection_map: 0 rows for IU/UV. 153 rows total. No composition_level column — composition comes from entity_species via JOIN.
  • fn_birth_registry_auto: reads species_collection_map WHERE collection_name = TG_TABLE_NAME AND is_primary = true LIMIT 1. JOINs entity_species for composition_level.
  • entity_species: 40 species. SPE-GOV (governance_infra, atom, observed). SPE-LAW (law, compound, governed).
  • birth_registry: 12 IU rows, all NULL species/composition. 0 UV rows.
  • species_collection_map has discriminator columns: discriminator_field, discriminator_value, discriminator_operator, discriminator_config.
  • TAC nesting: parent_id column, 83/86 have parent, max depth 2. Structural containment.
  • IU: parent_or_container_ref exists, 12 rows all NULL.
  • Species tree: flat (all 40 at depth=1).

B. Why immediate global SPE-GOV mapping is not automatically safe

SPE-GOV = governance_infra, atom, observed. If mapped globally to information_unit:

Scenario What happens Problem?
Current 12 pilot rows (design_doc_section) Classified as governance_infra/atom Acceptable — these ARE governance infrastructure artifacts.
Phase 5 migrates 86 TAC law_units into IU Also classified as governance_infra/atom WRONG — law content is not governance infrastructure. Atom may be wrong if nesting preserved.
Future IU types (e.g., policy_unit, regulation_unit) Also governance_infra/atom Potentially wrong — depends on nature.

Core risk: Global mapping makes ALL future IU rows the same species/composition regardless of unit_kind. Once 86 law_units enter, they'd carry the wrong classification until manually corrected.


C. Mapping strategy options

Option Description Pros Cons
C1: Single global mapping One species_collection_map row: information_unit → SPE-GOV (is_primary=true) Simple, immediate Điều 29 compliance Misclassifies law_unit; creates tech debt
C2: Discriminator mapping (design-only) Multiple rows with discriminator_field=unit_kind, one per species Architecturally correct; future-proof fn_birth_registry_auto doesn't use discriminators (see §D)
C3: New dedicated species Create species information_unit in entity_species, map globally Accurate classification Must decide composition_level (atom debate); single-species still classifies all unit_kinds the same
C4: New species per unit_kind SPE-IUL (law_unit, compound), SPE-IUD (design_doc_section, atom), etc. Most granular Scales with unit_kind vocab (each new kind needs new species); fn_birth_registry_auto can't discriminate
C5: Staged — SPE-GOV now + reclassify at migration Map SPE-GOV as is_primary. During Phase 5 migration, a migration-specific script assigns correct species per unit_kind Unblocks Phase 5; keeps Điều 29 compliance Requires migration script to handle species; birth_registry rows need post-birth correction
C6: Defer mapping until fn_birth_registry_auto enhanced No mapping. Document the gap. Enhance trigger to use discriminators. Then map. Cleanest long-term Leaves Điều 29 violation open; function patch is separate governance decision

D. Discriminator-based mapping feasibility — CRITICAL FINDING

Schema capability: YES

species_collection_map has columns: discriminator_field, discriminator_value, discriminator_operator, discriminator_config. These were designed for per-row species differentiation within the same collection.

Trigger function capability: NO

Phase 4B D6 captured fn_birth_registry_auto source. The species resolution query is:

WHERE scm.collection_name = TG_TABLE_NAME AND scm.is_primary = true LIMIT 1

It does NOT reference discriminator_field, discriminator_value, or any NEW record field. Discriminator columns are ignored at birth time.

Discovery needed: does ANY collection use discriminators today?

Phase 4C dry-run prompt must answer: are there existing species_collection_map rows with non-NULL discriminator values? If yes, how does the system use them (maybe a different mechanism, not fn_birth_registry_auto)?

Options if discriminators are needed

Sub-option What Scope
D-A: Enhance fn_birth_registry_auto Patch function to check discriminator_field against NEW record Function patch — separate governance decision, OUT OF SCOPE for Phase 4C
D-B: Post-birth DOT correction Birth with primary mapping; DOT tool corrects species after birth No function patch; requires DOT tool design
D-C: Migration-time explicit species During Phase 5 TAC→IU migration, the migration script explicitly sets species in birth_registry per unit_kind No function patch; species logic in migration script
D-D: Design-only, implement later Create discriminator rows in species_collection_map for documentation; fn_birth_registry_auto ignores them until enhanced Clean design; deferred implementation; no misclassification risk at birth

Opus assessment: D-C (migration-time explicit species) or D-D (design-only) are the most pragmatic. D-A requires function patch governance. D-B requires new DOT tooling. Both are scope creep for Phase 4C.


E. SPE-GOV vs SPE-LAW vs new species tradeoff

Species composition_level management_mode Fits IU pilot rows? Fits law_unit? Fits future unit_kinds?
SPE-GOV (governance_infra) atom observed ✅ design_doc_section is gov infra ❌ law content ≠ gov infra ❓ depends
SPE-LAW (law) compound governed ❌ pilot is observed, not governed ✅ law content IS law ❓ only for law-related
New SPE (e.g., information_unit) must decide must decide ✅ if designed for IU ✅ if composition correct ✅ if generic enough

Key constraint: Species ↔ composition is 1:1 in entity_species. If law_unit should be compound (nesting) but design_doc_section should be atom (no nesting), they CANNOT share a species without one being wrong.

This means: if IU has multiple unit_kinds with different containment properties → either (a) multiple species per unit_kind (C4) or (b) choose the most common/important composition and accept the rest are approximate.


F. Composition consequence of each species choice

Since species_collection_map has no composition_level and fn_birth_registry_auto JOINs entity_species.composition_level:

Species choice Composition at birth Correct for design_doc_section? Correct for law_unit (if nesting preserved)?
SPE-GOV atom ✅ (no containment) ❌ (containment = not atom)
SPE-LAW compound ❌ (pilot rows are not compound) ✅ (nesting = compound)
New species (atom) atom
New species (compound) compound
New species (molecule) molecule ⚠️ debatable ⚠️ debatable

No single species correctly classifies both design_doc_section (no containment → atom) and law_unit (TAC nesting → compound). This is the fundamental tension.


G. QT-001 backfill plan for 12 existing IU birth rows

Regardless of which species is chosen, 12 birth_registry rows need updating. QT-001 v3.1 requires:

  1. BƯỚC 1 — Check: species_collection_map entry exists for information_unit? birth trigger active?
  2. BƯỚC 2 — Count gap: birth_registry rows WHERE collection_name='information_unit' AND species_code IS NULL.
  3. BƯỚC 3 — Execute: UPDATE birth_registry SET species_code, composition_level FROM the same JOIN as fn_birth_registry_auto (species_collection_map → entity_species).
  4. BƯỚC 4 — Verify: count again. All 12 should have species.
  5. BƯỚC 5 — Inspect: if governed → dot-inspect-pen. If observed → verify birth only.

The backfill query (dry-run in Phase 4C prompt) would simulate:

SELECT br.entity_code, scm.species_code, es.composition_level
FROM birth_registry br
JOIN species_collection_map scm ON scm.collection_name = br.collection_name AND scm.is_primary = true
JOIN entity_species es ON es.species_code = scm.species_code
WHERE br.collection_name = 'information_unit' AND br.species_code IS NULL;

This returns what each row WOULD become. Agent reports this as candidate_not_approved.


H. unit_version strategy

UV has birth_code_strategy=subordinate and 0 birth_registry rows. Options:

Option Description
H1: No mapping needed UV is subordinate → no birth record → no species needed. Phase 3 confirmed fn_iu_verify_invariants i5 checks UV is NOT separately birth-registered.
H2: Map for completeness Add species_collection_map row anyway. If UV ever gets birth trigger, it's pre-wired.
H3: Defer Decide when UV governance changes.

Opus assessment: H1 (no mapping needed). UV is subordinate by design — it doesn't get birth records, so species mapping has no effect. If this changes, QT-003 applies.


I. Phase 5 TAC migration implications

Migration decision Species impact Composition impact
Preserve TAC nesting in IU via parent_or_container_ref IU rows that have children → NOT atom SPE-GOV (atom) wrong for parent IU rows
Flatten TAC — siblings only, no parent_or_container_ref All IU rows are independent → atom valid SPE-GOV (atom) acceptable
Hybrid — preserve nesting but encode as metadata, not FK Depends on whether metadata = "containment" per Điều 0-B Ambiguous

This decision blocks composition choice. If Phase 5 preserves nesting → atom is wrong → SPE-GOV is wrong long-term. If Phase 5 flattens → atom is fine → SPE-GOV is viable.


J. Options pending dry-run

The following is an option for GPT/User to evaluate, not a locked recommendation. Phase 4C dry-run must produce the evidence; GPT/User then choose.

Candidate strategy O1: Staged with fallback + future discriminator (option, NOT recommended yet)

  1. Phase 4C executable (if approved): Map information_unit to a species selected by GPT/User from the dry-run options. The species must satisfy: (a) composition_level value matches the dominant containment characteristic of the current 12 pilot rows (no nesting → atom-equivalent), (b) management_mode aligns with collection_registry.governance_role='observed' for IU.

  2. Phase 4C also: Optionally add discriminator rows in species_collection_map with is_primary=false and discriminator_field='unit_kind' for each unit_kind GPT/User wants to differentiate. These are documentation rows — fn_birth_registry_auto ignores them — but they record the design intent for future enhancement.

  3. Phase 5 migration script: When TAC law_unit content migrates → script explicitly sets species_code and composition_level per unit_kind, overriding the default. The discriminator rows are the reference table.

  4. Post-Phase 5: If IU is promoted to governed → QT-005 creates dedicated species with composition matching the final containment decision.

Alternative strategies (also pending dry-run):

  • O2: Defer all mapping until fn_birth_registry_auto is enhanced to read discriminators (separate governance decision). Leaves Điều 29 gap open longer.
  • O3: Create new dedicated species NOW matching IU's pilot characteristics. Avoids gom species risk but adds a new species before knowing the full unit_kind landscape.
  • O4: Map to an existing species in the law domain if dry-run shows one with composition matching Phase 5 nesting decision. Requires governance promotion (QT-005).

Decision dependency: Phase 5 nesting decision (preserve TAC parent_id → IU has containment → not atom; flatten → IU is atom). Composition choice tied to this.

Updated gates:

vocab_gate                   = OPEN
species_mapping_gate         = OPEN after Phase 4C dry-run reviewed + species choice approved
birth_backfill_gate          = OPEN after species choice + QT-001 procedure
discriminator_runtime_gate   = DEFERRED (fn_birth_registry_auto enhancement = separate governance)
governance_promotion_gate    = DEFERRED to post-Phase 5
migration_allowed            = OPEN after Phase 4C species + backfill executed

Phase 4C Design | Species mapping strategy | Discriminator analysis | QT-001 backfill | No seed | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/p3d-pack1-phase4c-species-mapping-strategy-qt001-backfill-design.md