P3D Pack 1 Phase 5C1 — Species Identity Decision Memo
P3D Pack 1 Phase 5C1 — Species Identity Decision Memo
Date: 2026-05-11 Author: Opus 4.7 (drafter) Status: DECISION MEMO — requires GPT/User approval of exact field values Design ref:
design/p3d-pack1-phase5b-hybrid-nesting-species-pilot-migration-design.md(rev2) Evidence ref:reports/p3d-pack1-phase5-tac-to-iu-migration-dryrun-report.md(Phase 5A rev6 PASS) Mode: DECISION MEMO ONLY — no DB write, no Agent dispatch
A. Why 5C1 is needed before 5C2
5C2 (DIEU-35 pilot migration) calls fn_iu_create which triggers fn_birth_registry_auto. That function reads species_collection_map WHERE is_primary=true LIMIT 1 for the target collection. If no mapping exists → new IU birth rows get NULL species → Điều 29 violation ("Thiếu BẤT KỲ thuộc tính nào = KHÔNG TIN CẬY").
5C1 creates the species, seeds the mapping, and backfills 12 existing IU rows so that when 5C2 runs, the entire IU collection has a consistent, non-NULL species assignment.
Without 5C1 → 5C2 creates 36 IU rows with NULL species → immediate Điều 29 regression.
B. Accepted constraints from Phase 5B rev2 (GPT LOCKED)
composition_level = 'atom' (D3 hybrid → IU is atomic text unit)
management_mode = 'observed' (governance_role stays observed for pilot)
species_strategy = one_new_species_for_information_unit (E.2.A single species)
uv_species_mapping = no (subordinate, 0 birth rows)
rollback_capture = KB_artifact + VPS_log
C. Species identity options
Based on Phase 5A evidence, entity_species has 40 live rows. Naming conventions observed from the species dump:
Observed patterns (from Phase 5A §10 G7):
code column : 'SPE-' + 3 uppercase letters (e.g., SPE-AGT, SPE-GOV, SPE-DOT)
species_code column: lowercase_underscore (e.g., 'agent', 'governance_infra', 'dot_tool')
display_name : Vietnamese label (e.g., 'Agent', 'Hạ tầng Giám sát', 'DOT Tool')
status : 'active' for all 40
composition_level : 'atom' | 'compound' | 'molecule' | 'meta'
management_mode : 'governed' | 'observed' | 'excluded'
Option A (Opus recommended)
species_code = 'information_unit_atom'
code = 'SPE-IUA'
display_name = 'Đơn vị Thông tin'
prefix = 'IUA'
composition = 'atom'
management = 'observed'
status = 'active'
depth = 1
parent_id = <live lookup — see §F>
Rationale: species_code is semantic (collection + composition), matching patterns like governance_infra, business_support. code follows SPE-XXX convention. Display name is Vietnamese per house standard.
Option B (shorter semantic name)
species_code = 'iu_atom'
code = 'SPE-IUA'
display_name = 'Đơn vị Thông tin'
... rest same as A
Rationale: shorter species_code, still unique. But less self-documenting than Option A.
Option C (domain-specific name)
species_code = 'law_text_unit'
code = 'SPE-LTU'
display_name = 'Đơn vị Văn bản Luật'
... rest same
Rationale: names the domain (law) explicitly. BUT Phase 5B E.2.A chose a UNIVERSAL species covering ALL IU unit_kinds (law_unit, design_doc_section, future types). A domain-specific name would be misleading once non-law IU rows exist.
Opus recommendation: Option A
Option A is most self-documenting and universal. information_unit_atom clearly describes: collection + composition. If future packs split by unit_kind (discriminator activation), the parent species retains a correct name.
D. Recommended option summary
species_code = 'information_unit_atom' (Option A)
code (entity prefix) = 'SPE-IUA'
display_name = 'Đơn vị Thông tin'
prefix = 'IUA'
composition_level = 'atom' (GPT LOCKED)
management_mode = 'observed' (GPT LOCKED)
status = 'active'
depth = 1 (child of infrastructure/governance root)
parent_id = <live lookup — see §F>
kg_metadata = NULL or '{}' (if nullable) (no KG metadata needed at creation)
_dot_origin = 'DOT:QT-005-P3D-PACK1-5C1' (provenance tag)
All values above are Opus proposals. GPT/User locks the final values.
E. Exact fields requiring GPT/User decision
| Field | Opus proposal | GPT decision needed? | Notes |
|---|---|---|---|
| species_code | information_unit_atom |
YES | Must be unique in entity_species |
| code (entity prefix) | SPE-IUA |
YES | Must follow SPE-XXX convention, unique |
| display_name | Đơn vị Thông tin |
YES | Vietnamese label |
| prefix | IUA |
YES | Short prefix |
| composition_level | atom |
No — GPT LOCKED | |
| management_mode | observed |
No — GPT LOCKED | |
| status | active |
No — standard | |
| depth | 1 |
YES | Depends on taxonomy parent |
| parent_id | <live lookup> |
YES — see §F | |
| kg_metadata | NULL |
Confirm nullable | Agent verifies via introspection |
| _dot_origin | DOT:QT-005-P3D-PACK1-5C1 |
YES | Provenance convention |
F. Taxonomy parent strategy
Phase 5A species dump shows depth 0 and depth 1 species but does not expose parent_id values. Three strategies:
F.1 Live lookup: use existing governance/infrastructure root
Agent introspects entity_species for depth=0 species. Looks for a root that semantically covers "governance infrastructure" or "information management". If found → new species depth=1, parent_id=that root.
Evidence-based candidate: SPE-GOV governance_infra is depth ? (management_mode=observed, composition=atom). If SPE-GOV is depth 0, it could be the parent. If SPE-GOV is depth 1, we need its parent.
This must be verified live. Opus cannot determine parent from Phase 5A evidence alone because parent_id values were not in the species dump.
F.2 New root (depth 0)
If no suitable existing root exists, create the new species at depth=0 (root). This is simpler but adds a top-level species.
F.3 GPT prescribes parent species_code
GPT names the parent (e.g., "use SPE-GOV as parent" or "create at depth 0"). Agent looks up parent_id live by species_code.
Opus recommendation: F.3
GPT should prescribe the parent intent (e.g., "child of infrastructure root" or "new root"). Agent resolves the actual parent_id live from the prescribed species_code. If prescribed parent doesn't exist → ABORT (FIELD_UNRESOLVED_STOP), not improvise.
G. QT-005 schema-driven INSERT fill policy
This is the key safety mechanism GPT requested. For each column in entity_species, the prompt defines WHERE the value comes from. Agent MUST NOT invent values.
G.1 Preliminary fill matrix (based on Phase 5A evidence + semantic registry)
Agent verifies this matrix via INTROSPECT-1 (information_schema.columns for entity_species). If any column appears that is NOT in this matrix AND is NOT NULL with no default → FIELD_UNRESOLVED_STOP.
| Column (semantic concept) | Phase 5A evidence column | Fill source | Value |
|---|---|---|---|
| PK (id) | id | DB_AUTO | auto-generated |
| species_identifier | species_code | PROVIDED_BY_GPT | <decided> |
| code/entity_code | code | PROVIDED_BY_GPT | <decided> |
| prefix | prefix | PROVIDED_BY_GPT | <decided> |
| species_display | display_name | PROVIDED_BY_GPT | <decided> |
| species_composition | composition_level | PROVIDED_BY_GPT | 'atom' (GPT LOCKED) |
| species_management | management_mode | PROVIDED_BY_GPT | 'observed' (GPT LOCKED) |
| status | status | PROVIDED_BY_GPT | 'active' |
| species_hierarchy_depth | depth | PROVIDED_BY_GPT | <decided> |
| species_hierarchy_parent | parent_id | LIVE_LOOKUP | lookup by GPT-prescribed parent species_code |
| kg_metadata | kg_metadata | PROVIDED_BY_GPT_OR_NULL | <decided> or NULL (verify nullable) |
| _dot_origin | _dot_origin | PROVIDED_BY_GPT | 'DOT:QT-005-P3D-PACK1-5C1' |
| date_created | (if exists) | DB_DEFAULT | auto timestamp |
| date_updated | (if exists) | DB_DEFAULT | auto timestamp |
| created_by / user_created | (if exists) | DB_DEFAULT_OR_ACTOR | Directus actor or default |
| updated_by / user_updated | (if exists) | DB_DEFAULT_OR_ACTOR | same |
| ANY OTHER NOT NULL column without default | — | FIELD_UNRESOLVED_STOP | ABORT and report |
G.2 Fill policy rules
RULE 1: Agent introspects entity_species columns via information_schema.
RULE 2: For each NOT NULL column without a DB default:
- If column is in the fill matrix above → use prescribed source.
- If column is NOT in the matrix → FIELD_UNRESOLVED_STOP. Do NOT invent a value.
RULE 3: For nullable columns not in the matrix → NULL.
RULE 4: For columns with DB defaults → omit from INSERT (let DB fill).
RULE 5: Agent builds INSERT column list dynamically from (introspected required columns ∩ fill matrix).
RULE 6: Agent MUST NOT add columns not in the fill matrix to the INSERT unless they have a DB default.
H. QT-001 backfill target selection policy
GPT requested SELECT-before-UPDATE pattern. Revised sequence:
H.1 SELECT targets
SELECT <birth_entity col>, <PK col>
FROM <birth_registry>
WHERE <collection_table_key col> = target_collection_primary
AND <species_identifier col> IS NULL
ORDER BY <PK col>;
→ Store as captured_backfill_targets (list of {entity_code, birth_id}).
→ Report: backfill_target_count = length of list.
H.2 Persist/preview captured targets
Write captured_backfill_targets to:
- KB: reports/p3d-pack1-phase5c1-backfill-targets-preview-<date>.md
- VPS log: /opt/incomex/logs/p3d-pack1-phase5c1-backfill-targets-<date>.log
(Phase 5A evidence was 12 rows — reference only. Live count may differ if IU rows were added/removed since Phase 5A.)
H.3 UPDATE using captured keys only
UPDATE <birth_registry>
SET <species_identifier col> = proposed_species_code,
<species_composition col> = proposed_composition_level
WHERE <PK col> = ANY (<captured birth_id list>);
→ Store affected_count = ROW_COUNT.
H.4 Verify via RETURNING cross-check
-- Additional safety: re-SELECT to confirm
SELECT <birth_entity col>, <species_identifier col>, <species_composition col>
FROM <birth_registry>
WHERE <PK col> = ANY (<captured birth_id list>);
→ ASSERT all rows have species = proposed_species_code.
→ ASSERT affected_count = backfill_target_count.
H.5 Verify no remaining NULL
SELECT count(*) FROM <birth_registry>
WHERE <collection_table_key col> = target_collection_primary
AND <species_identifier col> IS NULL;
→ must be 0.
I. Rollback and capture policy
I.1 Species seed rollback
DELETE FROM <species_collection_map> WHERE <PK> = captured_mapping_id;
DELETE FROM <entity_species> WHERE <PK> = captured_species_id;
I.2 Backfill rollback
UPDATE <birth_registry>
SET <species_identifier col> = NULL, <species_composition col> = NULL
WHERE <PK col> = ANY (<captured birth_id list>);
I.3 Capture persistence (GPT LOCKED)
All captured keys written to KB artifact + VPS log. No new DB control table.
Order of rollback: I.2 → I.1 (reverse execution order).
J. What remains blocked before dispatch
BLOCKER-1: Species exact identity (§E fields) — GPT/User must approve
BLOCKER-2: Taxonomy parent strategy (§F) — GPT/User must choose F.1/F.2/F.3 and prescribe parent
BLOCKER-3: kg_metadata nullable check — Agent verifies at introspection (if NOT NULL without default → FIELD_UNRESOLVED_STOP)
BLOCKER-4: _dot_origin convention — GPT/User confirms 'DOT:QT-005-P3D-PACK1-5C1' or prescribes alternative
When BLOCKER-1 and BLOCKER-2 resolved → Opus writes 5C1 prompt rev1 (dispatch-ready). BLOCKER-3 and BLOCKER-4 are Agent-verifiable at runtime with hard STOP if unresolvable.
Phase 5C1 Species Identity Decision Memo | Option A recommended | Fill policy matrix | SELECT-before-UPDATE | GPT/User decision required | 2026-05-11