P3D Pack 1 Phase 5 — Read-Only/Dry-Run TAC→IU Migration Mission Prompt (rev6)
P3D Pack 1 Phase 5 — Read-Only/Dry-Run TAC→IU Migration Mission Prompt (rev6)
Date: 2026-05-11 | Mode: READ-ONLY / DRY-RUN — no writes, no migration, no seed Style: MISSION PROMPT — semantic field registry + deterministic resolution + bounded output Design reference:
p3d-pack1-phase5-tac-to-iu-migration-design.md
0. SCOPE BLOCK
target_collection_primary = 'information_unit'
target_collection_subordinate = 'unit_version'
target_unit_kind = 'law_unit'
target_vocab_key = 'vocab.unit_kind.law_unit'
source_table_logical = 'tac_logical_unit'
source_table_version = 'tac_unit_version'
source_table_publication = 'tac_publication'
source_table_pub_member = 'tac_publication_member'
0.1 SEMANTIC CANDIDATE FIELD REGISTRY
Instead of writing column names, this prompt defines SEMANTIC CONCEPTS. Each concept has candidate labels. Agent resolves each concept against live schema using the resolution rule below.
Resolution rule (deterministic, no exceptions):
For each (concept, table):
candidates_found = columns in table whose name matches ANY candidate label
0 found → FIELD_ABSENT. Skip any query requiring this concept on this table.
1 found → RESOLVED. Use that column name.
>1 found → AMBIGUOUS_FIELD. Report ALL matching column names. Do NOT pick one.
Skip any query requiring this concept on this table.
GPT/User resolves ambiguity in review.
Registry:
| Concept ID | Semantic meaning | Candidate labels | Tables to check |
|---|---|---|---|
address_field |
Canonical/unique address identifier | canonical_address, address, uri, path, slug |
source_table_logical, target_collection_primary |
content_body |
Main text content | body, content, text, raw_content, markdown |
source_table_version, target_collection_subordinate |
content_hash |
Hash digest of content | content_hash, hash, checksum, digest |
source_table_version, target_collection_subordinate |
provenance_json_profile |
JSONB structured metadata/provenance container | content_profile, profile, metadata_profile |
target_collection_subordinate |
provenance_text_note |
Free-text provenance/origin note | provenance, provenance_note, origin |
target_collection_subordinate, source_table_version |
parent_ref |
Reference to parent entity | parent_id, parent_ref, parent_or_container_ref, container_ref |
source_table_logical, target_collection_primary |
sort_order |
Ordering/sequencing field (intra-unit) | sort_order, order, sort, position, rank, sequence |
source_table_logical |
section_type |
Classification of content section | section_type, type, kind, category |
source_table_logical, target_collection_primary |
lifecycle |
Status/lifecycle state | lifecycle_status, status, state |
source_table_logical, target_collection_primary |
publication_ref |
FK to publication table | publication_id, publication_ref, pub_id, document_id |
source_table_pub_member |
logical_unit_ref |
FK to logical unit table | logical_unit_id, unit_id, lu_id, logical_unit_ref |
source_table_pub_member |
publication_render_order |
Intra-publication ordering | render_order, pub_order, display_order, render_sequence |
source_table_pub_member |
title_field |
Title/name of content | title, name, display_name, heading |
source_table_version, target_collection_subordinate |
description_field |
Description/summary | description, summary, abstract, desc |
source_table_version, target_collection_subordinate |
| --- Registry/Species domain --- | |||
species_identifier |
Unique species code | species_code, species_id, species |
entity_species, species_collection_map, collection_registry, birth_registry |
species_display |
Human-readable species name | display_name, name, species_name, label, title |
entity_species |
species_composition |
Composition level of species | composition_level, composition, level |
entity_species, birth_registry |
species_management |
Management/governance mode of species | management_mode, mode, governance, management |
entity_species |
species_hierarchy_parent |
Parent species reference | parent_id, parent_ref, parent_species_id |
entity_species |
species_hierarchy_depth |
Tree depth | depth, tree_depth, level |
entity_species |
collection_table_key |
Collection PG table name (FK/JOIN key) | collection_name, table_name, collection |
collection_registry, species_collection_map, birth_registry |
collection_display_name |
Human-readable collection label | name, display_name, title, name_en |
collection_registry |
governance_role |
Collection governance classification | governance_role, role, governance |
collection_registry, birth_registry |
migration_state |
Migration status | migration_state, state, migration |
collection_registry |
birth_strategy |
Birth code strategy | birth_code_strategy, strategy, birth_strategy |
collection_registry |
mapping_primary |
Primary mapping flag | is_primary, primary, is_default |
species_collection_map |
disc_field |
Discriminator field name | discriminator_field, disc_field |
species_collection_map |
disc_value |
Discriminator value | discriminator_value, disc_value |
species_collection_map |
disc_operator |
Discriminator operator | discriminator_operator, disc_operator |
species_collection_map |
disc_config |
Discriminator config | discriminator_config, disc_config |
species_collection_map |
birth_entity |
Birth entity identifier | entity_code, code, entity_id |
birth_registry |
Agent must resolve ALL concepts against ALL listed tables in a single introspection pass (GATE-0 extension). Results form the "resolved field map" used by all subsequent goals. No goal may reference a column name directly — only concept IDs from this registry.
1. GATE-0 — Scope preflight + field resolution
Phase 1: Table existence
| Check | What |
|---|---|
| P1-P4 | All 4 source tables exist (relkind='r', public schema) |
| P5-P6 | Both target tables exist |
| P7 | collection_registry rows for both targets |
| P8 | dot_config key = target_vocab_key |
| P8b-P8e | Registry tables exist (relkind='r', public): entity_species, species_collection_map, collection_registry, birth_registry |
Any P1-P8e fail → STOP. P9-P14 informational (PRESENT/ABSENT, non-fatal).
Phase 2: Semantic field resolution
For EVERY (concept, table) pair in §0.1 registry:
1. Query information_schema.columns for that table
2. Check which candidate labels match actual column names
3. Record: RESOLVED(column_name) / FIELD_ABSENT / AMBIGUOUS_FIELD(list)
Output: Resolved field map — a table of (concept_id, scope_table, resolution_status, resolved_column_name_or_null). This map is THE authoritative reference for all subsequent goals. If a goal needs a concept that is ABSENT or AMBIGUOUS on a table → that goal's sub-query for that table is SKIPPED.
Phase 3: Function existence
| Check | What |
|---|---|
| P9 | fn_iu_create exists in public |
| P10 | fn_iu_create_plan exists in public |
| P11 | fn_content_hash exists in public (or equivalent hash function) |
| P12 | fn_iu_birth_gate_layer1 exists |
| P13 | fn_iu_birth_gate_layer2 exists |
| P14 | fn_iu_verify_invariants exists |
P9-P14: PRESENT or ABSENT. Non-fatal (informational for G8/G9).
2. GOALS (all column references via concept IDs from §0.1 registry — no old/deprecated concept names)
G1 — Source row accounting
- Compute:
SELECT count(*)per source table. For each source table, also query distinct values of RESOLVED categorical concepts (section_type, lifecycle) with counts. - Column references:
section_typeconcept on source_table_logical.lifecycleconcept on source_table_logical. If ABSENT → skip that breakdown. - Output: Counts per table. Distinct breakdowns for resolved concepts.
G2 — Schema alignment: source→target field mapping
- Compute: For each (source_table, target_table) pair, compare the resolved field map entries. For each concept: both RESOLVED (aligned), source RESOLVED + target ABSENT (migration gap), target RESOLVED + source ABSENT (target-only field).
- Also: List ALL columns on each table (not just registered concepts) to catch fields outside the registry.
- Output: Alignment table per pair. Gap analysis. Fields not covered by any concept (agent reports them as
UNREGISTERED_FIELDfor GPT review).
G3 — Nesting evidence
- Column reference:
parent_refconcept on source_table_logical. - If RESOLVED: Count roots (NULL parent) vs children (non-NULL). Recursive depth (WITH RECURSIVE, limit 10). Distribution by depth level. Parent→child
section_typeconcept pairs (if section_type RESOLVED). - If ABSENT: Report
parent_ref FIELD_ABSENT on source_table_logical. Nesting analysis = N/A. Nesting decision defaults to "flatten" evidence. - Output: Nesting structure. This is THE input for design §D nesting decision.
G4 — Publication structure
- Column references:
publication_refconcept on source_table_pub_member (FK to publication).logical_unit_refconcept on source_table_pub_member (FK to logical unit).publication_render_orderconcept on source_table_pub_member. - Compute: Introspect source_table_publication columns (
SELECT *for full dump). Count publications. Ifpublication_refRESOLVED: count members per publication via JOIN. Iflogical_unit_refRESOLVED: verify 1:1 with source_table_logical. Ifpublication_render_orderRESOLVED: verify ordering integrity (gaps, duplicates per publication). - Output: Publication count, membership distribution per publication, ordering analysis.
G5 — Canonical address collision check
- Column reference:
address_fieldconcept on source_table_logical AND target_collection_primary. - If both RESOLVED: Compute intersection: source addresses that already exist in target.
- If either ABSENT: Report which side is ABSENT. Collision check = N/A.
- Output: Collision count + sample (if any). Zero = PASS.
G6 — Hash/provenance projection
- Column references:
content_bodyon source_table_version + target_collection_subordinate.content_hashon source_table_version + target_collection_subordinate.provenance_json_profileon target_collection_subordinate (for structured TAC hash provenance per Phase 4B addendum §7).provenance_text_noteon target_collection_subordinate (informational only). - Function reference: fn_content_hash (from GATE-0 P11).
- If content_body + content_hash RESOLVED on source, and fn_content_hash PRESENT:
- Sample 5 source rows: compute fn_content_hash(source.content_body) and compare with source.content_hash. Show whether recipes differ (re-verify live, don't assume).
- Project: target would get fn_content_hash(migrated_body) as its hash. Source's original hash becomes provenance.
- If provenance_json_profile RESOLVED on target: Report its data_type (verify JSONB). Check if existing rows have this field populated. This is THE carrier for
source_hashes.tac_v1per Phase 4B contract. - If provenance_text_note RESOLVED on target: Report as informational (editor note, not structured hash provenance).
- If any concept ABSENT: Report which. Hash/provenance analysis partial.
- Output: Hash recipe comparison, provenance storage feasibility.
G7 — Species/composition per nesting option (NO interpretation by agent)
- Source: Live
entity_species(SELECT *for all rows + all columns). Re-verify Phase 4C governance_role match live. - Concept references:
species_compositionconcept on entity_species.species_managementconcept on entity_species.governance_roleconcept on collection_registry (for target). - Compute:
- If
species_compositionconcept RESOLVED on entity_species → query ALL distinct values of that resolved column (live, not listed from memory). - If RESOLVED → group ALL species rows by that resolved column.
- If
species_managementconcept RESOLVED on entity_species ANDgovernance_roleconcept RESOLVED on collection_registry → for each group, count species where resolvedspecies_managementvalue equals resolvedgovernance_rolevalue of target_collection_primary. Label:PLAUSIBLE(match),CONFLICT(mismatch),UNKNOWN(either concept ABSENT or AMBIGUOUS). - If
species_compositionABSENT or AMBIGUOUS → composition grouping =UNKNOWN. Still dump all species viaSELECT *for GPT review. - If
species_managementorgovernance_roleABSENT or AMBIGUOUS → PLAUSIBLE/CONFLICT labels =UNKNOWNfor all species.
- If
- Agent does NOT interpret which composition "allows containment" or "is simplest." Agent presents grouping + labels. GPT/User applies Điều 0-B.
- Output: Table:
resolved_composition_value | species_count | plausible_count | conflict_count | unknown_count | sample_species (from live SELECT *). All labelledcandidate_not_approved.
G8 — IU creation pathway analysis (bounded output)
- For each function in GATE-0 P9-P14 that is PRESENT:
pg_get_function_identity_arguments(oid)→ identity args stringproargnames→ parameter name arrayprorettype::regtype::text→ return type- If fn_iu_create: does its identity_args include a parameter whose name matches
parent_refconcept candidates? (Check by string containment on identity_args, not by assuming a specific name.)
- Trigger inventory for target_collection_primary:
SELECT tgname, tgtype, tgfoid::regproc AS function_name FROM pg_trigger WHERE tgrelid = '<target>'::regclass AND NOT tgisinternal ORDER BY tgname
- Output per function:
function_name | identity_args | return_type | parameter_names. Per trigger:trigger_name | trigger_type | function_called. - If a function is ABSENT: Report ABSENT. Mark related analysis as
UNKNOWN. - Agent does NOT interpret function behavior from source code. Agent provides identity/signature/trigger inventory. GPT/User interprets.
G9 — Birth gate field requirements (bounded output)
- For fn_iu_birth_gate_layer1 and fn_iu_birth_gate_layer2 (if PRESENT):
- Capture function source via
pg_get_functiondef. - Extract: every
NEW.<field_name>reference in the source text. List them asgate_required_fields. - For each gate_required_field: check if it exists on target_collection_primary (via resolved field map or direct introspection).
- Label:
SATISFIED(field exists on target),GAP(field absent),UNKNOWN(can't determine).
- Capture function source via
- Output: Per gate function:
field_name | exists_on_target | status (SATISFIED/GAP/UNKNOWN).
G10 — Pilot document selection candidates (metric-only, NO subjective ranking)
- Source: source_table_publication (all rows via
SELECT *). - If
publication_refRESOLVED on source_table_pub_member: JOIN to source_table_publication to count members per publication. - If
logical_unit_refRESOLVED on source_table_pub_member ANDsection_typeRESOLVED on source_table_logical: Count distinct section_type values per publication via pub_member→logical_unit JOIN (= diversity metric). - If
publication_render_orderRESOLVED on source_table_pub_member: Report ordering integrity per publication (min, max, gaps, duplicates). - Compute per publication (ALL publications, not a filtered subset):
member_count(total logical units in publication)diversity_count(distinct section_type values, if concept RESOLVED; else UNKNOWN)nesting_depth(max parent_ref depth within this publication's members, if parent_ref RESOLVED; else UNKNOWN)
- Sort: by
diversity_count DESC, member_count ASC(deterministic). If diversity_count = UNKNOWN, sort by member_count ASC only. - Output: ALL publications with metrics. Agent does NOT filter by "manageable" or "not the largest." Agent does NOT pick favorites. GPT/User selects pilot from the full list.
- All rows labelled
candidate_not_approved.
G11 — Rollback input computation
- If address_field RESOLVED on source_table_logical: Count total source rows (= projected insert count). If
publication_refRESOLVED on source_table_pub_member: group by publication to compute per-publication projected insert count for per-batch rollback sizing. - Output: Total projected migration size. Per-publication breakdown.
3. ACCEPTANCE CRITERIA
| # | Criterion | PASS |
|---|---|---|
| 1 | GATE-0 Phase 1-3 all reported | Each check: PASS/FAIL/PRESENT/ABSENT |
| 2 | Resolved field map complete | Every (concept, table) pair resolved |
| 3 | G1-G11 all attempted | Evidence, skip reason, or FIELD_ABSENT |
| 4 | Zero writes | Confirmed |
| 5 | No column name from memory | All via resolved field map |
| 6 | No "or similar" language | All resolution deterministic |
| 7 | No composition interpretation by agent | G7 groups only, no "simplest"/"allows containment" |
| 8 | G8/G9 output bounded | Identity/params/triggers/fields, with UNKNOWN |
| 9 | AMBIGUOUS_FIELD handled | Reported, not silently resolved |
| 10 | All candidates labelled | candidate_not_approved |
| 11 | Scale brittleness self-audit | §5 |
4. CONSTRAINTS
- No writes. No migration. No seed. No backfill.
- No column names outside resolved field map. Every field reference → concept ID → resolved via §0.1 registry.
- No fuzzy resolution. 0=ABSENT, 1=USE, >1=AMBIGUOUS. No "or similar."
- No composition/species interpretation. Agent groups and presents. GPT/User interprets.
- No function behavior interpretation. Agent provides signature/source/triggers. GPT/User interprets.
- Re-verify everything. Phase 3/4 findings are labels only.
- Mission-style. Agent writes queries using resolved field map.
5. SCALE BRITTLENESS SELF-AUDIT
| Check | Required |
|---|---|
| Column name from memory used in a query? | No — all via resolved field map |
| "or similar" or "equivalent" in any selector? | No |
| Agent interpreted "simplest" or "allows containment"? | No |
| AMBIGUOUS_FIELD silently resolved? | No — reported for GPT/User |
| Fixed counts as logic? | No |
| Phase 3/4 labels as facts? | No — re-verified |
| Semantic concept missing from registry? | Report as UNREGISTERED_FIELD |
6. REPORT REQUIREMENTS
Save to: knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase5-tac-to-iu-migration-dryrun-report.md
Must include: GATE-0 (tables + resolved field map + functions), G1-G11 evidence, self-audit, status flags.
Phase 5 Mission Prompt rev6 | All goals use current concept IDs | No deprecated concepts | No migration note workaround | 2026-05-11