KB-50DA rev 26

P3D Pack 1 Phase 5 — Read-Only/Dry-Run TAC→IU Migration Mission Prompt (rev6)

19 min read Revision 26
p3dpack1phase5migrationdry-runrev6semantic-field-registrydeterministicphase5aold-concepts-removedno-hardcode

P3D Pack 1 Phase 5 — Read-Only/Dry-Run TAC→IU Migration Mission Prompt (rev6)

Date: 2026-05-11 | Mode: READ-ONLY / DRY-RUN — no writes, no migration, no seed Style: MISSION PROMPT — semantic field registry + deterministic resolution + bounded output Design reference: p3d-pack1-phase5-tac-to-iu-migration-design.md


0. SCOPE BLOCK

target_collection_primary       = 'information_unit'
target_collection_subordinate   = 'unit_version'
target_unit_kind                = 'law_unit'
target_vocab_key                = 'vocab.unit_kind.law_unit'

source_table_logical            = 'tac_logical_unit'
source_table_version            = 'tac_unit_version'
source_table_publication        = 'tac_publication'
source_table_pub_member         = 'tac_publication_member'

0.1 SEMANTIC CANDIDATE FIELD REGISTRY

Instead of writing column names, this prompt defines SEMANTIC CONCEPTS. Each concept has candidate labels. Agent resolves each concept against live schema using the resolution rule below.

Resolution rule (deterministic, no exceptions):

For each (concept, table):
  candidates_found = columns in table whose name matches ANY candidate label
  
  0 found → FIELD_ABSENT. Skip any query requiring this concept on this table.
  1 found → RESOLVED. Use that column name.
  >1 found → AMBIGUOUS_FIELD. Report ALL matching column names. Do NOT pick one.
             Skip any query requiring this concept on this table.
             GPT/User resolves ambiguity in review.

Registry:

Concept ID Semantic meaning Candidate labels Tables to check
address_field Canonical/unique address identifier canonical_address, address, uri, path, slug source_table_logical, target_collection_primary
content_body Main text content body, content, text, raw_content, markdown source_table_version, target_collection_subordinate
content_hash Hash digest of content content_hash, hash, checksum, digest source_table_version, target_collection_subordinate
provenance_json_profile JSONB structured metadata/provenance container content_profile, profile, metadata_profile target_collection_subordinate
provenance_text_note Free-text provenance/origin note provenance, provenance_note, origin target_collection_subordinate, source_table_version
parent_ref Reference to parent entity parent_id, parent_ref, parent_or_container_ref, container_ref source_table_logical, target_collection_primary
sort_order Ordering/sequencing field (intra-unit) sort_order, order, sort, position, rank, sequence source_table_logical
section_type Classification of content section section_type, type, kind, category source_table_logical, target_collection_primary
lifecycle Status/lifecycle state lifecycle_status, status, state source_table_logical, target_collection_primary
publication_ref FK to publication table publication_id, publication_ref, pub_id, document_id source_table_pub_member
logical_unit_ref FK to logical unit table logical_unit_id, unit_id, lu_id, logical_unit_ref source_table_pub_member
publication_render_order Intra-publication ordering render_order, pub_order, display_order, render_sequence source_table_pub_member
title_field Title/name of content title, name, display_name, heading source_table_version, target_collection_subordinate
description_field Description/summary description, summary, abstract, desc source_table_version, target_collection_subordinate
--- Registry/Species domain ---
species_identifier Unique species code species_code, species_id, species entity_species, species_collection_map, collection_registry, birth_registry
species_display Human-readable species name display_name, name, species_name, label, title entity_species
species_composition Composition level of species composition_level, composition, level entity_species, birth_registry
species_management Management/governance mode of species management_mode, mode, governance, management entity_species
species_hierarchy_parent Parent species reference parent_id, parent_ref, parent_species_id entity_species
species_hierarchy_depth Tree depth depth, tree_depth, level entity_species
collection_table_key Collection PG table name (FK/JOIN key) collection_name, table_name, collection collection_registry, species_collection_map, birth_registry
collection_display_name Human-readable collection label name, display_name, title, name_en collection_registry
governance_role Collection governance classification governance_role, role, governance collection_registry, birth_registry
migration_state Migration status migration_state, state, migration collection_registry
birth_strategy Birth code strategy birth_code_strategy, strategy, birth_strategy collection_registry
mapping_primary Primary mapping flag is_primary, primary, is_default species_collection_map
disc_field Discriminator field name discriminator_field, disc_field species_collection_map
disc_value Discriminator value discriminator_value, disc_value species_collection_map
disc_operator Discriminator operator discriminator_operator, disc_operator species_collection_map
disc_config Discriminator config discriminator_config, disc_config species_collection_map
birth_entity Birth entity identifier entity_code, code, entity_id birth_registry

Agent must resolve ALL concepts against ALL listed tables in a single introspection pass (GATE-0 extension). Results form the "resolved field map" used by all subsequent goals. No goal may reference a column name directly — only concept IDs from this registry.


1. GATE-0 — Scope preflight + field resolution

Phase 1: Table existence

Check What
P1-P4 All 4 source tables exist (relkind='r', public schema)
P5-P6 Both target tables exist
P7 collection_registry rows for both targets
P8 dot_config key = target_vocab_key
P8b-P8e Registry tables exist (relkind='r', public): entity_species, species_collection_map, collection_registry, birth_registry

Any P1-P8e fail → STOP. P9-P14 informational (PRESENT/ABSENT, non-fatal).

Phase 2: Semantic field resolution

For EVERY (concept, table) pair in §0.1 registry:

1. Query information_schema.columns for that table
2. Check which candidate labels match actual column names
3. Record: RESOLVED(column_name) / FIELD_ABSENT / AMBIGUOUS_FIELD(list)

Output: Resolved field map — a table of (concept_id, scope_table, resolution_status, resolved_column_name_or_null). This map is THE authoritative reference for all subsequent goals. If a goal needs a concept that is ABSENT or AMBIGUOUS on a table → that goal's sub-query for that table is SKIPPED.

Phase 3: Function existence

Check What
P9 fn_iu_create exists in public
P10 fn_iu_create_plan exists in public
P11 fn_content_hash exists in public (or equivalent hash function)
P12 fn_iu_birth_gate_layer1 exists
P13 fn_iu_birth_gate_layer2 exists
P14 fn_iu_verify_invariants exists

P9-P14: PRESENT or ABSENT. Non-fatal (informational for G8/G9).


2. GOALS (all column references via concept IDs from §0.1 registry — no old/deprecated concept names)

G1 — Source row accounting

  • Compute: SELECT count(*) per source table. For each source table, also query distinct values of RESOLVED categorical concepts (section_type, lifecycle) with counts.
  • Column references: section_type concept on source_table_logical. lifecycle concept on source_table_logical. If ABSENT → skip that breakdown.
  • Output: Counts per table. Distinct breakdowns for resolved concepts.

G2 — Schema alignment: source→target field mapping

  • Compute: For each (source_table, target_table) pair, compare the resolved field map entries. For each concept: both RESOLVED (aligned), source RESOLVED + target ABSENT (migration gap), target RESOLVED + source ABSENT (target-only field).
  • Also: List ALL columns on each table (not just registered concepts) to catch fields outside the registry.
  • Output: Alignment table per pair. Gap analysis. Fields not covered by any concept (agent reports them as UNREGISTERED_FIELD for GPT review).

G3 — Nesting evidence

  • Column reference: parent_ref concept on source_table_logical.
  • If RESOLVED: Count roots (NULL parent) vs children (non-NULL). Recursive depth (WITH RECURSIVE, limit 10). Distribution by depth level. Parent→child section_type concept pairs (if section_type RESOLVED).
  • If ABSENT: Report parent_ref FIELD_ABSENT on source_table_logical. Nesting analysis = N/A. Nesting decision defaults to "flatten" evidence.
  • Output: Nesting structure. This is THE input for design §D nesting decision.

G4 — Publication structure

  • Column references: publication_ref concept on source_table_pub_member (FK to publication). logical_unit_ref concept on source_table_pub_member (FK to logical unit). publication_render_order concept on source_table_pub_member.
  • Compute: Introspect source_table_publication columns (SELECT * for full dump). Count publications. If publication_ref RESOLVED: count members per publication via JOIN. If logical_unit_ref RESOLVED: verify 1:1 with source_table_logical. If publication_render_order RESOLVED: verify ordering integrity (gaps, duplicates per publication).
  • Output: Publication count, membership distribution per publication, ordering analysis.

G5 — Canonical address collision check

  • Column reference: address_field concept on source_table_logical AND target_collection_primary.
  • If both RESOLVED: Compute intersection: source addresses that already exist in target.
  • If either ABSENT: Report which side is ABSENT. Collision check = N/A.
  • Output: Collision count + sample (if any). Zero = PASS.

G6 — Hash/provenance projection

  • Column references: content_body on source_table_version + target_collection_subordinate. content_hash on source_table_version + target_collection_subordinate. provenance_json_profile on target_collection_subordinate (for structured TAC hash provenance per Phase 4B addendum §7). provenance_text_note on target_collection_subordinate (informational only).
  • Function reference: fn_content_hash (from GATE-0 P11).
  • If content_body + content_hash RESOLVED on source, and fn_content_hash PRESENT:
    • Sample 5 source rows: compute fn_content_hash(source.content_body) and compare with source.content_hash. Show whether recipes differ (re-verify live, don't assume).
    • Project: target would get fn_content_hash(migrated_body) as its hash. Source's original hash becomes provenance.
  • If provenance_json_profile RESOLVED on target: Report its data_type (verify JSONB). Check if existing rows have this field populated. This is THE carrier for source_hashes.tac_v1 per Phase 4B contract.
  • If provenance_text_note RESOLVED on target: Report as informational (editor note, not structured hash provenance).
  • If any concept ABSENT: Report which. Hash/provenance analysis partial.
  • Output: Hash recipe comparison, provenance storage feasibility.

G7 — Species/composition per nesting option (NO interpretation by agent)

  • Source: Live entity_species (SELECT * for all rows + all columns). Re-verify Phase 4C governance_role match live.
  • Concept references: species_composition concept on entity_species. species_management concept on entity_species. governance_role concept on collection_registry (for target).
  • Compute:
    1. If species_composition concept RESOLVED on entity_species → query ALL distinct values of that resolved column (live, not listed from memory).
    2. If RESOLVED → group ALL species rows by that resolved column.
    3. If species_management concept RESOLVED on entity_species AND governance_role concept RESOLVED on collection_registry → for each group, count species where resolved species_management value equals resolved governance_role value of target_collection_primary. Label: PLAUSIBLE (match), CONFLICT (mismatch), UNKNOWN (either concept ABSENT or AMBIGUOUS).
    4. If species_composition ABSENT or AMBIGUOUS → composition grouping = UNKNOWN. Still dump all species via SELECT * for GPT review.
    5. If species_management or governance_role ABSENT or AMBIGUOUS → PLAUSIBLE/CONFLICT labels = UNKNOWN for all species.
  • Agent does NOT interpret which composition "allows containment" or "is simplest." Agent presents grouping + labels. GPT/User applies Điều 0-B.
  • Output: Table: resolved_composition_value | species_count | plausible_count | conflict_count | unknown_count | sample_species (from live SELECT *). All labelled candidate_not_approved.

G8 — IU creation pathway analysis (bounded output)

  • For each function in GATE-0 P9-P14 that is PRESENT:
    • pg_get_function_identity_arguments(oid) → identity args string
    • proargnames → parameter name array
    • prorettype::regtype::text → return type
    • If fn_iu_create: does its identity_args include a parameter whose name matches parent_ref concept candidates? (Check by string containment on identity_args, not by assuming a specific name.)
  • Trigger inventory for target_collection_primary:
    • SELECT tgname, tgtype, tgfoid::regproc AS function_name FROM pg_trigger WHERE tgrelid = '<target>'::regclass AND NOT tgisinternal ORDER BY tgname
  • Output per function: function_name | identity_args | return_type | parameter_names. Per trigger: trigger_name | trigger_type | function_called.
  • If a function is ABSENT: Report ABSENT. Mark related analysis as UNKNOWN.
  • Agent does NOT interpret function behavior from source code. Agent provides identity/signature/trigger inventory. GPT/User interprets.

G9 — Birth gate field requirements (bounded output)

  • For fn_iu_birth_gate_layer1 and fn_iu_birth_gate_layer2 (if PRESENT):
    • Capture function source via pg_get_functiondef.
    • Extract: every NEW.<field_name> reference in the source text. List them as gate_required_fields.
    • For each gate_required_field: check if it exists on target_collection_primary (via resolved field map or direct introspection).
    • Label: SATISFIED (field exists on target), GAP (field absent), UNKNOWN (can't determine).
  • Output: Per gate function: field_name | exists_on_target | status (SATISFIED/GAP/UNKNOWN).

G10 — Pilot document selection candidates (metric-only, NO subjective ranking)

  • Source: source_table_publication (all rows via SELECT *).
  • If publication_ref RESOLVED on source_table_pub_member: JOIN to source_table_publication to count members per publication.
  • If logical_unit_ref RESOLVED on source_table_pub_member AND section_type RESOLVED on source_table_logical: Count distinct section_type values per publication via pub_member→logical_unit JOIN (= diversity metric).
  • If publication_render_order RESOLVED on source_table_pub_member: Report ordering integrity per publication (min, max, gaps, duplicates).
  • Compute per publication (ALL publications, not a filtered subset):
    • member_count (total logical units in publication)
    • diversity_count (distinct section_type values, if concept RESOLVED; else UNKNOWN)
    • nesting_depth (max parent_ref depth within this publication's members, if parent_ref RESOLVED; else UNKNOWN)
  • Sort: by diversity_count DESC, member_count ASC (deterministic). If diversity_count = UNKNOWN, sort by member_count ASC only.
  • Output: ALL publications with metrics. Agent does NOT filter by "manageable" or "not the largest." Agent does NOT pick favorites. GPT/User selects pilot from the full list.
  • All rows labelled candidate_not_approved.

G11 — Rollback input computation

  • If address_field RESOLVED on source_table_logical: Count total source rows (= projected insert count). If publication_ref RESOLVED on source_table_pub_member: group by publication to compute per-publication projected insert count for per-batch rollback sizing.
  • Output: Total projected migration size. Per-publication breakdown.

3. ACCEPTANCE CRITERIA

# Criterion PASS
1 GATE-0 Phase 1-3 all reported Each check: PASS/FAIL/PRESENT/ABSENT
2 Resolved field map complete Every (concept, table) pair resolved
3 G1-G11 all attempted Evidence, skip reason, or FIELD_ABSENT
4 Zero writes Confirmed
5 No column name from memory All via resolved field map
6 No "or similar" language All resolution deterministic
7 No composition interpretation by agent G7 groups only, no "simplest"/"allows containment"
8 G8/G9 output bounded Identity/params/triggers/fields, with UNKNOWN
9 AMBIGUOUS_FIELD handled Reported, not silently resolved
10 All candidates labelled candidate_not_approved
11 Scale brittleness self-audit §5

4. CONSTRAINTS

  • No writes. No migration. No seed. No backfill.
  • No column names outside resolved field map. Every field reference → concept ID → resolved via §0.1 registry.
  • No fuzzy resolution. 0=ABSENT, 1=USE, >1=AMBIGUOUS. No "or similar."
  • No composition/species interpretation. Agent groups and presents. GPT/User interprets.
  • No function behavior interpretation. Agent provides signature/source/triggers. GPT/User interprets.
  • Re-verify everything. Phase 3/4 findings are labels only.
  • Mission-style. Agent writes queries using resolved field map.

5. SCALE BRITTLENESS SELF-AUDIT

Check Required
Column name from memory used in a query? No — all via resolved field map
"or similar" or "equivalent" in any selector? No
Agent interpreted "simplest" or "allows containment"? No
AMBIGUOUS_FIELD silently resolved? No — reported for GPT/User
Fixed counts as logic? No
Phase 3/4 labels as facts? No — re-verified
Semantic concept missing from registry? Report as UNREGISTERED_FIELD

6. REPORT REQUIREMENTS

Save to: knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase5-tac-to-iu-migration-dryrun-report.md

Must include: GATE-0 (tables + resolved field map + functions), G1-G11 evidence, self-audit, status flags.


Phase 5 Mission Prompt rev6 | All goals use current concept IDs | No deprecated concepts | No migration note workaround | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-pack1-phase5-readonly-dryrun-tac-to-iu-migration-prompt.md