P3D Pack 1 Phase 5 — TAC→IU Migration Design
P3D Pack 1 Phase 5 — TAC→IU Migration Design
Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no migration, no seed, no backfill Prerequisites: Phase 4 vocab PASS, Phase 4B-4D species/composition deferred to THIS phase Decision: Option 2 (defer species/mapping/backfill to Phase 5)
A. Purpose and non-goals
Purpose: Design the TAC→IU migration so that TAC logical units become native IU law_units with correct species, composition, birth records, and provenance — in one coherent pass.
Non-goals (explicitly out of scope):
- DDL changes to any table
- Function/trigger patches (fn_birth_registry_auto, fn_iu_create, etc.)
- Directus/Nuxt/Qdrant changes
- Dropping/replacing TAC tables (TAC stays as read-only archive)
- New DOT tool development
- Species creation outside evidence-supported needs
B. Completed prerequisites (Phase 1–4D)
| Phase | What was achieved | Reference |
|---|---|---|
| Phase 1 | Inventory/design complete | (internal) |
| Phase 2 | IU schema extended (new columns on information_unit + unit_version) | (DDL reports) |
| Phase 3 | Hash/planner/birth investigation: TAC vs IU hash recipes documented, planner behavior mapped, birth gates cataloged | ...reports/p3d-pack1-phase3-...report.md |
| Phase 4 vocab | 14 vocab keys committed to dot_config; planner resolves law_unit=plan_ok | ...reports/p3d-pack1-phase4-vocab-...report.md |
| Phase 4B | Species/composition/registry discovery: 40 species, 153 mappings, IU unmapped, fn_birth_registry_auto documented, discriminator dormant | ...reports/p3d-pack1-phase4b-...report.md |
| Phase 4C | Dry-run: 8 PLAUSIBLE candidates, deterministic labels, no semantic fit confirmed | ...reports/p3d-pack1-phase4c-...report.md |
| Phase 4D | Decision memo: defer species to Phase 5; dependency chain = nesting → composition → species | ...design/p3d-pack1-phase4d-...memo.md |
C. Live TAC→IU source/target model summary
Source and target tables are scope-declared in the Phase 5 prompt (§0 SCOPE BLOCK). This section describes the conceptual mapping only — actual column names, counts, and structures come from live introspection.
Source (TAC family): TAC logical units contain the text content. TAC unit versions hold versioned content snapshots. TAC publications group logical units into published documents.
Target (IU family): IU information_unit is the master entity. IU unit_version holds versioned content. IU has fn_iu_create as the canonical writer, with birth gates (layer1/layer2) and invariants.
Key difference: TAC's canonical writer is different from IU's. Migration cannot simply INSERT directly — it must either use fn_iu_create (which enforces birth gates) or INSERT with explicit gate compatibility (risky). Design must decide.
D. Nesting strategy options
This is the FIRST decision. Everything downstream depends on it.
D1: Preserve TAC parent→child
What: Populate information_unit.parent_or_container_ref with the parent IU's id/ref during migration. IU rows that have children CONTAIN other entities.
Composition consequence (Điều 0-B): Parent IU rows are molecule or compound (they contain other IU rows). Leaf IU rows may be atom.
Species consequence: Cannot use a single-composition species for all IU rows (parent=compound, leaf=atom). Need either: per-unit_kind species, per-depth species, or a species with "mixed" composition (which Điều 0-B doesn't define).
Migration complexity: Medium — must resolve parent→child ordering within fn_iu_create or INSERT sequence. Deferred FK constraints may help (Phase 3 found fk_initially_deferred=true).
D2: Flatten
What: All IU rows are independent siblings. parent_or_container_ref stays NULL. TAC's nesting is not transferred — sort_order and publication membership serve as the structural reference.
Composition consequence: All IU rows are atom (no entity contains another entity). Simple.
Species consequence: One species with composition=atom works for all IU rows.
Migration complexity: Low — no parent ordering needed. But structural fidelity to TAC is lost.
D3: Hybrid
What: Preserve nesting in metadata (e.g., a JSONB field, or publication_member ordering) but NOT in parent_or_container_ref. Or: use universal_edges with edge_type CONTAINS instead of FK.
Composition consequence: Debatable — if containment is in metadata-only, per Điều 0-B "entity contains what inside?" = sub-atomic metadata, so atom. If containment is via edges, Điều 0-B is ambiguous.
Migration complexity: Medium — must decide which metadata carries the nesting information.
Assessment framework for GPT/User
The nesting decision should be made by answering:
- Render fidelity: Can the Nuxt Laws Page render the correct document structure (headings, sub-sections, nested lists) from flat IU rows + sort_order? Or does it need parent_or_container_ref for tree navigation?
- Query patterns: Will consumers query "all children of this section"? Or only "all units in this publication sorted by render_order"?
- Future editing: When a user edits a law, do they edit individual IU rows or entire publication trees? Parent→child constrains editing (can't move a child without updating parent).
Phase 5 dry-run prompt should gather evidence to inform these answers (publication structure, render_order patterns, existing IU consumer analysis).
E. Composition consequences per nesting option
| Nesting | Parent IU composition | Leaf IU composition | Single species possible? |
|---|---|---|---|
| D1 (preserve) | molecule or compound | atom | No — mixed composition |
| D2 (flatten) | n/a (no parents) | atom (all) | Yes — uniform atom |
| D3 (hybrid/metadata) | atom (containment is sub-atomic metadata) | atom | Yes — uniform atom (if Điều 0-B accepts metadata-containment as sub-atomic) |
F. Species/mapping/QT-001 consequences per nesting option
| Nesting | Species approach | QT-001 backfill |
|---|---|---|
| D1 (preserve) | New species with composition=compound (for parent rows), or per-depth species. Discrimination by unit_kind or nesting depth. fn_birth_registry_auto can't discriminate → migration script assigns explicitly. | All rows (pilot + migrated) in one pass. Parent rows get compound, leaf rows get atom (or vice versa). |
| D2 (flatten) | One species with composition=atom for all IU. Simpler. Can reuse an existing observed/atom species from evidence (Phase 4C: 5 PLAUSIBLE atom candidates). | All rows get same species/composition in one pass. |
| D3 (hybrid) | Same as D2 if metadata-containment = sub-atomic. | Same as D2. |
G. Migration mapping: TAC → IU
Actual column-level mapping depends on live introspection of both schemas. The prompt must discover actual columns on both sides and compute the mapping. Conceptual intent:
| TAC source | IU target | Notes |
|---|---|---|
| tac_logical_unit row | information_unit row | One-to-one per logical unit |
| tac_logical_unit.canonical_address | information_unit.canonical_address | Must be unique in IU; collision check required |
| tac_logical_unit parent hierarchy | information_unit.parent_or_container_ref (if D1) or NULL (if D2/D3) | Nesting decision drives this |
| tac_logical_unit.section_type | information_unit.section_type (if column exists) | Verify column existence |
| tac_unit_version row | unit_version row | One-to-one per version per logical unit |
| tac_unit_version content fields (title, body, description) | unit_version equivalent fields | Verify column mapping live |
| tac_unit_version.content_hash | unit_version provenance (NOT content_hash — IU recomputes its own hash) | Provenance contract from Phase 4B legal addendum §7 |
| tac_publication membership | IU publication membership (TBD — IU publication model not yet designed) | May need separate design pass |
H. Hash/provenance policy
From Phase 3 + Phase 4B legal alignment addendum:
- IU content_hash: Computed by
fn_content_hash(body)— sha256 of body only. IU birth gate expects this. Migration must let IU compute its own hash (don't copy TAC's hash into content_hash). - TAC provenance: The original TAC content_hash (composite: title|body|description|content_profile) is preserved as a provenance reference in
unit_version.content_profile.source_hashes.tac_v1. This is a JSONB field, document-only contract from Phase 4B §7. Migration may or may not populate this depending on whether the content_profile JSONB path exists. - Phase 5 dry-run must verify: Does unit_version have a
content_profileJSONB column? Can it holdsource_hashes.tac_v1? If not, provenance is deferred to a separate DDL step.
I. Parent/child mapping and render_order
If D1 (preserve nesting):
- TAC's parent column (detected live, probably
parent_id) maps to IU's parent column (detected live, probablyparent_or_container_ref) - Must resolve INSERT ordering: parents before children, or use deferred FK constraints
sort_orderfrom TAC maps to IU ordering (verify column existence)
If D2/D3 (flatten or hybrid):
- parent_or_container_ref stays NULL
- sort_order preserved as metadata for render ordering within publication
- Publication membership determines document structure
J. Batch strategy and pilot-first plan
- Pilot first: Select ONE TAC publication (chosen by GPT/User from dry-run evidence, not hardcoded by Opus). Migrate its logical units + versions into IU. Verify render fidelity, gate compliance, species assignment.
- Batch migration: After pilot verified, migrate remaining publications in batches. Batch size from dry-run evidence (publication structure, typical unit count per publication).
- Transaction model: Each publication = one transaction. COMMIT per publication after verification. Rollback per publication on failure.
K. Verification gates
| Gate | What it checks | When |
|---|---|---|
| V1: Row accounting | TAC source count = IU migrated count (per publication) | Post-migration per batch |
| V2: No duplicate canonical_address | UNIQUE constraint on information_unit.canonical_address | Pre-migration collision check + DB constraint |
| V3: fn_iu_verify_invariants | 5 invariants pass for each migrated IU | Post-INSERT per row or per batch |
| V4: Content hash consistency | IU content_hash matches fn_content_hash(migrated_body) | Post-migration per version |
| V5: Birth registry completeness | Every migrated IU has a birth_registry row with non-NULL species/composition | Post-migration per batch |
| V6: Render fidelity | Migrated content renders identically to TAC source (0 drift) | Post-pilot manual/automated check |
L. Rollback/restore strategy
- Per-publication rollback: If a batch fails, delete only that publication's IU rows (exact keys from migration report). TAC source is untouched (read-only archive).
- No TAC deletion: TAC tables remain as-is after migration. They serve as the archive and provenance source.
- Exact-key rollback (per Phase 4 pattern): Migration script captures inserted IU keys via RETURNING. Rollback uses exact key list, not pattern matching.
M. Post-implementation design requirement
Per operating note: after any migration execution is accepted, a post-implementation design must be produced documenting: what was migrated, how, species/composition assigned, row counts, and how to reverse. Reference: ...operating-notes/design-after-repair-implementation-rule-2026-05-11.md.
N. Recommendation and next executable pack boundaries
Phase 5 is too large for one executable pack. Proposed sub-phases:
| Sub-phase | Content | Depends on |
|---|---|---|
| 5A | Nesting decision (GPT/User, informed by dry-run evidence) | Dry-run results |
| 5B | Species/composition resolution (follows from 5A nesting decision) + species_collection_map seed + QT-001 backfill for existing + new IU rows | 5A decision |
| 5C | Pilot migration (one publication) | 5B species resolved |
| 5D | Full batch migration | 5C pilot verified |
| 5E | Post-implementation design | 5D complete |
The dry-run prompt (below) gathers evidence for 5A + early 5B/5C inputs.
Phase 5 Design | Nesting-first | No hardcode | No migration | 2026-05-11