P3D Pack 1 Phase 5 — TAC→IU Migration Design

Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no migration, no seed, no backfill Prerequisites: Phase 4 vocab PASS, Phase 4B-4D species/composition deferred to THIS phase Decision: Option 2 (defer species/mapping/backfill to Phase 5)

A. Purpose and non-goals

Purpose: Design the TAC→IU migration so that TAC logical units become native IU law_units with correct species, composition, birth records, and provenance — in one coherent pass.

Non-goals (explicitly out of scope):

DDL changes to any table
Function/trigger patches (fn_birth_registry_auto, fn_iu_create, etc.)
Directus/Nuxt/Qdrant changes
Dropping/replacing TAC tables (TAC stays as read-only archive)
New DOT tool development
Species creation outside evidence-supported needs

B. Completed prerequisites (Phase 1–4D)

Phase	What was achieved	Reference
Phase 1	Inventory/design complete	(internal)
Phase 2	IU schema extended (new columns on information_unit + unit_version)	(DDL reports)
Phase 3	Hash/planner/birth investigation: TAC vs IU hash recipes documented, planner behavior mapped, birth gates cataloged	`...reports/p3d-pack1-phase3-...report.md`
Phase 4 vocab	14 vocab keys committed to dot_config; planner resolves law_unit=plan_ok	`...reports/p3d-pack1-phase4-vocab-...report.md`
Phase 4B	Species/composition/registry discovery: 40 species, 153 mappings, IU unmapped, fn_birth_registry_auto documented, discriminator dormant	`...reports/p3d-pack1-phase4b-...report.md`
Phase 4C	Dry-run: 8 PLAUSIBLE candidates, deterministic labels, no semantic fit confirmed	`...reports/p3d-pack1-phase4c-...report.md`
Phase 4D	Decision memo: defer species to Phase 5; dependency chain = nesting → composition → species	`...design/p3d-pack1-phase4d-...memo.md`

C. Live TAC→IU source/target model summary

Source and target tables are scope-declared in the Phase 5 prompt (§0 SCOPE BLOCK). This section describes the conceptual mapping only — actual column names, counts, and structures come from live introspection.

Source (TAC family): TAC logical units contain the text content. TAC unit versions hold versioned content snapshots. TAC publications group logical units into published documents.

Target (IU family): IU information_unit is the master entity. IU unit_version holds versioned content. IU has fn_iu_create as the canonical writer, with birth gates (layer1/layer2) and invariants.

Key difference: TAC's canonical writer is different from IU's. Migration cannot simply INSERT directly — it must either use fn_iu_create (which enforces birth gates) or INSERT with explicit gate compatibility (risky). Design must decide.

D. Nesting strategy options

This is the FIRST decision. Everything downstream depends on it.

D1: Preserve TAC parent→child

What: Populate information_unit.parent_or_container_ref with the parent IU's id/ref during migration. IU rows that have children CONTAIN other entities.

Composition consequence (Điều 0-B): Parent IU rows are molecule or compound (they contain other IU rows). Leaf IU rows may be atom.

Species consequence: Cannot use a single-composition species for all IU rows (parent=compound, leaf=atom). Need either: per-unit_kind species, per-depth species, or a species with "mixed" composition (which Điều 0-B doesn't define).

Migration complexity: Medium — must resolve parent→child ordering within fn_iu_create or INSERT sequence. Deferred FK constraints may help (Phase 3 found fk_initially_deferred=true).

D2: Flatten

What: All IU rows are independent siblings. parent_or_container_ref stays NULL. TAC's nesting is not transferred — sort_order and publication membership serve as the structural reference.

Composition consequence: All IU rows are atom (no entity contains another entity). Simple.

Species consequence: One species with composition=atom works for all IU rows.

Migration complexity: Low — no parent ordering needed. But structural fidelity to TAC is lost.

D3: Hybrid

What: Preserve nesting in metadata (e.g., a JSONB field, or publication_member ordering) but NOT in parent_or_container_ref. Or: use universal_edges with edge_type CONTAINS instead of FK.

Composition consequence: Debatable — if containment is in metadata-only, per Điều 0-B "entity contains what inside?" = sub-atomic metadata, so atom. If containment is via edges, Điều 0-B is ambiguous.

Migration complexity: Medium — must decide which metadata carries the nesting information.

Assessment framework for GPT/User

The nesting decision should be made by answering:

Render fidelity: Can the Nuxt Laws Page render the correct document structure (headings, sub-sections, nested lists) from flat IU rows + sort_order? Or does it need parent_or_container_ref for tree navigation?
Query patterns: Will consumers query "all children of this section"? Or only "all units in this publication sorted by render_order"?
Future editing: When a user edits a law, do they edit individual IU rows or entire publication trees? Parent→child constrains editing (can't move a child without updating parent).

Phase 5 dry-run prompt should gather evidence to inform these answers (publication structure, render_order patterns, existing IU consumer analysis).

E. Composition consequences per nesting option

Nesting	Parent IU composition	Leaf IU composition	Single species possible?
D1 (preserve)	molecule or compound	atom	No — mixed composition
D2 (flatten)	n/a (no parents)	atom (all)	Yes — uniform atom
D3 (hybrid/metadata)	atom (containment is sub-atomic metadata)	atom	Yes — uniform atom (if Điều 0-B accepts metadata-containment as sub-atomic)

F. Species/mapping/QT-001 consequences per nesting option

Nesting	Species approach	QT-001 backfill
D1 (preserve)	New species with composition=compound (for parent rows), or per-depth species. Discrimination by unit_kind or nesting depth. fn_birth_registry_auto can't discriminate → migration script assigns explicitly.	All rows (pilot + migrated) in one pass. Parent rows get compound, leaf rows get atom (or vice versa).
D2 (flatten)	One species with composition=atom for all IU. Simpler. Can reuse an existing observed/atom species from evidence (Phase 4C: 5 PLAUSIBLE atom candidates).	All rows get same species/composition in one pass.
D3 (hybrid)	Same as D2 if metadata-containment = sub-atomic.	Same as D2.

G. Migration mapping: TAC → IU

Actual column-level mapping depends on live introspection of both schemas. The prompt must discover actual columns on both sides and compute the mapping. Conceptual intent:

TAC source	IU target	Notes
tac_logical_unit row	information_unit row	One-to-one per logical unit
tac_logical_unit.canonical_address	information_unit.canonical_address	Must be unique in IU; collision check required
tac_logical_unit parent hierarchy	information_unit.parent_or_container_ref (if D1) or NULL (if D2/D3)	Nesting decision drives this
tac_logical_unit.section_type	information_unit.section_type (if column exists)	Verify column existence
tac_unit_version row	unit_version row	One-to-one per version per logical unit
tac_unit_version content fields (title, body, description)	unit_version equivalent fields	Verify column mapping live
tac_unit_version.content_hash	unit_version provenance (NOT content_hash — IU recomputes its own hash)	Provenance contract from Phase 4B legal addendum §7
tac_publication membership	IU publication membership (TBD — IU publication model not yet designed)	May need separate design pass

H. Hash/provenance policy

From Phase 3 + Phase 4B legal alignment addendum:

IU content_hash: Computed by fn_content_hash(body) — sha256 of body only. IU birth gate expects this. Migration must let IU compute its own hash (don't copy TAC's hash into content_hash).
TAC provenance: The original TAC content_hash (composite: title|body|description|content_profile) is preserved as a provenance reference in unit_version.content_profile.source_hashes.tac_v1. This is a JSONB field, document-only contract from Phase 4B §7. Migration may or may not populate this depending on whether the content_profile JSONB path exists.
Phase 5 dry-run must verify: Does unit_version have a content_profile JSONB column? Can it hold source_hashes.tac_v1? If not, provenance is deferred to a separate DDL step.

I. Parent/child mapping and render_order

If D1 (preserve nesting):

TAC's parent column (detected live, probably parent_id) maps to IU's parent column (detected live, probably parent_or_container_ref)
Must resolve INSERT ordering: parents before children, or use deferred FK constraints
sort_order from TAC maps to IU ordering (verify column existence)

If D2/D3 (flatten or hybrid):

parent_or_container_ref stays NULL
sort_order preserved as metadata for render ordering within publication
Publication membership determines document structure

J. Batch strategy and pilot-first plan

Pilot first: Select ONE TAC publication (chosen by GPT/User from dry-run evidence, not hardcoded by Opus). Migrate its logical units + versions into IU. Verify render fidelity, gate compliance, species assignment.
Batch migration: After pilot verified, migrate remaining publications in batches. Batch size from dry-run evidence (publication structure, typical unit count per publication).
Transaction model: Each publication = one transaction. COMMIT per publication after verification. Rollback per publication on failure.

K. Verification gates

Gate	What it checks	When
V1: Row accounting	TAC source count = IU migrated count (per publication)	Post-migration per batch
V2: No duplicate canonical_address	UNIQUE constraint on information_unit.canonical_address	Pre-migration collision check + DB constraint
V3: fn_iu_verify_invariants	5 invariants pass for each migrated IU	Post-INSERT per row or per batch
V4: Content hash consistency	IU content_hash matches fn_content_hash(migrated_body)	Post-migration per version
V5: Birth registry completeness	Every migrated IU has a birth_registry row with non-NULL species/composition	Post-migration per batch
V6: Render fidelity	Migrated content renders identically to TAC source (0 drift)	Post-pilot manual/automated check

L. Rollback/restore strategy

Per-publication rollback: If a batch fails, delete only that publication's IU rows (exact keys from migration report). TAC source is untouched (read-only archive).
No TAC deletion: TAC tables remain as-is after migration. They serve as the archive and provenance source.
Exact-key rollback (per Phase 4 pattern): Migration script captures inserted IU keys via RETURNING. Rollback uses exact key list, not pattern matching.

M. Post-implementation design requirement

Per operating note: after any migration execution is accepted, a post-implementation design must be produced documenting: what was migrated, how, species/composition assigned, row counts, and how to reverse. Reference: ...operating-notes/design-after-repair-implementation-rule-2026-05-11.md.

N. Recommendation and next executable pack boundaries

Phase 5 is too large for one executable pack. Proposed sub-phases:

Sub-phase	Content	Depends on
5A	Nesting decision (GPT/User, informed by dry-run evidence)	Dry-run results
5B	Species/composition resolution (follows from 5A nesting decision) + species_collection_map seed + QT-001 backfill for existing + new IU rows	5A decision
5C	Pilot migration (one publication)	5B species resolved
5D	Full batch migration	5C pilot verified
5E	Post-implementation design	5D complete

The dry-run prompt (below) gathers evidence for 5A + early 5B/5C inputs.

Phase 5 Design | Nesting-first | No hardcode | No migration | 2026-05-11