KB-EA35

P3D Pack 1 Phase 5 — TAC→IU Migration Design

13 min read Revision 1
p3dpack1phase5migration-designnesting-firsttac-to-iu

P3D Pack 1 Phase 5 — TAC→IU Migration Design

Date: 2026-05-11 Author: Opus 4.7 Status: DESIGN — no migration, no seed, no backfill Prerequisites: Phase 4 vocab PASS, Phase 4B-4D species/composition deferred to THIS phase Decision: Option 2 (defer species/mapping/backfill to Phase 5)


A. Purpose and non-goals

Purpose: Design the TAC→IU migration so that TAC logical units become native IU law_units with correct species, composition, birth records, and provenance — in one coherent pass.

Non-goals (explicitly out of scope):

  • DDL changes to any table
  • Function/trigger patches (fn_birth_registry_auto, fn_iu_create, etc.)
  • Directus/Nuxt/Qdrant changes
  • Dropping/replacing TAC tables (TAC stays as read-only archive)
  • New DOT tool development
  • Species creation outside evidence-supported needs

B. Completed prerequisites (Phase 1–4D)

Phase What was achieved Reference
Phase 1 Inventory/design complete (internal)
Phase 2 IU schema extended (new columns on information_unit + unit_version) (DDL reports)
Phase 3 Hash/planner/birth investigation: TAC vs IU hash recipes documented, planner behavior mapped, birth gates cataloged ...reports/p3d-pack1-phase3-...report.md
Phase 4 vocab 14 vocab keys committed to dot_config; planner resolves law_unit=plan_ok ...reports/p3d-pack1-phase4-vocab-...report.md
Phase 4B Species/composition/registry discovery: 40 species, 153 mappings, IU unmapped, fn_birth_registry_auto documented, discriminator dormant ...reports/p3d-pack1-phase4b-...report.md
Phase 4C Dry-run: 8 PLAUSIBLE candidates, deterministic labels, no semantic fit confirmed ...reports/p3d-pack1-phase4c-...report.md
Phase 4D Decision memo: defer species to Phase 5; dependency chain = nesting → composition → species ...design/p3d-pack1-phase4d-...memo.md

C. Live TAC→IU source/target model summary

Source and target tables are scope-declared in the Phase 5 prompt (§0 SCOPE BLOCK). This section describes the conceptual mapping only — actual column names, counts, and structures come from live introspection.

Source (TAC family): TAC logical units contain the text content. TAC unit versions hold versioned content snapshots. TAC publications group logical units into published documents.

Target (IU family): IU information_unit is the master entity. IU unit_version holds versioned content. IU has fn_iu_create as the canonical writer, with birth gates (layer1/layer2) and invariants.

Key difference: TAC's canonical writer is different from IU's. Migration cannot simply INSERT directly — it must either use fn_iu_create (which enforces birth gates) or INSERT with explicit gate compatibility (risky). Design must decide.


D. Nesting strategy options

This is the FIRST decision. Everything downstream depends on it.

D1: Preserve TAC parent→child

What: Populate information_unit.parent_or_container_ref with the parent IU's id/ref during migration. IU rows that have children CONTAIN other entities.

Composition consequence (Điều 0-B): Parent IU rows are molecule or compound (they contain other IU rows). Leaf IU rows may be atom.

Species consequence: Cannot use a single-composition species for all IU rows (parent=compound, leaf=atom). Need either: per-unit_kind species, per-depth species, or a species with "mixed" composition (which Điều 0-B doesn't define).

Migration complexity: Medium — must resolve parent→child ordering within fn_iu_create or INSERT sequence. Deferred FK constraints may help (Phase 3 found fk_initially_deferred=true).

D2: Flatten

What: All IU rows are independent siblings. parent_or_container_ref stays NULL. TAC's nesting is not transferred — sort_order and publication membership serve as the structural reference.

Composition consequence: All IU rows are atom (no entity contains another entity). Simple.

Species consequence: One species with composition=atom works for all IU rows.

Migration complexity: Low — no parent ordering needed. But structural fidelity to TAC is lost.

D3: Hybrid

What: Preserve nesting in metadata (e.g., a JSONB field, or publication_member ordering) but NOT in parent_or_container_ref. Or: use universal_edges with edge_type CONTAINS instead of FK.

Composition consequence: Debatable — if containment is in metadata-only, per Điều 0-B "entity contains what inside?" = sub-atomic metadata, so atom. If containment is via edges, Điều 0-B is ambiguous.

Migration complexity: Medium — must decide which metadata carries the nesting information.

Assessment framework for GPT/User

The nesting decision should be made by answering:

  1. Render fidelity: Can the Nuxt Laws Page render the correct document structure (headings, sub-sections, nested lists) from flat IU rows + sort_order? Or does it need parent_or_container_ref for tree navigation?
  2. Query patterns: Will consumers query "all children of this section"? Or only "all units in this publication sorted by render_order"?
  3. Future editing: When a user edits a law, do they edit individual IU rows or entire publication trees? Parent→child constrains editing (can't move a child without updating parent).

Phase 5 dry-run prompt should gather evidence to inform these answers (publication structure, render_order patterns, existing IU consumer analysis).


E. Composition consequences per nesting option

Nesting Parent IU composition Leaf IU composition Single species possible?
D1 (preserve) molecule or compound atom No — mixed composition
D2 (flatten) n/a (no parents) atom (all) Yes — uniform atom
D3 (hybrid/metadata) atom (containment is sub-atomic metadata) atom Yes — uniform atom (if Điều 0-B accepts metadata-containment as sub-atomic)

F. Species/mapping/QT-001 consequences per nesting option

Nesting Species approach QT-001 backfill
D1 (preserve) New species with composition=compound (for parent rows), or per-depth species. Discrimination by unit_kind or nesting depth. fn_birth_registry_auto can't discriminate → migration script assigns explicitly. All rows (pilot + migrated) in one pass. Parent rows get compound, leaf rows get atom (or vice versa).
D2 (flatten) One species with composition=atom for all IU. Simpler. Can reuse an existing observed/atom species from evidence (Phase 4C: 5 PLAUSIBLE atom candidates). All rows get same species/composition in one pass.
D3 (hybrid) Same as D2 if metadata-containment = sub-atomic. Same as D2.

G. Migration mapping: TAC → IU

Actual column-level mapping depends on live introspection of both schemas. The prompt must discover actual columns on both sides and compute the mapping. Conceptual intent:

TAC source IU target Notes
tac_logical_unit row information_unit row One-to-one per logical unit
tac_logical_unit.canonical_address information_unit.canonical_address Must be unique in IU; collision check required
tac_logical_unit parent hierarchy information_unit.parent_or_container_ref (if D1) or NULL (if D2/D3) Nesting decision drives this
tac_logical_unit.section_type information_unit.section_type (if column exists) Verify column existence
tac_unit_version row unit_version row One-to-one per version per logical unit
tac_unit_version content fields (title, body, description) unit_version equivalent fields Verify column mapping live
tac_unit_version.content_hash unit_version provenance (NOT content_hash — IU recomputes its own hash) Provenance contract from Phase 4B legal addendum §7
tac_publication membership IU publication membership (TBD — IU publication model not yet designed) May need separate design pass

H. Hash/provenance policy

From Phase 3 + Phase 4B legal alignment addendum:

  • IU content_hash: Computed by fn_content_hash(body) — sha256 of body only. IU birth gate expects this. Migration must let IU compute its own hash (don't copy TAC's hash into content_hash).
  • TAC provenance: The original TAC content_hash (composite: title|body|description|content_profile) is preserved as a provenance reference in unit_version.content_profile.source_hashes.tac_v1. This is a JSONB field, document-only contract from Phase 4B §7. Migration may or may not populate this depending on whether the content_profile JSONB path exists.
  • Phase 5 dry-run must verify: Does unit_version have a content_profile JSONB column? Can it hold source_hashes.tac_v1? If not, provenance is deferred to a separate DDL step.

I. Parent/child mapping and render_order

If D1 (preserve nesting):

  • TAC's parent column (detected live, probably parent_id) maps to IU's parent column (detected live, probably parent_or_container_ref)
  • Must resolve INSERT ordering: parents before children, or use deferred FK constraints
  • sort_order from TAC maps to IU ordering (verify column existence)

If D2/D3 (flatten or hybrid):

  • parent_or_container_ref stays NULL
  • sort_order preserved as metadata for render ordering within publication
  • Publication membership determines document structure

J. Batch strategy and pilot-first plan

  1. Pilot first: Select ONE TAC publication (chosen by GPT/User from dry-run evidence, not hardcoded by Opus). Migrate its logical units + versions into IU. Verify render fidelity, gate compliance, species assignment.
  2. Batch migration: After pilot verified, migrate remaining publications in batches. Batch size from dry-run evidence (publication structure, typical unit count per publication).
  3. Transaction model: Each publication = one transaction. COMMIT per publication after verification. Rollback per publication on failure.

K. Verification gates

Gate What it checks When
V1: Row accounting TAC source count = IU migrated count (per publication) Post-migration per batch
V2: No duplicate canonical_address UNIQUE constraint on information_unit.canonical_address Pre-migration collision check + DB constraint
V3: fn_iu_verify_invariants 5 invariants pass for each migrated IU Post-INSERT per row or per batch
V4: Content hash consistency IU content_hash matches fn_content_hash(migrated_body) Post-migration per version
V5: Birth registry completeness Every migrated IU has a birth_registry row with non-NULL species/composition Post-migration per batch
V6: Render fidelity Migrated content renders identically to TAC source (0 drift) Post-pilot manual/automated check

L. Rollback/restore strategy

  • Per-publication rollback: If a batch fails, delete only that publication's IU rows (exact keys from migration report). TAC source is untouched (read-only archive).
  • No TAC deletion: TAC tables remain as-is after migration. They serve as the archive and provenance source.
  • Exact-key rollback (per Phase 4 pattern): Migration script captures inserted IU keys via RETURNING. Rollback uses exact key list, not pattern matching.

M. Post-implementation design requirement

Per operating note: after any migration execution is accepted, a post-implementation design must be produced documenting: what was migrated, how, species/composition assigned, row counts, and how to reverse. Reference: ...operating-notes/design-after-repair-implementation-rule-2026-05-11.md.


N. Recommendation and next executable pack boundaries

Phase 5 is too large for one executable pack. Proposed sub-phases:

Sub-phase Content Depends on
5A Nesting decision (GPT/User, informed by dry-run evidence) Dry-run results
5B Species/composition resolution (follows from 5A nesting decision) + species_collection_map seed + QT-001 backfill for existing + new IU rows 5A decision
5C Pilot migration (one publication) 5B species resolved
5D Full batch migration 5C pilot verified
5E Post-implementation design 5D complete

The dry-run prompt (below) gathers evidence for 5A + early 5B/5C inputs.


Phase 5 Design | Nesting-first | No hardcode | No migration | 2026-05-11