KB-5BA4

P3D Pack 1 Phase 5B → 5C — Hybrid Pilot Migration Execution Prompt — DRAFT

23 min read Revision 1
p3dpack1phase5bphase5cpromptdraftmigrationd3-hybriddieu-35qt-001qt-005

P3D Pack 1 Phase 5B → 5C — Hybrid Pilot Migration Execution Prompt — DRAFT

Date: 2026-05-11 Author: Opus 4.7 (drafter) Status: DRAFT — NOT DISPATCH-READY. Requires GPT/User review and resolution of §10 placeholders before any Agent dispatch. Design reference: design/p3d-pack1-phase5b-hybrid-nesting-species-pilot-migration-design.md Strategy: D3 HYBRID (per GPT review 2026-05-11) — D3a (JSON in identity_profile) primary, D3b (universal_edges) deferred Pilot: DIEU-35 Mode (when dispatched): MIGRATION — writes allowed under transaction guards; per-publication COMMIT DB target: VPS PostgreSQL (postgres container, db=directus, schema=public)


0. SCOPE BLOCK (no other literal identifiers as selectors)

# Target family (Pack 1)
target_collection_primary       = 'information_unit'
target_collection_subordinate   = 'unit_version'
target_unit_kind                = 'law_unit'
target_vocab_key                = 'vocab.unit_kind.law_unit'

# Source family (TAC) — read-only throughout
source_table_logical            = 'tac_logical_unit'
source_table_version            = 'tac_unit_version'
source_table_publication        = 'tac_publication'
source_table_pub_member         = 'tac_publication_member'

# Pilot publication
pilot_doc_code                  = 'DIEU-35'    # design choice; live re-verify exists in tac_publication

# Species (PROPOSAL — locked by GPT/User in §10)
proposed_species_code           = '<DECIDED_BY_GPT>'   # Opus proposal: 'information_unit_atom'
proposed_species_entity_code    = '<DECIDED_BY_GPT>'   # Opus proposal: 'SPE-IUA'
proposed_composition_level      = 'atom'                # Opus rationale: D3 hybrid + Điều 0-B literal
proposed_species_management     = 'observed'            # matches IU governance_role

All column names, counts, vocab keys, species codes, composition values → derive live via the semantic field registry. No hardcoded selector outside the SCOPE BLOCK above.


1. GATE-0 — Read-only existence checks (must pass before any write)

GATE-0.A   All Pack 1 tables exist (Phase 5A P1–P8)
GATE-0.B   All Pack 1 functions exist (Phase 5A P9–P14)
GATE-0.C   Semantic field registry (rev6, 31 concepts) loaded and resolution map complete
GATE-0.D   Source row counts and per-publication metrics for DIEU-35 match Phase 5A evidence:
              tac_logical_unit total = 86; DIEU-35 members = 36; section_types = 12; depth = 2
           (Live re-count, not cached.)
GATE-0.E   Canonical address collision count for DIEU-35 = 0 (re-verified live).
GATE-0.F   No write/mutation has occurred up to this point.

If any GATE-0 check fails → ABORT, report, return to GPT/User.


2. PREFLIGHT (still read-only; precedes Step 1 of each phase action)

PF-1   §F species seed prerequisites (BEFORE Step 1):
         entity_species does NOT yet contain proposed_species_entity_code (no pre-existing collision)
         species_collection_map has 0 rows for collection_name = target_collection_primary

PF-2   §G backfill prerequisites (BEFORE Step 2):
         Step 1 (species seed) COMMITTED;
         species_collection_map has exactly 1 is_primary=true row for target_collection_primary;
         birth_registry rows for target_collection_primary with species_code IS NULL = 12 (live count).

PF-3   Pilot migration prerequisites (BEFORE Step 3):
         Step 2 (backfill) COMMITTED;
         birth_registry rows with species_code IS NULL for target_collection_primary = 0;
         vocab.section_type.* exists for each of the 12 distinct section_types in DIEU-35 (live);
         vocab.publication_type.law exists in dot_config;
         vocab.unit_kind.law_unit exists (Phase 4 confirmed);
         fn_iu_create signature matches Phase 5A G8 (9 args including p_parent_ref uuid).

PF-4   Identity_profile dry-construct test:
         Build one sample identity_profile JSON for DIEU-35's first row (render_order=0).
         Confirm the 5 birth-gate-required keys are present
         (title, owner_lookup_ref, primary_section_type_ref, publication_authority_ref, publication_type_ref).
         No INSERT; pure jsonb_build_object test.

PF-5   No-collision re-verify (live, immediately before Step 3):
         SELECT count(*) FROM tac_logical_unit lu
         JOIN tac_publication_member pm ON pm.<logical_unit_ref column> = lu.id
         JOIN tac_publication p ON p.id = pm.<publication_ref column>
         JOIN information_unit iu ON iu.<address_field> = lu.<address_field>
         WHERE p.<doc_code column> = pilot_doc_code;
         Expected: 0.
         (All column names from resolved field map, not hardcoded.)

If any PF-* fails → ABORT, report, do NOT proceed to the corresponding step.


3. STEP 1 — Species + mapping seed (QT-005 step 1–3) — requires §10 placeholder resolution

GOAL: Create the new species <proposed_species_entity_code> and primary mapping for target_collection_primary, so that subsequent IU INSERTs auto-assign species. No row of information_unit is touched in this step.

Operations (metadata-only, no DDL):

  1. INSERT INTO entity_species
       (species_code, display_name, composition_level, management_mode, status, depth, parent_id, prefix, kg_metadata, ...)
     VALUES
       (<decided_by_GPT>, <decided_by_GPT>, proposed_composition_level, proposed_species_management, 'active', <decided_by_GPT>, <decided_by_GPT>, <decided_by_GPT>, <decided_by_GPT>, ...)
     RETURNING id;     -- capture for rollback

  2. INSERT INTO species_collection_map
       (collection_name, species_code, is_primary, discriminator_field, discriminator_value, discriminator_operator, discriminator_config, ...)
     VALUES
       (target_collection_primary, <proposed_species_entity_code or species_code variant>, true, NULL, NULL, NULL, NULL, ...)
     RETURNING id;     -- capture for rollback

  3. (Optional, per §F.3 of design and §10 placeholder #2.) Decide whether to seed species_collection_map row for target_collection_subordinate.
     Opus default: do NOT seed; UV is subordinate with 0 birth rows.

PASS criteria for Step 1:
  - Both INSERTs succeed.
  - SELECT count(*) FROM species_collection_map WHERE collection_name = target_collection_primary AND is_primary = true → 1.
  - No row in information_unit is modified.

Rollback for Step 1: DELETE rows by captured ids. See §6.


4. STEP 2 — QT-001 backfill of 12 existing IU birth_registry rows

GOAL: Backfill species_code and composition_level for the 12 existing IU birth records that were stranded NULL before §F seed.

Operations (metadata-only):

  Count check (live):
    SELECT count(*) FROM birth_registry
    WHERE collection_name = target_collection_primary AND species_code IS NULL;
    Expected: 12.

  Backfill:
    Prefer DOT tool if present:  dot-birth-backfill --collection=information_unit
    Fallback governed SQL:
      WITH backfill AS (
        UPDATE birth_registry
          SET species_code      = <decided_by_GPT>,
              composition_level = proposed_composition_level
        WHERE collection_name = target_collection_primary
          AND species_code IS NULL
        RETURNING entity_code
      )
      INSERT INTO <a write-once rollback capture table or transient artifact>
        SELECT entity_code FROM backfill;
    -- The RETURNING capture is mandatory for exact-key rollback.

PASS criteria for Step 2:
  - Backfill RETURNING captured 12 exact entity_code values.
  - Re-count: 0 NULL species rows for target_collection_primary.
  - No row in information_unit is modified.
  - Did NOT touch any row outside collection_name = target_collection_primary.

Rollback for Step 2: UPDATE the exact captured entity_codes back to NULL. See §6.


5. STEP 3 — DIEU-35 pilot migration (36 IU + 36 UV rows)

GOAL: Migrate all 36 logical units belonging to pilot_doc_code into native IU/UV rows under D3 hybrid (atomic IU rows, hierarchy in identity_profile JSON, parent_or_container_ref NULL).

TRANSACTION (single COMMIT or single ROLLBACK):

  BEGIN;

  -- Build source set from live JOIN; ORDER BY render_order ASC.
  -- All column names from resolved field map. Sample column names below are illustrative only.

  FOR each source_row in (
    SELECT lu.id                 AS tac_lu_id,
           lu.<address_field>    AS canonical_address,
           lu.<section_type col> AS section_type,
           lu.<sort_order col>   AS sort_order,
           lu.<owner column>     AS owner_text,
           lu.<parent_ref col>   AS tac_parent_id,
           uv.id                 AS tac_uv_id,
           uv.<title col>        AS title,
           uv.<body col>         AS body,
           uv.<description col>  AS description,
           uv.<content_hash col> AS tac_v1_hash,
           pm.<publication_render_order col> AS render_order,
           p.id                  AS tac_pub_id,
           p.<doc_code col>      AS doc_code,
           p.<publication_type col> AS publication_type
    FROM tac_logical_unit lu
    JOIN tac_unit_version uv ON uv.<lu_fk col> = lu.id
    JOIN tac_publication_member pm ON pm.<logical_unit_ref> = lu.id
    JOIN tac_publication p ON p.id = pm.<publication_ref>
    WHERE p.<doc_code col> = pilot_doc_code
    ORDER BY pm.<publication_render_order col> ASC
  ) LOOP

    -- Step A: build identity_profile JSON
    v_parent_canon := (
      SELECT lu2.<address_field>
      FROM tac_logical_unit lu2
      WHERE lu2.id = source_row.tac_parent_id
    );  -- NULL for roots

    v_identity_profile := jsonb_build_object(
      'title'                       , source_row.title,
      'owner_lookup_ref'            , source_row.owner_text,
      'primary_section_type_ref'    , 'vocab.section_type.' || source_row.section_type,
      'publication_authority_ref'   , <DECIDED_BY_GPT — placeholder #4>,
      'publication_type_ref'        , 'vocab.publication_type.' || source_row.publication_type,
      'tac_parent_anchor'           , CASE WHEN source_row.tac_parent_id IS NULL THEN NULL ELSE source_row.tac_parent_id::text END,
      'tac_parent_canonical_address', v_parent_canon,
      'publication_anchor'          , source_row.doc_code,
      'publication_render_order'    , source_row.render_order,
      'tac_sort_order'              , source_row.sort_order
    );

    -- Step B: call fn_iu_create (signature from Phase 5A G8)
    v_result := fn_iu_create(
      p_canonical_address := source_row.canonical_address,
      p_title             := source_row.title,
      p_body              := source_row.body,
      p_actor             := 'phase5c_migration_' || pilot_doc_code,
      p_unit_kind         := target_unit_kind,
      p_section_type      := source_row.section_type,
      p_owner_ref         := source_row.owner_text,
      p_publication_type  := source_row.publication_type,
      p_parent_ref        := NULL    -- D3 hybrid: no parent containment at row level
    );
    -- v_result jsonb shape: live-verify on first row; do NOT assume key names

    -- Step C: merge identity_profile (override hierarchy keys; preserve birth-gate keys fn_iu_create set)
    UPDATE information_unit
      SET identity_profile = COALESCE(identity_profile, '{}'::jsonb) || v_identity_profile
      WHERE id = (v_result->>'<iu id key from live shape>')::uuid;

    -- Step D: patch unit_version.content_profile with TAC provenance (Phase 4B addendum §7)
    UPDATE unit_version
      SET content_profile = COALESCE(content_profile, '{}'::jsonb) || jsonb_build_object(
        'source_hashes', jsonb_build_object('tac_v1', source_row.tac_v1_hash),
        'source_refs',   jsonb_build_object(
          'tac_logical_unit_id', source_row.tac_lu_id,
          'tac_unit_version_id', source_row.tac_uv_id,
          'tac_publication_id' , source_row.tac_pub_id,
          'tac_doc_code'       , source_row.doc_code,
          'tac_render_order'   , source_row.render_order
        )
      )
      WHERE id = (v_result->>'<uv id key from live shape>')::uuid;

    -- Step E: capture inserted keys for rollback
    v_rollback_iu := array_append(v_rollback_iu, (v_result->>'<iu id key>')::uuid);
    v_rollback_uv := array_append(v_rollback_uv, (v_result->>'<uv id key>')::uuid);
  END LOOP;

  -- Step F: in-transaction assertions (must all PASS or ROLLBACK)
  ASSERT (SELECT count(*) FROM information_unit
          WHERE (identity_profile->>'publication_anchor') = pilot_doc_code) = 36;

  ASSERT (SELECT count(DISTINCT (identity_profile->>'publication_render_order')::int)
          FROM information_unit
          WHERE (identity_profile->>'publication_anchor') = pilot_doc_code) = 36;

  ASSERT (SELECT count(*) FROM birth_registry br
          WHERE br.collection_name = target_collection_primary
            AND br.entity_code = ANY (v_rollback_iu::text[])
            AND br.species_code IS NULL) = 0;

  ASSERT (SELECT count(*) FROM unnest(v_rollback_iu) AS x(id)
          WHERE (fn_iu_verify_invariants(
                   (SELECT canonical_address FROM information_unit WHERE id = x.id)
                 )->>'status') <> 'OK') = 0;

  ASSERT (SELECT count(*) FROM unit_version uv
          WHERE uv.id = ANY (v_rollback_uv)
            AND uv.<content_hash col> <> fn_content_hash(uv.<body col>)) = 0;

  -- Persist captured rollback lists to an artifact path before COMMIT
  -- (e.g., write to /tmp or to a control table; design choice for executable prompt finalization)

  COMMIT;

PASS criteria for Step 3:

  • All 5 in-transaction assertions PASS.
  • Captured rollback lists persisted to a known location.
  • TAC source untouched (re-verify counts on tac_* tables match pre-step counts).

6. STEP 4 — Render fidelity check (post-COMMIT, read-only)

RF-1   Reconstruct DIEU-35 tree from migrated IU rows by self-joining on
       identity_profile->>'tac_parent_canonical_address'. Compare structure
       (parent → child pairs) with TAC source tree for DIEU-35.
       Expected drift: 0 mismatched pairs, 0 missing leaves, 0 extra leaves.

RF-2   Body comparison: for each migrated row, compare unit_version.body
       to the corresponding tac_unit_version.body. Expected: 100% match
       (no transformation should have altered body text).

RF-3   render_order range: SELECT min, max, count(DISTINCT render_order)
       from migrated set. Expected: min=0, max=35, count=36.

RF-4   Publication ordering: SELECT canonical_address FROM information_unit
       WHERE publication_anchor = pilot_doc_code ORDER BY publication_render_order;
       Compare against the same query on TAC source.
       Expected: identical address sequence.

If RF-1..4 PASS → pilot accepted; report; await GPT/User confirmation.
If any RF fails → execute post-COMMIT rollback (§7.B).

7. ROLLBACK MODEL

7.A In-transaction rollback (Step 3)

Any in-transaction assertion failure → automatic ROLLBACK. No further action needed; state returns to pre-Step-3.

7.B Post-COMMIT rollback (RF-* failure or GPT/User reject)

BEGIN;
DELETE FROM birth_registry WHERE collection_name = target_collection_primary AND entity_code = ANY (<captured iu_id list>::text[]);
DELETE FROM unit_version    WHERE id = ANY (<captured uv_id list>);
DELETE FROM information_unit WHERE id = ANY (<captured iu_id list>);
COMMIT;

Pattern-matching rollback (e.g., WHERE publication_anchor='DIEU-35') is PROHIBITED — defense in depth against accidental over-deletion.

7.C Backfill rollback (§G.5 of design)

UPDATE birth_registry SET species_code = NULL, composition_level = NULL
  WHERE collection_name = target_collection_primary AND entity_code = ANY (<captured backfill entity_code list>);

7.D Species seed rollback (§F.3 of design)

DELETE FROM species_collection_map WHERE id = <captured Step 1.2 id>;
DELETE FROM entity_species         WHERE id = <captured Step 1.1 id>;

Order matters: mapping first, then species.

7.E TAC source

Never touched. No DELETE/UPDATE on tac_* tables at any point.


8. RÀNG BUỘC LUẬT (binding)

Điều 0-B       composition_level = atom under D3 hybrid (literal "không chứa entity khác");
               see open question M.2 in design re: UV-as-contained interpretation.
Điều 0-G       species_code auto-fills via species_collection_map; this prompt seeds the mapping
               BEFORE pilot migration so all 36 new IU rows auto-assign at birth.
Điều 29        "One classification system"; new species + mapping + backfill done in this single pack.
QT-001         5-step backfill procedure followed (§2 PF-2, §4).
QT-005         If new species created: §3 Step 1 follows QT-005 step 1 (species creation); 
               steps 2–4 (mapping, backfill, verify) covered in this prompt.
               Step 5 (governance_role promotion) DEFERRED — keep observed for pilot.
Điều 44 §A     No code change; all operations metadata-driven.
§FIX_GỐC       New species creation is gốc-fix for IU species gap, not patch.
§0-AU          No hardcode; all selectors via semantic field registry + SCOPE BLOCK only.
§0-AF          Evidence captured for production-style verification. Report includes live counts.

9. REPORT REQUIREMENTS (when dispatched, future)

Save to:

knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase5c-dieu35-pilot-migration-report.md

Required sections:

A. Status flags (PASS/PARTIAL/BLOCKED, written + rolled-back booleans, per-step pass/fail)
B. GATE-0 + PF-* live evidence
C. Step 1 species seed (entity_species id, species_collection_map id)
D. Step 2 backfill (12 entity_code list, before/after counts)
E. Step 3 pilot (36 iu_id list, 36 uv_id list, render_order range, assertion results)
F. Step 4 render fidelity (RF-1..4 results)
G. Live re-verify of source unchanged (TAC table row counts)
H. Open issues / deferred work
I. Rollback artifact path (captured key lists)

Status flag template:

phase5c_status                  = PASS | PARTIAL | BLOCKED
step1_species_seed              = COMMITTED | NOT_RUN | ROLLED_BACK
step2_backfill_12_rows          = COMMITTED | NOT_RUN | ROLLED_BACK
step3_pilot_dieu35              = COMMITTED | NOT_RUN | ROLLED_BACK
step4_render_fidelity           = PASS | FAIL | NOT_RUN
no_mutation_outside_targets     = true | false
tac_source_untouched            = true | false
rollback_keys_persisted         = true | false
post_implementation_design_due  = true

10. PLACEHOLDERS REQUIRING GPT/USER RESOLUTION BEFORE DISPATCH

The following decisions are NOT inside this DRAFT and must be locked before the prompt becomes dispatch-ready. The first dispatch-ready revision (rev1) shall fill these in based on GPT/User direction.

#1  Species naming
    - species_code           (Opus proposal: 'information_unit_atom')
    - entity_species code/id (Opus proposal: 'SPE-IUA')
    - display_name (Vietnamese)
    - prefix
    - depth + parent_id in taxonomy tree
    - kg_metadata
    Status: PENDING GPT/USER

#2  UV species_collection_map seed (yes/no)
    Opus default: NO (Phase 4D recommendation G1)
    Status: PENDING GPT/USER CONFIRMATION

#3  governance_role promotion now vs defer
    Opus default: DEFER (keep observed for pilot)
    Status: PENDING GPT/USER CONFIRMATION

#4  identity_profile.publication_authority_ref value
    The vocab key for "publication_authority_ref" — TAC publications don't have
    an obvious authority field. Need fixed value (e.g., 'vocab.publication_authority.incomex')
    or live lookup mechanism.
    Status: PENDING GPT/USER CHOICE

#5  Rollback capture persistence target
    Option A: write to /tmp on VPS with timestamped filename
    Option B: insert into a transient control table (e.g., migration_rollback_capture)
    Option C: agent writes to KB artifact path
    Status: PENDING GPT/USER CHOICE

#6  D3b universal_edges enrichment (in this pack or deferred)
    Opus default: DEFER to Phase 5E (after pilot verified)
    Status: PENDING GPT/USER CONFIRMATION

#7  IU publication_member table design (in this pack or deferred)
    Opus default: DEFER (use identity_profile.publication_anchor + publication_render_order for pilot/Phase 5D)
    Status: PENDING GPT/USER CONFIRMATION

11. POST-IMPLEMENTATION DESIGN REQUIREMENT

After Agent submits the Phase 5C report and GPT/User accepts it, the assigned drafter (per operating-notes/design-after-repair-implementation-rule-2026-05-11.md) writes the post-implementation design at:

knowledge/dev/laws/dieu44-trien-khai/design/p3d-pack1-phase5c-post-implementation-design-dieu35-pilot-migration-<date>.md

Sections A–G per the operating note. Opus default assignee unless GPT directs otherwise.


12. NO-HARDCODE DISCIPLINE (carried from Phase 5A)

Allowed hardcoded selectors:
  - target_collection_primary, target_collection_subordinate, target_unit_kind
  - source_table_logical, source_table_version, source_table_publication, source_table_pub_member
  - pilot_doc_code (= 'DIEU-35') — design choice; live re-verify exists
  - target_vocab_key (= 'vocab.unit_kind.law_unit')
  - proposed_* (species code/composition) only AFTER §10 resolution; pre-resolution they are placeholders

Forbidden:
  - column names outside the rev6 semantic field registry resolution
  - row counts as gates (counts are evidence, not selectors)
  - section_type literal values outside live SELECT
  - fuzzy matchers ('like', 'or similar', 'equivalent')
  - assumptions about jsonb shape returned by fn_iu_create (live-verify first row)
  - any selector that says "the only existing X" without live count proof

13. SAFETY SUMMARY

Read-only checks done before any write: GATE-0 + PF-1..5 + dry-construct identity_profile (PF-4).
First write: §3 Step 1 species + mapping. Two rows. Reversible.
Second write: §4 Step 2 backfill 12 birth rows. Reversible.
Third write: §5 Step 3 pilot transaction. 36 IU + 36 UV + auto birth rows. Reversible.
TAC source: NEVER written.
Rollback keys: captured at every write step; pattern-matching rollback PROHIBITED.
Post-COMMIT rollback path: explicit (§7.B).
Render fidelity check: read-only (§6).

Phase 5C Execution Prompt — DRAFT | D3 hybrid (D3a) | Pilot DIEU-35 | QT-005 + QT-001 + Pilot migration | NOT DISPATCH-READY (§10 placeholders pending) | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-pack1-phase5b-hybrid-pilot-migration-implementation-prompt-DRAFT.md