P7 — Pilot Migration Plan v0.2
P7 — Pilot Migration Plan v0.2 for Đ38 Text as Code
Loại: Pilot migration plan — Điều 38 Text as Code Phase: P7 (Pilot Plan), thiết kế dry-run workflow Trạng thái: OFFICIAL v0.2 — GPT R1 (15+3) + R2 (6) + FINAL PASS. User PASS. Ngày soạn: 2026-04-26 | Phiên: S181 Agent soạn: Opus 4.6 (Desktop) GPT review: R1 PASS có điều kiện (15+3). R2 PASS có điều kiện nhẹ (6). Final PASS. Inputs: P5 v0.2, P5b v0.2, P6 v0.2, C1–C3, C1A, LSL-01, L1–L5, Đ24, Đ32, Đ33, Đ35
1. Mục tiêu
Thiết kế pilot migration plan cho subset nhỏ — quy trình chuyển markdown/KB sang mô hình Text as Code. KHÔNG migrate production thật.
P7 chỉ thiết kế dry-run workflow. Chạy dry-run thật là bước riêng sau khi P7 PASS và User duyệt execution. P7 output = plan document, không phải data artifacts.
Nôm na: Viết hướng dẫn "đưa 3 thùng hàng lên băng chuyền thử", chưa đưa thật.
2. Phạm vi / Không làm
2.1 Trong phạm vi
Plan design: pilot subset, source inventory, mapping rules, segmentation plan, dry-run workflow, verification, rollback/export, checker readiness checklist, PASS/FAIL, go/no-go, artifact list, gap routing.
2.2 Không làm
- Không migrate thật / SQL / DDL / schema production
- Không chạy dry-run thật (execution = bước sau P7 PASS)
- Không chạy checker/DOT thật
- Không thay đổi dữ liệu authoritative
- Không tạo dry-run artifacts thật
- Không sửa P5/P5b/P6/C1–C3/LSL/L1–L5
- Không đưa toàn corpus vào pilot
- Không gọi SQL/migration/DOT
3. Pilot subset proposal
3.1 Ba tài liệu pilot (proposal, chưa final)
| # | File | Proposed doc_code | Pub_type | Complexity | Lý do |
|---|---|---|---|---|---|
| 1 | HOW-TO-READ.md |
D38-HOWTO (pilot proposal) | design_note | Thấp | Index/routing. Ít section. Test basic segmentation. |
| 2 | C1A-segmentation-operating-model.md |
D38-C1A (pilot proposal) | design_note | Cao | Operating model phức tạp: nhiều section_types, bảng, edge cases. |
| 3 | P5-schema-draft-v0-2.md |
D38-P5 (pilot proposal) | design_note | Rất cao | Schema doc: pseudo-DDL code blocks, bảng lớn, mixed content. |
doc_code values là pilot proposal, chưa final. Final doc_code chốt khi implementation.
3.2 Pilot round 1 excludes
| Excluded | Lý do |
|---|---|
| L1–L5 (legal supplements) | Complexity pháp lý khác, để pilot round 2 |
| LSL-01 (34K chars) | Quá dài, tăng rủi ro round 1 |
| tham-khao/ files | Document-centric cũ, không phù hợp pilot unit-centric |
| Component/BOM migration | Tăng complexity đáng kể, defer round 2 (§8) |
| Toàn corpus | Rủi ro quá lớn cho round 1 |
3.3 Gradient complexity
Thấp (HOWTO) → Cao (C1A) → Rất cao (P5). Phát hiện issues tăng dần. Nếu HOWTO fail → dừng sớm, ít waste.
4. Source inventory
4.1 HOW-TO-READ.md
| Thuộc tính | Giá trị |
|---|---|
| Path | knowledge/dev/laws/dieu38-trien-khai/HOW-TO-READ.md |
| Purpose | Hướng dẫn Agent đọc đúng thứ tự |
| Estimated segments | ~5–8 logical units (estimate ban đầu) |
| Section types | heading, paragraph, reference_mapping, appendix |
| Edge cases | Bảng rule mâu thuẫn, interpretation notes (3 units riêng) |
4.2 C1A-segmentation-operating-model.md
| Thuộc tính | Giá trị |
|---|---|
| Path | knowledge/dev/laws/dieu38-trien-khai/C1A-segmentation-operating-model.md |
| Purpose | Segmentation operations |
| Estimated segments | ~25–35 logical units (estimate ban đầu) |
| Section types | heading, paragraph, definition, process, governance_process, reference_mapping, invariant_list, open_decision_list, appendix |
| Edge cases | §6 vocab table, §12 invariants, §13 ODs. Some sections may exceed soft-limit. |
4.3 P5-schema-draft-v0-2.md
| Thuộc tính | Giá trị |
|---|---|
| Path | Exact path to be verified before dry-run execution |
| Purpose | Schema draft Unit/Publication/Metadata |
| Estimated segments | ~30–60 logical units (estimate ban đầu; PASS tolerance 20–70 per §16) |
| Section types | heading, paragraph, technical_spec, reference_mapping, invariant_list, open_decision_list, process, changelog |
| Edge cases | Pseudo-DDL code blocks (>20 lines each) = body unit cha (OD-P7-01). Large PASS criteria/invariant tables. Entity map ASCII art. Patch log = changelog. |
Note: All source paths must be verified from Agent Data immediately before dry-run execution.
5. Migration target model
5.1 Target concepts (staging notation only)
Pilot tạo staging notation (JSON) cho mỗi file, map sang P5/P5b concepts. KHÔNG ghi PG. KHÔNG tạo UUID thật. KHÔNG INSERT.
| Concept | Staging output |
|---|---|
| logical_unit | {canonical_address, parent_address, sort_order, section_type, title, owner} |
| unit_version | {logical_unit_ref, version=1, title, body, description, lifecycle='draft', length_flag, content_hash_placeholder} |
| publication | {doc_code (pilot proposal), version='pilot-v0', pub_type, lifecycle='proposed', publication_profile: {kb_path, source_hash}} |
| publication_member | {canonical_address, render_order} |
| entity_labels (Đ24) | {canonical_address (proposed staging key, NOT confirmed entity_code format), facet, value} — mapping format deferred per OD-P5-01/OD-P5b-07 |
| section_type_vocab | Staging vocab proposal from C1A §6 candidates — NOT production seed, NOT Đ24 registry |
5.2 Publication version
publication.version = pilot-v0 (pilot label). Source file KB revision/hash → publication_profile.source_revision / publication_profile.source_hash. KHÔNG dùng KB revision làm publication.version.
5.3 Provenance
provenance = PROV-AI (staging placeholder). Chưa verify Đ24 FAC-PROV registry thật. Ghi rõ: placeholder, cần verify registry trước implementation.
5.4 Source vs Snapshot
Source documents = OFFICIAL authoritative cho tới real migration approval. Pilot snapshot = bản copy read-only cho dry-run. Snapshot KHÔNG thay thế source authority.
6. Segmentation plan
6.1 Rules (C1A SR-1→SR-7)
| Rule | Pilot application |
|---|---|
| SR-1 | Title rõ + sửa riêng → 1 unit |
| SR-2 | Không title → body cha |
| SR-3 | Sửa A kéo B → gộp |
| SR-4 | <50 từ, no authority → gộp |
| SR-5 | Cắt theo nghĩa, không theo token/ký tự |
| SR-6 | Title mô tả ý, không "§5 Part A" |
| SR-7 | 1 canonical parent |
6.2 Edge cases
| Case | Rule | Handling |
|---|---|---|
| Code block (DDL) | OD-P7-01 | Body unit cha. Chỉ tách nếu ≥2 nơi ref/lifecycle riêng. |
| Table/matrix | SR-5 | Nhỏ (<500 từ) = body. Lớn có title riêng = unit riêng. |
| Heading-only | P5/P6 | Navigation heading = structural node (body=NULL). |
| Invariant list | C1A §6 | section_type=invariant_list. Unit riêng. |
| Open decision list | C1A §6 | section_type=open_decision_list. Unit riêng. |
| Changelog/patch log | C1A §5.7 | section_type=changelog. Unit riêng. |
| Interpretation notes | HOW-TO-READ specific | Mỗi note = unit riêng (IN-1, IN-2, IN-3). |
6.3 Canonical address format (pilot proposal)
{DOC_CODE}-S{section}-P{paragraph}[-{sub}]. Ví dụ: D38-C1A-S4-P1 (§4 SR-1).
6.4 Manual review checkpoint
Sau segmentation proposal (dry-run step 3) → mandatory human review trước generate staging rows. Agent đề xuất, User/GPT approve segmentation tree.
7. Mapping rules
7.1 Document → Publication
1 markdown file → 1 proposed publication (pilot-v0). Source path/hash → publication_profile.
7.2 Heading/section → logical_unit
## Title → logical_unit. ### Sub → child unit. Body text under heading → unit_version.body. No heading paragraph → body of parent.
7.3 unit_version fields
| Field | Source |
|---|---|
| title | Heading text (cleaned) |
| body | Content between headings |
| description | First sentence (staging heuristic — OD-P7-07, refine after pilot) |
| lifecycle | 'draft' |
| version_number | 1 |
| content_hash | SHA-256(title+body+desc) placeholder |
| length_flag | Computed from word count |
| provenance | 'PROV-AI' (staging placeholder, verify Đ24 FAC-PROV before impl) |
7.4 section_type assignment
| Pattern | Type |
|---|---|
| Root/major heading | heading |
| Narrative | paragraph |
| Definition block | definition |
| Rule/principle | principle |
| Process/steps | process |
| DDL/schema | technical_spec |
| Governance process | governance_process |
| Crosswalk table | reference_mapping |
| Multi-dim table | matrix |
| Invariants table | invariant_list |
| OD table | open_decision_list |
| Changelog | changelog |
| Appendix | appendix |
section_type_vocab seed from C1A §6 = staging vocab proposal. NOT production seed. NOT Đ24 registry. NOT parallel label registry. section_type = structural metadata SoT, separate from entity_labels.
7.5 Labels through Đ24
| Facet | Value | Scope |
|---|---|---|
| doc | D38 | All pilot units |
| topic | segmentation/schema/guide | Per document |
| layer | design | All |
Đ24 entity_code mapping deferred (OD-P5-01). canonical_address = proposed staging key, not confirmed entity_code format. Labels in entity_labels format only, no parallel registry.
7.6 publication_member
Proposed publication → draft unit_versions. render_order = sequential. Pilot does NOT enact publication.
7.7 Change-set / APR (staging simulation only)
Pilot staging may include simulated change_set (draft, apr_ref=NULL). ALL enactment = simulation only. No real APR, no real enacted status. Any "enacted simulation" clearly marked SIMULATED in staging output.
8. Component/BOM pilot scope
8.1 Option 1 — KHÔNG migrate Component/BOM round 1
3 pilot documents = design notes, no deployable components. Component/BOM adds complexity. Defer to round 2.
8.2 Future component candidate log
Nếu pilot phát hiện component references → ghi log:
| Field | Mô tả |
|---|---|
| source_unit | canonical_address of unit containing reference |
| phrase | Text referencing component ("guard", "template", "pattern") |
| candidate_type | guard / template / pattern / function |
| reason | Why this looks like a component reference |
| decision | defer_to_round2 |
Log = JSON array, attached to pilot report. No migration action.
8.3 Pilot round 2
After round 1 PASS → pilot round 2: 1–2 component-heavy docs (SOP/operational).
9. Checker readiness
9.1 Readiness checklist (design only)
P7 liệt kê P6 checkers cần bật. Execution = bước riêng, cần User approve hoặc explicitly waive cho staging-only simulation.
Birth gate candidate (unit slice): BG-LU-01→06, BG-UV-01→06. Component BG-COMP/BG-BOM = SKIP round 1.
Pre-enactment candidate: PE-PUB-01→06. Pilot = proposed publication, pre-enact simulated only.
Daily ERROR smoke (simulated): DOT-LU-01→04, DOT-UV-01→03, DOT-PUB-01→03. Simulated on staging JSON.
9.2 Simulated checker (staging-only)
Simulated checks on staging JSON output:
- canonical_address unique
- parent exists + cùng doc_code
- section_type in staging vocab proposal
- No duplicate enacted (trivially pass — all draft)
- length_flag computed
- content_hash computed
Output = checker report JSON. No PG writes.
10. Dry-run workflow (plan design)
10.1 Nine steps (plan — execution is post-P7)
| Step | Mô tả | Input | Output | Production write? |
|---|---|---|---|---|
| 1 | Export source snapshot (read-only copy + SHA-256) | Pilot markdown files | Source snapshot | NO |
| 2 | Parse headings/blocks | Snapshot | Parsed tree JSON | NO |
| 3 | Propose segmentation + MANDATORY REVIEW | Parsed tree + C1A rules | Segmentation proposal → User/GPT approve | NO |
| 4 | Generate staging rows | Approved segmentation | Staging rows JSON | NO |
| 5 | Generate publication + membership | Staging rows | Publication JSON | NO |
| 6 | Generate Đ24 label mapping proposal | Staging rows | Label mapping JSON | NO |
| 7 | Run simulated checkers | Staging JSON | Checker report JSON | NO |
| 8 | Round-trip export + MANDATORY REVIEW | Staging rows | Export markdown + comparison report | NO |
| 9 | Compile pilot report → go/no-go | All outputs | Pilot review report | NO |
10.2 Manual review checkpoints
Checkpoint A (after step 3): User/GPT review segmentation proposal before generating staging rows. Checkpoint B (after step 8): User/GPT review round-trip export quality before go/no-go.
10.3 Actors
| Actor | Steps |
|---|---|
| Agent (Claude CLI/Codex) | 1–2, 4–7 (execute post-P7) |
| Desktop (Opus) | 3 (propose), 8–9 (compile) |
| GPT | Review at checkpoints A, B, 9 |
| User (Huyên) | Approve checkpoints, go/no-go |
11. Verification workflow
11.1 Checklist
| # | Check | Threshold | Severity |
|---|---|---|---|
| 1 | Unit count plausible | HOWTO: 3–12, C1A: 15–50, P5: 20–70 | WARN outside |
| 2 | No orphan unit | 0 | ERROR |
| 3 | No duplicate canonical_address | 0 | ERROR |
| 4 | Parent/sort_order tree valid | All parents exist + cùng doc_code | ERROR |
| 5 | No publication inline content | Pub only membership refs | ERROR |
| 6 | No draft in enacted pub | Pilot = proposed → trivially PASS | PASS |
| 7 | Section_type in staging vocab proposal | All valid | ERROR |
| 8 | Labels via Đ24 format only | No parallel registry | ERROR |
| 9 | Hard-limit decision exists | >1500 words → decision | WARN |
| 10 | Code blocks byte-preserved | DDL blocks intact | ERROR |
| 11 | Tables row-count preserved | Same row count | ERROR |
| 12 | Round-trip: no heading/body loss | All headings + body blocks present | ERROR |
| 13 | Round-trip: block count delta ≤ threshold | Delta ≤ 5% of source blocks | WARN |
| 14 | All pilot artifacts present (per §12) | 11/11 | ERROR |
11.2 No production change verification
| Mechanism | Mô tả |
|---|---|
| Artifacts only in staging/sandbox directory | No Directus/PG mutation calls |
| Source file hashes unchanged | Compare pre/post hashes |
| Action log attached | Every dry-run step logged with timestamps |
| No SQL/DDL/INSERT/UPDATE/DELETE | Verify agent did not execute PG commands |
12. Pilot artifacts to produce (at dry-run execution)
| # | Artifact | Format | Created at step |
|---|---|---|---|
| 1 | Source snapshot manifest (paths + SHA-256) | JSON | 1 |
| 2 | Parsed heading/block tree | JSON | 2 |
| 3 | Segmentation proposal (approved) | JSON | 3 |
| 4 | Staging rows (logical_unit + unit_version) | JSON | 4 |
| 5 | Publication + membership | JSON | 5 |
| 6 | Đ24 label mapping proposal | JSON | 6 |
| 7 | Simulated checker report | JSON | 7 |
| 8 | Round-trip export markdown | Markdown | 8 |
| 9 | Round-trip comparison report | JSON/Markdown | 8 |
| 10 | Future component candidate log | JSON | 3–4 (if references found) |
| 11 | Pilot review report | Markdown | 9 |
13. Rollback / Export
13.1 Rollback package (11 artifacts above)
All staging artifacts = read-only files in sandbox. Rollback = delete staging files. Source documents unchanged (§5.4).
13.2 No production side effects
Pilot KHÔNG: ghi PG, tạo schema/trigger, modify KB documents, change entity_labels, create real publication/unit records.
14. Gap routing
14.1 Nếu pilot phát hiện gap trong P5/P5b/P6
| Severity | Action |
|---|---|
| High (breaks invariant or makes model unworkable) | STOP pilot. Mở amendment task cho P5/P5b/P6. Re-pilot after fix. |
| Low (cosmetic, wording, minor OD clarification) | Continue pilot with note. Fix after pilot in amendment round. |
Ghi gap vào pilot report với severity + affected document + proposed fix.
15. Risk register
| # | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| 1 | Over-segmentation | Medium | Unit noise | SR-4. Compare S179 counts. |
| 2 | Under-segmentation | Low | Lost granularity | SR-1. Length flag. |
| 3 | Code block split | Low | Break DDL | OD-P7-01: body default. Verify #10. |
| 4 | Address collision | Very Low | Identity break | Simulated BG-LU-01. |
| 5 | Section_type mismatch | Medium | Misclassification | Manual review checkpoint A. |
| 6 | Description heuristic poor | Medium | Bad metadata | Accept WARN. Refine. |
| 7 | Round-trip loss | Medium | Data integrity | Verification #10–13. |
| 8 | Scope creep | Low | Delay | Strict 3-doc limit. No component round 1. |
| 9 | Agent writes PG | Very Low | Production corruption | No-production-change verification §11.2. |
16. PASS/FAIL criteria
16.1 PASS
| # | Criterion | Threshold |
|---|---|---|
| 1 | All 3 documents segmented | 3/3 |
| 2 | No ERROR in verification | 0 |
| 3 | Unit counts within estimate range | Within bounds |
| 4 | Code blocks/tables preserved | 100% |
| 5 | Simulated BG-LU/UV checks pass on staging JSON | All pass |
| 6 | Round-trip: no heading/body loss, block delta ≤5% | Measured |
| 7 | All pilot artifacts present (per §12) | 11/11 |
| 8 | No production data changed | Verified §11.2 |
16.2 PASS with minor notes
WARNs OK: section_type refinement, description quality, minor formatting diffs, length calibration.
16.3 FAIL/block
Any verification ERROR, >3 misclassified types, round-trip loses content, production changed.
16.4 Abort
Source modified during pilot. Agent writes PG. Scope exceeds 3 docs.
17. Go/No-go
Go ≠ migrate production. Go = permission to begin implementation design planning (DDL design, write path design, birth gate trigger design, DOT scheduling design). Go ≠ permission to deploy or migrate production data. Production migration needs separate approval gate.
| Decision | Condition |
|---|---|
| GO | Pilot PASS + GPT PASS + User approve |
| GO with conditions | PASS + minor notes → address before impl |
| NO-GO | FAIL → fix → re-pilot |
| ABORT | Fundamental model issue → review P5/P5b/C1A |
18. Open decisions
| Code | Câu hỏi | Đề xuất | Phase |
|---|---|---|---|
| OD-P7-01 | Code block = body default or unit? | Body default. Tách nếu ≥2 ref/lifecycle riêng. | Pilot |
| OD-P7-02 | Deep nesting address | -P1-1 suffix | Pilot |
| OD-P7-03 | Staging format | JSON | Pilot |
| OD-P7-04 | Round-trip format | Markdown | Pilot |
| OD-P7-05 | Component/BOM round 2 scope | After round 1 PASS | Post-pilot |
| OD-P7-06 | section_type vocab seed scope | All 17 C1A candidates as staging proposal | Pilot |
| OD-P7-07 | Description auto-extract method | First-sentence heuristic | Pilot |
19. Constitutional check
| Law | Verdict | Notes |
|---|---|---|
| NT1/NT13 | PASS | Pilot does NOT assert production SoT. Staging only. |
| NT2 | PASS | Simulated checkers machine-checkable. |
| NT4 | PASS | Vocab/checker = config data, staging proposal. |
| NT8 | PASS | Component/BOM respected, deferred round 2. |
| NT11 | PASS | No duplicate registry. Staging vocab ≠ Đ24. |
| Đ24 | PASS | Labels entity_labels format. Mapping deferred. No parallel registry. |
| Đ32 | PASS | No real enactment. apr_ref=NULL. Simulation only. |
| Đ33 | PASS | No DDL/SQL/migration. Staging JSON only. |
| Đ35 | PASS | Checker = readiness checklist design only. |
| C1A/P5/P5b/P6 | PASS | Obeys invariants. Segmentation per C1A. Schema per P5. |
20. Final answers
P7 sửa P5/P5b/P6? KHÔNG. Nếu gap → ghi report, route per §14.
P7 cho phép migrate thật? CHƯA. P7 = plan. Execution = post-P7 PASS.
Blocker trước pilot execution? Không blocker trước khi chạy pilot — với điều kiện User duyệt P7 plan và approve dry-run execution. Nếu staging-only simulation, P6 checker implementation có thể được User explicitly waive. All source paths must be verified from Agent Data immediately before dry-run execution.
Blocker trước implementation: DDL approval, birth gate trigger impl, DOT setup, Đ24 entity_code verify.
Bước sau P7? (1) Chạy pilot dry-run (execution, sau User approve). (2) GPT+User review report. (3) Go/no-go. (4) If GO → implementation design planning (DDL, triggers, DOT, migration scripts — design trước, triển khai sau). (5) Pilot round 2 (Component/BOM).
GPT Patch log
R1 (15+3): P7=plan-only, path soften, unit count widen, pilot-v0, PROV-AI placeholder, Đ24 deferred, staging vocab proposal, measurable round-trip, checker approve/waive, component log, no-prod verify, artifact list, gap routing, APR simulated, source vs snapshot, doc_code proposal, excludes table, manual checkpoints.
R2 (6): P5 estimate/tolerance align, artifact count 11, simulated wording, pilot execution needs User approve, implementation design planning, source path verify note.
P7 v0.2 | OFFICIAL | S181 | 2026-04-26 | Opus 4.6 GPT: R1(15+3) + R2(6) + FINAL PASS | User: PASS