KB-6BA9

P7 — Pilot Migration Plan v0.2

21 min read Revision 1
dieu38p7pilotmigrationtext-as-codeofficial

P7 — Pilot Migration Plan v0.2 for Đ38 Text as Code

Loại: Pilot migration plan — Điều 38 Text as Code Phase: P7 (Pilot Plan), thiết kế dry-run workflow Trạng thái: OFFICIAL v0.2 — GPT R1 (15+3) + R2 (6) + FINAL PASS. User PASS. Ngày soạn: 2026-04-26 | Phiên: S181 Agent soạn: Opus 4.6 (Desktop) GPT review: R1 PASS có điều kiện (15+3). R2 PASS có điều kiện nhẹ (6). Final PASS. Inputs: P5 v0.2, P5b v0.2, P6 v0.2, C1–C3, C1A, LSL-01, L1–L5, Đ24, Đ32, Đ33, Đ35


1. Mục tiêu

Thiết kế pilot migration plan cho subset nhỏ — quy trình chuyển markdown/KB sang mô hình Text as Code. KHÔNG migrate production thật.

P7 chỉ thiết kế dry-run workflow. Chạy dry-run thật là bước riêng sau khi P7 PASS và User duyệt execution. P7 output = plan document, không phải data artifacts.

Nôm na: Viết hướng dẫn "đưa 3 thùng hàng lên băng chuyền thử", chưa đưa thật.


2. Phạm vi / Không làm

2.1 Trong phạm vi

Plan design: pilot subset, source inventory, mapping rules, segmentation plan, dry-run workflow, verification, rollback/export, checker readiness checklist, PASS/FAIL, go/no-go, artifact list, gap routing.

2.2 Không làm

  • Không migrate thật / SQL / DDL / schema production
  • Không chạy dry-run thật (execution = bước sau P7 PASS)
  • Không chạy checker/DOT thật
  • Không thay đổi dữ liệu authoritative
  • Không tạo dry-run artifacts thật
  • Không sửa P5/P5b/P6/C1–C3/LSL/L1–L5
  • Không đưa toàn corpus vào pilot
  • Không gọi SQL/migration/DOT

3. Pilot subset proposal

3.1 Ba tài liệu pilot (proposal, chưa final)

# File Proposed doc_code Pub_type Complexity Lý do
1 HOW-TO-READ.md D38-HOWTO (pilot proposal) design_note Thấp Index/routing. Ít section. Test basic segmentation.
2 C1A-segmentation-operating-model.md D38-C1A (pilot proposal) design_note Cao Operating model phức tạp: nhiều section_types, bảng, edge cases.
3 P5-schema-draft-v0-2.md D38-P5 (pilot proposal) design_note Rất cao Schema doc: pseudo-DDL code blocks, bảng lớn, mixed content.

doc_code values là pilot proposal, chưa final. Final doc_code chốt khi implementation.

3.2 Pilot round 1 excludes

Excluded Lý do
L1–L5 (legal supplements) Complexity pháp lý khác, để pilot round 2
LSL-01 (34K chars) Quá dài, tăng rủi ro round 1
tham-khao/ files Document-centric cũ, không phù hợp pilot unit-centric
Component/BOM migration Tăng complexity đáng kể, defer round 2 (§8)
Toàn corpus Rủi ro quá lớn cho round 1

3.3 Gradient complexity

Thấp (HOWTO) → Cao (C1A) → Rất cao (P5). Phát hiện issues tăng dần. Nếu HOWTO fail → dừng sớm, ít waste.


4. Source inventory

4.1 HOW-TO-READ.md

Thuộc tính Giá trị
Path knowledge/dev/laws/dieu38-trien-khai/HOW-TO-READ.md
Purpose Hướng dẫn Agent đọc đúng thứ tự
Estimated segments ~5–8 logical units (estimate ban đầu)
Section types heading, paragraph, reference_mapping, appendix
Edge cases Bảng rule mâu thuẫn, interpretation notes (3 units riêng)

4.2 C1A-segmentation-operating-model.md

Thuộc tính Giá trị
Path knowledge/dev/laws/dieu38-trien-khai/C1A-segmentation-operating-model.md
Purpose Segmentation operations
Estimated segments ~25–35 logical units (estimate ban đầu)
Section types heading, paragraph, definition, process, governance_process, reference_mapping, invariant_list, open_decision_list, appendix
Edge cases §6 vocab table, §12 invariants, §13 ODs. Some sections may exceed soft-limit.

4.3 P5-schema-draft-v0-2.md

Thuộc tính Giá trị
Path Exact path to be verified before dry-run execution
Purpose Schema draft Unit/Publication/Metadata
Estimated segments ~30–60 logical units (estimate ban đầu; PASS tolerance 20–70 per §16)
Section types heading, paragraph, technical_spec, reference_mapping, invariant_list, open_decision_list, process, changelog
Edge cases Pseudo-DDL code blocks (>20 lines each) = body unit cha (OD-P7-01). Large PASS criteria/invariant tables. Entity map ASCII art. Patch log = changelog.

Note: All source paths must be verified from Agent Data immediately before dry-run execution.


5. Migration target model

5.1 Target concepts (staging notation only)

Pilot tạo staging notation (JSON) cho mỗi file, map sang P5/P5b concepts. KHÔNG ghi PG. KHÔNG tạo UUID thật. KHÔNG INSERT.

Concept Staging output
logical_unit {canonical_address, parent_address, sort_order, section_type, title, owner}
unit_version {logical_unit_ref, version=1, title, body, description, lifecycle='draft', length_flag, content_hash_placeholder}
publication {doc_code (pilot proposal), version='pilot-v0', pub_type, lifecycle='proposed', publication_profile: {kb_path, source_hash}}
publication_member {canonical_address, render_order}
entity_labels (Đ24) {canonical_address (proposed staging key, NOT confirmed entity_code format), facet, value} — mapping format deferred per OD-P5-01/OD-P5b-07
section_type_vocab Staging vocab proposal from C1A §6 candidates — NOT production seed, NOT Đ24 registry

5.2 Publication version

publication.version = pilot-v0 (pilot label). Source file KB revision/hash → publication_profile.source_revision / publication_profile.source_hash. KHÔNG dùng KB revision làm publication.version.

5.3 Provenance

provenance = PROV-AI (staging placeholder). Chưa verify Đ24 FAC-PROV registry thật. Ghi rõ: placeholder, cần verify registry trước implementation.

5.4 Source vs Snapshot

Source documents = OFFICIAL authoritative cho tới real migration approval. Pilot snapshot = bản copy read-only cho dry-run. Snapshot KHÔNG thay thế source authority.


6. Segmentation plan

6.1 Rules (C1A SR-1→SR-7)

Rule Pilot application
SR-1 Title rõ + sửa riêng → 1 unit
SR-2 Không title → body cha
SR-3 Sửa A kéo B → gộp
SR-4 <50 từ, no authority → gộp
SR-5 Cắt theo nghĩa, không theo token/ký tự
SR-6 Title mô tả ý, không "§5 Part A"
SR-7 1 canonical parent

6.2 Edge cases

Case Rule Handling
Code block (DDL) OD-P7-01 Body unit cha. Chỉ tách nếu ≥2 nơi ref/lifecycle riêng.
Table/matrix SR-5 Nhỏ (<500 từ) = body. Lớn có title riêng = unit riêng.
Heading-only P5/P6 Navigation heading = structural node (body=NULL).
Invariant list C1A §6 section_type=invariant_list. Unit riêng.
Open decision list C1A §6 section_type=open_decision_list. Unit riêng.
Changelog/patch log C1A §5.7 section_type=changelog. Unit riêng.
Interpretation notes HOW-TO-READ specific Mỗi note = unit riêng (IN-1, IN-2, IN-3).

6.3 Canonical address format (pilot proposal)

{DOC_CODE}-S{section}-P{paragraph}[-{sub}]. Ví dụ: D38-C1A-S4-P1 (§4 SR-1).

6.4 Manual review checkpoint

Sau segmentation proposal (dry-run step 3) → mandatory human review trước generate staging rows. Agent đề xuất, User/GPT approve segmentation tree.


7. Mapping rules

7.1 Document → Publication

1 markdown file → 1 proposed publication (pilot-v0). Source path/hash → publication_profile.

7.2 Heading/section → logical_unit

## Title → logical_unit. ### Sub → child unit. Body text under heading → unit_version.body. No heading paragraph → body of parent.

7.3 unit_version fields

Field Source
title Heading text (cleaned)
body Content between headings
description First sentence (staging heuristic — OD-P7-07, refine after pilot)
lifecycle 'draft'
version_number 1
content_hash SHA-256(title+body+desc) placeholder
length_flag Computed from word count
provenance 'PROV-AI' (staging placeholder, verify Đ24 FAC-PROV before impl)

7.4 section_type assignment

Pattern Type
Root/major heading heading
Narrative paragraph
Definition block definition
Rule/principle principle
Process/steps process
DDL/schema technical_spec
Governance process governance_process
Crosswalk table reference_mapping
Multi-dim table matrix
Invariants table invariant_list
OD table open_decision_list
Changelog changelog
Appendix appendix

section_type_vocab seed from C1A §6 = staging vocab proposal. NOT production seed. NOT Đ24 registry. NOT parallel label registry. section_type = structural metadata SoT, separate from entity_labels.

7.5 Labels through Đ24

Facet Value Scope
doc D38 All pilot units
topic segmentation/schema/guide Per document
layer design All

Đ24 entity_code mapping deferred (OD-P5-01). canonical_address = proposed staging key, not confirmed entity_code format. Labels in entity_labels format only, no parallel registry.

7.6 publication_member

Proposed publication → draft unit_versions. render_order = sequential. Pilot does NOT enact publication.

7.7 Change-set / APR (staging simulation only)

Pilot staging may include simulated change_set (draft, apr_ref=NULL). ALL enactment = simulation only. No real APR, no real enacted status. Any "enacted simulation" clearly marked SIMULATED in staging output.


8. Component/BOM pilot scope

8.1 Option 1 — KHÔNG migrate Component/BOM round 1

3 pilot documents = design notes, no deployable components. Component/BOM adds complexity. Defer to round 2.

8.2 Future component candidate log

Nếu pilot phát hiện component references → ghi log:

Field Mô tả
source_unit canonical_address of unit containing reference
phrase Text referencing component ("guard", "template", "pattern")
candidate_type guard / template / pattern / function
reason Why this looks like a component reference
decision defer_to_round2

Log = JSON array, attached to pilot report. No migration action.

8.3 Pilot round 2

After round 1 PASS → pilot round 2: 1–2 component-heavy docs (SOP/operational).


9. Checker readiness

9.1 Readiness checklist (design only)

P7 liệt kê P6 checkers cần bật. Execution = bước riêng, cần User approve hoặc explicitly waive cho staging-only simulation.

Birth gate candidate (unit slice): BG-LU-01→06, BG-UV-01→06. Component BG-COMP/BG-BOM = SKIP round 1.

Pre-enactment candidate: PE-PUB-01→06. Pilot = proposed publication, pre-enact simulated only.

Daily ERROR smoke (simulated): DOT-LU-01→04, DOT-UV-01→03, DOT-PUB-01→03. Simulated on staging JSON.

9.2 Simulated checker (staging-only)

Simulated checks on staging JSON output:

  • canonical_address unique
  • parent exists + cùng doc_code
  • section_type in staging vocab proposal
  • No duplicate enacted (trivially pass — all draft)
  • length_flag computed
  • content_hash computed

Output = checker report JSON. No PG writes.


10. Dry-run workflow (plan design)

10.1 Nine steps (plan — execution is post-P7)

Step Mô tả Input Output Production write?
1 Export source snapshot (read-only copy + SHA-256) Pilot markdown files Source snapshot NO
2 Parse headings/blocks Snapshot Parsed tree JSON NO
3 Propose segmentation + MANDATORY REVIEW Parsed tree + C1A rules Segmentation proposal → User/GPT approve NO
4 Generate staging rows Approved segmentation Staging rows JSON NO
5 Generate publication + membership Staging rows Publication JSON NO
6 Generate Đ24 label mapping proposal Staging rows Label mapping JSON NO
7 Run simulated checkers Staging JSON Checker report JSON NO
8 Round-trip export + MANDATORY REVIEW Staging rows Export markdown + comparison report NO
9 Compile pilot report → go/no-go All outputs Pilot review report NO

10.2 Manual review checkpoints

Checkpoint A (after step 3): User/GPT review segmentation proposal before generating staging rows. Checkpoint B (after step 8): User/GPT review round-trip export quality before go/no-go.

10.3 Actors

Actor Steps
Agent (Claude CLI/Codex) 1–2, 4–7 (execute post-P7)
Desktop (Opus) 3 (propose), 8–9 (compile)
GPT Review at checkpoints A, B, 9
User (Huyên) Approve checkpoints, go/no-go

11. Verification workflow

11.1 Checklist

# Check Threshold Severity
1 Unit count plausible HOWTO: 3–12, C1A: 15–50, P5: 20–70 WARN outside
2 No orphan unit 0 ERROR
3 No duplicate canonical_address 0 ERROR
4 Parent/sort_order tree valid All parents exist + cùng doc_code ERROR
5 No publication inline content Pub only membership refs ERROR
6 No draft in enacted pub Pilot = proposed → trivially PASS PASS
7 Section_type in staging vocab proposal All valid ERROR
8 Labels via Đ24 format only No parallel registry ERROR
9 Hard-limit decision exists >1500 words → decision WARN
10 Code blocks byte-preserved DDL blocks intact ERROR
11 Tables row-count preserved Same row count ERROR
12 Round-trip: no heading/body loss All headings + body blocks present ERROR
13 Round-trip: block count delta ≤ threshold Delta ≤ 5% of source blocks WARN
14 All pilot artifacts present (per §12) 11/11 ERROR

11.2 No production change verification

Mechanism Mô tả
Artifacts only in staging/sandbox directory No Directus/PG mutation calls
Source file hashes unchanged Compare pre/post hashes
Action log attached Every dry-run step logged with timestamps
No SQL/DDL/INSERT/UPDATE/DELETE Verify agent did not execute PG commands

12. Pilot artifacts to produce (at dry-run execution)

# Artifact Format Created at step
1 Source snapshot manifest (paths + SHA-256) JSON 1
2 Parsed heading/block tree JSON 2
3 Segmentation proposal (approved) JSON 3
4 Staging rows (logical_unit + unit_version) JSON 4
5 Publication + membership JSON 5
6 Đ24 label mapping proposal JSON 6
7 Simulated checker report JSON 7
8 Round-trip export markdown Markdown 8
9 Round-trip comparison report JSON/Markdown 8
10 Future component candidate log JSON 3–4 (if references found)
11 Pilot review report Markdown 9

13. Rollback / Export

13.1 Rollback package (11 artifacts above)

All staging artifacts = read-only files in sandbox. Rollback = delete staging files. Source documents unchanged (§5.4).

13.2 No production side effects

Pilot KHÔNG: ghi PG, tạo schema/trigger, modify KB documents, change entity_labels, create real publication/unit records.


14. Gap routing

14.1 Nếu pilot phát hiện gap trong P5/P5b/P6

Severity Action
High (breaks invariant or makes model unworkable) STOP pilot. Mở amendment task cho P5/P5b/P6. Re-pilot after fix.
Low (cosmetic, wording, minor OD clarification) Continue pilot with note. Fix after pilot in amendment round.

Ghi gap vào pilot report với severity + affected document + proposed fix.


15. Risk register

# Risk Likelihood Impact Mitigation
1 Over-segmentation Medium Unit noise SR-4. Compare S179 counts.
2 Under-segmentation Low Lost granularity SR-1. Length flag.
3 Code block split Low Break DDL OD-P7-01: body default. Verify #10.
4 Address collision Very Low Identity break Simulated BG-LU-01.
5 Section_type mismatch Medium Misclassification Manual review checkpoint A.
6 Description heuristic poor Medium Bad metadata Accept WARN. Refine.
7 Round-trip loss Medium Data integrity Verification #10–13.
8 Scope creep Low Delay Strict 3-doc limit. No component round 1.
9 Agent writes PG Very Low Production corruption No-production-change verification §11.2.

16. PASS/FAIL criteria

16.1 PASS

# Criterion Threshold
1 All 3 documents segmented 3/3
2 No ERROR in verification 0
3 Unit counts within estimate range Within bounds
4 Code blocks/tables preserved 100%
5 Simulated BG-LU/UV checks pass on staging JSON All pass
6 Round-trip: no heading/body loss, block delta ≤5% Measured
7 All pilot artifacts present (per §12) 11/11
8 No production data changed Verified §11.2

16.2 PASS with minor notes

WARNs OK: section_type refinement, description quality, minor formatting diffs, length calibration.

16.3 FAIL/block

Any verification ERROR, >3 misclassified types, round-trip loses content, production changed.

16.4 Abort

Source modified during pilot. Agent writes PG. Scope exceeds 3 docs.


17. Go/No-go

Go ≠ migrate production. Go = permission to begin implementation design planning (DDL design, write path design, birth gate trigger design, DOT scheduling design). Go ≠ permission to deploy or migrate production data. Production migration needs separate approval gate.

Decision Condition
GO Pilot PASS + GPT PASS + User approve
GO with conditions PASS + minor notes → address before impl
NO-GO FAIL → fix → re-pilot
ABORT Fundamental model issue → review P5/P5b/C1A

18. Open decisions

Code Câu hỏi Đề xuất Phase
OD-P7-01 Code block = body default or unit? Body default. Tách nếu ≥2 ref/lifecycle riêng. Pilot
OD-P7-02 Deep nesting address -P1-1 suffix Pilot
OD-P7-03 Staging format JSON Pilot
OD-P7-04 Round-trip format Markdown Pilot
OD-P7-05 Component/BOM round 2 scope After round 1 PASS Post-pilot
OD-P7-06 section_type vocab seed scope All 17 C1A candidates as staging proposal Pilot
OD-P7-07 Description auto-extract method First-sentence heuristic Pilot

19. Constitutional check

Law Verdict Notes
NT1/NT13 PASS Pilot does NOT assert production SoT. Staging only.
NT2 PASS Simulated checkers machine-checkable.
NT4 PASS Vocab/checker = config data, staging proposal.
NT8 PASS Component/BOM respected, deferred round 2.
NT11 PASS No duplicate registry. Staging vocab ≠ Đ24.
Đ24 PASS Labels entity_labels format. Mapping deferred. No parallel registry.
Đ32 PASS No real enactment. apr_ref=NULL. Simulation only.
Đ33 PASS No DDL/SQL/migration. Staging JSON only.
Đ35 PASS Checker = readiness checklist design only.
C1A/P5/P5b/P6 PASS Obeys invariants. Segmentation per C1A. Schema per P5.

20. Final answers

P7 sửa P5/P5b/P6? KHÔNG. Nếu gap → ghi report, route per §14.

P7 cho phép migrate thật? CHƯA. P7 = plan. Execution = post-P7 PASS.

Blocker trước pilot execution? Không blocker trước khi chạy pilot — với điều kiện User duyệt P7 plan và approve dry-run execution. Nếu staging-only simulation, P6 checker implementation có thể được User explicitly waive. All source paths must be verified from Agent Data immediately before dry-run execution.

Blocker trước implementation: DDL approval, birth gate trigger impl, DOT setup, Đ24 entity_code verify.

Bước sau P7? (1) Chạy pilot dry-run (execution, sau User approve). (2) GPT+User review report. (3) Go/no-go. (4) If GO → implementation design planning (DDL, triggers, DOT, migration scripts — design trước, triển khai sau). (5) Pilot round 2 (Component/BOM).


GPT Patch log

R1 (15+3): P7=plan-only, path soften, unit count widen, pilot-v0, PROV-AI placeholder, Đ24 deferred, staging vocab proposal, measurable round-trip, checker approve/waive, component log, no-prod verify, artifact list, gap routing, APR simulated, source vs snapshot, doc_code proposal, excludes table, manual checkpoints.

R2 (6): P5 estimate/tolerance align, artifact count 11, simulated wording, pilot execution needs User approve, implementation design planning, source path verify note.


P7 v0.2 | OFFICIAL | S181 | 2026-04-26 | Opus 4.6 GPT: R1(15+3) + R2(6) + FINAL PASS | User: PASS