IU-0 — Information Unit Minimum Standard & Packaging Plan (OUTLINE-E.1)
IU-0 — Information Unit Minimum Standard & Packaging Plan (OUTLINE-E.1)
Trạng thái: OUTLINE-E.1 (uploaded) | Phiên: S190+ (2026-05-02) Path:
knowledge/dev/laws/dieu44-trien-khai/design/07-iu0-information-unit-minimum-standard-outline.mdPhụ thuộc: P38-XC final + P44-3/4A/5A + P11A/C/D/E + P5 v0.2 + P6 v0.2 + VRC Report rev 1 (PASS) History: Outline A (file placement) → B (vector kiến trúc) → C (VRC evidence) → D (IU-VP-6 delay) → E (CRUD contract + freshness + outbox) → E.1 (polish: VRC PASS, numbering, defaults chốt) Track A Legacy Stabilization v3.1: tách riêng tạiknowledge/dev/laws/dieu38-trien-khai/reports/legacy-vector-stabilization-outline-v3.md
§0. Tóm tắt 5 dòng cốt lõi
- IU-0 chốt schema chuẩn chung Incomex cho
information_unit— không schema riêng TAC/Đ38. Kế thừa nguyên P38-XC UMC + P44-3/4A/5A, KHÔNG re-design. - Packaging 9 layer (identity/version/content/profile/edges/topic/checker/vector/protection) + MUP cho pilot.
- Vector = 2 hệ tách tầng (VRC evidence-based): Legacy KB Vector 100% giữ nguyên. IU Vector = song song (Phase 0 mapping → Phase 1 IU collection riêng → Phase 2 search adapter + dedup → Phase 3 unit-aware chunking). Duplicate content KB+IU = CRITICAL risk.
- PG CRUD Hot Path Contract (HP-1..15 + IU-HP-1..5): hot path = PG local only, cold path = async outbox worker 120s delay. Do-not-break Vector Guardrails VG-1..9.
- Slice-based editing + Pilot dogfooding Đ44/P38/P11/IU-0. KHÔNG DDL/code/P44-6. Track A Legacy Stabilization tách riêng.
§0.A File placement convention
| Loại tài liệu | Path (verified) |
|---|---|
| Đ44 design docs | knowledge/dev/laws/dieu44-trien-khai/design/01-..06- |
| Đ38/P11 closure | knowledge/dev/laws/dieu38-trien-khai/closure/ |
| Reports | knowledge/dev/laws/dieu38-trien-khai/reports/ |
| Reviews | knowledge/dev/laws/dieu{38,44}-trien-khai/reviews/ |
| Vector reports | knowledge/current-state/reports/ |
| IU-0 | knowledge/dev/laws/dieu44-trien-khai/design/07-iu0-... |
| Legacy Stabilization | knowledge/dev/laws/dieu38-trien-khai/reports/legacy-vector-stabilization-outline-v3.md |
Verify: knowledge/dev/foundation/ = 0 documents (not active). Không dùng.
§1. Bối cảnh + lý do cần IU-0
§1.1 User định hướng S190+
- Schema miếng thông tin = chuẩn chung mọi đối tượng Incomex, không riêng TAC.
- Tránh tuyệt đối 2 hệ thống schema.
- Đưa vào dùng sớm — docs dài, Agent khó xử lý.
- Vector chạy theo miếng, không tự cắt.
- Hệ vector hiện tại giữ nguyên, IU vector song song.
- CRUD PG phải nhanh, vector chỉ phục vụ search lâu dài.
§1.2 IU-0 lấp 6 gap: UMC universal / packaging / vector contract / protection / slice editing / pilot dogfooding.
§1.3 IU-0 KHÔNG LÀM (10 guardrails)
- KHÔNG re-design UMC/Capability/Profile/Edge/DOT.
- KHÔNG DDL/code/migration.
- KHÔNG execute query production.
- KHÔNG mở P44-6.
- KHÔNG sửa các tài liệu đã upload.
- KHÔNG chốt Qdrant implementation.
- KHÔNG claim production-ready.
- KHÔNG đặt tên DOT thật.
- KHÔNG hardcode threshold/schedule/vocab.
- KHÔNG upload outline (đã upload — E.1).
§2. Quyết định chiến lược
information_unitlà universal information substrate cho mọi đối tượng Incomex. Không hệ schema thứ hai.
§2.1 14 unit_kind illustrative
law_unit, workflow_step, sop_step, knowledge_atom, claim, decision, task_instruction, rule, facet_note, config_doc, binding_doc, design_doc_section, proof_note, review_comment.
§2.2 Anti-pattern
- ❌ Tạo schema riêng 1 use case.
- ❌ Đẻ family mới khi chỉ subtype.
- ❌ Bypass UMC fields.
§3. Universal Minimum Core — kế thừa P38-XC
UMC 10 elements (U1-U10) kế thừa nguyên P38-XC §5.2. IU-0 KHÔNG re-design.
5 layer: UMC core → Profile JSONB → Edges (universal_edges) → Checker state (DOT) → Vector state (projection).
§4. Capability/profile extension — kế thừa P38-XC §3 + P44-3
10 capabilities Cap-1..10. Profile extension model qua APR. IU-0 KHÔNG re-design.
§5. Packaging model — 9 layer
| # | Layer | SoT/Owner |
|---|---|---|
| 1 | Identity (U1-U3, U7, U8) | Core columns; Cap-1 |
| 2 | Version (U6 + version chain) | Per-kind table; Cap-2 |
| 3 | Content (U5 + body + content_hash) | Per-kind body; Cap-2 |
| 4 | Profile metadata (3 JSONB profiles) | JSONB columns; Cap-1/2/3 |
| 5 | Edges (universal_edges) | universal_edges SSOT; Cap-4/5 |
| 6 | Topic labels (content_profile.topic_labels[]) | content_profile; Cap-6 |
| 7 | Checker state (conformance + drift + issues) | DOT state; Cap-9 |
| 8 | Vector sync state | Per-kind vector columns + Qdrant; Cap-10 |
| 9 | Protection/lock (enacted immutability + apr_ref) | Lifecycle metadata; Cap-8 |
MUP (Minimum Usable Package)
- MUP Core (block birth): Identity + Version + Content.
- MUP Tier 1 (recommended): Profile Tier 0 required + parent_or_container_ref.
- MUP Tier 2+ (defer pilot): Edges + topics + checker + vector + protection.
§5.A IU-PG CRUD Performance Contract
§5.A.1 Kế thừa 15 rules HP-1..15
Từ Track A Legacy Stabilization v3.1 §0.1. Mọi rule áp dụng cả legacy KB VÀ IU path.
§5.A.2 IU-specific CRUD rules
| # | Rule |
|---|---|
| IU-HP-1 | IU CRUD hot path = PG local only. Tạo unit → PG INSERT + outbox UPSERT → return ngay. |
| IU-HP-2 | Slice edit hot path = read slice + write slice + outbox. Toàn bộ PG local. |
| IU-HP-3 | Profile JSONB update = PG jsonb_set/merge. Validation local. |
| IU-HP-4 | Edge CRUD = PG local. Reverse-index rebuild defer outbox nếu cần. |
| IU-HP-5 | Birth gate validation = PG local only (constraints/trigger nhẹ). |
§5.A.3 IU Outbox queue
Cùng pattern Track A V1-B. Debounce key IU = unit_version_id. Shared vs separate outbox = OPEN IU-0-λ.
§5.A.4 Agent workflow scenarios
- Tạo mới: INSERT + outbox → <10ms → embed sau 120s.
- Sửa 3 lần trong 1 phút: 3x UPDATE + 3x outbox bump → 1 embed cuối.
- Đọc vừa sửa: PG SELECT <5ms (content đã trong context).
- Xoá: DELETE + outbox delete 30s delay.
- Bulk 50 miếng: 50x INSERT + debounce → staggered embed.
§6. Legacy Vector Safety + IU Vector Parallel Track
VRC rev 1 PASS (Codex 2026-05-02, GPT confirmed). Evidence đủ.
§6.0 Vector Reality — VRC Evidence (17 findings F1-F17)
| # | Finding | Value |
|---|---|---|
| F1 | Collection | production_documents duy nhất |
| F2 | Dimensions/distance | 1536 / Cosine |
| F3 | Points | 11052 |
| F4 | Active KB docs | 2770 |
| F5 | Ratio | 3.99 (health: critical) |
| F6 | Embedding | text-embedding-3-small |
| F7 | Chunking | 4000 chars / 400 overlap |
| F8 | Payload | content, document_id, metadata (tags/title/source/build_id/chunk_index/total_chunks), parent_id, is_human_readable |
| F9 | KHÔNG CÓ unit_id/canonical_address/content_hash | Legacy = document-based |
| F10 | Search dedup | document_id only, no semantic, no rerank |
| F11 | Search filter | tags + status only |
| F12 | Multi-collection | KHÔNG hỗ trợ |
| F13 | Orphan/ghost | 47 orphan + 7 ghost |
| F14 | dot-vector-audit | ĐANG FAIL |
| F15 | PG trigger | Function tồn tại, KHÔNG GẮN |
| F16 | Update semantics | Delete all → upsert new, not atomic |
| F17 | Point ID | UUID5 deterministic |
§6.1 Hai hệ vector tách tầng
VRC rev 1 PASS. Chỉ điều tra thêm nếu scope triển khai thay đổi.
| Layer | Vai trò | Quy tắc |
|---|---|---|
| Legacy KB Vector | Toàn bộ KB documents | Giữ nguyên 100% |
| IU Vector / IU Bridge | Miếng thông tin | Song song, adapter/bridge |
Legacy-first principle: conflict → Legacy SSOT thắng. IU bridge KHÔNG mutate legacy.
§6.2 IU Vector Target Principles (KHÔNG áp dụng cho legacy)
| # | Principle | Phase |
|---|---|---|
| IU-VP-1 | Chunk chỉ trong phạm vi 1 unit_version | Phase 3 |
| IU-VP-2 | Chunk biết unit_id, unit_version_id, canonical_address, content_hash_at_embedding | Phase 1 |
| IU-VP-3 | Content hash đổi → stale → regenerate | Phase 1+ |
| IU-VP-4 | Split/merge → retire old, regenerate new | Phase 3 |
| IU-VP-5 | Vector = projection, KHÔNG SSOT. PG thắng. | Confirmed |
| IU-VP-6 | Async delayed outbox: 120s delay, debounce unit_version_id, worker poll 30s, retry 3x, backoff, dead-letter, observable. Metadata-only = no enqueue. Delete 30-60s. force_immediate = manual+approval. | Confirmed |
§6.2.A Vector freshness 5 states
| State | Search behavior |
|---|---|
| current | Normal; ưu tiên nếu duplicate |
| pending | Trả cũ + freshness warning; fallback PG |
| stale | Trả cũ + stale warning |
| error | Exclude hoặc error warning; alert ops |
| retired | Exclude |
§6.2.B Duplicate behavior matrix
| IU vector | KB vector | Search |
|---|---|---|
| current + enacted | current | Ưu tiên IU |
| pending | current | Trả KB |
| stale/error | current | Trả KB + warning IU |
| current + enacted | stale/error | Trả IU |
| retired | current | Trả KB |
| Không tồn tại | current | Trả KB (legacy-only) |
§6.3 4 Phase migration
| Phase | Action | Impact legacy? |
|---|---|---|
| Phase 0 | Read-only mapping adapter | 0 |
| Phase 1 | IU collection riêng + outbox ingestion | 0 |
| Phase 2 | Search adapter + freshness dedup | Read-only on legacy |
| Phase 3 | Unit-aware chunking | 0 (IU only) |
§6.4 Anti-pattern
- ❌ Chunk ngoài phạm vi miếng.
- ❌ Qdrant ghi body vào PG.
- ❌ Giả định legacy payload có unit_id.
- ❌ Replace legacy pipeline.
- ❌ Mutate
production_documentscho IU. - ❌ Ép KB → IU trước vector hóa.
- ❌ IU bridge mutate legacy data.
§6.5 Duplicate Content — RISK HIGH/CRITICAL
User nhấn mạnh: RẤT PHỔ BIẾN. Phase 0 safest = collection separation. Chi tiết freshness-based priority: §6.2.B.
§6.6 Do-not-break Vector Guardrails VG-1..9
| # | Guardrail |
|---|---|
| VG-1 | KHÔNG đổi collection production_documents |
| VG-2 | KHÔNG thay chunking 4000/400 |
| VG-3 | KHÔNG re-embed hàng loạt |
| VG-4 | KHÔNG xoá orphan bằng logic mới |
| VG-5 | KHÔNG đổi ingestion pipeline |
| VG-6 | KHÔNG đổi orphan checker hiện hữu |
| VG-7 | KHÔNG ép vector theo IU schema trong legacy collection |
| VG-8 | Adapter first, KHÔNG phá pipeline cũ |
| VG-9 | PG/KB = SSOT; vector = projection |
§7. Protection — 9 guardrails G-1..G-9
G-1 enacted immutable / G-2 APR DDL / G-3 universal_edges SSOT / G-4 vector projection / G-5 profile registry / G-6 2-layer separation / G-7 canonical immutable / G-8 DOT phụ IDLE / G-9 bump major UMC migration.
§8. Slice-based editing workflow
Slice = 1 information_unit. Default = §Section atom. 7 operations: read / edit / append / split / merge / reorder / delete. Agent edit: resolve address → read slice → apply edit → new content_hash → new unit_version → outbox → return diff.
§9. Minimum Usable Package
MUP Tier 0 (block birth): U1-U8 + identity_profile.title. MUP Tier 1 (recommended): U9 + edges + topic_labels. MUP Tier 2+ (defer): vector + checker + relations.
§10. Pilot dogfooding
Priority: P0 = Đ44/P38/P11/IU-0 docs (closest data). 5 phases: Prepare → Dogfood design docs → Extend KB → TAC integration → Vector+checker. Success: slice editing reduce tokens ≥50%, edit cycle <5 turns, 0 critical schema bugs, guardrails hold.
§11. OPEN / TD
OPEN (13 items)
- α Path: RESOLVED.
- β Implementation pattern.
- γ unit_kind extensions dogfooding.
- δ Slice boundary final.
- ε Vector granularity per kind.
- ζ Concurrency slice editing.
- η MUP threshold strict.
- θ Profile templates dogfooding.
- ι Legacy V1-V5 stabilization (Track A).
- κ IU collection design.
- λ Shared vs separate outbox.
- μ Queue observability integration.
- ν Worker poll 30s / retry 3 / dead-letter — pilot tune.
TD (7 items)
- Migrate design docs → IU format.
- Slice boundary automation.
- Vector regeneration scheduler.
- Guardrails G-1..9 + VG-1..9 enforcement.
- Pilot metrics dashboard.
- Legacy V1-V5 execution (Track A v3.1).
- Outbox worker implementation.
§12. Risk / Conflict
| # | Risk | Severity |
|---|---|---|
| R1 | Trùng P38-XC UMC | High — mitigate: kế thừa nguyên |
| R2 | Trùng P44-3 Profile | High — mitigate: kế thừa nguyên |
| R3 | Vector conflict P5 | High — mitigate: align P5 INV |
| R4 | Đẻ family mới | High — mitigate: anti-pattern §2.2 |
| R7 | DDL/code lén | High — mitigate: guardrails |
| R8 | Mở P44-6 implicit | High — mitigate: guardrails |
| R9 | Guardrails vi phạm | High — mitigate: pilot criteria |
| R10 | Mâu thuẫn IU-0 ↔ P11E | Medium — mitigate: kế thừa |
| R11 | Legacy vector drift hiện hữu | CRITICAL — mitigate: Track A |
| R12 | Duplicate KB+IU | HIGH/CRITICAL — mitigate: collection separation + freshness §6.2.B |
| R13 | Docker ephemeral | High — mitigate: commit git |
| R14 | Outbox worker down → pending stuck | High — mitigate: HP-10 observability + alert |
| R15 | Dead-letter accumulation | Medium — mitigate: alert + review |
Anti self-contradiction: IU-0 ↔ P38-XC / P44-3/4A/5A / P11C/D/E / P5 / P6 = PASS.
§13. Questions for GPT/User review
| # | Câu hỏi | Đề xuất |
|---|---|---|
| Q2 | Authority IU-0 | Logical proposal DRAFT; APR khi Đ44 enacted |
| Q3 | unit_kind extensions | Merge: proof_note→knowledge_atom; design_doc_section mới |
| Q4 | Slice boundary default | §Section atom |
| Q5 | MUP strict mọi kind? | Strict + auto-generate title |
| Q6 | Vector granularity | Per kind: law=section, design=paragraph, atom=1 chunk |
| Q7 | Pilot Phase 1 scope | 3-5 docs: IU-0 + P11E + P38-XC + P44-3 + 1 Đ44 |
| Q8 | IU-0-A polish round | Có — pattern consistency |
| Q9 | G-9 UMC migration | Giữ — UMC > profile (2 layer) |
| Q10 | Thứ tự sau outline | Polish → full draft |
| Q11 | Track A v3.1 PASS? | Đề xuất PASS |
| Q12 | IU-0 E.1 đủ PASS full draft? | Đề xuất PASS |
| Q13 | Full draft song song Track A? | Song song — §6 Phase 1+ blocked by V1 |
IU-0 OUTLINE-E.1 | S190+ (2026-05-02) | Uploaded | History: A→B→C→D→E→E.1 | VRC rev 1 PASS | PG CRUD Hot Path Contract + outbox queue + vector freshness + duplicate matrix | Track A v3.1 tách riêng