KB-3F36

IU-0 — Information Unit Minimum Standard & Packaging Plan (OUTLINE-E.1)

14 min read Revision 1
iu-0information-unitminimum-standardpackagingvectorcrud-performanceoutlines190

IU-0 — Information Unit Minimum Standard & Packaging Plan (OUTLINE-E.1)

Trạng thái: OUTLINE-E.1 (uploaded) | Phiên: S190+ (2026-05-02) Path: knowledge/dev/laws/dieu44-trien-khai/design/07-iu0-information-unit-minimum-standard-outline.md Phụ thuộc: P38-XC final + P44-3/4A/5A + P11A/C/D/E + P5 v0.2 + P6 v0.2 + VRC Report rev 1 (PASS) History: Outline A (file placement) → B (vector kiến trúc) → C (VRC evidence) → D (IU-VP-6 delay) → E (CRUD contract + freshness + outbox) → E.1 (polish: VRC PASS, numbering, defaults chốt) Track A Legacy Stabilization v3.1: tách riêng tại knowledge/dev/laws/dieu38-trien-khai/reports/legacy-vector-stabilization-outline-v3.md


§0. Tóm tắt 5 dòng cốt lõi

  1. IU-0 chốt schema chuẩn chung Incomex cho information_unit — không schema riêng TAC/Đ38. Kế thừa nguyên P38-XC UMC + P44-3/4A/5A, KHÔNG re-design.
  2. Packaging 9 layer (identity/version/content/profile/edges/topic/checker/vector/protection) + MUP cho pilot.
  3. Vector = 2 hệ tách tầng (VRC evidence-based): Legacy KB Vector 100% giữ nguyên. IU Vector = song song (Phase 0 mapping → Phase 1 IU collection riêng → Phase 2 search adapter + dedup → Phase 3 unit-aware chunking). Duplicate content KB+IU = CRITICAL risk.
  4. PG CRUD Hot Path Contract (HP-1..15 + IU-HP-1..5): hot path = PG local only, cold path = async outbox worker 120s delay. Do-not-break Vector Guardrails VG-1..9.
  5. Slice-based editing + Pilot dogfooding Đ44/P38/P11/IU-0. KHÔNG DDL/code/P44-6. Track A Legacy Stabilization tách riêng.

§0.A File placement convention

Loại tài liệu Path (verified)
Đ44 design docs knowledge/dev/laws/dieu44-trien-khai/design/01-..06-
Đ38/P11 closure knowledge/dev/laws/dieu38-trien-khai/closure/
Reports knowledge/dev/laws/dieu38-trien-khai/reports/
Reviews knowledge/dev/laws/dieu{38,44}-trien-khai/reviews/
Vector reports knowledge/current-state/reports/
IU-0 knowledge/dev/laws/dieu44-trien-khai/design/07-iu0-...
Legacy Stabilization knowledge/dev/laws/dieu38-trien-khai/reports/legacy-vector-stabilization-outline-v3.md

Verify: knowledge/dev/foundation/ = 0 documents (not active). Không dùng.


§1. Bối cảnh + lý do cần IU-0

§1.1 User định hướng S190+

  • Schema miếng thông tin = chuẩn chung mọi đối tượng Incomex, không riêng TAC.
  • Tránh tuyệt đối 2 hệ thống schema.
  • Đưa vào dùng sớm — docs dài, Agent khó xử lý.
  • Vector chạy theo miếng, không tự cắt.
  • Hệ vector hiện tại giữ nguyên, IU vector song song.
  • CRUD PG phải nhanh, vector chỉ phục vụ search lâu dài.

§1.2 IU-0 lấp 6 gap: UMC universal / packaging / vector contract / protection / slice editing / pilot dogfooding.

§1.3 IU-0 KHÔNG LÀM (10 guardrails)

  1. KHÔNG re-design UMC/Capability/Profile/Edge/DOT.
  2. KHÔNG DDL/code/migration.
  3. KHÔNG execute query production.
  4. KHÔNG mở P44-6.
  5. KHÔNG sửa các tài liệu đã upload.
  6. KHÔNG chốt Qdrant implementation.
  7. KHÔNG claim production-ready.
  8. KHÔNG đặt tên DOT thật.
  9. KHÔNG hardcode threshold/schedule/vocab.
  10. KHÔNG upload outline (đã upload — E.1).

§2. Quyết định chiến lược

information_unit là universal information substrate cho mọi đối tượng Incomex. Không hệ schema thứ hai.

§2.1 14 unit_kind illustrative

law_unit, workflow_step, sop_step, knowledge_atom, claim, decision, task_instruction, rule, facet_note, config_doc, binding_doc, design_doc_section, proof_note, review_comment.

§2.2 Anti-pattern

  • ❌ Tạo schema riêng 1 use case.
  • ❌ Đẻ family mới khi chỉ subtype.
  • ❌ Bypass UMC fields.

§3. Universal Minimum Core — kế thừa P38-XC

UMC 10 elements (U1-U10) kế thừa nguyên P38-XC §5.2. IU-0 KHÔNG re-design.

5 layer: UMC core → Profile JSONB → Edges (universal_edges) → Checker state (DOT) → Vector state (projection).


§4. Capability/profile extension — kế thừa P38-XC §3 + P44-3

10 capabilities Cap-1..10. Profile extension model qua APR. IU-0 KHÔNG re-design.


§5. Packaging model — 9 layer

# Layer SoT/Owner
1 Identity (U1-U3, U7, U8) Core columns; Cap-1
2 Version (U6 + version chain) Per-kind table; Cap-2
3 Content (U5 + body + content_hash) Per-kind body; Cap-2
4 Profile metadata (3 JSONB profiles) JSONB columns; Cap-1/2/3
5 Edges (universal_edges) universal_edges SSOT; Cap-4/5
6 Topic labels (content_profile.topic_labels[]) content_profile; Cap-6
7 Checker state (conformance + drift + issues) DOT state; Cap-9
8 Vector sync state Per-kind vector columns + Qdrant; Cap-10
9 Protection/lock (enacted immutability + apr_ref) Lifecycle metadata; Cap-8

MUP (Minimum Usable Package)

  • MUP Core (block birth): Identity + Version + Content.
  • MUP Tier 1 (recommended): Profile Tier 0 required + parent_or_container_ref.
  • MUP Tier 2+ (defer pilot): Edges + topics + checker + vector + protection.

§5.A IU-PG CRUD Performance Contract

§5.A.1 Kế thừa 15 rules HP-1..15

Từ Track A Legacy Stabilization v3.1 §0.1. Mọi rule áp dụng cả legacy KB VÀ IU path.

§5.A.2 IU-specific CRUD rules

# Rule
IU-HP-1 IU CRUD hot path = PG local only. Tạo unit → PG INSERT + outbox UPSERT → return ngay.
IU-HP-2 Slice edit hot path = read slice + write slice + outbox. Toàn bộ PG local.
IU-HP-3 Profile JSONB update = PG jsonb_set/merge. Validation local.
IU-HP-4 Edge CRUD = PG local. Reverse-index rebuild defer outbox nếu cần.
IU-HP-5 Birth gate validation = PG local only (constraints/trigger nhẹ).

§5.A.3 IU Outbox queue

Cùng pattern Track A V1-B. Debounce key IU = unit_version_id. Shared vs separate outbox = OPEN IU-0-λ.

§5.A.4 Agent workflow scenarios

  • Tạo mới: INSERT + outbox → <10ms → embed sau 120s.
  • Sửa 3 lần trong 1 phút: 3x UPDATE + 3x outbox bump → 1 embed cuối.
  • Đọc vừa sửa: PG SELECT <5ms (content đã trong context).
  • Xoá: DELETE + outbox delete 30s delay.
  • Bulk 50 miếng: 50x INSERT + debounce → staggered embed.

§6. Legacy Vector Safety + IU Vector Parallel Track

VRC rev 1 PASS (Codex 2026-05-02, GPT confirmed). Evidence đủ.

§6.0 Vector Reality — VRC Evidence (17 findings F1-F17)

# Finding Value
F1 Collection production_documents duy nhất
F2 Dimensions/distance 1536 / Cosine
F3 Points 11052
F4 Active KB docs 2770
F5 Ratio 3.99 (health: critical)
F6 Embedding text-embedding-3-small
F7 Chunking 4000 chars / 400 overlap
F8 Payload content, document_id, metadata (tags/title/source/build_id/chunk_index/total_chunks), parent_id, is_human_readable
F9 KHÔNG CÓ unit_id/canonical_address/content_hash Legacy = document-based
F10 Search dedup document_id only, no semantic, no rerank
F11 Search filter tags + status only
F12 Multi-collection KHÔNG hỗ trợ
F13 Orphan/ghost 47 orphan + 7 ghost
F14 dot-vector-audit ĐANG FAIL
F15 PG trigger Function tồn tại, KHÔNG GẮN
F16 Update semantics Delete all → upsert new, not atomic
F17 Point ID UUID5 deterministic

§6.1 Hai hệ vector tách tầng

VRC rev 1 PASS. Chỉ điều tra thêm nếu scope triển khai thay đổi.

Layer Vai trò Quy tắc
Legacy KB Vector Toàn bộ KB documents Giữ nguyên 100%
IU Vector / IU Bridge Miếng thông tin Song song, adapter/bridge

Legacy-first principle: conflict → Legacy SSOT thắng. IU bridge KHÔNG mutate legacy.

§6.2 IU Vector Target Principles (KHÔNG áp dụng cho legacy)

# Principle Phase
IU-VP-1 Chunk chỉ trong phạm vi 1 unit_version Phase 3
IU-VP-2 Chunk biết unit_id, unit_version_id, canonical_address, content_hash_at_embedding Phase 1
IU-VP-3 Content hash đổi → stale → regenerate Phase 1+
IU-VP-4 Split/merge → retire old, regenerate new Phase 3
IU-VP-5 Vector = projection, KHÔNG SSOT. PG thắng. Confirmed
IU-VP-6 Async delayed outbox: 120s delay, debounce unit_version_id, worker poll 30s, retry 3x, backoff, dead-letter, observable. Metadata-only = no enqueue. Delete 30-60s. force_immediate = manual+approval. Confirmed

§6.2.A Vector freshness 5 states

State Search behavior
current Normal; ưu tiên nếu duplicate
pending Trả cũ + freshness warning; fallback PG
stale Trả cũ + stale warning
error Exclude hoặc error warning; alert ops
retired Exclude

§6.2.B Duplicate behavior matrix

IU vector KB vector Search
current + enacted current Ưu tiên IU
pending current Trả KB
stale/error current Trả KB + warning IU
current + enacted stale/error Trả IU
retired current Trả KB
Không tồn tại current Trả KB (legacy-only)

§6.3 4 Phase migration

Phase Action Impact legacy?
Phase 0 Read-only mapping adapter 0
Phase 1 IU collection riêng + outbox ingestion 0
Phase 2 Search adapter + freshness dedup Read-only on legacy
Phase 3 Unit-aware chunking 0 (IU only)

§6.4 Anti-pattern

  • ❌ Chunk ngoài phạm vi miếng.
  • ❌ Qdrant ghi body vào PG.
  • ❌ Giả định legacy payload có unit_id.
  • ❌ Replace legacy pipeline.
  • ❌ Mutate production_documents cho IU.
  • ❌ Ép KB → IU trước vector hóa.
  • ❌ IU bridge mutate legacy data.

§6.5 Duplicate Content — RISK HIGH/CRITICAL

User nhấn mạnh: RẤT PHỔ BIẾN. Phase 0 safest = collection separation. Chi tiết freshness-based priority: §6.2.B.

§6.6 Do-not-break Vector Guardrails VG-1..9

# Guardrail
VG-1 KHÔNG đổi collection production_documents
VG-2 KHÔNG thay chunking 4000/400
VG-3 KHÔNG re-embed hàng loạt
VG-4 KHÔNG xoá orphan bằng logic mới
VG-5 KHÔNG đổi ingestion pipeline
VG-6 KHÔNG đổi orphan checker hiện hữu
VG-7 KHÔNG ép vector theo IU schema trong legacy collection
VG-8 Adapter first, KHÔNG phá pipeline cũ
VG-9 PG/KB = SSOT; vector = projection

§7. Protection — 9 guardrails G-1..G-9

G-1 enacted immutable / G-2 APR DDL / G-3 universal_edges SSOT / G-4 vector projection / G-5 profile registry / G-6 2-layer separation / G-7 canonical immutable / G-8 DOT phụ IDLE / G-9 bump major UMC migration.


§8. Slice-based editing workflow

Slice = 1 information_unit. Default = §Section atom. 7 operations: read / edit / append / split / merge / reorder / delete. Agent edit: resolve address → read slice → apply edit → new content_hash → new unit_version → outbox → return diff.


§9. Minimum Usable Package

MUP Tier 0 (block birth): U1-U8 + identity_profile.title. MUP Tier 1 (recommended): U9 + edges + topic_labels. MUP Tier 2+ (defer): vector + checker + relations.


§10. Pilot dogfooding

Priority: P0 = Đ44/P38/P11/IU-0 docs (closest data). 5 phases: Prepare → Dogfood design docs → Extend KB → TAC integration → Vector+checker. Success: slice editing reduce tokens ≥50%, edit cycle <5 turns, 0 critical schema bugs, guardrails hold.


§11. OPEN / TD

OPEN (13 items)

  • α Path: RESOLVED.
  • β Implementation pattern.
  • γ unit_kind extensions dogfooding.
  • δ Slice boundary final.
  • ε Vector granularity per kind.
  • ζ Concurrency slice editing.
  • η MUP threshold strict.
  • θ Profile templates dogfooding.
  • ι Legacy V1-V5 stabilization (Track A).
  • κ IU collection design.
  • λ Shared vs separate outbox.
  • μ Queue observability integration.
  • ν Worker poll 30s / retry 3 / dead-letter — pilot tune.

TD (7 items)

  1. Migrate design docs → IU format.
  2. Slice boundary automation.
  3. Vector regeneration scheduler.
  4. Guardrails G-1..9 + VG-1..9 enforcement.
  5. Pilot metrics dashboard.
  6. Legacy V1-V5 execution (Track A v3.1).
  7. Outbox worker implementation.

§12. Risk / Conflict

# Risk Severity
R1 Trùng P38-XC UMC High — mitigate: kế thừa nguyên
R2 Trùng P44-3 Profile High — mitigate: kế thừa nguyên
R3 Vector conflict P5 High — mitigate: align P5 INV
R4 Đẻ family mới High — mitigate: anti-pattern §2.2
R7 DDL/code lén High — mitigate: guardrails
R8 Mở P44-6 implicit High — mitigate: guardrails
R9 Guardrails vi phạm High — mitigate: pilot criteria
R10 Mâu thuẫn IU-0 ↔ P11E Medium — mitigate: kế thừa
R11 Legacy vector drift hiện hữu CRITICAL — mitigate: Track A
R12 Duplicate KB+IU HIGH/CRITICAL — mitigate: collection separation + freshness §6.2.B
R13 Docker ephemeral High — mitigate: commit git
R14 Outbox worker down → pending stuck High — mitigate: HP-10 observability + alert
R15 Dead-letter accumulation Medium — mitigate: alert + review

Anti self-contradiction: IU-0 ↔ P38-XC / P44-3/4A/5A / P11C/D/E / P5 / P6 = PASS.


§13. Questions for GPT/User review

# Câu hỏi Đề xuất
Q2 Authority IU-0 Logical proposal DRAFT; APR khi Đ44 enacted
Q3 unit_kind extensions Merge: proof_note→knowledge_atom; design_doc_section mới
Q4 Slice boundary default §Section atom
Q5 MUP strict mọi kind? Strict + auto-generate title
Q6 Vector granularity Per kind: law=section, design=paragraph, atom=1 chunk
Q7 Pilot Phase 1 scope 3-5 docs: IU-0 + P11E + P38-XC + P44-3 + 1 Đ44
Q8 IU-0-A polish round Có — pattern consistency
Q9 G-9 UMC migration Giữ — UMC > profile (2 layer)
Q10 Thứ tự sau outline Polish → full draft
Q11 Track A v3.1 PASS? Đề xuất PASS
Q12 IU-0 E.1 đủ PASS full draft? Đề xuất PASS
Q13 Full draft song song Track A? Song song — §6 Phase 1+ blocked by V1

IU-0 OUTLINE-E.1 | S190+ (2026-05-02) | Uploaded | History: A→B→C→D→E→E.1 | VRC rev 1 PASS | PG CRUD Hot Path Contract + outbox queue + vector freshness + duplicate matrix | Track A v3.1 tách riêng

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/07-iu0-information-unit-minimum-standard-outline.md