KB-1A28

IU-0 Module B — Packaging, CRUD Hot Path, Vector Architecture

11 min read Revision 1
iu-0full-draftmodular-v2packagingcrudhot-pathoutboxvectorvrcs191

IU-0 Module B — Packaging, CRUD Hot Path, Vector Architecture

Trạng thái: FULL DRAFT MODULAR v2 — chờ GPT/User polish review Module: 07b | Sections: §5, §5.A, §6 Parent: 07-iu0-index-and-core.md Track A: DONE | Schema sketch: NON-NORMATIVE EXAMPLE Sửa file này khi: packaging layers, CRUD contract, outbox, hoặc vector architecture thay đổi


§5. Packaging model — 9 layer

§5.1 Tổng quan

Mỗi information_unit đóng gói 9 layer:

# Layer SoT Owner/Update Khi sửa
1 Identity Core columns (U1-U3, U7, U8) System + Agent (birth) U1, U2 immutable. U3 amend qua APR.
2 Version Per-kind version table System auto Sửa = INSERT version mới, không UPDATE cũ.
3 Content Per-kind body + content_hash Agent → System tính hash UPDATE body → re-compute hash → outbox enqueue.
4 Profile JSONB columns per role Agent + DOT enrichment jsonb_set/merge local PG. Validation local.
5 Edges universal_edges Agent + DOT INSERT/DELETE rows. Reverse-index defer outbox.
6 Topic content_profile.topic_labels[] Agent + DOT (Đ24) Update JSONB array. Phải qua Đ24 vocab.
7 Checker DOT state fields DOT system DOT update conformance/drift. Không Agent khai.
8 Vector Per-kind vector columns + Qdrant Outbox worker Async delayed (xem §5.A).
9 Protection Lifecycle metadata + apr_ref System (lifecycle transition) enacted → content locked (→ xem 07c §7 G-1).

§5.2 MUP — Minimum Usable Package (overview)

Tier Layers Block birth? Mục đích
Tier 0 Identity + Version + Content ✅ Block Unit tồn tại, có identity + body + version
Tier 1 + Profile required + container ref + 1 edge 🔶 Recommended Unit biết mình thuộc ai
Tier 2+ + Edges + topics + checker + vector + protection ❌ Defer DOT enrich dần

Chi tiết MUP per tier: → xem 07c §9.

§5.3 Layer interaction rules

Quy tắc Diễn giải
Identity trước Content Phải có unit_id + canonical_address trước khi ghi body.
Version chứa Content Mỗi version snapshot body + content_hash.
Profile không duplicate Core Không lưu lifecycle_status trong profile JSONB.
Edge reference valid Target phải tồn tại (DOT verify, không block birth).
Vector = projection PG = SSOT. Stale → regenerate từ PG.
Protection chặn write Enacted → Version + Content locked.

§5.A IU-PG CRUD Performance Contract

Cốt lõi: CRUD hot path = PG local only. Không OpenAI, Qdrant, remote call. Tạo/sửa/xoá → PG xong → trả kết quả ngay.

§5.A.1 Hot path vs Cold path

Path Gồm gì Latency Dependency
Hot INSERT/UPDATE/DELETE unit + version + profile + edge + outbox enqueue <10ms PG local only
Cold Outbox worker → embed → Qdrant upsert/delete 120s+ delay PG + OpenAI + Qdrant

§5.A.2 HP-1..15 — Kế thừa Track A

Áp dụng cả legacy KB VÀ IU path:

# Rule
HP-1 PG transaction commit trước response
HP-2 Không OpenAI call trong write path
HP-3 Không Qdrant call trong write path
HP-4 Content hash compute local (SHA-256)
HP-5 Profile JSONB update = jsonb_set (PG native)
HP-6 Edge CRUD = PG INSERT/DELETE
HP-7 Birth gate = PG constraints + trigger nhẹ
HP-8 Lifecycle transition = PG UPDATE + constraint
HP-9 Outbox enqueue = PG INSERT/UPSERT (cùng transaction)
HP-10 Outbox worker observable (monitoring stuck)
HP-11 Retry backoff exponential
HP-12 Dead-letter = manual review + alert
HP-13 Debounce key = source entity ID
HP-14 Delete = soft-delete PG + outbox delete (30-60s)
HP-15 Bulk = staggered enqueue

§5.A.3 IU-HP-1..5 — IU-specific

# Rule Chi tiết
IU-HP-1 Tạo unit INSERT unit + version + outbox → return ngay. <10ms.
IU-HP-2 Slice edit Read slice + UPDATE body + new hash + INSERT version + outbox. PG local.
IU-HP-3 Profile update UPDATE JSONB (jsonb_set/merge). Validation local.
IU-HP-4 Edge CRUD INSERT/DELETE universal_edges. Reverse-index defer outbox.
IU-HP-5 Birth gate PG constraints + trigger kiểm 15 required. Không external call.

§5.A.4 Outbox defaults (đã chốt, không hỏi lại)

Parameter Value
Queue option V1-B durable outbox
Quiet window 120s (configurable 180s)
Delete delay 30–60s async
Write path Enqueue-only
Worker poll 30s
Max retry 3 (exponential backoff)
Dead-letter Manual review + alert
Debounce key legacy document_id
Debounce key IU unit_version_id

§5.A.5 NON-NORMATIVE EXAMPLE — Outbox table sketch

⚠️ Minh hoạ logical. Physical schema chốt qua APR.

Column (sketch) Purpose
outbox_id PK
source_kind kb_document / information_unit / unit_version
source_id FK tới source entity
debounce_key Gộp nhiều sửa → 1 embed
operation upsert / delete
content_hash Hash tại enqueue — worker skip nếu đã embed
earliest_run_at now() + quiet_window
status pending / processing / done / dead_letter
retry_count Số retry
last_error Error gần nhất
timestamps Audit

Shared vs separate outbox: OPEN IU-0-λ (→ xem 07c §11).

§5.A.6 Agent workflow scenarios

Scenario Hot path (PG) Cold path (outbox)
Tạo mới INSERT unit + version + outbox. <10ms. Worker embed sau 120s.
Sửa 3 lần/1 phút 3x UPDATE + 3x version + 3x outbox bump. Debounce: 1 embed cuối.
Đọc vừa sửa SELECT body. <5ms. Không cần vector.
Xoá Soft-delete + outbox delete 30-60s. Worker delete vector.
Bulk 50 units 50x INSERT + staggered outbox. Worker lần lượt.
Profile enrichment DOT UPDATE JSONB. Content không đổi. Metadata-only = không enqueue.

§6. Vector architecture — Legacy safety + IU parallel track

Dựa 100% VRC evidence + Track A results. Không suy đoán.

§6.0 Bài học

Phiên S190+ đã 3 lần viết thiết kế vector trước khi biết thực tế. User nhắc 3 lần "không chắc đúng = sai". Sau đó Agent điều tra runtime (VRC), tìm root cause orphan (OGV-0), fix 2-layer defense (OGV-P0), hoàn tất Track A. §6 chỉ reference evidence.

§6.1 Legacy vector — VRC facts

# Fact Value VRC
1 Collection production_documents (duy nhất) F1
2 Dimensions / distance 1536 / Cosine F2
3 Points 11,052 F3
4 Active KB docs 2,770 F4
5 Embedding text-embedding-3-small F6
6 Chunking 4000 chars / 400 overlap F7
7 Payload content, document_id, metadata, parent_id, is_human_readable F8
8 KHÔNG CÓ unit_id, canonical_address, unit_version_id, content_hash F9
9 Search dedup document_id only F10
10 Multi-collection KHÔNG hỗ trợ F12
11 Update Delete all → upsert new, not atomic F16

§6.2 Track A completion

Metric Trước Sau
Orphan bug 47 0
Ghost bug 0 (7 = correct-behavior) 0
Zombie resurrection Active Fixed (listener guard + trigger semantic DELETE)
Trigger persist Ephemeral ✅ Migration file
dot-vector-audit FAILING ✅ Report-only PASS
Qdrant backup No cron ✅ Daily 03:00

§6.3 Hai hệ tách tầng

Layer Vai trò Quy tắc
Legacy KB Vector Toàn bộ KB, production_documents Giữ nguyên 100%.
IU Vector Information units Song song. Collection riêng. Adapter/bridge.

Legacy-first: conflict → Legacy SSOT thắng. IU KHÔNG mutate legacy.

§6.4 IU-VP-1..6

# Principle Phase
IU-VP-1 Chunk trong phạm vi 1 unit_version 3
IU-VP-2 Chunk metadata: unit_id, unit_version_id, canonical_address, content_hash_at_embedding 1
IU-VP-3 Content hash đổi → stale → regenerate 1+
IU-VP-4 Split/merge → retire old, generate new 3
IU-VP-5 Vector = projection, KHÔNG SSOT. PG thắng. Qdrant down → CRUD vẫn chạy. Confirmed
IU-VP-6 Async delayed outbox. 120s quiet, debounce unit_version_id, poll 30s, retry 3, dead-letter. Delete 30-60s. Confirmed

§6.5 Freshness — 5 states

State Search behavior
current Normal. Ưu tiên nếu duplicate.
pending Trả cũ + freshness warning. Fallback PG.
stale Trả cũ + stale warning.
error Exclude hoặc error warning. Alert ops.
retired Exclude hoàn toàn.

§6.6 Duplicate KB + IU matrix

IU state KB state Search trả
current + enacted current Ưu tiên IU
pending current Trả KB
stale/error current Trả KB + warning IU
current + enacted stale/error Trả IU
retired current Trả KB
Không tồn tại current Trả KB (legacy-only)

§6.7 4 Phase migration

Phase Action Impact legacy?
0 Read-only mapping adapter Không
1 IU collection riêng + outbox ingestion Không
2 Search adapter 2 collections + freshness dedup Read-only legacy
3 Unit-aware chunking Không (IU only)

Hiện tại Phase 0 chưa bắt đầu. Track A = prerequisite thỏa. Implementation qua P44-6 (chưa mở).

§6.8 VG-1..9 — Do-not-break vector guardrails

# Guardrail
VG-1 KHÔNG đổi production_documents
VG-2 KHÔNG thay chunking 4000/400
VG-3 KHÔNG re-embed hàng loạt legacy
VG-4 KHÔNG xoá orphan bằng logic mới (Track A = 0)
VG-5 KHÔNG đổi ingestion pipeline legacy
VG-6 KHÔNG đổi orphan checker
VG-7 KHÔNG ép IU schema vào legacy payload
VG-8 Adapter first — không phá pipeline cũ
VG-9 PG/KB = SSOT; vector = projection

§6.9 Anti-patterns vector

# Anti-pattern
VA-1 Chunk ngoài phạm vi unit
VA-2 Qdrant ghi body vào PG
VA-3 Giả định legacy payload có unit_id (VRC F9: KHÔNG)
VA-4 Replace legacy pipeline trước Phase 2
VA-5 Mutate production_documents cho IU
VA-6 Ép KB draft → IU trước vector hoá
VA-7 IU bridge mutate legacy data

Module B — §5, §5.A, §6 | Parent: 07-iu0-index-and-core.md | Evidence: VRC rev 1 + Track A DONE