KB-5950 rev 2

5000x-live — Real corpus pilot (DIEU-35)

4 min read Revision 2
iu-core5000x-livereal-corpusdieu-35hygiene-repaired-by-6000x

5000x-live — Real corpus pilot (DIEU-35)

Hygiene note (6000x): title/tags renormalised from "5500x" to "5000x-live" to match the path. Content preserved verbatim except this banner.

Verdict: PASS — three-axis envelope on live legal corpus validated; no mutation.

Selection rationale

iu_three_axis_envelope corpus by axis_a_doc_code:

doc_code iu_count max_depth
(null) 77 0
DIEU-35 36 2
DIEU-28 27 1
DIEU-32 23 1

DIEU-35 selected — largest non-null real legal corpus, deepest hierarchy (2 levels), 36 IU.

End-to-end verification (no mutation)

Step 1 — PG canonical: three-axis envelope rows

DIEU-35 has 36 envelope rows. Sample S4 subtree (root cb211ee6-2b61-496e-b191-ef502ea28345):

depth=0  D38-DIEU35-S4
depth=1  D38-DIEU35-S4-P1
depth=1  D38-DIEU35-S4-P2
depth=1  D38-DIEU35-S4-P3
depth=1  D38-DIEU35-S4-P4
depth=2  D38-DIEU35-S4-P1-1
depth=2  D38-DIEU35-S4-P1-2
depth=2  D38-DIEU35-S4-P1-3

8 rows from fn_iu_subtree(p_root uuid) — recursive descent operates correctly.

Step 2 — Axis A (linear sort)

axis_a_sort_order deterministic: S4=5, S4-P1=6, S4-P1-1=7, S4-P1-2=8. Original-text reconstruction projection intact.

Step 3 — Axis B (semantic tags) — dict shape validated

Real DIEU-35 axis_b_tags shape (JSONB):

{"unit_kind": ["kind:law_unit"], "section_type": ["sectype:article"], "legal_document": ["doc:DIEU-35"]}
{"unit_kind": ["kind:law_unit"], "section_type": ["sectype:heading"],  "legal_document": ["doc:DIEU-35"]}
{"unit_kind": ["kind:law_unit"], "section_type": ["sectype:principle"],"legal_document": ["doc:DIEU-35"]}

This confirms the 5000x-discovered defect (axis_b_tags is Record<string,string[]> dict, NOT flat string[]). The factory composable's flattenAxisBTags(raw) helper handles both shapes. Defect-fix validated against real corpus this macro.

Step 4 — Axis C (hierarchy)

Max depth 2 (S4-P1-N children), parent_id chain intact, depth column accurate.

Step 5 — Qdrant boundary check

DIEU-35 IU set is not yet Qdrant-synced (intentionally — DIEU-35 onboarding is a separate macro). Current iu_vector_sync_point set is 61 points / 60 unique units from the earlier pilot.iu0.test-* corpus. Per-IU boundary preserved (61/60 → KT-B is the only 2-chunk IU; both chunks inside its boundary).

Step 6 — Five-layer boundary

PG → Directus → Qdrant → operator_runtime → text-as-code. Every layer queried read-only this macro. No layer mutated.

Rollback / disable

Nothing to rollback. All write gates remain inert. No new rows. No deletes.

  1. Flip iu_core.vector_sync_enabledtrue.
  2. Run cutter_agent/iu_core/vector_sync.py apply_iu_set for 36 DIEU-35 IUs (per-IU boundary, bounded batches).
  3. Verify Qdrant count grows 61 → 61 + 36 = 97 (or +37 if any IU exceeds per-IU token limit).
  4. Verify iu_vector_sync_point rows grow same amount, all sync_status='indexed'.
  5. Re-flip vector_sync_enabled → false.
  6. Smoke search "DIEU 35 nguyên tắc" → top 3 hits contain DIEU-35 IUs.
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-5000x-live-ui-ops-real-corpus-pilot-open-goal/05-real-corpus-pilot.md