5000x-live — Real corpus pilot (DIEU-35)
5000x-live — Real corpus pilot (DIEU-35)
Hygiene note (6000x): title/tags renormalised from "5500x" to "5000x-live" to match the path. Content preserved verbatim except this banner.
Verdict: PASS — three-axis envelope on live legal corpus validated; no mutation.
Selection rationale
iu_three_axis_envelope corpus by axis_a_doc_code:
| doc_code | iu_count | max_depth |
|---|---|---|
| (null) | 77 | 0 |
| DIEU-35 | 36 | 2 |
| DIEU-28 | 27 | 1 |
| DIEU-32 | 23 | 1 |
DIEU-35 selected — largest non-null real legal corpus, deepest hierarchy (2 levels), 36 IU.
End-to-end verification (no mutation)
Step 1 — PG canonical: three-axis envelope rows
DIEU-35 has 36 envelope rows. Sample S4 subtree (root cb211ee6-2b61-496e-b191-ef502ea28345):
depth=0 D38-DIEU35-S4
depth=1 D38-DIEU35-S4-P1
depth=1 D38-DIEU35-S4-P2
depth=1 D38-DIEU35-S4-P3
depth=1 D38-DIEU35-S4-P4
depth=2 D38-DIEU35-S4-P1-1
depth=2 D38-DIEU35-S4-P1-2
depth=2 D38-DIEU35-S4-P1-3
8 rows from fn_iu_subtree(p_root uuid) — recursive descent operates correctly.
Step 2 — Axis A (linear sort)
axis_a_sort_order deterministic: S4=5, S4-P1=6, S4-P1-1=7, S4-P1-2=8. Original-text reconstruction projection intact.
Step 3 — Axis B (semantic tags) — dict shape validated
Real DIEU-35 axis_b_tags shape (JSONB):
{"unit_kind": ["kind:law_unit"], "section_type": ["sectype:article"], "legal_document": ["doc:DIEU-35"]}
{"unit_kind": ["kind:law_unit"], "section_type": ["sectype:heading"], "legal_document": ["doc:DIEU-35"]}
{"unit_kind": ["kind:law_unit"], "section_type": ["sectype:principle"],"legal_document": ["doc:DIEU-35"]}
This confirms the 5000x-discovered defect (axis_b_tags is Record<string,string[]> dict, NOT flat string[]). The factory composable's flattenAxisBTags(raw) helper handles both shapes. Defect-fix validated against real corpus this macro.
Step 4 — Axis C (hierarchy)
Max depth 2 (S4-P1-N children), parent_id chain intact, depth column accurate.
Step 5 — Qdrant boundary check
DIEU-35 IU set is not yet Qdrant-synced (intentionally — DIEU-35 onboarding is a separate macro). Current iu_vector_sync_point set is 61 points / 60 unique units from the earlier pilot.iu0.test-* corpus. Per-IU boundary preserved (61/60 → KT-B is the only 2-chunk IU; both chunks inside its boundary).
Step 6 — Five-layer boundary
PG → Directus → Qdrant → operator_runtime → text-as-code. Every layer queried read-only this macro. No layer mutated.
Rollback / disable
Nothing to rollback. All write gates remain inert. No new rows. No deletes.
Recommended next-macro DIEU-35 Qdrant onboarding (5800x)
- Flip
iu_core.vector_sync_enabled→true. - Run
cutter_agent/iu_core/vector_sync.py apply_iu_setfor 36 DIEU-35 IUs (per-IU boundary, bounded batches). - Verify Qdrant count grows 61 → 61 + 36 = 97 (or +37 if any IU exceeds per-IU token limit).
- Verify
iu_vector_sync_pointrows grow same amount, all sync_status='indexed'. - Re-flip vector_sync_enabled → false.
- Smoke search "DIEU 35 nguyên tắc" → top 3 hits contain DIEU-35 IUs.