dot-iu-cutter v0.5 — Existing-Corpus Tier Normalization Plan (read-only assessment; write deferred) (2026-05-17)
dot-iu-cutter v0.5 — Existing-Corpus Tier Normalization Plan
Date: 2026-05-17 · Status: READ-ONLY ASSESSMENT + PLAN ONLY — no UPDATE, no write, no schema change. Parent: v0.5 pre-scale foundation.
1. Read-only assessment (grounded)
public.tac_logical_unit (86 rows, all draft/draft_only/canonical-address-v1):
| doc_code | rows | tier populated? |
|---|---|---|
| DIEU-28 | 27 | yes — root / section / unit |
| DIEU-32 | 23 | no — tier blank |
| DIEU-35 | 36 | no — tier blank |
59/86 rows have empty tier (all of DIEU-32 + DIEU-35). canonical_address is uniformly populated with the v1 grammar (D38-DIEU<NN>-ROOT / -S<n> / -S<n>-P<n>) and section_type is populated for all. So tier is derivable structurally from the already-correct canonical_address + section_type, not lost information.
2. Why it matters for full-document scale
A full-document trial + SWEEP/lineage logic and any tier-aware query assume a consistent tier. Mixed populated/blank tier across co-resident documents → inconsistent query semantics and a latent correctness risk once the corpus grows. It must be assessed now and normalised before broad multi-document operations — but normalisation is a write, hence a separate cycle.
3. Proposed derivation rule (for the future write cycle — NOT executed)
Deterministic, idempotent, derivable purely from existing correct columns:
canonical_addressends-ROOT→tier='root'canonical_addressmatches…-S<n>with no-P…andsection_type='heading'(a section container) →tier='section'- otherwise (leaf address
…-S<n>-P<n>/…-S<n>non-heading leaf) →tier='unit'Cross-checked against DIEU-28 (already tiered) as the oracle: the rule must reproduce DIEU-28's existing tiers exactly before it is trusted for DIEU-32/35. Any DIEU-28 mismatch ⇒ rule wrong ⇒ STOP, do not apply.
4. Execution posture (deferred, separate authorization)
- Phase A (now, this doc): read-only assessment + derivation-rule design + oracle-validation plan. Done. No write.
- Phase B (separate, NOT authorized): a dedicated write cycle — dry-run the derivation on a restored copy, validate against the DIEU-28 oracle + manual spot review, then a command-review + sovereign prompt for the production
UPDATE … SET tier=…(idempotent, append-safe, per-row, no other column touched, backup + verification). This is an ordinary data-qualityUPDATE(not a cut, not append-only-ledger), so it needs its own explicit authorization and is out of scope here. - Independent of the full-document trial scheduling (can run before it; should run before any cross-document tier-aware operation).
5. Open decisions for GPT
- OD-T1 Approve the §3 derivation rule (with DIEU-28 oracle gate) — or require manual curation instead of derivation.
- OD-T2 Sequence: normalise DIEU-32/35 before the volume dry-run, or treat tier-normalisation and the dry-run as independent tracks.
- OD-T3 Confirm normalisation is a separate write cycle (recommended) vs folded into another authorized change.
Boundaries / Git
Read-only assessment + plan only — no UPDATE/write/schema/commit. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; SQL = SSOT; no vector/NoSQL. Next = GPT review; the write is a separate gated cycle.