KB-5FEA

dot-iu-cutter v0.5 — Existing-Corpus Tier Normalization Plan (read-only assessment; write deferred) (2026-05-17)

4 min read Revision 1
dot-iu-cutterv0.5pre-scale-foundationtier-normalizationDIEU-32DIEU-35design-onlydieu44

dot-iu-cutter v0.5 — Existing-Corpus Tier Normalization Plan

Date: 2026-05-17 · Status: READ-ONLY ASSESSMENT + PLAN ONLY — no UPDATE, no write, no schema change. Parent: v0.5 pre-scale foundation.

1. Read-only assessment (grounded)

public.tac_logical_unit (86 rows, all draft/draft_only/canonical-address-v1):

doc_code rows tier populated?
DIEU-28 27 yes — root / section / unit
DIEU-32 23 no — tier blank
DIEU-35 36 no — tier blank

59/86 rows have empty tier (all of DIEU-32 + DIEU-35). canonical_address is uniformly populated with the v1 grammar (D38-DIEU<NN>-ROOT / -S<n> / -S<n>-P<n>) and section_type is populated for all. So tier is derivable structurally from the already-correct canonical_address + section_type, not lost information.

2. Why it matters for full-document scale

A full-document trial + SWEEP/lineage logic and any tier-aware query assume a consistent tier. Mixed populated/blank tier across co-resident documents → inconsistent query semantics and a latent correctness risk once the corpus grows. It must be assessed now and normalised before broad multi-document operations — but normalisation is a write, hence a separate cycle.

3. Proposed derivation rule (for the future write cycle — NOT executed)

Deterministic, idempotent, derivable purely from existing correct columns:

  • canonical_address ends -ROOTtier='root'
  • canonical_address matches …-S<n> with no -P… and section_type='heading' (a section container) → tier='section'
  • otherwise (leaf address …-S<n>-P<n> / …-S<n> non-heading leaf) → tier='unit' Cross-checked against DIEU-28 (already tiered) as the oracle: the rule must reproduce DIEU-28's existing tiers exactly before it is trusted for DIEU-32/35. Any DIEU-28 mismatch ⇒ rule wrong ⇒ STOP, do not apply.

4. Execution posture (deferred, separate authorization)

  • Phase A (now, this doc): read-only assessment + derivation-rule design + oracle-validation plan. Done. No write.
  • Phase B (separate, NOT authorized): a dedicated write cycle — dry-run the derivation on a restored copy, validate against the DIEU-28 oracle + manual spot review, then a command-review + sovereign prompt for the production UPDATE … SET tier=… (idempotent, append-safe, per-row, no other column touched, backup + verification). This is an ordinary data-quality UPDATE (not a cut, not append-only-ledger), so it needs its own explicit authorization and is out of scope here.
  • Independent of the full-document trial scheduling (can run before it; should run before any cross-document tier-aware operation).

5. Open decisions for GPT

  • OD-T1 Approve the §3 derivation rule (with DIEU-28 oracle gate) — or require manual curation instead of derivation.
  • OD-T2 Sequence: normalise DIEU-32/35 before the volume dry-run, or treat tier-normalisation and the dry-run as independent tracks.
  • OD-T3 Confirm normalisation is a separate write cycle (recommended) vs folded into another authorized change.

Boundaries / Git

Read-only assessment + plan only — no UPDATE/write/schema/commit. Git main · e93424b5ff7fa5e4b8406131977ce4339cd0856a · clean (0 lines). No hardcoding; SQL = SSOT; no vector/NoSQL. Next = GPT review; the write is a separate gated cycle.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-pre-scale-foundation-design/dot-iu-cutter-v0.5-existing-corpus-tier-normalization-plan-2026-05-17.md