KB-1A5D

IU Core 120x Three-Axis — 02 The three-axis metadata model (migration 014)

4 min read Revision 1
dieu44iu-core-mvp120xthree-axismetadatamigration-014v0.62026-05-22

02 — The Three-Axis Metadata Model (migration 014)

Goal

Give every Information Unit a metadata envelope rich enough to filter / group / thread the IU corpus along three axes — original-text reconstruction, semantic content, and hierarchy — without creating any second source of truth.

Migration 014 — objects built (additive DDL, one BEGIN…COMMIT)

Object Axis Role
iu_metadata_tag_registry (table) B classifier-vocabulary registry; every tag value must be registered (FK-enforced)
iu_metadata_tag (table) B IU↔tag association; enrichment_source ∈ derived/confirmed/inferred/proposed; confidence 0–1
v_iu_source_outline (view) A per-IU document-order outline; sort_order_step <> 1 flags a missing/duplicate source segment
v_iu_content_group (view) B IU↔tag joined with the registry's kind/label
v_iu_metadata_envelope (view) A+B+C the unified per-IU three-axis envelope
fn_iu_reconstruct_source(text) A given a doc_code, return its IUs in original linear order + gap_before
fn_iu_subtree(uuid) C given an IU, return it + every descendant from iu_tree_path
fn_iu_metadata_refresh(uuid[]) B idempotently (re)derive classifier tags from authoritative columns

No second source of truth (gate #9)

The decisive design constraint: migration 014 adds no column to information_unit (verified by test_no_duplicate_source_of_truth).

  • Axis A is a pure projection. v_iu_source_outline and fn_iu_reconstruct_source compute order from the existing doc_code/sort_order/section_* columns — they store nothing, so they cannot drift from the authoritative row.
  • Axis C reads the existing iu_tree_path / v_iu_tree / iu_relation / iu_structure_operation — already the source of truth for hierarchy.
  • Axis B is the only new stored data. Semantic tags existed nowhere before, so iu_metadata_tag competes with nothing. It is FK-guarded against information_unit(id) and iu_metadata_tag_registry(tag_key) — no orphan tag, no un-registered vocabulary.

No hardcoded vocabulary (gate #8)

fn_iu_metadata_refresh registers tag vocabulary by discovery: INSERT … SELECT DISTINCT doc_code / unit_kind / section_type FROM information_unit — the registry rows come from live data, never from baked literals. The tag_kind set (legal_document/unit_kind/section_type/ topic/subject/legal_domain) is domain vocabulary, classified. The 'doc:'/'kind:'/'sectype:' tag-key prefixes are namespacing conventions. test_no_hardcoded_secrets_or_ids confirms 0 uuid/secret literals.

Confirmed vs inferred metadata

enrichment_source separates trusted from machine-suggested:

  • derived — computed from an authoritative information_unit column (this macro's population — confidence 1.000);
  • confirmed — human-ratified;
  • inferred / proposed — machine-suggested, NOT authoritative.

This macro populates only derived tags. The topic/subject inferred lane is registered as a capability (the tag_kind vocabulary + the enrichment_source values exist) but is left empty — no semantic content is fabricated. Populating it (e.g. from an NLP/vector pass) is a later macro.

Reversibility

rollback/014_three_axis_metadata.rollback.sql drops every view, function and table. information_unit is untouched, so the rollback cannot lose authoritative data. iu_metadata_tag is truncatable — the derived tags rebuild from information_unit at any time via fn_iu_metadata_refresh.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-120x-three-axis-metadata-delivery-autocut-textcode-open-goal/02-three-axis-metadata-model.md