IU Core 120x Three-Axis — 02 The three-axis metadata model (migration 014)
02 — The Three-Axis Metadata Model (migration 014)
Goal
Give every Information Unit a metadata envelope rich enough to filter / group / thread the IU corpus along three axes — original-text reconstruction, semantic content, and hierarchy — without creating any second source of truth.
Migration 014 — objects built (additive DDL, one BEGIN…COMMIT)
| Object | Axis | Role |
|---|---|---|
iu_metadata_tag_registry (table) |
B | classifier-vocabulary registry; every tag value must be registered (FK-enforced) |
iu_metadata_tag (table) |
B | IU↔tag association; enrichment_source ∈ derived/confirmed/inferred/proposed; confidence 0–1 |
v_iu_source_outline (view) |
A | per-IU document-order outline; sort_order_step <> 1 flags a missing/duplicate source segment |
v_iu_content_group (view) |
B | IU↔tag joined with the registry's kind/label |
v_iu_metadata_envelope (view) |
A+B+C | the unified per-IU three-axis envelope |
fn_iu_reconstruct_source(text) |
A | given a doc_code, return its IUs in original linear order + gap_before |
fn_iu_subtree(uuid) |
C | given an IU, return it + every descendant from iu_tree_path |
fn_iu_metadata_refresh(uuid[]) |
B | idempotently (re)derive classifier tags from authoritative columns |
No second source of truth (gate #9)
The decisive design constraint: migration 014 adds no column to
information_unit (verified by test_no_duplicate_source_of_truth).
- Axis A is a pure projection.
v_iu_source_outlineandfn_iu_reconstruct_sourcecompute order from the existingdoc_code/sort_order/section_*columns — they store nothing, so they cannot drift from the authoritative row. - Axis C reads the existing
iu_tree_path/v_iu_tree/iu_relation/iu_structure_operation— already the source of truth for hierarchy. - Axis B is the only new stored data. Semantic tags existed nowhere
before, so
iu_metadata_tagcompetes with nothing. It is FK-guarded againstinformation_unit(id)andiu_metadata_tag_registry(tag_key)— no orphan tag, no un-registered vocabulary.
No hardcoded vocabulary (gate #8)
fn_iu_metadata_refresh registers tag vocabulary by discovery:
INSERT … SELECT DISTINCT doc_code / unit_kind / section_type FROM information_unit — the registry rows come from live data, never from baked
literals. The tag_kind set (legal_document/unit_kind/section_type/
topic/subject/legal_domain) is domain vocabulary, classified. The
'doc:'/'kind:'/'sectype:' tag-key prefixes are namespacing conventions.
test_no_hardcoded_secrets_or_ids confirms 0 uuid/secret literals.
Confirmed vs inferred metadata
enrichment_source separates trusted from machine-suggested:
derived— computed from an authoritativeinformation_unitcolumn (this macro's population — confidence 1.000);confirmed— human-ratified;inferred/proposed— machine-suggested, NOT authoritative.
This macro populates only derived tags. The topic/subject inferred lane
is registered as a capability (the tag_kind vocabulary + the
enrichment_source values exist) but is left empty — no semantic content is
fabricated. Populating it (e.g. from an NLP/vector pass) is a later macro.
Reversibility
rollback/014_three_axis_metadata.rollback.sql drops every view, function and
table. information_unit is untouched, so the rollback cannot lose
authoritative data. iu_metadata_tag is truncatable — the derived tags rebuild
from information_unit at any time via fn_iu_metadata_refresh.