dot-iu-cutter v0.1 — P0-1 canonical_address Migration Design
dot-iu-cutter v0.1 — P0-1 canonical_address Migration Design
Date: 2026-05-15 Status: P0 MIGRATION DESIGN — Item 2 of 6 Scope: DESIGN ONLY. No DDL, no SQL, no CREATE/ALTER TABLE, no column DDL, no migration execution, no PG mutation, no backfill executed. Master:
migration-design/dot-iu-cutter-v0.1-p0-migration-design-master-2026-05-15.md
1. Purpose
P0-1 introduces canonical_address as a stable, human-readable, first-class identity field on tac_logical_unit (existing table) so that every IU has an authoritative address (e.g. "Đ44 §5.3.1") used for:
- Manifest per-unit-block identity reference (P0-2).
- Round-trip verification ordering (P0-4, D1 §4.7).
- Citation discipline (D11 §4.13 consumer contract).
- Cross-document reference resolution (D6 §4.2).
- Đ0-G birth-gate authority distinction (
enacted/draft/runtime).
2. Source Design References
- D6 Assembly Axes & Metadata Contract — §4.2 (axis-1 metadata), §4.8 (vocabulary discipline), §6 (schema gap item 1).
- D7 UOSL Compatibility Note — §4.3 (G1 identity hint), §6 (schema gap item 1).
- D8 §6.1 consolidated schema gap item 1.
- D2 §4.2 manifest per-unit block field
canonical_address. - Đ24 Step 1 ratified
authorityenum[enacted, draft, runtime](cross-law with Đ0-G). - P0 Schema Planning §5.1 P0-1 detail.
3. Logical Object / Table Intent
Target table: tac_logical_unit (existing — augmented; NOT a new table).
New fields added on tac_logical_unit:
canonical_address(primary new field)authority(Đ0-G distinction)- companion supporting fields per §4
Companion table (open decision): canonical_address_alias for handling renames / supersessions (see §9 item 4).
Target DB: directus (existing). Target Schema: TAC (existing). Target Layer: Kho (data persistence on canonical units).
4. Proposed Fields at Conceptual Level
4.1 On tac_logical_unit (augmentation)
| Field name | Type-class | Nullable | Notes |
|---|---|---|---|
canonical_address |
text | YES initially (bootstrap); NOT NULL after backfill | stable, human-readable address; format spec in §5 |
canonical_address_format_version |
text (semver) | YES | format version used at creation; supports format migration |
authority |
enum-ref to Đ24 group 10 ([enacted, draft, runtime]) |
NO after backfill | Đ0-G birth-gate distinction |
birth_gate_class |
enum-ref | YES | further Đ0-G classification if needed (open decision §9) |
address_collision_at |
timestamp UTC | YES | last collision detection timestamp |
superseded_by_unit_id |
FK to tac_logical_unit |
YES | for supersession chains |
supersedes_unit_id |
FK to tac_logical_unit |
YES | reverse pointer |
4.2 canonical_address_alias (companion; supports renames / supersessions; OPEN whether table or JSONB column on tac_logical_unit)
| Field name | Type-class | Nullable | Notes |
|---|---|---|---|
alias_id |
bigserial OR uuid | NO | row identifier |
unit_id |
FK to tac_logical_unit |
NO | current canonical unit |
address_text |
text | NO | historical or alternate address |
address_format_version |
text semver | NO | format at the time this alias was valid |
alias_kind |
enum-ref | NO | values: previous_canonical / rename / redirect / external_reference |
valid_from |
timestamp UTC | NO | when this alias started being recognized |
valid_until |
timestamp UTC | YES | null = currently valid |
created_by |
text actor | NO | actor recording the alias |
5. Field Ownership / Vocabulary Dependency
| Field | Vocabulary owner |
|---|---|
canonical_address text content |
Đ24 controls format vocabulary (per Đ24 Step 1 — §5.1 §5.2 of Đ24 closure: section_type/unit_kind/body_source_policy partially inform address syntax) |
canonical_address_format_version |
cutter-local (semver) |
authority enum |
Đ24 Step 1 ratified: [enacted, draft, runtime] cross-law with Đ0-G |
birth_gate_class |
Đ0-G governance — open decision §9 |
alias_kind enum |
Đ24 (recommend cutter-local v0.1 with Đ24 confirm path) |
6. Canonical Address Format (proposed)
format_version: 1.0.0
syntax_grammar (conceptual; not regex):
for_law_artifacts: "Đ{N}[ §{path_segments_dot_separated}]"
examples:
- "Đ44"
- "Đ44 §5.3.1"
- "Đ44 §12.7"
for_design_artifacts: "{document_slug}/{path_segments_dot_separated}"
examples:
- "dot-iu-cutter-v0.1/D6/§4.2"
- "dot-iu-cutter-v0.1/D11/§4.4"
for_code_or_other_artifacts: "{ns}:{symbol}[#{revision}]"
examples:
- "tac:fn_iu_create"
- "tac:tac_logical_unit#schema_v3"
uniqueness_scope:
per_source_revision: YES (canonical_address is unique per source_revision OR globally — open decision §9 item 1)
global: deferred decision
mutability:
immutable_after_publish: by default YES
rename_allowed_via_alias: YES (alias_kind = rename); old address remains queryable via canonical_address_alias
7. Authority / Enacted / Draft / Runtime Distinction (Đ0-G)
authority_values: [enacted, draft, runtime] # Đ24 Step 1 ratified
authority_semantics_per_dieu0g:
enacted: official, fully-promulgated law/artifact — most authoritative
draft: controlled draft — provisional; e.g. Đ44 itself is currently draft
runtime: operational/runtime artifact — derived; e.g. cutter-generated IUs from runtime sessions
collision_under_authority:
rule: a (canonical_address, source_revision) pair may have multiple rows ONLY if they have different `authority` values
precedence_for_resolution_in_retrieval:
1. enacted (winner if exists)
2. draft (winner if no enacted)
3. runtime (winner if no enacted/draft)
8. Collision Policy (P0 Schema Planning §5.1 open decision 3)
collision_detection_trigger: at MARK stage (D1 §4.3 collision check)
collision_outcomes:
no_existing_address_at_same_revision: proceed (no collision)
existing_address_same_authority_same_revision: BLOCK; this is a duplicate-cut attempt — emit collision_status='supersedes' on manifest + route to G-2 backlog
existing_address_different_authority_same_revision: ALLOW (per §7 precedence rule); annotate
existing_address_same_authority_different_revision: prior cut superseded if current revision is newer; emit collision_status='supersedes'; supersession chain via superseded_by_unit_id
existing_address_in_alias_history: emit collision_status='prior_cut_present'; require reviewer attention
collision_resolution:
by_authority: enacted > draft > runtime
by_revision: newer wins (within same authority)
by_supersession_chain: latest in chain wins
fall_back_to_human_review: if rules above don't yield a single winner
9. Open Decisions
- Uniqueness scope — unique
(canonical_address, source_revision)per authority OR globally? Recommendation: per(authority, source_revision); revisit at Đ44 G1 identity ratification. - Mutation policy — once published, can canonical_address change? Recommendation: NO direct mutation; renames go through
canonical_address_aliaswithalias_kind='rename'. - Indexing strategy — single-column index, composite
(canonical_address, source_revision), OR additionally trigram (pg_trgm) for fuzzy lookup. Recommendation: composite + trigram for partial-match retrieval. Index strategy is FUTURE migration execution detail; not designed here. - Alias storage — separate
canonical_address_aliastable OR JSONB array column ontac_logical_unit. Recommendation: separate table (queryable lifecycle); JSONB only if Đ44 prefers profile-JSON pattern. birth_gate_classfield — distinct enum fromauthorityOR derivable fromauthority+ other signals. Recommendation: keep distinct field, nullable; populate from Đ0-G rules later; open for Đ0-G governance.- Backfill strategy for existing
tac_logical_unitrows — nullable at first, then populate via deterministic derivation rule from existing identifiers; OR migration-time backfill function. Backfill is FUTURE migration execution; not designed here. - Format extensions — handling of cross-language artifacts (Vietnamese vs English law identifiers). Recommendation: format_version mechanism handles this; v1.0.0 supports Đ-prefix; v1.1.0+ may add localized prefixes via Đ24 ratification.
canonical_addressfor non-law non-design artifacts — code symbols, reports, runbooks — format syntax recommended in §6 but Đ24 confirmation needed.- Constraint enforcement — should canonical_address be enforced unique at PG constraint level OR application level v0.1. Recommendation: application level v0.1 (loose); add PG unique constraint post-backfill in a separate migration. Constraint DDL is FUTURE.
- NULL canonical_address on legacy rows — accept nullable column on TAC rows that pre-date P0-1 (legacy IUs)? Recommendation: yes during bootstrap; backfill rule + post-backfill NOT NULL constraint via FUTURE migration.
10. Lifecycle
[unit creation via fn_iu_create]
↓
canonical_address assigned (computed by MARK per format_version)
↓
unit lifecycle:
draft → review → published → (superseded OR archived)
↓ ↓
alias entries recorded on key transitions
↓
on rename: alias_kind='rename'; old canonical_address moves to alias
on supersession: superseded_by_unit_id set; supersedes_unit_id reverse
on collision: collision_status emitted to manifest; G-2 backlog notified
11. Dependencies
upstream_dependencies:
governance:
- Đ24 Step 1 ratified (authority enum, body_source_policy partially informs address syntax)
- Đ0-G base/draft/runtime distinction (cross-law via Đ24 group 10)
- Đ44 outcome A.6 #5 (first-class column policy: canonical_address IS first-class per Decision 5 spirit)
schema:
- existing tac_logical_unit table
- existing tac_unit_version
- existing tac_publication (for publication-membership cross-checks)
no_p0_data_dependency_on_other_p0_items: true (P0-1 is independent root after P0-5)
downstream_dependents:
- P0-2 manifest_unit_block (references canonical_address per row)
- P0-3 cut_change_set (references units by canonical_address in change-set payload)
- P0-4 verify_result (round-trip uses canonical_address ordering)
- P0-6 review_decision (review findings cite canonical_address)
- all retrieval / citation surfaces (D11 §4.13 consumer contract)
operational_dependencies:
- backfill plan for existing tac_logical_unit rows (FUTURE migration execution)
- format_version registry (cutter-local; may move to Đ24 if format extensions needed)
12. Risks
| Risk | Severity | Mitigation in this design |
|---|---|---|
| Backfill of large existing tac_logical_unit population could be slow / partial | Standard | format_version supports gradual rollout; nullable initially; FUTURE migration execution plan |
| Format ambiguity for non-law artifacts | Standard | open decision §9 item 8; Đ24 ratification path |
| Alias chain cycles | Standard | application-level cycle detection (similar to P0-5 dependency graph); FUTURE constraint |
| Collision policy mis-application | Standard | §8 explicit precedence rules; fall back to human review |
| Authority enum drift (Đ0-G changes) | Standard | format_version mechanism; Đ0-G cross-law signature required for changes |
canonical_address exposed in retrieval but maps to restricted unit |
HIGH | cross-link with G-5 audience filter; retrieval layer must check audience BEFORE returning canonical_address (P3 work, NOT P0); address itself can be safely stored Internal |
| Localization (vi vs en) of address prefixes | Low | format_version mechanism handles; deferred via §9 item 7 |
| Index performance regression on existing TAC queries | Standard | index strategy is FUTURE; recommend benchmarking pre-migration |
13. Đ32 Risk Review Notes
proposed_risk_class: Standard
review_inputs_for_dieu32:
- logical design content (this document)
- format spec (§6)
- authority distinction model (§7)
- collision policy (§8)
- cross-law dependencies (Đ24, Đ0-G)
- migration execution preconditions:
- backfill rule documented and reviewed
- low-traffic migration window
- rollback plan: column nullable until backfill complete; column drop is reversible
- Đ24 confirms format vocabulary acceptance
- Đ0-G confirms authority enum semantics
review_outputs_expected:
- Đ32 approval / approval_with_notes
- format spec ratification by Đ24
- authority enum re-confirmation by Đ0-G
- migration execution preconditions confirmed
review_authority: Đ32 council + Đ24 vocabulary owner + Đ0-G birth-gate owner co-sign
review_phase: NOT_STARTED
Special Đ32 attention:
- Backfill data exposure — during backfill, derived canonical_addresses are computed and written; if backfill is misapplied, addresses may be wrong → retrieval issues, citation drift. Backfill plan must be reviewed BEFORE execution.
- Authority misclassification — if
authorityis wrongly assigned (e.g.,draftfor anenactedlaw), audience filter (FUTURE) may misroute. Cross-link with Đ0-G. - Trigram index PII concern — fuzzy match could leak partial matches across audience boundaries; NOT a P0 issue (retrieval is P3) but documented for cross-link.
14. Explicit Confirmation
no_ddl_written: true
no_sql_written: true
no_create_table_or_alter_table_in_this_document: true
no_column_ddl_in_this_document: true
no_index_ddl: true
no_constraint_ddl: true
no_backfill_executed: true
no_migration_executed: true
no_pg_mutation: true
no_qdrant_mutation: true
no_data_writes: true
no_implementation_planning: true
no_existing_file_modified: true
output_form: logical_design_only