KB-379F

dot-iu-cutter v0.1 — P0-1 canonical_address Migration Design

14 min read Revision 1
dot-iu-cuttermigration-designp0-1canonical-addresstac-logical-unitno-ddlrev5d

dot-iu-cutter v0.1 — P0-1 canonical_address Migration Design

Date: 2026-05-15 Status: P0 MIGRATION DESIGN — Item 2 of 6 Scope: DESIGN ONLY. No DDL, no SQL, no CREATE/ALTER TABLE, no column DDL, no migration execution, no PG mutation, no backfill executed. Master: migration-design/dot-iu-cutter-v0.1-p0-migration-design-master-2026-05-15.md


1. Purpose

P0-1 introduces canonical_address as a stable, human-readable, first-class identity field on tac_logical_unit (existing table) so that every IU has an authoritative address (e.g. "Đ44 §5.3.1") used for:

  • Manifest per-unit-block identity reference (P0-2).
  • Round-trip verification ordering (P0-4, D1 §4.7).
  • Citation discipline (D11 §4.13 consumer contract).
  • Cross-document reference resolution (D6 §4.2).
  • Đ0-G birth-gate authority distinction (enacted / draft / runtime).

2. Source Design References

  • D6 Assembly Axes & Metadata Contract — §4.2 (axis-1 metadata), §4.8 (vocabulary discipline), §6 (schema gap item 1).
  • D7 UOSL Compatibility Note — §4.3 (G1 identity hint), §6 (schema gap item 1).
  • D8 §6.1 consolidated schema gap item 1.
  • D2 §4.2 manifest per-unit block field canonical_address.
  • Đ24 Step 1 ratified authority enum [enacted, draft, runtime] (cross-law with Đ0-G).
  • P0 Schema Planning §5.1 P0-1 detail.

3. Logical Object / Table Intent

Target table: tac_logical_unit (existing — augmented; NOT a new table).

New fields added on tac_logical_unit:

  • canonical_address (primary new field)
  • authority (Đ0-G distinction)
  • companion supporting fields per §4

Companion table (open decision): canonical_address_alias for handling renames / supersessions (see §9 item 4).

Target DB: directus (existing). Target Schema: TAC (existing). Target Layer: Kho (data persistence on canonical units).

4. Proposed Fields at Conceptual Level

4.1 On tac_logical_unit (augmentation)

Field name Type-class Nullable Notes
canonical_address text YES initially (bootstrap); NOT NULL after backfill stable, human-readable address; format spec in §5
canonical_address_format_version text (semver) YES format version used at creation; supports format migration
authority enum-ref to Đ24 group 10 ([enacted, draft, runtime]) NO after backfill Đ0-G birth-gate distinction
birth_gate_class enum-ref YES further Đ0-G classification if needed (open decision §9)
address_collision_at timestamp UTC YES last collision detection timestamp
superseded_by_unit_id FK to tac_logical_unit YES for supersession chains
supersedes_unit_id FK to tac_logical_unit YES reverse pointer

4.2 canonical_address_alias (companion; supports renames / supersessions; OPEN whether table or JSONB column on tac_logical_unit)

Field name Type-class Nullable Notes
alias_id bigserial OR uuid NO row identifier
unit_id FK to tac_logical_unit NO current canonical unit
address_text text NO historical or alternate address
address_format_version text semver NO format at the time this alias was valid
alias_kind enum-ref NO values: previous_canonical / rename / redirect / external_reference
valid_from timestamp UTC NO when this alias started being recognized
valid_until timestamp UTC YES null = currently valid
created_by text actor NO actor recording the alias

5. Field Ownership / Vocabulary Dependency

Field Vocabulary owner
canonical_address text content Đ24 controls format vocabulary (per Đ24 Step 1 — §5.1 §5.2 of Đ24 closure: section_type/unit_kind/body_source_policy partially inform address syntax)
canonical_address_format_version cutter-local (semver)
authority enum Đ24 Step 1 ratified: [enacted, draft, runtime] cross-law with Đ0-G
birth_gate_class Đ0-G governance — open decision §9
alias_kind enum Đ24 (recommend cutter-local v0.1 with Đ24 confirm path)

6. Canonical Address Format (proposed)

format_version: 1.0.0
syntax_grammar (conceptual; not regex):
  for_law_artifacts: "Đ{N}[ §{path_segments_dot_separated}]"
    examples:
      - "Đ44"
      - "Đ44 §5.3.1"
      - "Đ44 §12.7"
  for_design_artifacts: "{document_slug}/{path_segments_dot_separated}"
    examples:
      - "dot-iu-cutter-v0.1/D6/§4.2"
      - "dot-iu-cutter-v0.1/D11/§4.4"
  for_code_or_other_artifacts: "{ns}:{symbol}[#{revision}]"
    examples:
      - "tac:fn_iu_create"
      - "tac:tac_logical_unit#schema_v3"
uniqueness_scope:
  per_source_revision: YES (canonical_address is unique per source_revision OR globally — open decision §9 item 1)
  global: deferred decision
mutability:
  immutable_after_publish: by default YES
  rename_allowed_via_alias: YES (alias_kind = rename); old address remains queryable via canonical_address_alias

7. Authority / Enacted / Draft / Runtime Distinction (Đ0-G)

authority_values: [enacted, draft, runtime]  # Đ24 Step 1 ratified
authority_semantics_per_dieu0g:
  enacted: official, fully-promulgated law/artifact — most authoritative
  draft: controlled draft — provisional; e.g. Đ44 itself is currently draft
  runtime: operational/runtime artifact — derived; e.g. cutter-generated IUs from runtime sessions
collision_under_authority:
  rule: a (canonical_address, source_revision) pair may have multiple rows ONLY if they have different `authority` values
  precedence_for_resolution_in_retrieval:
    1. enacted (winner if exists)
    2. draft (winner if no enacted)
    3. runtime (winner if no enacted/draft)

8. Collision Policy (P0 Schema Planning §5.1 open decision 3)

collision_detection_trigger: at MARK stage (D1 §4.3 collision check)
collision_outcomes:
  no_existing_address_at_same_revision: proceed (no collision)
  existing_address_same_authority_same_revision: BLOCK; this is a duplicate-cut attempt — emit collision_status='supersedes' on manifest + route to G-2 backlog
  existing_address_different_authority_same_revision: ALLOW (per §7 precedence rule); annotate
  existing_address_same_authority_different_revision: prior cut superseded if current revision is newer; emit collision_status='supersedes'; supersession chain via superseded_by_unit_id
  existing_address_in_alias_history: emit collision_status='prior_cut_present'; require reviewer attention
collision_resolution:
  by_authority: enacted > draft > runtime
  by_revision: newer wins (within same authority)
  by_supersession_chain: latest in chain wins
  fall_back_to_human_review: if rules above don't yield a single winner

9. Open Decisions

  1. Uniqueness scope — unique (canonical_address, source_revision) per authority OR globally? Recommendation: per (authority, source_revision); revisit at Đ44 G1 identity ratification.
  2. Mutation policy — once published, can canonical_address change? Recommendation: NO direct mutation; renames go through canonical_address_alias with alias_kind='rename'.
  3. Indexing strategy — single-column index, composite (canonical_address, source_revision), OR additionally trigram (pg_trgm) for fuzzy lookup. Recommendation: composite + trigram for partial-match retrieval. Index strategy is FUTURE migration execution detail; not designed here.
  4. Alias storage — separate canonical_address_alias table OR JSONB array column on tac_logical_unit. Recommendation: separate table (queryable lifecycle); JSONB only if Đ44 prefers profile-JSON pattern.
  5. birth_gate_class field — distinct enum from authority OR derivable from authority + other signals. Recommendation: keep distinct field, nullable; populate from Đ0-G rules later; open for Đ0-G governance.
  6. Backfill strategy for existing tac_logical_unit rows — nullable at first, then populate via deterministic derivation rule from existing identifiers; OR migration-time backfill function. Backfill is FUTURE migration execution; not designed here.
  7. Format extensions — handling of cross-language artifacts (Vietnamese vs English law identifiers). Recommendation: format_version mechanism handles this; v1.0.0 supports Đ-prefix; v1.1.0+ may add localized prefixes via Đ24 ratification.
  8. canonical_address for non-law non-design artifacts — code symbols, reports, runbooks — format syntax recommended in §6 but Đ24 confirmation needed.
  9. Constraint enforcement — should canonical_address be enforced unique at PG constraint level OR application level v0.1. Recommendation: application level v0.1 (loose); add PG unique constraint post-backfill in a separate migration. Constraint DDL is FUTURE.
  10. NULL canonical_address on legacy rows — accept nullable column on TAC rows that pre-date P0-1 (legacy IUs)? Recommendation: yes during bootstrap; backfill rule + post-backfill NOT NULL constraint via FUTURE migration.

10. Lifecycle

[unit creation via fn_iu_create]
   ↓
canonical_address assigned (computed by MARK per format_version)
   ↓
unit lifecycle:
   draft → review → published → (superseded OR archived)
   ↓                                ↓
   alias entries recorded on key transitions
   ↓
on rename: alias_kind='rename'; old canonical_address moves to alias
on supersession: superseded_by_unit_id set; supersedes_unit_id reverse
on collision: collision_status emitted to manifest; G-2 backlog notified

11. Dependencies

upstream_dependencies:
  governance:
    - Đ24 Step 1 ratified (authority enum, body_source_policy partially informs address syntax)
    - Đ0-G base/draft/runtime distinction (cross-law via Đ24 group 10)
    - Đ44 outcome A.6 #5 (first-class column policy: canonical_address IS first-class per Decision 5 spirit)
  schema:
    - existing tac_logical_unit table
    - existing tac_unit_version
    - existing tac_publication (for publication-membership cross-checks)
  no_p0_data_dependency_on_other_p0_items: true (P0-1 is independent root after P0-5)
downstream_dependents:
  - P0-2 manifest_unit_block (references canonical_address per row)
  - P0-3 cut_change_set (references units by canonical_address in change-set payload)
  - P0-4 verify_result (round-trip uses canonical_address ordering)
  - P0-6 review_decision (review findings cite canonical_address)
  - all retrieval / citation surfaces (D11 §4.13 consumer contract)
operational_dependencies:
  - backfill plan for existing tac_logical_unit rows (FUTURE migration execution)
  - format_version registry (cutter-local; may move to Đ24 if format extensions needed)

12. Risks

Risk Severity Mitigation in this design
Backfill of large existing tac_logical_unit population could be slow / partial Standard format_version supports gradual rollout; nullable initially; FUTURE migration execution plan
Format ambiguity for non-law artifacts Standard open decision §9 item 8; Đ24 ratification path
Alias chain cycles Standard application-level cycle detection (similar to P0-5 dependency graph); FUTURE constraint
Collision policy mis-application Standard §8 explicit precedence rules; fall back to human review
Authority enum drift (Đ0-G changes) Standard format_version mechanism; Đ0-G cross-law signature required for changes
canonical_address exposed in retrieval but maps to restricted unit HIGH cross-link with G-5 audience filter; retrieval layer must check audience BEFORE returning canonical_address (P3 work, NOT P0); address itself can be safely stored Internal
Localization (vi vs en) of address prefixes Low format_version mechanism handles; deferred via §9 item 7
Index performance regression on existing TAC queries Standard index strategy is FUTURE; recommend benchmarking pre-migration

13. Đ32 Risk Review Notes

proposed_risk_class: Standard
review_inputs_for_dieu32:
  - logical design content (this document)
  - format spec (§6)
  - authority distinction model (§7)
  - collision policy (§8)
  - cross-law dependencies (Đ24, Đ0-G)
  - migration execution preconditions:
    - backfill rule documented and reviewed
    - low-traffic migration window
    - rollback plan: column nullable until backfill complete; column drop is reversible
    - Đ24 confirms format vocabulary acceptance
    - Đ0-G confirms authority enum semantics
review_outputs_expected:
  - Đ32 approval / approval_with_notes
  - format spec ratification by Đ24
  - authority enum re-confirmation by Đ0-G
  - migration execution preconditions confirmed
review_authority: Đ32 council + Đ24 vocabulary owner + Đ0-G birth-gate owner co-sign
review_phase: NOT_STARTED

Special Đ32 attention:

  • Backfill data exposure — during backfill, derived canonical_addresses are computed and written; if backfill is misapplied, addresses may be wrong → retrieval issues, citation drift. Backfill plan must be reviewed BEFORE execution.
  • Authority misclassification — if authority is wrongly assigned (e.g., draft for an enacted law), audience filter (FUTURE) may misroute. Cross-link with Đ0-G.
  • Trigram index PII concern — fuzzy match could leak partial matches across audience boundaries; NOT a P0 issue (retrieval is P3) but documented for cross-link.

14. Explicit Confirmation

no_ddl_written: true
no_sql_written: true
no_create_table_or_alter_table_in_this_document: true
no_column_ddl_in_this_document: true
no_index_ddl: true
no_constraint_ddl: true
no_backfill_executed: true
no_migration_executed: true
no_pg_mutation: true
no_qdrant_mutation: true
no_data_writes: true
no_implementation_planning: true
no_existing_file_modified: true
output_form: logical_design_only
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/migration-design/dot-iu-cutter-v0.1-p0-1-canonical-address-migration-design-2026-05-15.md