KB-5B13

dot-iu-cutter v0.2 — canonical_address Reconciliation Discovery (2026-05-15)

20 min read Revision 1
dieu44-trien-khaidot-iu-cutterv0.2canonical-addressreconciliationdiscoveryread-only2026-05-15

dot-iu-cutter v0.2 — canonical_address Reconciliation Discovery

document_path: knowledge/dev/laws/dieu44-trien-khai/v0.2-planning/dot-iu-cutter-v0.2-canonical-address-reconciliation-discovery-2026-05-15.md
revision: r1
date: 2026-05-15
author: Agent (Claude Code CLI, Opus 4.7 1M)
sovereign: User / anh Huyên
verifier: GPT (Đ32 HIGH-risk path — anticipated)
secondary: Opus
phase: v0.2 planning — discovery (read-only)
mutation_performed: false
ddl_written: false
migration_started: false

§1 — Purpose

Read-only discovery of the existing production state of canonical_address in the directus database, to determine whether the v0.1 P0-1 design can be applied as-drafted, must be revised, or must be substantially rethought before any v0.2 DDL is authored.

Inputs that motivated this discovery:

  • v0.1 production execution finding F-2 (pre-existing column).
  • v0.2 scope backlog item v0.2-D-4 (canonical_address reconciliation).
  • v0.1 P0-1 migration design (the intended-design baseline).

§2 — Read-Only Inspection Method

target_environment: VPS 38.242.240.89, production postgres container, db=directus
access: ssh + docker exec -i postgres psql (read-only queries only)
queries_executed:
  - information_schema.columns          (column metadata)
  - pg_indexes                          (index inventory)
  - pg_constraint                       (constraint inventory)
  - SELECT count(*) / COUNT(col)        (row stats)
  - SELECT col … ORDER BY col           (sample values)
queries_NOT_executed:
  - any INSERT/UPDATE/DELETE/TRUNCATE
  - any CREATE/ALTER/DROP
  - any backfill/migration
mutation_performed: NO

§3 — Current Production State (verbatim from inspection)

3.1 public.tac_logical_unit.canonical_address — column metadata

column_name:               canonical_address
data_type:                 text
character_maximum_length:  NULL (unlimited)
is_nullable:               NO
column_default:            (none)
ordinal_position:          2 (immediately after id; first-class field)

3.2 public.tac_logical_unit — full column list (current production)

# column type nullable
1 id uuid NO
2 canonical_address text NO
3 doc_code text NO
4 parent_id uuid YES
5 sort_order integer NO
6 section_type text NO
7 section_code text YES
8 owner text NO
9 identity_profile jsonb NO
10 tier text YES
11 lifecycle_status text NO
12 created_at timestamptz NO
13 updated_at timestamptz NO

3.3 Indexes on public.tac_logical_unit

tac_logical_unit_pkey                       UNIQUE btree (id)
tac_logical_unit_canonical_address_key      UNIQUE btree (canonical_address)
idx_tac_lu_doc_code                         btree (doc_code)
idx_tac_lu_identity_profile_gin             GIN (identity_profile)
idx_tac_lu_lifecycle                        btree (lifecycle_status)
idx_tac_lu_parent (partial)                 btree (parent_id) WHERE parent_id IS NOT NULL
idx_tac_lu_section_type                     btree (section_type)

3.4 Constraints on public.tac_logical_unit

tac_logical_unit_pkey                       PRIMARY KEY (id)
tac_logical_unit_canonical_address_key      UNIQUE (canonical_address)            ← global uniqueness ALREADY enforced
tac_logical_unit_parent_id_fkey             FK (parent_id) → tac_logical_unit(id)
tac_logical_unit_section_type_fkey          FK (section_type) → tac_section_type_vocab(code)
tac_logical_unit_lifecycle_status_fkey      FK (lifecycle_status) → tac_lu_lifecycle_vocab(code)
tac_logical_unit_sort_order_check           CHECK (sort_order >= 0)

3.5 Row statistics

row_total:                                  86
canonical_address_non_null:                 86
canonical_address_null:                     0
distinct_canonical_address_values:          86   (100% unique; matches DB unique constraint)
lifecycle_distribution_sampled:             all 86 sampled rows show lifecycle_status='draft_only'

3.6 Sample values (86 rows in production — full inventory)

Pattern observed: D{doc_num}-DIEU{N}-{S{seg}|ROOT}[-P{N}][-{N}]

D38-DIEU28-ROOT, D38-DIEU28-S0, D38-DIEU28-S1, D38-DIEU28-S1-P1, D38-DIEU28-S1-P2,
D38-DIEU28-S2, D38-DIEU28-S2-P1..P5, D38-DIEU28-S3, D38-DIEU28-S3-P1..P5,
D38-DIEU28-S4..S11, D38-DIEU28-S8-P1, D38-DIEU28-S8-P2, D38-DIEU28-S10, D38-DIEU28-S11,

D38-DIEU32-ROOT, D38-DIEU32-S0..S9, D38-DIEU32-S2-P1..P4, D38-DIEU32-S3-P1..P5, D38-DIEU32-S4-P1..P3,

D38-DIEU35-ROOT, D38-DIEU35-S0..S15, D38-DIEU35-S4-P1, D38-DIEU35-S4-P1-1..P1-3,
D38-DIEU35-S4-P2..P4, D38-DIEU35-S6-P1..P7, D38-DIEU35-S8-P1..P5

Document scopes present: D38-DIEU28, D38-DIEU32, D38-DIEU35 only (Đ44 itself is NOT in tac_logical_unit yet).

3.7 Sister columns named canonical_address (system-wide)

public.tac_logical_unit         text NOT NULL    (86 rows; the row this whole reconciliation is about)
public.information_unit         text NOT NULL    (98 rows non-null)
public.unit_edit_draft          text NOT NULL    (13 rows non-null)
public.iu_notification_event    text NOT NULL    (column count not measured separately; NOT NULL)
public.event_outbox             text NOT NULL    (44,726 rows non-null)
public.event_pending            text NOT NULL    (0 rows non-null at inspection time)
public.birth_registry           text NULLABLE    (0 rows non-null at inspection time)
sandbox_tac.logical_unit        text             (sandbox schema)

canonical_address is a system-wide vocabulary, not a tac_logical_unit-only field. Any reconciliation impacts 7+ tables, of which event_outbox is the largest (44,726 rows).

3.8 FKs referencing tac_logical_unit

tac_logical_unit (parent_id → id)                    self-FK (hierarchy)
tac_change_set_member (logical_unit_id → id)         data
tac_publication_member (logical_unit_id → id)        data
tac_unit_version (logical_unit_id → id)              data

All cross-table references use tac_logical_unit.id (uuid PK), not canonical_address as a foreign-key target. The sister-table canonical_address columns are denormalized text mirrors, not FK references.

3.9 Companion fields proposed by v0.1 P0-1 design — production state

Proposed by P0-1 §4.1 Present in production today?
canonical_address YES (text, NOT NULL, UNIQUE)
canonical_address_format_version NO
authority (enacted/draft/runtime) NO column; lifecycle_status exists with FK to tac_lu_lifecycle_vocab(code) — observed value draft_only (different vocabulary)
birth_gate_class NO
address_collision_at NO
superseded_by_unit_id NO
supersedes_unit_id NO
canonical_address_alias table NO

Other columns NOT in the P0-1 design but present in production: doc_code, sort_order, section_code, owner, identity_profile (jsonb), tier. The identity_profile jsonb column (GIN-indexed) may already hold some of the address-related metadata the P0-1 design intended to break out into discrete columns; this is NOT verified in this discovery — leave as v0.2 design question §6.4.


§4 — Does the Existing Column Match v0.1 / P0-1 Intended Design?

match_verdict: NO — substantial mismatch on multiple axes

4.1 Format syntax — DIVERGENT

p0_1_design_syntax_for_law_artifacts:    "Đ{N}[ §{path}]"     example: "Đ44 §5.3.1"
production_actual_syntax:                "D{doc}-DIEU{N}-{S{n}|ROOT}[-P{n}][-{n}]"   example: "D38-DIEU28-S2-P1"

differences:
  - prefix: production uses "D{doc}-" outer prefix that scopes by source document (e.g., D38 = document 38); P0-1 design has no document-scope prefix
  - article marker: production uses "DIEU{N}" (ASCII spelling of "Điều"); P0-1 design uses Vietnamese diacritic "Đ{N}"
  - separator: production uses hyphens with token labels (S, P); P0-1 design uses "§" + dot-separated path
  - depth notation: production uses positional codes (S2-P1-3); P0-1 design uses dot-path under §
  - non-law artifact formats (design D6/D11, code symbols) per P0-1 §6: NO evidence in production data — only law-document addresses currently exist

4.2 Uniqueness scope — DIVERGENT

p0_1_design_recommendation:  per (authority, source_revision); application-layer v0.1; PG constraint FUTURE
production_actual:           GLOBAL uniqueness; PG constraint ALREADY in place (tac_logical_unit_canonical_address_key)

Production is stricter than the P0-1 design recommended. This is operationally safer but means:

  • the design's collision policy §8 (multiple rows per address with different authority) is INCOMPATIBLE with the current DB constraint
  • supersession via supersedes/superseded_by FK columns (P0-1 §4.1) requires CURRENT and PRIOR units to have different canonical_address values — cannot exist with same canonical_address simultaneously

4.3 Authority distinction — ABSENT

P0-1 §7 requires an authority text column with values {enacted, draft, runtime} (Đ24 Step 1 ratified vocabulary, cross-law with Đ0-G).

Production has no authority column. The closest existing field is lifecycle_status (FK to tac_lu_lifecycle_vocab(code)), with observed value draft_only. This is a different vocabulary serving a different concern (publication lifecycle vs birth-gate authority).

4.4 Supersession / alias model — ABSENT

P0-1 §4.1 requires superseded_by_unit_id + supersedes_unit_id FKs and §4.2 proposes a canonical_address_alias companion table for rename history. Neither exists in production.

The production model treats canonical_address as immutable and globally unique with no formal supersession or alias mechanism wired to the schema.

4.5 Format-version mechanism — ABSENT

P0-1 §4.1 introduces canonical_address_format_version for forward-compat migration of the format syntax. Production has no such column. The observed format (D38-DIEU28-S2-P1) is implicit v0; any P0-1 design must either adopt this v0 format as canonical or version-stamp existing rows during reconciliation.

4.6 Sister-table coupling — UNANTICIPATED BY P0-1

P0-1 §3 declared the design scope as augmenting tac_logical_unit only. In production, 7 OTHER tables already store canonical_address as duplicated text (98 + 13 + 0 + 44,726 + 0 + 0 + …). Any change to the canonical syntax, uniqueness rule, or format version would need a coordinated migration across all of them — chiefly event_outbox (44,726 rows).

This is not a backfill problem inside tac_logical_unit (86 rows). It is a system-wide vocabulary problem with 44,000+ rows of coupled data.


§5 — Risks

5.1 Risks if REUSED as-is (no changes to existing column)

R-reuse-1  format_mismatch_with_P0_design
  severity: HIGH
  description: P0-2 manifest_unit_block, P0-3 cut_change_set, P0-4 verify_result all cite canonical_address per row. If production format ("D38-DIEU28-S2-P1") is used unchanged, the entire P0 design's address-syntax assumption (Đ-prefix + §) must be revised. JSON-schemas, parser logic, citation discipline all anchor on the syntax.
  mitigation: revise P0-1/P0-2/P0-4/P0-6 design files to adopt the production syntax as canonical.

R-reuse-2  authority_distinction_unsupported
  severity: HIGH
  description: P0-design's birth-gate authority (enacted/draft/runtime) cannot be derived from the existing schema. Multiple verification paths in P0-4 depend on the authority being known.
  mitigation: add authority column in v0.2 (this is additive and safe); decide how to populate (default 'draft', backfill rule via lifecycle_status or identity_profile.jsonb introspection).

R-reuse-3  global_uniqueness_constraint_incompatible_with_collision_policy
  severity: standard
  description: P0-1 §7 allows multiple rows with same canonical_address differing by authority; the existing UNIQUE constraint forbids this. P0-1's collision policy §8 needs revision.
  mitigation: adopt global-uniqueness policy and drop the multi-authority-same-address proposal; or augment the unique constraint with authority (UNIQUE (canonical_address, authority)) — a constraint change.

R-reuse-4  supersession_model_unimplemented
  severity: standard
  description: superseded_by_unit_id / supersedes_unit_id / canonical_address_alias do not exist in production. P0-design uses them as the rename pathway.
  mitigation: defer to a separate v0.2 sub-item; alias storage decision (table vs JSONB) is P0-1 §9 item 4 — still open.

R-reuse-5  sister_table_drift
  severity: HIGH (because of 44,726 event_outbox rows)
  description: tac_logical_unit.canonical_address and event_outbox.canonical_address are coupled; if tac changes format, event_outbox rows referencing prior addresses become stale.
  mitigation: any format change must be planned with a coordinated cross-table migration; reuse-as-is avoids this risk entirely.

5.2 Risks if ALTERED (e.g., change type/constraint/format on existing column)

R-alter-1  reader_writer_impact
  severity: HIGH
  description: public.fn_event_unread, the existing INSERT trigger that tests `COALESCE(NEW.canonical_address, '') = ''`, and any other code reading/writing this column on tac_logical_unit, information_unit, event_outbox, etc. would all need verification under the new shape.
  mitigation: inventory all callers; v0.2 design must read the trigger and function source; do not alter without that inventory.

R-alter-2  large_table_rewrite
  severity: HIGH
  description: event_outbox has 44,726 rows; rewriting/migrating canonical_address across that table is a potentially long-running operation.
  mitigation: scope alteration to tac_logical_unit only; or stage migration with a backfill window.

R-alter-3  unique_constraint_change
  severity: standard
  description: changing the unique constraint scope (e.g., to (canonical_address, authority)) requires DROP/REBUILD of the unique index; readers see a brief window without the unique guarantee. Concurrent inserts during the window could create duplicates.
  mitigation: do constraint change inside a single transaction with explicit lock; or use CREATE UNIQUE INDEX CONCURRENTLY then swap, then DROP.

R-alter-4  application_code_in_flight
  severity: standard
  description: Nuxt, dot, agent-data application code reads canonical_address; deploying a schema change without a coordinated app-version handshake creates a window of incompatibility.
  mitigation: any altering reconciliation must couple a schema change to an app deploy/handshake — out of scope of v0.2 P0 work; belongs to v0.3+.

5.3 Risks if IGNORED (proceed with v0.2 without reconciling)

R-ignore-1  double_truth_emerges
  severity: HIGH
  description: if v0.2 adds new address-related columns (e.g., canonical_address_v2, address_authority) without addressing existing canonical_address, the system ends up with two address sources of truth — readers must choose, citations diverge, retrieval is non-deterministic.
  mitigation: do NOT add new address-named columns until reconciliation is decided.

R-ignore-2  manifest_envelope_unit_block_design_blocked
  severity: HIGH
  description: P0-2 (manifest) references canonical_address per row. Without a decision on which address shape v0.2 binds to, P0-2 cannot be designed correctly.
  mitigation: reconciliation must complete BEFORE P0-2 dry-run authoring.

R-ignore-3  verify_round_trip_ambiguous
  severity: HIGH
  description: P0-4 axis-1 round-trip ordering uses canonical_address; ambiguous address vocabulary breaks the verify guarantee.
  mitigation: reconciliation must complete BEFORE P0-4 design refresh.

R-ignore-4  governance_signal_misleading
  severity: standard
  description: decision_backlog entries, change-set payloads, and verify_result findings would cite an address shape that is not the canonical one used elsewhere in the system.
  mitigation: address shape must be a single decided form before any CUT/VERIFY work begins.

§6 — Findings Carried Forward

F-DISC-1  production format syntax (D{doc}-DIEU{N}-{S|ROOT}[-P{N}][-{N}]) differs fundamentally from P0-1 design syntax (Đ{N}[ §{path}]); v0.2 must reconcile syntax before P0-2/P0-4 design refresh.
F-DISC-2  canonical_address already enforced text NOT NULL UNIQUE at DB level — stricter than P0-1 design proposed; P0-1 collision policy §8 must be revised.
F-DISC-3  authority column does not exist; lifecycle_status (FK vocab) serves a different concern; v0.2 must decide authority introduction strategy.
F-DISC-4  canonical_address is system-wide vocabulary spanning 7 production tables including event_outbox (44,726 rows); reconciliation scope is far larger than P0-1 imagined.
F-DISC-5  no supersession / no alias / no format_version columns present; P0-1 design treated them as additive — they remain additive, but must be specified in coordination with the existing globally-unique column.
F-DISC-6  identity_profile jsonb column on tac_logical_unit (GIN-indexed) may already store some address-related metadata; UNVERIFIED in this discovery; v0.2 design must inspect.
F-DISC-7  all 86 rows currently lifecycle_status='draft_only'; no published units exist; reconciliation may proceed under low-traffic conditions but app reader/writer audit still required.
F-DISC-8  no FK uses canonical_address as the target; cross-table references use tac_logical_unit.id (uuid). Address-text fields on sister tables are denormalized mirrors.

§7 — What This Discovery Does NOT Do

no_DDL_written: TRUE
no_SQL_mutation: TRUE
no_ALTER_TABLE: TRUE
no_INSERT_UPDATE_DELETE: TRUE
no_migration: TRUE
no_change_to_tac_logical_unit: TRUE
no_change_to_cutter_governance: TRUE
no_deploy: TRUE
no_CUT_VERIFY: TRUE
no_v0_2_design_advanced: TRUE  (this file remains discovery-only; design happens in subsequent docs after option selection + GPT approval)
output_form: read_only_discovery_documentation

§8 — Cross-References

v0_1_P0_1_design_baseline:
  knowledge/dev/laws/dieu44-trien-khai/migration-design/dot-iu-cutter-v0.1-p0-1-canonical-address-migration-design-2026-05-15.md
v0_1_production_handoff:
  knowledge/dev/laws/dieu44-trien-khai/execution/dot-iu-cutter-v0.1-production-handoff-status-2026-05-15.md
v0_1_p0_schema_planning_package:
  knowledge/dev/laws/dieu44-trien-khai/planning/dot-iu-cutter-v0.1-p0-schema-planning-package-2026-05-15.md
v0_2_scope_backlog:
  knowledge/dev/laws/dieu44-trien-khai/planning/dot-iu-cutter-v0.2-scope-backlog-2026-05-15.md
v0_2_canonical_address_options:
  knowledge/dev/laws/dieu44-trien-khai/v0.2-planning/dot-iu-cutter-v0.2-canonical-address-reconciliation-options-2026-05-15.md
v0_2_canonical_address_report:
  knowledge/dev/laws/dieu44-trien-khai/v0.2-planning/dot-iu-cutter-v0.2-canonical-address-reconciliation-report-2026-05-15.md
production_pre_migration_schema_evidence (canonical_address pre-existing):
  /opt/incomex/backups/dieu44_exec_2026-05-15/directus_schema_pre_20260515T141429Z.sql  sha256 638307fd62d4b1aa087ce7f70f42112c4c6185a2e44d8144a1d859029515668a

End of discovery document.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.2-planning/dot-iu-cutter-v0.2-canonical-address-reconciliation-discovery-2026-05-15.md