KB-1E21

dot-iu-cutter v0.2 — BR-2 identity_profile JSONB Discovery (2026-05-15)

9 min read Revision 1
dieu44-trien-khaidot-iu-cutterv0.2br-2identity-profilejsonbdiscoveryread-only2026-05-15

dot-iu-cutter v0.2 — BR-2 identity_profile JSONB Discovery

document_path: knowledge/dev/laws/dieu44-trien-khai/v0.2-planning/dot-iu-cutter-v0.2-br-2-identity-profile-jsonb-discovery-2026-05-15.md
revision: r1
date: 2026-05-15
author: Agent (Claude Code CLI, Opus 4.7 1M)
phase: v0.2 planning — BR-2 read-only discovery
mutation_performed: false

§1 — Purpose

Resolve BR-2 (reconciliation report §4): inspect public.tac_logical_unit.identity_profile (jsonb, GIN-indexed) to determine whether the v0.1 P0-1 design's companion vocabulary (authority, canonical_address_format_version, etc.) already lives inside the jsonb. The output is purely a finding: is there a risk of double-storage if Phase α adds explicit columns?


§2 — Read-Only Method

queries_executed (all SELECT-only):
  - DISTINCT jsonb_object_keys
  - frequency per key
  - value_type per key (jsonb_typeof)
  - ILIKE filter for authority-like / format-version-like / address-like keys
  - sample shapes (LIMIT 5 with jsonb_pretty)
no_mutation: TRUE
no_DDL: TRUE

§3 — Distinct Top-Level Keys

total_distinct_top_level_keys: 3
keys:
  body_sha256
  canonical_address
  source_span

3.1 Frequency per key

body_sha256        → 27 rows (out of 86)
canonical_address  → 27 rows
source_span        → 27 rows

Implication: only 27 of 86 rows have a non-trivial identity_profile. The remaining 59 rows have identity_profile = '{}' (column is NOT NULL so cannot be NULL; must be empty object). All three keys appear together — no row has 1 or 2 of the 3 keys, all-or-nothing.

3.2 Value type per key

body_sha256        → string   (27 rows)
canonical_address  → string   (27 rows)
source_span        → object   (27 rows; shape: {start_line:int, end_line:int})

§4 — Targeted Key Searches

4.1 Authority-like keys

ILIKE filters tested: '%authority%', '%enacted%', '%draft%', '%runtime%', '%birth%', '%gate%'
keys_found: NONE (0 rows)

4.2 Format-version-like keys

ILIKE filters tested: '%version%', '%format%', '%schema_v%'
keys_found: NONE (0 rows)

4.3 Address-like keys

ILIKE filters tested: '%address%', '%canonical%', '%path%', '%citation%', '%dieu%', '%section%'
keys_found:
  canonical_address  → 27 rows

Only canonical_address matches — and it duplicates the column value.


§5 — Sample Shapes (verbatim)

// row id 09e5a5a5-…  canonical_address column = "D38-DIEU28-ROOT"
{
    "body_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "source_span": {
        "end_line": 1,
        "start_line": 1
    },
    "canonical_address": "D38-DIEU28-ROOT"
}

// row id 2bab00a6-…  canonical_address column = "D38-DIEU28-S0"
{
    "body_sha256": "87c2850b9bd87854abd5bb9d57576f7fb01cb123fb9bc94888bfa76945e32704",
    "source_span": {
        "end_line": 8,
        "start_line": 2
    },
    "canonical_address": "D38-DIEU28-S0"
}

Observation: body_sha256 value e3b0c4…b7852b855 is the SHA-256 of the empty string (well-known constant). Several "ROOT" or section-header rows store this sentinel, indicating the row's body content is empty/heading-only.


§6 — Does Authority-Like Data Already Exist?

authority_already_stored_in_identity_profile: NO
authority_already_stored_elsewhere_on_tac_logical_unit: NO
nearest_existing_concept:
  - lifecycle_status (text, FK to tac_lu_lifecycle_vocab)
    - currently all 86 rows show value 'draft_only'
    - SEMANTICALLY DISTINCT from P0-1 authority enum {enacted, draft, runtime}
    - lifecycle_status governs publication lifecycle; authority governs Đ0-G birth-gate distinction
v0_2_phase_alpha_implication:
  adding an explicit `authority` column to tac_logical_unit is SAFE from double-storage risk
  (nothing to reconcile against existing jsonb; only thing to coordinate with is lifecycle_status — and they govern different concerns)

§7 — Does Format-Version-Like Data Already Exist?

format_version_already_stored_in_identity_profile: NO
format_version_already_stored_elsewhere_on_tac_logical_unit: NO
canonical_address_format_implied_by_data: D{doc}-DIEU{N}-{S|ROOT}[-P{n}][-{n}]  (consistent across all 86 production rows)
v0_2_phase_alpha_implication:
  adding an explicit `canonical_address_format_version` column to tac_logical_unit is SAFE from double-storage risk
  (no existing field carries this concept)

§8 — Does canonical_address-Like Data Already Exist Inside the JSONB?

canonical_address_already_stored_in_identity_profile: YES (duplicating the column value)
extent: 27 of 86 rows (the 27 rows whose identity_profile is non-empty)
match: 100% of those 27 cases — the jsonb value equals the column value verbatim
why_duplicated: unclear from inspection alone; likely an artifact of how those 27 rows were ingested (an importer that wrote both the column AND the jsonb mirror)
risk_of_double_storage: LOW — it is the SAME value, not a divergent value
followup_recommendation: cleanup is not blocking for v0.2; can be normalized in a separate cosmetic pass later (drop the jsonb key once readers are confirmed to use only the column)

§9 — Other Observations

body_sha256:
  - 27 rows hold a sha256 string of the row's body content
  - several rows hold the well-known empty-string sha256 (e3b0c4…)
  - this is unrelated to canonical_address but is useful birth-gate evidence; the Phase β supersession design may want to leverage body_sha256 for content-change detection

source_span:
  - 27 rows hold {start_line, end_line} int pairs pointing into a source document
  - unrelated to canonical_address itself but useful for P0-2 manifest_unit_block design (source_span_start / source_span_end fields are already part of the v0.1 P0-2 design)
  - v0.2 manifest_unit_block design CAN read from this source_span jsonb sub-object (if a per-row mirror is wanted) OR re-derive at MARK time

§10 — Recommendation for Phase α

recommendation_for_v0_2_phase_alpha:

  add_authority_column:
    safe: YES
    type: text
    nullable: YES initially (backfill required for 86 existing rows)
    constraint: CHECK or FK to a Đ24 vocabulary table (TBD; see BR-4)
    backfill_rule: derive from lifecycle_status or doc_code; design under BR-4

  add_canonical_address_format_version_column:
    safe: YES
    type: text (semver string)
    nullable: NO with DEFAULT 'd38-v0' (or whatever Đ24 ratifies under BR-5)
    backfill_for_existing_rows: trivial — UPDATE ALL with the chosen default constant

  add_canonical_address_alias_table:
    safe: YES
    placement: tac schema (per P0-1 §3 recommendation) OR cutter_governance (per X-1 placement decision; revisit)
    no_conflict_with_jsonb: confirmed — no alias-related data lives in identity_profile

  do_NOT_add_columns_that_would_duplicate_jsonb_data:
    body_sha256_as_column: NO — already covered by jsonb; if Phase β needs a column, defer that decision
    source_span_as_columns: NO at the tac_logical_unit level (already in jsonb); P0-2 manifest_unit_block has its own source_span_start/source_span_end fields — those are on manifest_unit_block, not tac_logical_unit

  cosmetic_cleanup_NOT_blocking:
    consider dropping `canonical_address` key from identity_profile in a separate cosmetic pass after confirming no reader depends on it; NOT part of Phase α

br_2_blocker_status: RESOLVED
followup_for_br_4: authority column type/vocabulary is TBD — depends on BR-4 (Đ0-G authority backfill rule design)
followup_for_br_5: canonical_address_format_version DEFAULT value depends on BR-5 (Đ24 ratification of production syntax as v1)

§11 — Hard Boundaries

no_DDL_written: TRUE
no_SQL_mutation: TRUE
no_ALTER_TABLE: TRUE
no_INSERT_UPDATE_DELETE: TRUE
no_migration: TRUE
no_design_authored: TRUE (only findings + recommendation; Phase α DDL design happens later under explicit prompt)
output_form: br_2_read_only_discovery

End of BR-2 discovery.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.2-planning/dot-iu-cutter-v0.2-br-2-identity-profile-jsonb-discovery-2026-05-15.md