KB-4251
dot-iu-cutter v0.1 — HB-04 X-7 Canonicalization Rule v0.1 Prose Ratification Closure
12 min read Revision 1
dot-iu-cutterblocker-closurehb-04x-7canonicalization-rulev0.1dieu24dieu44no-codeno-executionno-ddlrev5d
dot-iu-cutter v0.1 — HB-04 X-7 Canonicalization Rule v0.1 Prose Ratification Closure
Date: 2026-05-15 Status: HB-04 CLOSURE RECORD —
closed_with_notesTrigger: GPT review of HB-06 returnedPASS(2026-05-15). User has explicitly authorized batch closure of HB-01, HB-02, HB-03, HB-04. Scope: CLOSURE RECORD ONLY. No code, no canonicalization library implementation, no DDL, no SQL, no migration, no PG mutation, no Directus mutation, no Qdrant/vector mutation, no execution.
1. Scope
HB-04 ratifies the canonicalization rule v0.1 prose per X-7. The v0.1 placeholder bound at X-A (NFC + LF + trim) is now elevated to ratified prose covering the markdown source_kind v0.1 default plus the binding that canonicalization_rule_used must be recorded on every verify_result row.
hb_04_scope:
in_scope:
- ratify canonicalization rule v0.1 prose at the level needed to unblock CTE-03 scaffolding
- bind canonicalization_rule_used field requirement on verify_result
- record Đ24 + Đ44 sign-off attribution
- record downstream effect on CTE-03
not_in_scope:
- implement the canonicalization library (CTE-03; separate engineering session)
- issue per-source_kind extensions for code (ast_node) or binary (byte) — FUTURE via D4 capability intake (PEF-05)
- create or alter any table
- write any code
2. Source References
reviews/dot-iu-cutter-v0.1-hb-06-operational-seats-closure-gpt-review-2026-05-15.md(PASS — authorizes batch closure)blocker-closure/dot-iu-cutter-v0.1-hb-06-operational-seats-closure-2026-05-15.mdratification/dot-iu-cutter-v0.1-x-a-source-span-drift-unit-ratification-2026-05-15.md§3.3 + §3.4 (v0.1 placeholder accepted)implementation-planning/dot-iu-cutter-v0.1-p0-canonicalization-rule-v0.1-planning-note-2026-05-15.md§4 (prose plan)migration-design/dot-iu-cutter-v0.1-p0-4-verify-result-migration-design-2026-05-15.md§6 + §14 (mid-cycle change risk + canonicalization_rule_used requirement)migration-design/dot-iu-cutter-v0.1-p0-2-manifest-envelope-unit-block-migration-design-2026-05-15.md§9 item 4 (source_span unit alignment)risk-review/dot-iu-cutter-v0.1-p0-cross-cutting-decision-register-2026-05-15.md§3.7 (X-7 options + recommendation)blocker-closure/dot-iu-cutter-v0.1-p0-workstream-b-vocabulary-schema-canonicalization-2026-05-15.md§5 (HB-04 acceptance criteria)
3. Decision Recorded
decision_id: HB-04
cross_cutting_decision_resolved: X-7
selected_option: dieu24_ratified_prose_for_markdown_v0_1_with_byte_to_token_conversion
canonicalization_rule_v0_1_prose_ratified:
identifier_proposal: canon-md-v0.1.0
identifier_status: working identifier accepted; Đ24 retains authority to set final identifier at execution-phase DDL authoring without re-opening HB-04
scope: markdown source_kind v0.1 default
rule_steps (in order; idempotent):
1: read source bytes as UTF-8
2: strip UTF-8 BOM (EF BB BF) at file start if present; byte offset 0 of post-BOM bytes used for source_span calculations
3: apply NFC unicode normalization
4: normalize line endings — any CR or CRLF sequence → LF
5: trim trailing whitespace per line (whitespace = space U+0020 + tab U+0009; not other unicode whitespace)
6: enforce exactly one LF at file end (recommendation v0.1; Đ24 ratifies)
7: tokenize into canonical_tokens (token boundary rule below)
canonical_token_boundary_definition_v0_1:
basis: per-line tokenization with intra-line tokens split on whitespace
intra_line_tokens: maximal runs of non-whitespace UTF-8 code points
whitespace: space (U+0020) or tab (U+0009)
line_boundary: LF acts as token separator; not itself a token
consecutive_blank_lines: preserved as-is (markdown semantic content; not collapsed)
canonical_token_identity: the token's UTF-8 byte content after NFC normalization
canonical_token_position_form: (line_index, intra_line_token_index) tuple — bound v0.1; flat-sequence-index alternative is NOT chosen v0.1
byte_offset_to_canonical_token_position_mapping_algorithm:
1: read post-BOM bytes
2: walk bytes from offset 0, applying rule steps 3-7 progressively to maintain byte→codepoint→token correspondence
3: for each byte_span_start: locate the first canonical_token whose codepoints include or follow that byte
4: for each byte_span_end: locate the last canonical_token whose codepoints precede or include that byte
5: emit (start_token_position, end_token_position) per byte_span
determinism: required
performance_class: O(n) over document size acceptable v0.1
axis_1_drift_unit_binding: canonical_token (per X-A; reaffirmed here)
drift_threshold_default: 0 (any drift = FAIL; non-zero allowance requires explicit Đ32 policy)
per_source_kind_extension_policy:
markdown_v0_1: this prose applies
non_markdown_source_kind_v0_1: axis_1_status='not_applicable' (out of scope v0.1)
code_source_kind: FUTURE ast_node rule via D4 capability intake (PEF-05) + Đ24 ratification
binary_source_kind: FUTURE byte rule via D4 capability intake (PEF-05) + Đ24 ratification
mid_cycle_rule_change_handling:
prohibited: rule changes require D4 capability intake + Đ24 ratification of a NEW rule version (new identifier)
legacy_verify_results: retain their canonicalization_rule_used value; immutable
canonicalization_rule_used_field_binding:
table: verify_result (P0-4)
type_class: text (SemVer-style identifier)
nullability: NOT NULL
immutability: immutable after row insert
default_v0_1: canon-md-v0.1.0 (or final Đ24 identifier)
audit_use:
- allows reproduction of historical drift calculations
- prevents ghost drift on rule version change
- supports rule-version impact analysis
requirement: MUST be populated on every verify_result row
4. Authority / Sign-Off
authorities_signing:
primary_signers:
- Đ24 (vocabulary owner) — ratifies canonicalization rule v0.1 prose
- Đ44 (family registry custodian) — accepts cross-family alignment: verify_result (verify_family) references the rule identifier; manifest_unit_block (manifest_family) source_span uses byte offsets that the rule maps via §3.3 algorithm
secondary_signers:
- GPT (policy reviewer; PASS upstream on cross-cutting register and X-7 recommendation; PASS on canonicalization planning note)
- User / anh Huyên (sovereign authority)
- Opus / Agent (record-keeping side)
what_each_authority_accepts:
Đ24:
- full prose for canonicalization rule v0.1 (markdown scope, step ordering, BOM/line-ending/trim/exactly-one-LF policies, canonical_token boundary, byte→token mapping algorithm, per-source_kind extension policy, mid-cycle change handling)
- canonicalization_rule_used field binding on verify_result
- identifier working name canon-md-v0.1.0 (final identifier at Đ24 ratification artefact if renamed)
Đ44:
- cross-family alignment between verify_result and manifest_unit_block via byte-span → canonical-token-position conversion
- reaffirms X-A binding
GPT:
- cross-cutting register §3.7 recommendation matches the closure
User / anh Huyên:
- sovereign acceptance per the explicit prompt
5. Acceptance Criteria
acceptance_criteria_for_hb_04:
v0_1_prose_ratified:
status: RATIFIED (scope, steps, token boundary, mapping algorithm, extension policy, mid-cycle handling)
canonicalization_rule_used_field_binding_recorded:
status: BOUND (NOT NULL + immutable on verify_result)
identifier_recorded:
status: ASSIGNED (working: canon-md-v0.1.0; Đ24 may finalize at ratification artefact)
signing_attribution_recorded:
status: ATTRIBUTED (Đ24 + Đ44 primary; GPT + User + Opus/Agent secondary)
no_canonicalization_library_implemented:
status: confirmed (CTE-03 remains OPEN; now ready_to_close)
no_code_written:
status: confirmed
no_DDL:
status: confirmed
hb_04_acceptance_state: ALL SEVEN criteria satisfied; closure_with_notes
6. Downstream Effects
downstream_effects_of_hb_04_closure:
CTE_03_canonicalization_library_scaffolding:
status_before: blocked (waited on HB-04)
status_after: ready_to_close (HB-04 ratified)
next_action: open engineering session to scaffold the application-layer canonicalization library implementing the v0.1 prose; G-3 oversight
note: CTE-03 is NOT closed by this closure
HB_05_rollback_test_plan_dry_run:
status_before: blocked
status_after: still blocked (terminal node; many upstream remain)
status_change: none — HB-05 cannot close until CTE-03 + others all close
note: HB-04 contributes the parallel chain HB-04 → CTE-03 → HB-05; further closures of CTE-03 + CTE-02 + CTE-04 + HB-07 + HB-09 are required before HB-05
Step_6_DDL (P0-4 verify_result execution):
note: first DDL of Step 6 requires canonicalization_rule_used field referencing a Đ24-ratified identifier; HB-04 satisfies the identifier-existence requirement
status_change: pre-execution gate intact
HB_01_HB_02_HB_03_HB_06_HB_07_HB_08_HB_09_CTE_01_CTE_02_CTE_04:
status_change: none (independent of HB-04)
what_HB_04_does_NOT_do:
- implement the canonicalization library (CTE-03; engineering work; separate session)
- run any conversion
- emit any canonical_token stream
- bind per-source_kind extensions for code or binary (FUTURE; PEF-05)
- alter any table (verify_result.canonicalization_rule_used field DDL is execution-phase task)
- issue final Đ24 ratification artefact under ratification/ (Đ24 may produce a separate ratification file referencing this closure; closure is sufficient for HB-04 acceptance per user prompt)
7. Status
HB_04_status: closed_with_notes
HB_04_closure_authority: Đ24 + Đ44 (per cross-cutting register §3.7 + user prompt 2026-05-15)
HB_04_closure_signers:
- Đ24 vocabulary owner (primary)
- Đ44 family registry custodian (primary)
- GPT (policy reviewer)
- User / anh Huyên (sovereign authority)
- Opus / Agent (record-keeping)
execution_authorized: false
implementation_allowed: false
ddl_allowed: false
migration_allowed: false
canonicalization_library_scaffolded: false (CTE-03 remains OPEN; now ready_to_close)
canonicalization_library_executed: false
notes_carried_forward:
- identifier working: canon-md-v0.1.0; Đ24 may finalize at a separate ratification artefact under ratification/ if renamed
- per-source_kind extensions for code (ast_node) and binary (byte) FUTURE via D4 capability intake (PEF-05)
- canonical_token_position_form chosen v0.1: (line_index, intra_line_token_index) tuple
- mid-cycle rule change requires a new rule version with new identifier; legacy verify_result rows remain immutable
- CTE-03 scaffolding is engineering work; G-3 oversight; HB-04 does not implement
8. Hard Boundaries Confirmation
no_canonicalization_library_implemented: true (CTE-03 remains OPEN)
no_canonical_token_stream_emitted: true
no_conversion_run: true
no_code_written: true
no_ddl_written: true
no_sql_written: true
no_table_created: true
no_table_altered: true (verify_result.canonicalization_rule_used field DDL is execution-phase task)
no_migration_script_written: true
no_migration_executed: true
no_pg_mutation: true
no_qdrant_mutation: true
no_directus_mutation: true
no_data_writes: true
no_per_source_kind_extension_for_code_or_binary_in_this_file: true (FUTURE; PEF-05)
no_execution: true
no_phase_prior_file_modified: true
output_form: hb_04_closure_record_in_markdown_only