KB-3F6F

dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14

11 min read Revision 1
dot-iu-cuttermethodologymark-review-cutsplit-mergent14semantic-cutting2026-05-14

dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14

Purpose

This document corrects the design direction for dot-iu-cutter.

The core problem is not merely building a cutter that can call fn_iu_create. Phase 5C2 proved execution is feasible. The core problem is the semantic decision process:

Where should the document be cut?
Why exactly there?
Who/what decides?
How is the decision reviewed?
How does the system correct bad cuts later by split/merge?
How does an executor understand the rules without hidden tribal knowledge?

Therefore, dot-iu-cutter must be designed as a closed-loop decision workflow, not just a migration utility.

Constitutional basis

  • NT14 executor perspective: the design must be written so the implementing agent can execute it without guessing hidden intent.
  • NT13 PG First / PG Native / PG Driven: final state and manifests must be persisted/traceable in PG/IU-compatible structures, not only ephemeral prompts.
  • NT15 design before implementation: Mark/Review/Cut and Split/Merge lifecycle must be specified before implementation.
  • Zero Trust: if the cutter cannot prove no loss/no overlap/no invalid classification, it must stop or escalate.

Core principle

Cutting is a decision, not a mechanical operation.
Execution must be downstream of an explicit semantic manifest.

The cutter has three conceptual layers:

  1. MARK — semantic segmentation decision: AI marks intended information units.
  2. REVIEW — independent AI review/repair of the manifest.
  3. CUT — deterministic execution from the approved manifest.

The tool must separate these layers. A wrong mark must be fixable without changing the execution engine. A safer engine must be upgradeable without rewriting the marking principles.

Default authority model

Human is not in the normal loop.

AI Marker proposes manifest.
AI Reviewer validates/repairs manifest.
System cuts only after manifest PASS.
Human is notified, not asked, unless escalation criteria trigger.

Human/User escalation only for:

  • Constitution/law-level ambiguity that changes meaning.
  • New section_type/vocab needed.
  • Source appears internally contradictory or corrupted.
  • AI reviewer cannot produce a no-loss/no-overlap manifest.
  • Split/merge would change canonical meaning, not just structure.

MARK stage — semantic manifest

Input:

  • source document or TAC publication;
  • source path/doc_code/version;
  • source text with stable line numbers or source spans;
  • existing vocab and section_type registry;
  • prior IU/cut history if any.

Output is a cut manifest, not immediate writes.

Each proposed unit must include:

manifest_id
source_doc_ref
source_version_ref
unit_index
source_start_line/source_end_line OR source_span byte/char offsets
canonical_address_proposal
title
body_span_policy
section_type
unit_kind
parent_manifest_id
hierarchy_depth
body_source_policy
semantic_role
cut_reason
merge_or_split_notes
confidence
review_required_flags

Minimum cut_reason examples:

  • heading_boundary
  • semantic_role_change
  • independent_principle
  • independent_process_step
  • checklist_block
  • technical_spec_block
  • code_block_as_artifact
  • table_as_independent_spec
  • container_heading
  • too_large_split
  • cross_reference_boundary

Semantic cutting rules

Rule 1 — Natural structural boundaries

Markdown headings (#, ##, ###, etc.) are natural cut candidates, but not blindly final.

  • A heading with substantive body may become one unit.
  • A heading with children and no body becomes a container heading unit.
  • A heading plus all children must not be flattened into one unit if children carry different semantic roles.

Rule 2 — Semantic role changes are cut boundaries

Even without headings, cut when the role changes, e.g.:

  • principle → process;
  • process → checklist;
  • requirement → technical spec;
  • definition → example;
  • governance rule → metric;
  • narrative → code/config/table.

Rule 3 — One unit should be actionable

A good information unit is the smallest chunk that another workflow can address, edit, cite, validate, or link independently.

Bad cuts:

  • Too large: contains multiple independently actionable decisions.
  • Too small: fragment cannot be interpreted without adjacent text unless it is a valid heading/container.
  • Mixed role: one unit mixes requirement + process + code + checklist in a way that blocks typed edges.

Rule 4 — Tables/checklists/code blocks

  • A table is its own unit if it defines schema, mapping, matrix, checklist, or decision evidence.
  • A code block is its own unit if it is executable/config/spec and can be linked to code/report.
  • A checklist is its own unit when items are evaluated together.
  • A small inline example can remain inside the parent if not independently actionable.

Rule 5 — Size thresholds are review triggers, not blind rules

Suggested defaults:

very_small_body_chars < 50 => review unless heading/container
large_body_chars > 5000 => review for split
very_large_body_chars > 9000 => must split or justify

The AI may keep a large unit if splitting would destroy meaning, but must write justification.

Rule 6 — Completeness and no-overlap

The manifest must cover the intended source exactly:

no missing source spans
no overlapping source spans
stable ordering
parent-child consistency
source hash recorded

Excluded boilerplate must be explicitly classified, not silently dropped.

Rule 7 — Body source policy

Use explicit body policy:

PRESERVE_BODY_FROM_SOURCE
SYNTHESIZE_TITLE_FOR_HEADING_NULL_BODY
CONTAINER_HEADING_NO_BODY_IN_SOURCE
EXCLUDED_BOILERPLATE
DERIVED_SUMMARY_NOT_ALLOWED_BY_DEFAULT

Derived summaries are not allowed as replacement body unless a separate design authorizes it.

Rule 8 — Vocab-first classification

Every unit must use an existing section_type/unit_kind unless the manifest flags NEW_VOCAB_REQUIRED.

AI must not invent new types silently.

Rule 9 — Edge readiness

Every cut should preserve enough provenance for later professional graph links:

law unit → requirement → process → code → report

If a chunk is likely to become an edge source/target, prefer an independently addressable unit.

REVIEW stage — independent manifest review

Input: source + manifest.

The reviewer must produce:

manifest_review_status=PASS|PATCHED_PASS|BLOCKED
coverage_pass=true|false
no_overlap_pass=true|false
semantic_cohesion_pass=true|false
section_type_pass=true|false
hierarchy_pass=true|false
size_policy_pass=true|false
edge_readiness_pass=true|false
split_merge_recommendations=<list>
human_escalation_required=true|false

Review checks:

  1. Coverage: every intended line/span accounted for.
  2. No overlap: no source span double-cut unless deliberately referenced, never duplicated as body.
  3. Semantic cohesion: each unit has one dominant role.
  4. Actionability: unit can be cited/edited/linked independently.
  5. Hierarchy: parent/child structure matches meaning.
  6. Vocab: no invented section_type/unit_kind.
  7. Size: large/small units justified.
  8. Body policy: null/empty/title/body rules explicit.
  9. Edge readiness: enough metadata to connect later to requirements/process/code/report.

Reviewer may patch the manifest if the correction is mechanical and confidence is high. If not, it blocks or escalates.

CUT stage — deterministic execution

The cut stage reads only an approved manifest.

The cut stage must not decide semantic boundaries except for hard stop safety. It executes:

  • preflight gates;
  • fn_iu_create canonical writer;
  • profile/provenance patches;
  • birth trigger verification;
  • rollback key dual-write;
  • V-1..V-10 validation;
  • report.

If manifest and live source differ, stop and require re-mark.

Split/Merge lifecycle

Cuts are not permanent. The system must support correction.

Split one unit into many

Use when:

  • unit contains multiple semantic roles;
  • unit too large;
  • later edge/linking needs sub-addressable parts;
  • user/AI discovers edit workflows are too coarse.

Required metadata:

operation=split
source_unit_id
new_unit_ids
source_version_id
split_reason
span_mapping
semantic_mapping
old_canonical_address
new_canonical_addresses
supersedes_relation
audit_actor
audit_time
rollback_plan

Split must preserve old unit history. Do not silently delete history.

Merge many units into one

Use when:

  • units are too small/non-actionable;
  • meaning depends on adjacency;
  • previous split created artificial fragmentation;
  • review finds edge/linking noise.

Required metadata:

operation=merge
source_unit_ids
new_unit_id
merge_reason
canonical_address_policy
superseded_by_relation
history_preserved=true
rollback_plan

Canonical address policy for split/merge

  • Never silently reuse an address for changed semantics.
  • Prefer new child suffixes for split.
  • Preserve redirects/aliases in identity_profile or edge registry.
  • Mark old units as superseded/deprecated, not erased, unless rollback before production commit.

Versioning

Split/merge creates new unit/version records or governed lifecycle changes. It must be auditable as a structural edit, not a content edit.

Human escalation policy

Default: AI decides.

Escalate to user only if:

  • semantic ambiguity affects legal/governance meaning;
  • new vocab/type is needed;
  • competing valid cuts have materially different downstream edges;
  • source appears corrupted/data-loss;
  • split/merge changes enacted/canonical law semantics.

Design implication for current Agent prompt

The existing agent-dot-iu-cutter-v0.1-design-prompt-2026-05-14.md is insufficient if interpreted as only a transaction/tool design. It must be patched so the primary deliverable is:

Cut Decision Workflow Design

with the tool/transaction layer as execution backend.

Ask Agent to revise the cutter design prompt or directly draft the design with these priorities:

  1. Mark → Review → Cut workflow.
  2. Semantic manifest schema.
  3. AI reviewer and escalation rules.
  4. Split/Merge lifecycle and metadata.
  5. Cut principles written from executor perspective.
  6. Then only after that: command/tool implementation surface.

Final flags

methodology_status=APPROVED_FOR_DESIGN_INPUT
cutter_is_decision_workflow_not_only_tool=true
human_default_in_loop=false
ai_mark_review_default=true
split_merge_required_in_design=true
nt14_executor_perspective_required=true
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/dot-iu-cutter-methodology-mark-review-cut-and-split-merge-principles-2026-05-14.md