KB-3F6F

dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14

11 min read Revision 1

dot-iu-cuttermethodologymark-review-cutsplit-mergent14semantic-cutting2026-05-14

dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14

Purpose

This document corrects the design direction for dot-iu-cutter.

The core problem is not merely building a cutter that can call fn_iu_create. Phase 5C2 proved execution is feasible. The core problem is the semantic decision process:

Where should the document be cut?
Why exactly there?
Who/what decides?
How is the decision reviewed?
How does the system correct bad cuts later by split/merge?
How does an executor understand the rules without hidden tribal knowledge?

Therefore, dot-iu-cutter must be designed as a closed-loop decision workflow, not just a migration utility.

Constitutional basis

NT14 executor perspective: the design must be written so the implementing agent can execute it without guessing hidden intent.
NT13 PG First / PG Native / PG Driven: final state and manifests must be persisted/traceable in PG/IU-compatible structures, not only ephemeral prompts.
NT15 design before implementation: Mark/Review/Cut and Split/Merge lifecycle must be specified before implementation.
Zero Trust: if the cutter cannot prove no loss/no overlap/no invalid classification, it must stop or escalate.

Core principle

Cutting is a decision, not a mechanical operation.
Execution must be downstream of an explicit semantic manifest.

The cutter has three conceptual layers:

MARK — semantic segmentation decision: AI marks intended information units.
REVIEW — independent AI review/repair of the manifest.
CUT — deterministic execution from the approved manifest.

The tool must separate these layers. A wrong mark must be fixable without changing the execution engine. A safer engine must be upgradeable without rewriting the marking principles.

Default authority model

Human is not in the normal loop.

AI Marker proposes manifest.
AI Reviewer validates/repairs manifest.
System cuts only after manifest PASS.
Human is notified, not asked, unless escalation criteria trigger.

Human/User escalation only for:

Constitution/law-level ambiguity that changes meaning.
New section_type/vocab needed.
Source appears internally contradictory or corrupted.
AI reviewer cannot produce a no-loss/no-overlap manifest.
Split/merge would change canonical meaning, not just structure.

MARK stage — semantic manifest

Input:

source document or TAC publication;
source path/doc_code/version;
source text with stable line numbers or source spans;
existing vocab and section_type registry;
prior IU/cut history if any.

Output is a cut manifest, not immediate writes.

Each proposed unit must include:

manifest_id
source_doc_ref
source_version_ref
unit_index
source_start_line/source_end_line OR source_span byte/char offsets
canonical_address_proposal
title
body_span_policy
section_type
unit_kind
parent_manifest_id
hierarchy_depth
body_source_policy
semantic_role
cut_reason
merge_or_split_notes
confidence
review_required_flags

Minimum cut_reason examples:

heading_boundary
semantic_role_change
independent_principle
independent_process_step
checklist_block
technical_spec_block
code_block_as_artifact
table_as_independent_spec
container_heading
too_large_split
cross_reference_boundary

Semantic cutting rules

Rule 1 — Natural structural boundaries

Markdown headings (#, ##, ###, etc.) are natural cut candidates, but not blindly final.

A heading with substantive body may become one unit.
A heading with children and no body becomes a container heading unit.
A heading plus all children must not be flattened into one unit if children carry different semantic roles.

Rule 2 — Semantic role changes are cut boundaries

Even without headings, cut when the role changes, e.g.:

principle → process;
process → checklist;
requirement → technical spec;
definition → example;
governance rule → metric;
narrative → code/config/table.

Rule 3 — One unit should be actionable

A good information unit is the smallest chunk that another workflow can address, edit, cite, validate, or link independently.

Bad cuts:

Too large: contains multiple independently actionable decisions.
Too small: fragment cannot be interpreted without adjacent text unless it is a valid heading/container.
Mixed role: one unit mixes requirement + process + code + checklist in a way that blocks typed edges.

Rule 4 — Tables/checklists/code blocks

A table is its own unit if it defines schema, mapping, matrix, checklist, or decision evidence.
A code block is its own unit if it is executable/config/spec and can be linked to code/report.
A checklist is its own unit when items are evaluated together.
A small inline example can remain inside the parent if not independently actionable.

Suggested defaults:

very_small_body_chars < 50 => review unless heading/container
large_body_chars > 5000 => review for split
very_large_body_chars > 9000 => must split or justify

The AI may keep a large unit if splitting would destroy meaning, but must write justification.

Rule 6 — Completeness and no-overlap

The manifest must cover the intended source exactly:

no missing source spans
no overlapping source spans
stable ordering
parent-child consistency
source hash recorded

Excluded boilerplate must be explicitly classified, not silently dropped.

Rule 7 — Body source policy

Use explicit body policy:

PRESERVE_BODY_FROM_SOURCE
SYNTHESIZE_TITLE_FOR_HEADING_NULL_BODY
CONTAINER_HEADING_NO_BODY_IN_SOURCE
EXCLUDED_BOILERPLATE
DERIVED_SUMMARY_NOT_ALLOWED_BY_DEFAULT

Derived summaries are not allowed as replacement body unless a separate design authorizes it.

Rule 8 — Vocab-first classification

Every unit must use an existing section_type/unit_kind unless the manifest flags NEW_VOCAB_REQUIRED.

AI must not invent new types silently.

Rule 9 — Edge readiness

Every cut should preserve enough provenance for later professional graph links:

law unit → requirement → process → code → report

If a chunk is likely to become an edge source/target, prefer an independently addressable unit.

REVIEW stage — independent manifest review

Input: source + manifest.

The reviewer must produce:

manifest_review_status=PASS|PATCHED_PASS|BLOCKED
coverage_pass=true|false
no_overlap_pass=true|false
semantic_cohesion_pass=true|false
section_type_pass=true|false
hierarchy_pass=true|false
size_policy_pass=true|false
edge_readiness_pass=true|false
split_merge_recommendations=<list>
human_escalation_required=true|false

Review checks:

Coverage: every intended line/span accounted for.
No overlap: no source span double-cut unless deliberately referenced, never duplicated as body.
Semantic cohesion: each unit has one dominant role.
Actionability: unit can be cited/edited/linked independently.
Hierarchy: parent/child structure matches meaning.
Vocab: no invented section_type/unit_kind.
Size: large/small units justified.
Body policy: null/empty/title/body rules explicit.
Edge readiness: enough metadata to connect later to requirements/process/code/report.

Reviewer may patch the manifest if the correction is mechanical and confidence is high. If not, it blocks or escalates.

CUT stage — deterministic execution

The cut stage reads only an approved manifest.

The cut stage must not decide semantic boundaries except for hard stop safety. It executes:

preflight gates;
fn_iu_create canonical writer;
profile/provenance patches;
birth trigger verification;
rollback key dual-write;
V-1..V-10 validation;
report.

If manifest and live source differ, stop and require re-mark.

Split/Merge lifecycle

Cuts are not permanent. The system must support correction.

Split one unit into many

Use when:

unit contains multiple semantic roles;
unit too large;
later edge/linking needs sub-addressable parts;
user/AI discovers edit workflows are too coarse.

Required metadata:

operation=split
source_unit_id
new_unit_ids
source_version_id
split_reason
span_mapping
semantic_mapping
old_canonical_address
new_canonical_addresses
supersedes_relation
audit_actor
audit_time
rollback_plan

Split must preserve old unit history. Do not silently delete history.

Merge many units into one

Use when:

units are too small/non-actionable;
meaning depends on adjacency;
previous split created artificial fragmentation;
review finds edge/linking noise.

Required metadata:

operation=merge
source_unit_ids
new_unit_id
merge_reason
canonical_address_policy
superseded_by_relation
history_preserved=true
rollback_plan

Canonical address policy for split/merge

Never silently reuse an address for changed semantics.
Prefer new child suffixes for split.
Preserve redirects/aliases in identity_profile or edge registry.
Mark old units as superseded/deprecated, not erased, unless rollback before production commit.

Versioning

Split/merge creates new unit/version records or governed lifecycle changes. It must be auditable as a structural edit, not a content edit.

Human escalation policy

Default: AI decides.

Escalate to user only if:

semantic ambiguity affects legal/governance meaning;
new vocab/type is needed;
competing valid cuts have materially different downstream edges;
source appears corrupted/data-loss;
split/merge changes enacted/canonical law semantics.

Design implication for current Agent prompt

The existing agent-dot-iu-cutter-v0.1-design-prompt-2026-05-14.md is insufficient if interpreted as only a transaction/tool design. It must be patched so the primary deliverable is:

Cut Decision Workflow Design

with the tool/transaction layer as execution backend.

Recommended next Agent task

Ask Agent to revise the cutter design prompt or directly draft the design with these priorities:

Mark → Review → Cut workflow.
Semantic manifest schema.
AI reviewer and escalation rules.
Split/Merge lifecycle and metadata.
Cut principles written from executor perspective.
Then only after that: command/tool implementation surface.

Final flags

methodology_status=APPROVED_FOR_DESIGN_INPUT
cutter_is_decision_workflow_not_only_tool=true
human_default_in_loop=false
ai_mark_review_default=true
split_merge_required_in_design=true
nt14_executor_perspective_required=true

dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14

Purpose

Constitutional basis

Core principle

Default authority model

MARK stage — semantic manifest

Semantic cutting rules

Rule 1 — Natural structural boundaries

Rule 2 — Semantic role changes are cut boundaries

Rule 3 — One unit should be actionable

Rule 4 — Tables/checklists/code blocks

Rule 5 — Size thresholds are review triggers, not blind rules

Rule 6 — Completeness and no-overlap

Rule 7 — Body source policy

Rule 8 — Vocab-first classification

Rule 9 — Edge readiness

REVIEW stage — independent manifest review

CUT stage — deterministic execution

Split/Merge lifecycle

Split one unit into many

Merge many units into one

Canonical address policy for split/merge

Versioning

Human escalation policy

Design implication for current Agent prompt

Recommended next Agent task

Final flags