dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14
dot-iu-cutter Methodology — Mark → Review → Cut + Split/Merge Lifecycle — 2026-05-14
Purpose
This document corrects the design direction for dot-iu-cutter.
The core problem is not merely building a cutter that can call fn_iu_create. Phase 5C2 proved execution is feasible. The core problem is the semantic decision process:
Where should the document be cut?
Why exactly there?
Who/what decides?
How is the decision reviewed?
How does the system correct bad cuts later by split/merge?
How does an executor understand the rules without hidden tribal knowledge?
Therefore, dot-iu-cutter must be designed as a closed-loop decision workflow, not just a migration utility.
Constitutional basis
- NT14 executor perspective: the design must be written so the implementing agent can execute it without guessing hidden intent.
- NT13 PG First / PG Native / PG Driven: final state and manifests must be persisted/traceable in PG/IU-compatible structures, not only ephemeral prompts.
- NT15 design before implementation: Mark/Review/Cut and Split/Merge lifecycle must be specified before implementation.
- Zero Trust: if the cutter cannot prove no loss/no overlap/no invalid classification, it must stop or escalate.
Core principle
Cutting is a decision, not a mechanical operation.
Execution must be downstream of an explicit semantic manifest.
The cutter has three conceptual layers:
- MARK — semantic segmentation decision: AI marks intended information units.
- REVIEW — independent AI review/repair of the manifest.
- CUT — deterministic execution from the approved manifest.
The tool must separate these layers. A wrong mark must be fixable without changing the execution engine. A safer engine must be upgradeable without rewriting the marking principles.
Default authority model
Human is not in the normal loop.
AI Marker proposes manifest.
AI Reviewer validates/repairs manifest.
System cuts only after manifest PASS.
Human is notified, not asked, unless escalation criteria trigger.
Human/User escalation only for:
- Constitution/law-level ambiguity that changes meaning.
- New section_type/vocab needed.
- Source appears internally contradictory or corrupted.
- AI reviewer cannot produce a no-loss/no-overlap manifest.
- Split/merge would change canonical meaning, not just structure.
MARK stage — semantic manifest
Input:
- source document or TAC publication;
- source path/doc_code/version;
- source text with stable line numbers or source spans;
- existing vocab and section_type registry;
- prior IU/cut history if any.
Output is a cut manifest, not immediate writes.
Each proposed unit must include:
manifest_id
source_doc_ref
source_version_ref
unit_index
source_start_line/source_end_line OR source_span byte/char offsets
canonical_address_proposal
title
body_span_policy
section_type
unit_kind
parent_manifest_id
hierarchy_depth
body_source_policy
semantic_role
cut_reason
merge_or_split_notes
confidence
review_required_flags
Minimum cut_reason examples:
heading_boundarysemantic_role_changeindependent_principleindependent_process_stepchecklist_blocktechnical_spec_blockcode_block_as_artifacttable_as_independent_speccontainer_headingtoo_large_splitcross_reference_boundary
Semantic cutting rules
Rule 1 — Natural structural boundaries
Markdown headings (#, ##, ###, etc.) are natural cut candidates, but not blindly final.
- A heading with substantive body may become one unit.
- A heading with children and no body becomes a container heading unit.
- A heading plus all children must not be flattened into one unit if children carry different semantic roles.
Rule 2 — Semantic role changes are cut boundaries
Even without headings, cut when the role changes, e.g.:
- principle → process;
- process → checklist;
- requirement → technical spec;
- definition → example;
- governance rule → metric;
- narrative → code/config/table.
Rule 3 — One unit should be actionable
A good information unit is the smallest chunk that another workflow can address, edit, cite, validate, or link independently.
Bad cuts:
- Too large: contains multiple independently actionable decisions.
- Too small: fragment cannot be interpreted without adjacent text unless it is a valid heading/container.
- Mixed role: one unit mixes requirement + process + code + checklist in a way that blocks typed edges.
Rule 4 — Tables/checklists/code blocks
- A table is its own unit if it defines schema, mapping, matrix, checklist, or decision evidence.
- A code block is its own unit if it is executable/config/spec and can be linked to code/report.
- A checklist is its own unit when items are evaluated together.
- A small inline example can remain inside the parent if not independently actionable.
Rule 5 — Size thresholds are review triggers, not blind rules
Suggested defaults:
very_small_body_chars < 50 => review unless heading/container
large_body_chars > 5000 => review for split
very_large_body_chars > 9000 => must split or justify
The AI may keep a large unit if splitting would destroy meaning, but must write justification.
Rule 6 — Completeness and no-overlap
The manifest must cover the intended source exactly:
no missing source spans
no overlapping source spans
stable ordering
parent-child consistency
source hash recorded
Excluded boilerplate must be explicitly classified, not silently dropped.
Rule 7 — Body source policy
Use explicit body policy:
PRESERVE_BODY_FROM_SOURCE
SYNTHESIZE_TITLE_FOR_HEADING_NULL_BODY
CONTAINER_HEADING_NO_BODY_IN_SOURCE
EXCLUDED_BOILERPLATE
DERIVED_SUMMARY_NOT_ALLOWED_BY_DEFAULT
Derived summaries are not allowed as replacement body unless a separate design authorizes it.
Rule 8 — Vocab-first classification
Every unit must use an existing section_type/unit_kind unless the manifest flags NEW_VOCAB_REQUIRED.
AI must not invent new types silently.
Rule 9 — Edge readiness
Every cut should preserve enough provenance for later professional graph links:
law unit → requirement → process → code → report
If a chunk is likely to become an edge source/target, prefer an independently addressable unit.
REVIEW stage — independent manifest review
Input: source + manifest.
The reviewer must produce:
manifest_review_status=PASS|PATCHED_PASS|BLOCKED
coverage_pass=true|false
no_overlap_pass=true|false
semantic_cohesion_pass=true|false
section_type_pass=true|false
hierarchy_pass=true|false
size_policy_pass=true|false
edge_readiness_pass=true|false
split_merge_recommendations=<list>
human_escalation_required=true|false
Review checks:
- Coverage: every intended line/span accounted for.
- No overlap: no source span double-cut unless deliberately referenced, never duplicated as body.
- Semantic cohesion: each unit has one dominant role.
- Actionability: unit can be cited/edited/linked independently.
- Hierarchy: parent/child structure matches meaning.
- Vocab: no invented section_type/unit_kind.
- Size: large/small units justified.
- Body policy: null/empty/title/body rules explicit.
- Edge readiness: enough metadata to connect later to requirements/process/code/report.
Reviewer may patch the manifest if the correction is mechanical and confidence is high. If not, it blocks or escalates.
CUT stage — deterministic execution
The cut stage reads only an approved manifest.
The cut stage must not decide semantic boundaries except for hard stop safety. It executes:
- preflight gates;
fn_iu_createcanonical writer;- profile/provenance patches;
- birth trigger verification;
- rollback key dual-write;
- V-1..V-10 validation;
- report.
If manifest and live source differ, stop and require re-mark.
Split/Merge lifecycle
Cuts are not permanent. The system must support correction.
Split one unit into many
Use when:
- unit contains multiple semantic roles;
- unit too large;
- later edge/linking needs sub-addressable parts;
- user/AI discovers edit workflows are too coarse.
Required metadata:
operation=split
source_unit_id
new_unit_ids
source_version_id
split_reason
span_mapping
semantic_mapping
old_canonical_address
new_canonical_addresses
supersedes_relation
audit_actor
audit_time
rollback_plan
Split must preserve old unit history. Do not silently delete history.
Merge many units into one
Use when:
- units are too small/non-actionable;
- meaning depends on adjacency;
- previous split created artificial fragmentation;
- review finds edge/linking noise.
Required metadata:
operation=merge
source_unit_ids
new_unit_id
merge_reason
canonical_address_policy
superseded_by_relation
history_preserved=true
rollback_plan
Canonical address policy for split/merge
- Never silently reuse an address for changed semantics.
- Prefer new child suffixes for split.
- Preserve redirects/aliases in identity_profile or edge registry.
- Mark old units as superseded/deprecated, not erased, unless rollback before production commit.
Versioning
Split/merge creates new unit/version records or governed lifecycle changes. It must be auditable as a structural edit, not a content edit.
Human escalation policy
Default: AI decides.
Escalate to user only if:
- semantic ambiguity affects legal/governance meaning;
- new vocab/type is needed;
- competing valid cuts have materially different downstream edges;
- source appears corrupted/data-loss;
- split/merge changes enacted/canonical law semantics.
Design implication for current Agent prompt
The existing agent-dot-iu-cutter-v0.1-design-prompt-2026-05-14.md is insufficient if interpreted as only a transaction/tool design. It must be patched so the primary deliverable is:
Cut Decision Workflow Design
with the tool/transaction layer as execution backend.
Recommended next Agent task
Ask Agent to revise the cutter design prompt or directly draft the design with these priorities:
- Mark → Review → Cut workflow.
- Semantic manifest schema.
- AI reviewer and escalation rules.
- Split/Merge lifecycle and metadata.
- Cut principles written from executor perspective.
- Then only after that: command/tool implementation surface.
Final flags
methodology_status=APPROVED_FOR_DESIGN_INPUT
cutter_is_decision_workflow_not_only_tool=true
human_default_in_loop=false
ai_mark_review_default=true
split_merge_required_in_design=true
nt14_executor_perspective_required=true