dot-iu-cutter v0.1 — Operational Problem Statement Rev2 — C1A Integrated — 2026-05-14
dot-iu-cutter v0.1 — Operational Problem Statement Rev2 — C1A Integrated — 2026-05-14
Status
status=DRAFT_FOR_USER_APPROVAL
agent_dispatch_allowed=false
implementation_allowed=false
purpose=approve_problem_statement_before_design
This is not an Agent task. This document is the proposed operational problem statement for User approval before any Agent is asked to design.
0. Change from Rev1
Rev1 correctly identified Mark → Review → Cut, but did not fully integrate the existing canonical segmentation law.
Rev2 fixes that.
Controlling segmentation foundation:
knowledge/dev/laws/dieu38-trien-khai/C1A-segmentation-operating-model.md
C1A is OFFICIAL, User PASS / GPT PASS. It already answers the foundational question:
Agent cắt tài liệu thành miếng thông tin theo quy tắc gì?
Therefore, dot-iu-cutter must not invent a new segmentation law. It must operationalize C1A into an automated closed-loop process.
1. Đề bài quy trình ngắn gọn
Design an operational process so that when the user says:
Cắt luật A
or, later:
Cắt văn bản X
the system can automatically execute a closed-loop workflow:
Resolve source
→ check existing cut/history/collisions
→ MARK semantic cut manifest under C1A rules
→ REVIEW manifest under independent AI review
→ CUT deterministically from approved manifest
→ VERIFY by round-trip/no-loss/no-overlap/invariants
→ REPORT result and rollback keys
→ if later wrong granularity is found, correct by governed SPLIT/MERGE lifecycle
The core design problem is not “how to call fn_iu_create.” Phase 5C2 proved that execution is feasible. The core design problem is:
How does the system decide where to cut, prove the decision is safe, detect errors, and repair structure later without relying on hidden human judgment?
Default authority:
AI decides and reviews by default.
Human/User is final approver of this problem statement and later high-risk policies, but is not in the normal per-document cutting loop.
Human escalation is exceptional, not normal.
2. Canonical rules that must be inherited from C1A
The future design must explicitly inherit these C1A elements.
2.1 Three-question test — primary cut test
Every candidate unit must pass C1A §3.2:
- Title rõ? — Can the unit be named so another agent understands the main idea without opening it?
- Sửa riêng được? — Can this unit be edited without necessarily editing another unit?
- Không quá khó sửa? — Is it not so long/complex that review becomes impractical?
These three questions are the default semantic unit test. They must appear in the Mark and Review stages.
2.2 SR-1 → SR-7 — official segmentation rules
Design must carry forward:
- SR-1: section with clear title + independently editable → one logical unit.
- SR-2: no clear title → body of parent, not its own unit.
- SR-3: if editing A necessarily pulls B → merge/group as one unit.
- SR-4: too short + no authority → merge with parent/sibling.
- SR-5: cut by meaning, not mechanically.
- SR-6: title must describe meaning; no mechanical A/B/C names.
- SR-7: each unit has exactly one canonical parent in the structural tree.
2.3 OD-PILOT edge cases
Design must carry forward:
- OD-01 Code/config block: default body of parent; separate only if independently referenced, versioned, testable, or reusable.
- OD-02 Heading-only: valid unit if it has authority/governance role; otherwise structural/navigation node.
- OD-03 Mission/instruction block: keep atomic even if long unless it can be split into independent missions.
- OD-04 section_type: controlled vocabulary; no silent invention.
- OD-05 Hard-limit matrix/table: split by semantic dimension unless matrix must be read as a whole and has approved exception.
- OD-06 Field responsibility matrix: split by object family when independently editable.
2.4 NL1–NL4 and length management
Design must apply:
- NL1 Unit-Centric — unit is center.
- NL2 Semantic Unit Rule — cut when title clear + separately editable + not too hard to edit.
- NL3 Risk-tiered Authority — agent authority depends on risk tier and lifecycle.
- NL4 Length as Trigger — length warns/reviews; it does not mechanically cut.
Length rules:
normal <= 500 words default
soft-limit 500–1500 words default
hard-limit >1500 words default
These thresholds are defaults/review triggers, not blind rules. Publish/enact with hard-limit requires one of:
split
re-segment
length exception approved
2.5 C1A invariants CI-1 → CI-12
Design must not violate, especially:
- CI-3: publication does not contain inline content, only references unit_versions.
- CI-4: label
doc=Xis not publication membership. - CI-6: canonical address stable for same logical unit.
- CI-8: no mechanical A/B/C split; title must describe meaning.
- CI-9: no parallel label registry.
- CI-11: each unit has one canonical parent.
- CI-12: every new unit goes through birth gate.
2.6 Lessons from P10A/P10B/Phase5C2
The design must integrate these empirical lessons:
- P10A D35 v1 segmentation showed why a root must not duplicate full document body.
- P10A D35 v2 showed section_type diversity and semantic split of §4/§6 are required.
- P10B D32 proved round-trip 0 drift is a reliable validation gate.
- Phase5C2 proved bounded transaction + rollback keys +
fn_iu_createwriter path works. - DIEU-32 proved heading/container body policy must distinguish TAC representation from IU body requirements.
3. Questions the design must answer
Group A — Source and command
Q1. How does the user command work?
- Input: “Cắt luật A”.
- System resolves source by doc_code/name/path.
- If exactly one match: continue.
- If none/multiple: ask one clarification question.
Q2. What is the source of truth?
- KB markdown for new documents.
- TAC publication for existing TAC sources.
- Future source modes allowed only if they declare canonical source and source hash.
Q3. What if the source was already cut?
- Collision/history check is mandatory.
- If already cut, system must not blindly duplicate.
- It must classify: status-only, split/merge, supersede/re-cut, or block for review.
Group B — Marking / segmentation decision
Q4. What does MARK produce?
A manifest, not writes.
Minimum fields:
manifest_id
source_doc_ref
source_version_ref
source_hash
source_mode
unit_index
source_start_line/source_end_line or byte/char span
canonical_address_proposal
title
section_type
unit_kind
parent_manifest_id
hierarchy_depth
body_source_policy
semantic_role
cut_reason
C1A_rule_refs
three_question_test_result
confidence
review_required_flags
length_flag
edge_readiness_notes
split_merge_notes
Q5. How does AI decide a cut boundary?
Expected answer:
- First apply C1A three-question test.
- Then apply SR-1→SR-7.
- Use structure as evidence, not as blind rule.
- Semantic role changes can create cuts even without headings.
- Length triggers review, not mechanical split.
- Every decision must carry
cut_reasonandC1A_rule_refs.
Q6. How are section_type and unit_kind chosen?
Expected answer:
- Use controlled vocabulary.
- Do not invent silently.
- If no type fits, set
NEW_VOCAB_REQUIREDand escalate. - section_type must support later edge/professional linking.
Q7. How are heading/container and body policies handled?
Expected answer:
- Inherit C1A OD-02 and Phase5C2 policy.
- Heading with authority/governance role can be unit.
- Approved body policy:
SYNTHESIZE_TITLE iff section_type='heading' AND body IS NULL AND children>0
PRESERVE iff body IS NOT NULL
BLOCK iff body IS NULL AND not heading-container
Group C — Review / decision authority
Q8. Who reviews the manifest?
Expected answer:
- AI review by default.
- Reviewer must be role-separated from Marker, even if same agent performs second pass.
- Human not in normal loop.
Q9. What does REVIEW check?
Expected answer:
coverage/no-loss
no-overlap
C1A three-question test per unit
SR/OD rule compliance
semantic cohesion
actionability
section_type/vocab correctness
hierarchy and one-parent rule
length flags and exceptions
body policy
edge readiness
round-trip feasibility
Review output:
manifest_review_status=PASS|PATCHED_PASS|BLOCKED
human_escalation_required=true|false
Q10. When is human escalation required?
Expected answer:
Human escalation only for:
- source ambiguity that AI cannot resolve;
- new vocab/type required;
- suspected data loss/corruption;
- competing valid cuts with different legal/governance meaning;
- high/highest-risk finalization if law requires it;
- split/merge changes enacted canonical meaning.
Group D — Cut / verify / rollback
Q11. How does CUT execute?
Expected answer:
- CUT executes only from approved manifest.
- Use canonical writer only (
fn_iu_createfor IU path). - No direct IU/UV insert.
- Per-document/publication bounded transaction.
- Birth trigger must fire.
- Profile/provenance must include manifest and C1A decision evidence.
Q12. How does the system prove the cut is correct?
Expected answer:
- Mandatory round-trip verification.
- Reconstruct from created pieces and compare to canonical source or declared normalizer.
- No-loss/no-overlap must hold.
- For representation conversions, e.g. heading-title synthesis, provenance and V-3b' policy must prove equivalence.
Q13. What happens if verification fails?
Expected answer:
- Content/integrity error → rollback automatically using exact keys.
- Semantic granularity issue after successful round-trip → Split/Merge lifecycle, not rollback.
Q14. How does rollback work?
Expected answer:
- Exact-key rollback only.
- Rollback keys dual-written to KB + VPS log before COMMIT.
- Pattern deletion prohibited.
Group E — Split/Merge correction lifecycle
Q15. How does Split work?
Expected answer:
Mark split points
Review split manifest
Create new units/versions through canonical writer
Mark old unit superseded, not deleted
Create split_from/supersedes/superseded_by relations
Reassign/propose edge reassignment
Round-trip verify new units equal old unit content or accepted normalized representation
Report
Required metadata:
operation=split
source_unit_id
source_unit_version_id
new_unit_ids
split_reason
span_mapping
semantic_mapping
old_canonical_address
new_canonical_addresses
edge_reassignment_plan
rollback_plan
Q16. How does Merge work?
Expected answer:
Mark merge candidate units
Review merge decision
Create new merged unit
Mark old units superseded
Preserve aliases/redirects
Reassign/propose edge reassignment
Round-trip verify merged content equals ordered old content
Report
Group F — Simplicity and operability
Q17. Can this run in one operation?
Expected answer:
Normal case yes:
Resolve → Mark → Review → Cut → Verify → Report
Correction case:
Find issue → Mark structural change → Review → Apply Split/Merge → Verify → Report
Q18. How do agents remember the process?
Expected answer:
The detailed design must reduce to two state machines:
Cắt: Resolve → Mark → Review → Cut → Verify → Report
Sửa cấu trúc: Detect → Mark → Review → Split/Merge → Verify → Report
All detailed gates map to one of these states.
Q19. How does this support all document kinds?
Expected answer:
Same workflow, different unit_kind, section_type profile, render policy:
law → law_unit
design doc → design_doc_section
process → process_section
report → report_section
Q20. How are decisions persisted?
Expected answer:
Persist or make persistable:
source resolution
manifest
manifest review
execution report
round-trip result
rollback keys
split/merge operation
policy exceptions
v0.1 may use KB artifacts; design must state PG-native direction.
4. Expected approach for answering the questions
The design should answer the above questions with this approach:
4.1 Do not write a new segmentation law
Use C1A as canonical segmentation law.
dot-iu-cutter adds automation process and execution controls around C1A.
4.2 Separate decision from execution
- MARK decides.
- REVIEW validates/repairs/blocks.
- CUT executes.
Execution backend must not silently invent cut boundaries.
4.3 Use manifest as audit object
The manifest is the durable decision record.
Every unit must record:
- source span;
- title;
- type;
- parent;
- C1A rules used;
- three-question test result;
- cut reason;
- confidence;
- review flags.
4.4 Require round-trip verification
No successful cut without reconstruction and comparison.
This is the concrete answer to:
Cắt xong có ai kiểm tra không?
Yes: the system checks by reassembling and comparing. Human is not needed for normal cases.
4.5 Treat semantic mistakes as structural lifecycle, not silent edits
If a cut is content-wrong, rollback.
If a cut is semantically suboptimal, use Split/Merge with supersession/history and edge reassignment.
4.6 Keep human out by default, but define escalation clearly
AI should decide normal cuts.
Human should only approve the problem statement, policy changes, high-risk exceptions, and semantic/legal ambiguities that AI cannot resolve.
4.7 Keep the process simple for executors
The design must be rich internally, but executor-facing workflow must remain:
Resolve → Mark → Review → Cut → Verify → Report
If the design cannot be expressed this simply, it is not ready.
5. Acceptance criteria for User approval of the problem statement
This problem statement is ready for Agent design only if User accepts:
- C1A is canonical segmentation law.
- Mark → Review → Cut is the default operating model.
- AI decides cuts by default; human escalation is exceptional.
- Manifest is mandatory before execution.
- Independent AI review is mandatory before cut.
- Round-trip verification is mandatory after cut.
- Content/integrity failure rolls back automatically.
- Semantic-granularity failure is handled by Split/Merge lifecycle.
- Split/Merge preserves history and uses supersession, not silent overwrite.
- Phase5C2 body policy and patched V-3 semantics are carried forward.
- Process must remain reducible to simple state machines.
- No Agent design or implementation starts until User approves this problem statement.
6. Current decision needed
User should approve, amend, or reject this Rev2 problem statement.
No Agent work should be started until User approval.
Final flags
problem_statement_rev2_status=DRAFT_FOR_USER_APPROVAL
c1a_integrated=true
agent_design_allowed=false_until_user_approval
implementation_allowed=false
next_step=user_review_problem_statement_rev2