dot-iu-cutter v0.1 — Operational Problem Statement / Đề bài vận hành “Cắt luật A” — 2026-05-14
dot-iu-cutter v0.1 — Operational Problem Statement / Đề bài vận hành “Cắt luật A” — 2026-05-14
0. Purpose
This document defines the operational problem before asking Agent to design the cutter.
The goal is not merely to split a document into N pieces. The goal is a closed-loop operating process where the system can answer, by design:
What source is being cut?
Where are cut boundaries?
Why these boundaries?
Who/what decides?
Who/what reviews?
How does the system prove no content loss or overlap?
What if the cut is wrong?
How do split/merge corrections work?
When does human escalation happen?
How does this become simple enough for agents to execute repeatedly?
Target operator command:
Cắt luật A
Target system behavior:
Resolve → Mark → Review → Cut → Round-trip verify → Report → If needed Split/Merge lifecycle
Human is not in the normal loop. Human only escalates when the design says the AI cannot safely decide.
1. Constitutional / governance requirements
- NT14 executor perspective: rules must be executable by agents without hidden interpretation.
- NT13 PG First / PG Native / PG Driven: final manifest, decisions, evidence, rollback keys, and reports must be persisted or persistable as system state.
- NT15 design before implementation: no tool implementation until this operational problem is answered.
- Zero Trust: if no-loss/no-overlap/round-trip cannot be proven, stop or rollback.
- Anti-hardcode: no document-specific hardcoded split arrays; every cut is source-derived and manifest-backed.
2. Core answer: Mark → Review → Cut
The accepted operating model is:
MARK = AI makes semantic cut decision and emits manifest.
REVIEW = independent AI verifies/repairs/blocks manifest.
CUT = deterministic execution from reviewed manifest.
This separates decision from execution:
- The manifest is the decision artifact.
- The cutter engine is the execution artifact.
- Split/Merge are correction workflows on previously made decisions.
3. Mandatory design questions and accepted answers
Q1. How does the user command work?
User says:
Cắt luật A
Cắt văn bản X
Cắt file Y
System resolves the source automatically from KB/PG metadata.
Accepted answer:
- If exactly one source resolves, continue.
- If no source or multiple plausible sources, ask one clarification question.
- This is one of the only default reasons to ask human.
Design must include source resolution rules and ambiguity handling.
Q2. What is the source of truth?
Accepted answer:
- For new text documents: KB markdown file is canonical source.
- For already-TAC publications: TAC publication can be source mode.
- Future source modes are allowed, but each mode must declare canonical source and source hash.
Before cutting, system must check:
- source exists;
- source hash/version captured;
- existing IU collision for same doc_code/canonical namespace;
- whether source was previously cut.
If already cut:
- default is not to recut blindly;
- system must classify as
existing_cut_detectedand choose: status only, split/merge, supersede/re-cut, or block for review.
Q3. Where does AI cut?
Accepted answer: three-layer decision model.
Layer 1 — Structure
- Markdown headings are natural cut candidates.
- Heading boundaries must be considered, but not blindly treated as final.
- Heading with children and no body may become heading/container unit.
Layer 2 — Semantics
AI may split inside a heading block when semantic role changes, e.g.:
- principle → process;
- process → checklist;
- requirement → technical_spec;
- governance rule → metric;
- text → table/code block;
- one paragraph contains multiple independent principles.
Layer 3 — Size / actionability flags
Size is a review trigger, not an automatic rule.
Suggested defaults:
body_chars > 5000 => review for possible split
body_chars > 9000 => must split or justify
body_chars < 50 => review unless heading/container
AI must explain decision via cut_reason and confidence.
Q4. What is MARK output?
MARK outputs a manifest, not writes.
Minimum manifest fields:
manifest_id
source_doc_ref
source_version_ref
source_hash
source_mode
unit_index
source_start_line/source_end_line OR source_span
canonical_address_proposal
title
body_span_policy
section_type
unit_kind
parent_manifest_id
hierarchy_depth
body_source_policy
semantic_role
cut_reason
confidence
review_required_flags
edge_readiness_notes
split_merge_notes
Manifest must satisfy:
no missing intended source spans
no overlapping body spans
stable ordering
parent-child consistency
source hash bound to manifest
Q5. Who reviews manifest?
Accepted answer:
- AI reviews by default.
- Review must be logically independent from Mark, even if same model/session performs a second role.
- Human is optional escalation, not default.
REVIEW checks:
coverage/no-loss
no-overlap
semantic cohesion
actionability
section_type correctness
unit_kind correctness
hierarchy correctness
size flags
body policy correctness
vocab existence
edge readiness
round-trip feasibility
Review output:
manifest_review_status=PASS|PATCHED_PASS|BLOCKED
human_escalation_required=true|false
Q6. How is “cut correctness” checked after execution?
Accepted answer: round-trip verification is mandatory.
After CUT:
- Render all newly created pieces in manifest order/hierarchy.
- Compare reconstructed text against canonical source or expected render.
- If 0 drift, PASS.
- If content drift, rollback using exact keys.
This turns “did we cut correctly?” into a measurable test.
Round-trip verification must distinguish:
- exact byte equality if expected;
- normalized equality if source mode declares normalizer;
- accepted representation conversion, e.g. heading NULL body → title body with provenance.
Q7. What if something is wrong?
Two classes:
A. Content/integrity error
Examples:
- lost text;
- duplicated text;
- wrong ordering;
- TAC/KB source mutated;
- birth missing;
- invalid hash/body policy.
Action:
rollback automatically before/after commit using exact keys, then report.
B. Semantic cut-quality error
Examples:
- one unit contains two independently actionable concepts;
- two units should have been one;
- section_type wrong;
- hierarchy awkward;
- downstream edge/linking reveals wrong granularity.
Action:
use Split/Merge lifecycle; do not treat as content failure if round-trip passed.
Q8. How does Split work?
Split is structural correction, not ad hoc editing.
Use when:
- unit has multiple semantic roles;
- unit too large/coarse;
- downstream workflow needs smaller addressable units;
- edge/linking precision is poor.
Required process:
- MARK split points inside existing unit.
- REVIEW split manifest.
- CREATE new units/versions through canonical writer.
- Mark old unit as superseded; do not erase history.
- Create
split_from/supersedes/superseded_byrelations. - Reassign or propose reassignment of edges.
- Round-trip verify combined new units equal old unit content or expected normalized representation.
- Report.
Required metadata:
operation=split
source_unit_id
source_unit_version_id
new_unit_ids
old_canonical_address
new_canonical_addresses
split_reason
span_mapping
semantic_mapping
supersedes_relation
edge_reassignment_plan
audit_actor
audit_time
rollback_plan
Q9. How does Merge work?
Merge is structural correction.
Use when:
- units are too small/non-actionable;
- meaning depends on adjacency;
- previous split created artificial fragmentation;
- edge/linking noise is high.
Required process:
- MARK merge candidate units.
- REVIEW merge decision.
- CREATE new merged unit through canonical writer.
- Mark old units as superseded.
- Preserve aliases/redirects.
- Reassign/propose reassignment of edges.
- Round-trip verify merged content equals ordered old content or expected normalized representation.
- Report.
Required metadata:
operation=merge
source_unit_ids
new_unit_id
merge_reason
canonical_address_policy
superseded_by_relation
edge_reassignment_plan
history_preserved=true
rollback_plan
Q10. How does rollback work?
Accepted answer:
- Exact-key rollback only.
- Rollback keys must be dual-written to KB + VPS log before COMMIT.
- Pattern deletion is prohibited.
- Split/Merge and Cut all use the same exact-key rollback discipline.
Q11. Does the entire “Cắt luật A” run in one operation?
Accepted answer: yes, for normal cases.
Default flow:
Resolve source
→ collision/history check
→ MARK manifest
→ REVIEW manifest
→ CUT via canonical writer
→ round-trip verify
→ report
The user receives a status:
PASS: Luật A đã cắt thành N miếng, round-trip 0 drift.
FAIL_ROLLED_BACK: đã rollback, lý do [...]
BLOCKED_NEEDS_CLARIFICATION: source ambiguous / vocab missing / suspected corruption / etc.
Q12. Is this only for laws?
Accepted answer: no.
Same workflow applies to multiple document kinds. Only unit_kind, section_type profiles, and render policies vary.
Examples:
law → unit_kind=law_unit
design doc → unit_kind=design_doc_section
process → unit_kind=process_section
report → unit_kind=report_section
Q13. How are decisions persisted?
Design must persist or make persistable:
- source resolution result;
- mark manifest;
- review report;
- execution report;
- rollback keys;
- round-trip result;
- split/merge operations;
- policy exceptions.
Minimum v0.1 may store in KB artifacts, but design must show future PG-native manifest tables if needed.
Q14. How does AI know section_type/unit_kind?
Design must specify a vocab-first classifier:
- Try existing vocab.
- If no matching type, flag
NEW_VOCAB_REQUIRED. - Do not invent type silently.
- If type ambiguity does not affect execution, choose best type with confidence and review flag.
- If type ambiguity affects governance/render/edges, escalate.
Q15. How are body policies decided?
Use explicit body policy:
PRESERVE_BODY_FROM_SOURCE
SYNTHESIZE_TITLE_FOR_HEADING_NULL_BODY
CONTAINER_HEADING_NO_BODY_IN_SOURCE
EXCLUDED_BOILERPLATE
BLOCK_NULL_BODY_UNSUPPORTED
From Phase 5C2, approved rule:
SYNTHESIZE_TITLE iff section_type='heading' AND body IS NULL AND children>0
PRESERVE iff body IS NOT NULL
BLOCK iff body IS NULL AND not heading-container
Q16. How does the process remain simple enough?
Design must present a minimal state machine agents can remember:
Resolve → Mark → Review → Cut → Verify → Report
Split/Merge is a separate correction flow:
Find issue → Mark structural change → Review → Apply → Verify → Report
All details must map to these simple states.
4. Escalation matrix
| Situation | Default action |
|---|---|
| Source ambiguous | Ask one clarification question |
| Source missing | Block and report |
| Already cut | Show status; choose split/merge/supersede path by policy |
| Missing vocab/type | Escalate unless low-risk mapped type exists |
| No-loss/no-overlap fails | Review/repair once; if still fail, block |
| Round-trip drift | Rollback automatically |
| Non-heading NULL body | Block |
| Heading NULL body with children | Apply synthesize-title policy |
| Very large unit | Review for split; justify if kept |
| Very small non-heading unit | Review for merge; justify if kept |
| Split/merge changes legal meaning | Human escalation |
| Pure structural split/merge | AI can decide and report |
5. Design acceptance criteria
Agent’s design will be rejected unless it answers all of the following:
- Can a user say “Cắt luật A” without specifying file path in normal cases?
- Can the system resolve source and collision/history state?
- Does MARK produce a concrete manifest with spans, reasons, types, hierarchy, confidence?
- Does REVIEW independently check no-loss/no-overlap/semantic cohesion/vocab/hierarchy/body policy?
- Does CUT execute only from an approved manifest?
- Is round-trip verification mandatory?
- Does the design define rollback before and after commit?
- Does the design define Split and Merge with history preservation?
- Does the design define who decides and when human escalation occurs?
- Does the design cover already-cut documents?
- Does the design cover KB markdown and TAC source modes?
- Does the design carry Phase 5C2 body policy and V-3 semantics?
- Is it simple enough to reduce to Resolve → Mark → Review → Cut → Verify → Report?
- Does it avoid hardcoded per-document logic?
- Does it make edge/professional linking possible later?
6. Next step
This problem statement must become the controlling input for Agent design.
Patch/replace the Agent design prompt so Agent must produce a design answering this problem statement, not merely a technical cutter spec.
Final flags
operational_problem_statement_status=APPROVED_INPUT
main_problem=cut_decision_operating_process
technical_cutter=execution_backend_only
human_default_in_loop=false
ai_decides_by_default=true
round_trip_required=true
split_merge_required=true
agent_design_can_start_after_prompt_patch=true