KB-56DB

dot-iu-cutter v0.1 — Operational Problem Statement / Đề bài vận hành “Cắt luật A” — 2026-05-14

14 min read Revision 1

dot-iu-cutteroperational-problem-statementcat-luat-amark-review-cutsplit-mergerequirements2026-05-14

dot-iu-cutter v0.1 — Operational Problem Statement / Đề bài vận hành “Cắt luật A” — 2026-05-14

0. Purpose

This document defines the operational problem before asking Agent to design the cutter.

The goal is not merely to split a document into N pieces. The goal is a closed-loop operating process where the system can answer, by design:

What source is being cut?
Where are cut boundaries?
Why these boundaries?
Who/what decides?
Who/what reviews?
How does the system prove no content loss or overlap?
What if the cut is wrong?
How do split/merge corrections work?
When does human escalation happen?
How does this become simple enough for agents to execute repeatedly?

Target operator command:

Cắt luật A

Target system behavior:

Resolve → Mark → Review → Cut → Round-trip verify → Report → If needed Split/Merge lifecycle

Human is not in the normal loop. Human only escalates when the design says the AI cannot safely decide.

1. Constitutional / governance requirements

NT14 executor perspective: rules must be executable by agents without hidden interpretation.
NT13 PG First / PG Native / PG Driven: final manifest, decisions, evidence, rollback keys, and reports must be persisted or persistable as system state.
NT15 design before implementation: no tool implementation until this operational problem is answered.
Zero Trust: if no-loss/no-overlap/round-trip cannot be proven, stop or rollback.
Anti-hardcode: no document-specific hardcoded split arrays; every cut is source-derived and manifest-backed.

2. Core answer: Mark → Review → Cut

The accepted operating model is:

MARK  = AI makes semantic cut decision and emits manifest.
REVIEW = independent AI verifies/repairs/blocks manifest.
CUT = deterministic execution from reviewed manifest.

This separates decision from execution:

The manifest is the decision artifact.
The cutter engine is the execution artifact.
Split/Merge are correction workflows on previously made decisions.

3. Mandatory design questions and accepted answers

Q1. How does the user command work?

User says:

Cắt luật A
Cắt văn bản X
Cắt file Y

System resolves the source automatically from KB/PG metadata.

Accepted answer:

If exactly one source resolves, continue.
If no source or multiple plausible sources, ask one clarification question.
This is one of the only default reasons to ask human.

Design must include source resolution rules and ambiguity handling.

Q2. What is the source of truth?

Accepted answer:

For new text documents: KB markdown file is canonical source.
For already-TAC publications: TAC publication can be source mode.
Future source modes are allowed, but each mode must declare canonical source and source hash.

Before cutting, system must check:

source exists;
source hash/version captured;
existing IU collision for same doc_code/canonical namespace;
whether source was previously cut.

If already cut:

default is not to recut blindly;
system must classify as existing_cut_detected and choose: status only, split/merge, supersede/re-cut, or block for review.

Q3. Where does AI cut?

Accepted answer: three-layer decision model.

Layer 1 — Structure

Markdown headings are natural cut candidates.
Heading boundaries must be considered, but not blindly treated as final.
Heading with children and no body may become heading/container unit.

Layer 2 — Semantics

AI may split inside a heading block when semantic role changes, e.g.:

principle → process;
process → checklist;
requirement → technical_spec;
governance rule → metric;
text → table/code block;
one paragraph contains multiple independent principles.

Layer 3 — Size / actionability flags

Size is a review trigger, not an automatic rule.

Suggested defaults:

body_chars > 5000 => review for possible split
body_chars > 9000 => must split or justify
body_chars < 50 => review unless heading/container

AI must explain decision via cut_reason and confidence.

Q4. What is MARK output?

MARK outputs a manifest, not writes.

Minimum manifest fields:

manifest_id
source_doc_ref
source_version_ref
source_hash
source_mode
unit_index
source_start_line/source_end_line OR source_span
canonical_address_proposal
title
body_span_policy
section_type
unit_kind
parent_manifest_id
hierarchy_depth
body_source_policy
semantic_role
cut_reason
confidence
review_required_flags
edge_readiness_notes
split_merge_notes

Manifest must satisfy:

no missing intended source spans
no overlapping body spans
stable ordering
parent-child consistency
source hash bound to manifest

Q5. Who reviews manifest?

Accepted answer:

AI reviews by default.
Review must be logically independent from Mark, even if same model/session performs a second role.
Human is optional escalation, not default.

REVIEW checks:

coverage/no-loss
no-overlap
semantic cohesion
actionability
section_type correctness
unit_kind correctness
hierarchy correctness
size flags
body policy correctness
vocab existence
edge readiness
round-trip feasibility

Review output:

manifest_review_status=PASS|PATCHED_PASS|BLOCKED
human_escalation_required=true|false

Q6. How is “cut correctness” checked after execution?

Accepted answer: round-trip verification is mandatory.

After CUT:

Render all newly created pieces in manifest order/hierarchy.
Compare reconstructed text against canonical source or expected render.
If 0 drift, PASS.
If content drift, rollback using exact keys.

This turns “did we cut correctly?” into a measurable test.

Round-trip verification must distinguish:

exact byte equality if expected;
normalized equality if source mode declares normalizer;
accepted representation conversion, e.g. heading NULL body → title body with provenance.

Q7. What if something is wrong?

Two classes:

A. Content/integrity error

Examples:

lost text;
duplicated text;
wrong ordering;
TAC/KB source mutated;
birth missing;
invalid hash/body policy.

Action:

rollback automatically before/after commit using exact keys, then report.

B. Semantic cut-quality error

Examples:

one unit contains two independently actionable concepts;
two units should have been one;
section_type wrong;
hierarchy awkward;
downstream edge/linking reveals wrong granularity.

Action:

use Split/Merge lifecycle; do not treat as content failure if round-trip passed.

Q8. How does Split work?

Split is structural correction, not ad hoc editing.

Use when:

unit has multiple semantic roles;
unit too large/coarse;
downstream workflow needs smaller addressable units;
edge/linking precision is poor.

Required process:

MARK split points inside existing unit.
REVIEW split manifest.
CREATE new units/versions through canonical writer.
Mark old unit as superseded; do not erase history.
Create split_from / supersedes / superseded_by relations.
Reassign or propose reassignment of edges.
Round-trip verify combined new units equal old unit content or expected normalized representation.
Report.

Required metadata:

operation=split
source_unit_id
source_unit_version_id
new_unit_ids
old_canonical_address
new_canonical_addresses
split_reason
span_mapping
semantic_mapping
supersedes_relation
edge_reassignment_plan
audit_actor
audit_time
rollback_plan

Q9. How does Merge work?

Merge is structural correction.

Use when:

units are too small/non-actionable;
meaning depends on adjacency;
previous split created artificial fragmentation;
edge/linking noise is high.

Required process:

MARK merge candidate units.
REVIEW merge decision.
CREATE new merged unit through canonical writer.
Mark old units as superseded.
Preserve aliases/redirects.
Reassign/propose reassignment of edges.
Round-trip verify merged content equals ordered old content or expected normalized representation.
Report.

Required metadata:

operation=merge
source_unit_ids
new_unit_id
merge_reason
canonical_address_policy
superseded_by_relation
edge_reassignment_plan
history_preserved=true
rollback_plan

Q10. How does rollback work?

Accepted answer:

Exact-key rollback only.
Rollback keys must be dual-written to KB + VPS log before COMMIT.
Pattern deletion is prohibited.
Split/Merge and Cut all use the same exact-key rollback discipline.

Q11. Does the entire “Cắt luật A” run in one operation?

Accepted answer: yes, for normal cases.

Default flow:

Resolve source
→ collision/history check
→ MARK manifest
→ REVIEW manifest
→ CUT via canonical writer
→ round-trip verify
→ report

The user receives a status:

PASS: Luật A đã cắt thành N miếng, round-trip 0 drift.
FAIL_ROLLED_BACK: đã rollback, lý do [...]
BLOCKED_NEEDS_CLARIFICATION: source ambiguous / vocab missing / suspected corruption / etc.

Q12. Is this only for laws?

Accepted answer: no.

Same workflow applies to multiple document kinds. Only unit_kind, section_type profiles, and render policies vary.

Examples:

law → unit_kind=law_unit
design doc → unit_kind=design_doc_section
process → unit_kind=process_section
report → unit_kind=report_section

Q13. How are decisions persisted?

Design must persist or make persistable:

source resolution result;
mark manifest;
review report;
execution report;
rollback keys;
round-trip result;
split/merge operations;
policy exceptions.

Minimum v0.1 may store in KB artifacts, but design must show future PG-native manifest tables if needed.

Q14. How does AI know section_type/unit_kind?

Design must specify a vocab-first classifier:

Try existing vocab.
If no matching type, flag NEW_VOCAB_REQUIRED.
Do not invent type silently.
If type ambiguity does not affect execution, choose best type with confidence and review flag.
If type ambiguity affects governance/render/edges, escalate.

Q15. How are body policies decided?

Use explicit body policy:

PRESERVE_BODY_FROM_SOURCE
SYNTHESIZE_TITLE_FOR_HEADING_NULL_BODY
CONTAINER_HEADING_NO_BODY_IN_SOURCE
EXCLUDED_BOILERPLATE
BLOCK_NULL_BODY_UNSUPPORTED

From Phase 5C2, approved rule:

SYNTHESIZE_TITLE iff section_type='heading' AND body IS NULL AND children>0
PRESERVE iff body IS NOT NULL
BLOCK iff body IS NULL AND not heading-container

Q16. How does the process remain simple enough?

Design must present a minimal state machine agents can remember:

Resolve → Mark → Review → Cut → Verify → Report

Split/Merge is a separate correction flow:

Find issue → Mark structural change → Review → Apply → Verify → Report

All details must map to these simple states.

4. Escalation matrix

Situation	Default action
Source ambiguous	Ask one clarification question
Source missing	Block and report
Already cut	Show status; choose split/merge/supersede path by policy
Missing vocab/type	Escalate unless low-risk mapped type exists
No-loss/no-overlap fails	Review/repair once; if still fail, block
Round-trip drift	Rollback automatically
Non-heading NULL body	Block
Heading NULL body with children	Apply synthesize-title policy
Very large unit	Review for split; justify if kept
Very small non-heading unit	Review for merge; justify if kept
Split/merge changes legal meaning	Human escalation
Pure structural split/merge	AI can decide and report

5. Design acceptance criteria

Agent’s design will be rejected unless it answers all of the following:

Can a user say “Cắt luật A” without specifying file path in normal cases?
Can the system resolve source and collision/history state?
Does MARK produce a concrete manifest with spans, reasons, types, hierarchy, confidence?
Does REVIEW independently check no-loss/no-overlap/semantic cohesion/vocab/hierarchy/body policy?
Does CUT execute only from an approved manifest?
Is round-trip verification mandatory?
Does the design define rollback before and after commit?
Does the design define Split and Merge with history preservation?
Does the design define who decides and when human escalation occurs?
Does the design cover already-cut documents?
Does the design cover KB markdown and TAC source modes?
Does the design carry Phase 5C2 body policy and V-3 semantics?
Is it simple enough to reduce to Resolve → Mark → Review → Cut → Verify → Report?
Does it avoid hardcoded per-document logic?
Does it make edge/professional linking possible later?

6. Next step

This problem statement must become the controlling input for Agent design.

Patch/replace the Agent design prompt so Agent must produce a design answering this problem statement, not merely a technical cutter spec.

Final flags

operational_problem_statement_status=APPROVED_INPUT
main_problem=cut_decision_operating_process
technical_cutter=execution_backend_only
human_default_in_loop=false
ai_decides_by_default=true
round_trip_required=true
split_merge_required=true
agent_design_can_start_after_prompt_patch=true