dot-iu-cutter v0.1 — Segmentation Health and Usage Feedback Design
dot-iu-cutter v0.1 — Segmentation Health & Usage Feedback Design (D3)
Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §7.E, §7.G, §12.8 Scope: DESIGN ONLY.
1. Purpose
Define how F2 (Health / Correction) operates: how the system observes post-cut usage, detects mis-segmentations, classifies signals, runs the Segmentation Health Report, and selects between Split / Merge / Edge / Thread / Context-Pack / NoAction — each as a governed lifecycle action, never an automatic structural change.
2. Scope
- Health signals catalog and classification
- Segmentation Health Report cadence and content
- Detect → Review → Action → Verify → Report lifecycle for F2
- Split / Merge / Edge / Thread / Context-Pack / NoAction decision matrix
- Evidence bundles per action
Out of scope: thread object lifecycle (D9); retrieval-side signals (D11); Decision Backlog Registry mechanics (D5).
3. Dependencies
- rev5d §7.E, §7.G, §12.8
- D1 (operational state machine), D2 (manifest contract), D9 (thread signals consumed)
- C1A (segmentation rules remain binding)
- Đ32 (high-risk approval), Đ37 (escalation), Đ39 (universal_edges first), Đ24 (vocabulary)
4. Key Decisions
4.1 Flow 2 Backbone
Observe → Detect → Review → {Split | Merge | Edge | Thread | Context-Pack | NoAction} → Verify → Report
Same state machine vocabulary as F1; the entry trigger differs (Health Report event / complaint / signal threshold).
4.2 Signal Catalog (Q22; criterion 9, 11)
| Signal | Description | Classification |
|---|---|---|
co_citation |
Two units repeatedly cited together | requires_instrumentation |
co_edit |
Two units edited together in same change-set | requires_instrumentation |
co_retrieval |
Two units retrieved together in same query | requires_instrumentation |
edge_density_overlap |
Units share many edges → semantic overlap | available_now (after Đ39 hooks land) |
context_pack_dependency |
One unit appears only as supporting in another's pack | requires_instrumentation |
orphan_or_underused_unit |
Unit with no edges and no retrieval hits | requires_instrumentation |
misclassification_signal |
section_type/unit_kind mismatches observed behavior | available_now |
expected_artifact_missing |
From D9 thread expected chain | available_now hook; data requires_instrumentation |
noisy_retrieval |
From D11 retrieval feedback | requires_instrumentation |
wrong_link |
From D9 link rejection | available_now |
length_drift |
Unit length grew beyond C1A NL band | available_now |
overlap_growth |
Span overlap with sibling appeared after edit | available_now |
user_complaint |
Human reported confusing/wrong unit | available_now |
user_ai_disagreement |
From D9 §4.12 | available_now |
wrong_audience_result |
From D11; security event | available_now (security path) |
Classification rule (criterion 11): every signal carries one of available_now / requires_instrumentation / future_capability. v0.1 ships the catalog and hooks; collection completeness depends on instrumentation.
4.3 Signal Aggregation
Signals are PG-persisted events. The system computes per-unit and per-thread aggregate scores:
| Aggregate | Use |
|---|---|
unit_health_score |
Composite from signals targeting this unit |
thread_health_score |
Composite from signals targeting threads containing this unit |
coupling_score |
Edge-density + co-citation + co-retrieval blend |
Score formulas are policy; v0.1 defines the hooks and routes scores to Decision Backlog (D5) for tuning.
4.4 Segmentation Health Report (Q24; criterion 10)
Trigger conditions (any one fires a report):
- N new units since last report (N policy; default placeholder).
- Y days since last report.
- M events of any health signal type.
- Strong coupling detected (
coupling_score> threshold). - User complaint received.
Report content:
report_id
generated_at
window_start / window_end
trigger
units_in_scope (count + sample)
signals_by_class (table)
high_priority_findings (list)
recommended_actions (per finding)
escalations_routed (to Đ37 / Đ32)
related_decision_backlog_entries (refs to D5)
The report is PG-persisted, with KB markdown mirror for human reading.
4.5 Detect → Review → Action Lifecycle (Q25; criterion 7, 8, 9)
Detect: signals + scores cross threshold
→ Evidence bundle: collect raw signals, units, thread refs, retrieval samples, edits
→ AI proposal: recommend one of {Split, Merge, Edge, Thread, Context-Pack, NoAction}
→ Independent review (Đ37): PASS / FAIL / NEEDS_HUMAN
→ On PASS: apply chosen action with rollback key
→ Verify: round-trip (axis-1) + axis-2 coverage check + signal recheck
→ Report: PG-persisted; mirrored to KB
Auto-action is forbidden. Even low-risk findings require review. (rev5d P7: prefer graph enrichment before structural change; criterion 8.)
4.6 Action Decision Matrix (Q15, Q16, Q17)
| Situation | Recommended action |
|---|---|
| Two units always travel together AND not independently meaningful | Merge (with full lifecycle: superseded, aliases, redirects, edge reassignment) |
| One unit grew internally heterogeneous (C1A 3-question test now fails) | Split |
| Units are independently meaningful but often travel together | Edge (universal_edges, Đ39) or Context-Pack (retrieval-side bundling) |
| Units share thread relevance but not direct coupling | Thread (membership in semantic thread, D9) |
| Observed signal is noise / unit health is acceptable | NoAction |
Decision is recorded with rationale and full evidence bundle, regardless of choice. NoAction is a first-class outcome, not skipping the workflow.
4.7 Split Lifecycle (Q15; criterion 7)
Input: failing unit with evidence bundle
→ Propose new unit set (with new manifest fragment per D2)
→ REVIEW (D2 checklist applies)
→ Old unit marked superseded; new units created
→ History preserved; spans/roles migrated; edges reassigned per universal_edges
→ Aliases/redirects from old → new
→ VERIFY: axis-1 round-trip + axis-2 coverage + signal recheck
→ Report
4.8 Merge Lifecycle (Q16; criterion 7, P7)
Input: two units with strong, justified coupling
→ Pre-condition: evidence shows units are NOT independently meaningful (P7 guardrail)
→ If P7 doubt remains → recommend Edge/Thread/Context-Pack first
→ Propose new merged unit
→ REVIEW (D2 checklist; canonical parent uniqueness)
→ Old units superseded; new unit created
→ Aliases/redirects; edges reassigned
→ VERIFY + Report
Merge-by-coupling-alone is forbidden. P7 requires evidence that units are not independently meaningful.
4.9 Edge / Thread / Context-Pack (Q17; criterion 8)
Preferred over structural change when units remain independently meaningful:
- Edge (Đ39
universal_edges): typed relation captures the coupling; both units survive. - Thread (D9 membership): if coupling is part of a domain axis spanning multiple units across documents.
- Context-Pack (D11): retrieval-side bundling; both units always returned together when queried.
The action choice is reviewed; once applied, signals are re-evaluated post-action.
4.10 NoAction
A NoAction outcome is fully documented:
- Why the signal does not warrant change.
- What threshold would change the decision in a future cycle.
- An entry in Decision Backlog Registry (D5) with
next_review_date.
NoAction is explicit, not silent dismissal.
4.11 Verification After F2 Actions
For any structural change (Split/Merge): axis-1 round-trip MUST re-pass against the canonical source representation; affected publications must re-render to 0 drift. For edge/thread/context-pack: axis-2 coverage check + signal recheck.
4.12 Reporting
Every F2 cycle ends with a Report (PG-persisted, KB mirror). Reports carry rollback keys; rollback for F2 follows the D1 §4.8 model.
4.13 Routing of Signals (Q43)
All signals route to:
- Segmentation Health Report (this deliverable).
- Decision Backlog Registry (D5) for governance.
- Đ37 escalation queue for high-risk findings (Đ32).
No new notification system (criterion 38).
4.14 Thread-Side Signals Consumed (Q41, Q42)
D3 consumes thread signals from D9 (missing_link, wrong_link, stale, overbroad, too_narrow, expected_artifact_missing) and retrieval signals from D11 (noisy_thread, wrong_audience_result, weak_thread). These contribute to the unit/thread health scoring.
5. PG Storage per Object (Design Intent — No DDL)
| Object | Target DB | Layer | Notes |
|---|---|---|---|
health_signal_event |
directus | Não | Per-event raw record |
unit_health_score |
directus | Não | Per-unit aggregate (view or table) |
thread_health_score |
directus | Não | Per-thread aggregate |
coupling_score |
directus | Não | Pair-level aggregate |
segmentation_health_report |
directus | Kho | Persisted reports |
f2_action_decision |
directus | Kho | Action chosen + rationale + evidence |
evidence_bundle (F2) |
directus | Não | JSONB envelope |
6. Schema Gaps
health_signal_eventtable — no current capability.- Aggregate views (
unit_health_score,thread_health_score,coupling_score) — definitions and refresh policy. segmentation_health_report— persistence schema.f2_action_decision— distinct from F1 manifest; may share envelope shape.- Co-edit / co-citation / co-retrieval instrumentation —
requires_instrumentation. - Edge reassignment audit trail — required by Split/Merge lifecycles.
- Alias / redirect table — for superseded units; may exist in TAC, verify.
- Threshold policy table — per-signal and per-aggregate thresholds; current policy storage unclear.
7. Law References
| Surface | Law |
|---|---|
| Segmentation rules (post-action verify) | C1A |
| Risk gating | Đ32 |
| Roles / escalation | Đ37 |
| Universal edges authority | Đ39 |
| Vocabulary | Đ24 |
| Manifest-as-code for action decisions | Đ38 |
8. Open Questions
- Threshold values for signal aggregates — defer to policy (D5).
- How co-edit signals are captured given the current CDC infrastructure — defer to D4 capability intake.
- Should
coupling_scoreblend retrieval and edge signals with fixed weights or learned weights? Recommendation: fixed weights v0.1, learned later via capability intake. - Cadence defaults (N, Y, M) for Segmentation Health Report — policy decision.
9. Coverage
Questions covered (primary): Q15, Q16, Q17, Q22, Q23, Q24, Q25. Questions covered (secondary): Q41, Q42, Q43, Q44.
Acceptance criteria covered:
- 7 (split/merge lifecycle)
- 8 (edge/context-pack/no-action, not auto-merge)
- 9 (post-cut usage review)
- 10 (Segmentation Health Report)
- 11 (signal classification)
- 35 (missing/wrong link detection — supporting D9)
- 36 (thread split/merge — supporting D9)
- 38 (no parallel notification — supporting)
Schema gaps: 8 named (see §6).
Law dependencies: C1A, Đ24, Đ32, Đ37, Đ38, Đ39.
Open questions: 4 (see §8).
Law conflicts encountered: none. P7 guardrail enforced (no merge-by-coupling-alone).