dot-iu-cutter v0.1 — Segmentation Health & Usage Feedback Design (D3)

Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §7.E, §7.G, §12.8 Scope: DESIGN ONLY.

1. Purpose

Define how F2 (Health / Correction) operates: how the system observes post-cut usage, detects mis-segmentations, classifies signals, runs the Segmentation Health Report, and selects between Split / Merge / Edge / Thread / Context-Pack / NoAction — each as a governed lifecycle action, never an automatic structural change.

2. Scope

Health signals catalog and classification
Segmentation Health Report cadence and content
Detect → Review → Action → Verify → Report lifecycle for F2
Split / Merge / Edge / Thread / Context-Pack / NoAction decision matrix
Evidence bundles per action

Out of scope: thread object lifecycle (D9); retrieval-side signals (D11); Decision Backlog Registry mechanics (D5).

3. Dependencies

rev5d §7.E, §7.G, §12.8
D1 (operational state machine), D2 (manifest contract), D9 (thread signals consumed)
C1A (segmentation rules remain binding)
Đ32 (high-risk approval), Đ37 (escalation), Đ39 (universal_edges first), Đ24 (vocabulary)

4. Key Decisions

4.1 Flow 2 Backbone

Observe → Detect → Review → {Split | Merge | Edge | Thread | Context-Pack | NoAction} → Verify → Report

Same state machine vocabulary as F1; the entry trigger differs (Health Report event / complaint / signal threshold).

4.2 Signal Catalog (Q22; criterion 9, 11)

Signal	Description	Classification
`co_citation`	Two units repeatedly cited together	requires_instrumentation
`co_edit`	Two units edited together in same change-set	requires_instrumentation
`co_retrieval`	Two units retrieved together in same query	requires_instrumentation
`edge_density_overlap`	Units share many edges → semantic overlap	available_now (after Đ39 hooks land)
`context_pack_dependency`	One unit appears only as supporting in another's pack	requires_instrumentation
`orphan_or_underused_unit`	Unit with no edges and no retrieval hits	requires_instrumentation
`misclassification_signal`	section_type/unit_kind mismatches observed behavior	available_now
`expected_artifact_missing`	From D9 thread expected chain	available_now hook; data requires_instrumentation
`noisy_retrieval`	From D11 retrieval feedback	requires_instrumentation
`wrong_link`	From D9 link rejection	available_now
`length_drift`	Unit length grew beyond C1A NL band	available_now
`overlap_growth`	Span overlap with sibling appeared after edit	available_now
`user_complaint`	Human reported confusing/wrong unit	available_now
`user_ai_disagreement`	From D9 §4.12	available_now
`wrong_audience_result`	From D11; security event	available_now (security path)

Classification rule (criterion 11): every signal carries one of available_now / requires_instrumentation / future_capability. v0.1 ships the catalog and hooks; collection completeness depends on instrumentation.

4.3 Signal Aggregation

Signals are PG-persisted events. The system computes per-unit and per-thread aggregate scores:

Aggregate	Use
`unit_health_score`	Composite from signals targeting this unit
`thread_health_score`	Composite from signals targeting threads containing this unit
`coupling_score`	Edge-density + co-citation + co-retrieval blend

Score formulas are policy; v0.1 defines the hooks and routes scores to Decision Backlog (D5) for tuning.

4.4 Segmentation Health Report (Q24; criterion 10)

Trigger conditions (any one fires a report):

N new units since last report (N policy; default placeholder).
Y days since last report.
M events of any health signal type.
Strong coupling detected (coupling_score > threshold).
User complaint received.

Report content:

report_id
generated_at
window_start / window_end
trigger
units_in_scope (count + sample)
signals_by_class (table)
high_priority_findings (list)
recommended_actions (per finding)
escalations_routed (to Đ37 / Đ32)
related_decision_backlog_entries (refs to D5)

The report is PG-persisted, with KB markdown mirror for human reading.

4.5 Detect → Review → Action Lifecycle (Q25; criterion 7, 8, 9)

Detect: signals + scores cross threshold
  → Evidence bundle: collect raw signals, units, thread refs, retrieval samples, edits
  → AI proposal: recommend one of {Split, Merge, Edge, Thread, Context-Pack, NoAction}
  → Independent review (Đ37): PASS / FAIL / NEEDS_HUMAN
  → On PASS: apply chosen action with rollback key
  → Verify: round-trip (axis-1) + axis-2 coverage check + signal recheck
  → Report: PG-persisted; mirrored to KB

Auto-action is forbidden. Even low-risk findings require review. (rev5d P7: prefer graph enrichment before structural change; criterion 8.)

4.6 Action Decision Matrix (Q15, Q16, Q17)

Situation	Recommended action
Two units always travel together AND not independently meaningful	Merge (with full lifecycle: superseded, aliases, redirects, edge reassignment)
One unit grew internally heterogeneous (C1A 3-question test now fails)	Split
Units are independently meaningful but often travel together	Edge (universal_edges, Đ39) or Context-Pack (retrieval-side bundling)
Units share thread relevance but not direct coupling	Thread (membership in semantic thread, D9)
Observed signal is noise / unit health is acceptable	NoAction

Decision is recorded with rationale and full evidence bundle, regardless of choice. NoAction is a first-class outcome, not skipping the workflow.

4.7 Split Lifecycle (Q15; criterion 7)

Input: failing unit with evidence bundle
  → Propose new unit set (with new manifest fragment per D2)
  → REVIEW (D2 checklist applies)
  → Old unit marked superseded; new units created
  → History preserved; spans/roles migrated; edges reassigned per universal_edges
  → Aliases/redirects from old → new
  → VERIFY: axis-1 round-trip + axis-2 coverage + signal recheck
  → Report

4.8 Merge Lifecycle (Q16; criterion 7, P7)

Input: two units with strong, justified coupling
  → Pre-condition: evidence shows units are NOT independently meaningful (P7 guardrail)
  → If P7 doubt remains → recommend Edge/Thread/Context-Pack first
  → Propose new merged unit
  → REVIEW (D2 checklist; canonical parent uniqueness)
  → Old units superseded; new unit created
  → Aliases/redirects; edges reassigned
  → VERIFY + Report

Merge-by-coupling-alone is forbidden. P7 requires evidence that units are not independently meaningful.

4.9 Edge / Thread / Context-Pack (Q17; criterion 8)

Preferred over structural change when units remain independently meaningful:

Edge (Đ39 universal_edges): typed relation captures the coupling; both units survive.
Thread (D9 membership): if coupling is part of a domain axis spanning multiple units across documents.
Context-Pack (D11): retrieval-side bundling; both units always returned together when queried.

The action choice is reviewed; once applied, signals are re-evaluated post-action.

4.10 NoAction

A NoAction outcome is fully documented:

Why the signal does not warrant change.
What threshold would change the decision in a future cycle.
An entry in Decision Backlog Registry (D5) with next_review_date.

NoAction is explicit, not silent dismissal.

4.11 Verification After F2 Actions

For any structural change (Split/Merge): axis-1 round-trip MUST re-pass against the canonical source representation; affected publications must re-render to 0 drift. For edge/thread/context-pack: axis-2 coverage check + signal recheck.

4.12 Reporting

Every F2 cycle ends with a Report (PG-persisted, KB mirror). Reports carry rollback keys; rollback for F2 follows the D1 §4.8 model.

4.13 Routing of Signals (Q43)

All signals route to:

Segmentation Health Report (this deliverable).
Decision Backlog Registry (D5) for governance.
Đ37 escalation queue for high-risk findings (Đ32).

No new notification system (criterion 38).

4.14 Thread-Side Signals Consumed (Q41, Q42)

D3 consumes thread signals from D9 (missing_link, wrong_link, stale, overbroad, too_narrow, expected_artifact_missing) and retrieval signals from D11 (noisy_thread, wrong_audience_result, weak_thread). These contribute to the unit/thread health scoring.

5. PG Storage per Object (Design Intent — No DDL)

Object	Target DB	Layer	Notes
`health_signal_event`	directus	Não	Per-event raw record
`unit_health_score`	directus	Não	Per-unit aggregate (view or table)
`thread_health_score`	directus	Não	Per-thread aggregate
`coupling_score`	directus	Não	Pair-level aggregate
`segmentation_health_report`	directus	Kho	Persisted reports
`f2_action_decision`	directus	Kho	Action chosen + rationale + evidence
`evidence_bundle` (F2)	directus	Não	JSONB envelope

6. Schema Gaps

health_signal_event table — no current capability.
Aggregate views (unit_health_score, thread_health_score, coupling_score) — definitions and refresh policy.
segmentation_health_report — persistence schema.
f2_action_decision — distinct from F1 manifest; may share envelope shape.
Co-edit / co-citation / co-retrieval instrumentation — requires_instrumentation.
Edge reassignment audit trail — required by Split/Merge lifecycles.
Alias / redirect table — for superseded units; may exist in TAC, verify.
Threshold policy table — per-signal and per-aggregate thresholds; current policy storage unclear.

7. Law References

Surface	Law
Segmentation rules (post-action verify)	C1A
Risk gating	Đ32
Roles / escalation	Đ37
Universal edges authority	Đ39
Vocabulary	Đ24
Manifest-as-code for action decisions	Đ38

8. Open Questions

Threshold values for signal aggregates — defer to policy (D5).
How co-edit signals are captured given the current CDC infrastructure — defer to D4 capability intake.
Should coupling_score blend retrieval and edge signals with fixed weights or learned weights? Recommendation: fixed weights v0.1, learned later via capability intake.
Cadence defaults (N, Y, M) for Segmentation Health Report — policy decision.

9. Coverage

Questions covered (primary): Q15, Q16, Q17, Q22, Q23, Q24, Q25. Questions covered (secondary): Q41, Q42, Q43, Q44.

Acceptance criteria covered:

7 (split/merge lifecycle)
8 (edge/context-pack/no-action, not auto-merge)
9 (post-cut usage review)
10 (Segmentation Health Report)
11 (signal classification)
35 (missing/wrong link detection — supporting D9)
36 (thread split/merge — supporting D9)
38 (no parallel notification — supporting)

Schema gaps: 8 named (see §6).

Law dependencies: C1A, Đ24, Đ32, Đ37, Đ38, Đ39.

Open questions: 4 (see §8).

Law conflicts encountered: none. P7 guardrail enforced (no merge-by-coupling-alone).