KB-5609

dot-iu-cutter v0.1 — Segmentation Health and Usage Feedback Design

12 min read Revision 1
dot-iu-cutterdesignsegmentation-healthusage-feedbackrev5d

dot-iu-cutter v0.1 — Segmentation Health & Usage Feedback Design (D3)

Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §7.E, §7.G, §12.8 Scope: DESIGN ONLY.


1. Purpose

Define how F2 (Health / Correction) operates: how the system observes post-cut usage, detects mis-segmentations, classifies signals, runs the Segmentation Health Report, and selects between Split / Merge / Edge / Thread / Context-Pack / NoAction — each as a governed lifecycle action, never an automatic structural change.

2. Scope

  • Health signals catalog and classification
  • Segmentation Health Report cadence and content
  • Detect → Review → Action → Verify → Report lifecycle for F2
  • Split / Merge / Edge / Thread / Context-Pack / NoAction decision matrix
  • Evidence bundles per action

Out of scope: thread object lifecycle (D9); retrieval-side signals (D11); Decision Backlog Registry mechanics (D5).

3. Dependencies

  • rev5d §7.E, §7.G, §12.8
  • D1 (operational state machine), D2 (manifest contract), D9 (thread signals consumed)
  • C1A (segmentation rules remain binding)
  • Đ32 (high-risk approval), Đ37 (escalation), Đ39 (universal_edges first), Đ24 (vocabulary)

4. Key Decisions

4.1 Flow 2 Backbone

Observe → Detect → Review → {Split | Merge | Edge | Thread | Context-Pack | NoAction} → Verify → Report

Same state machine vocabulary as F1; the entry trigger differs (Health Report event / complaint / signal threshold).

4.2 Signal Catalog (Q22; criterion 9, 11)

Signal Description Classification
co_citation Two units repeatedly cited together requires_instrumentation
co_edit Two units edited together in same change-set requires_instrumentation
co_retrieval Two units retrieved together in same query requires_instrumentation
edge_density_overlap Units share many edges → semantic overlap available_now (after Đ39 hooks land)
context_pack_dependency One unit appears only as supporting in another's pack requires_instrumentation
orphan_or_underused_unit Unit with no edges and no retrieval hits requires_instrumentation
misclassification_signal section_type/unit_kind mismatches observed behavior available_now
expected_artifact_missing From D9 thread expected chain available_now hook; data requires_instrumentation
noisy_retrieval From D11 retrieval feedback requires_instrumentation
wrong_link From D9 link rejection available_now
length_drift Unit length grew beyond C1A NL band available_now
overlap_growth Span overlap with sibling appeared after edit available_now
user_complaint Human reported confusing/wrong unit available_now
user_ai_disagreement From D9 §4.12 available_now
wrong_audience_result From D11; security event available_now (security path)

Classification rule (criterion 11): every signal carries one of available_now / requires_instrumentation / future_capability. v0.1 ships the catalog and hooks; collection completeness depends on instrumentation.

4.3 Signal Aggregation

Signals are PG-persisted events. The system computes per-unit and per-thread aggregate scores:

Aggregate Use
unit_health_score Composite from signals targeting this unit
thread_health_score Composite from signals targeting threads containing this unit
coupling_score Edge-density + co-citation + co-retrieval blend

Score formulas are policy; v0.1 defines the hooks and routes scores to Decision Backlog (D5) for tuning.

4.4 Segmentation Health Report (Q24; criterion 10)

Trigger conditions (any one fires a report):

  • N new units since last report (N policy; default placeholder).
  • Y days since last report.
  • M events of any health signal type.
  • Strong coupling detected (coupling_score > threshold).
  • User complaint received.

Report content:

report_id
generated_at
window_start / window_end
trigger
units_in_scope (count + sample)
signals_by_class (table)
high_priority_findings (list)
recommended_actions (per finding)
escalations_routed (to Đ37 / Đ32)
related_decision_backlog_entries (refs to D5)

The report is PG-persisted, with KB markdown mirror for human reading.

4.5 Detect → Review → Action Lifecycle (Q25; criterion 7, 8, 9)

Detect: signals + scores cross threshold
  → Evidence bundle: collect raw signals, units, thread refs, retrieval samples, edits
  → AI proposal: recommend one of {Split, Merge, Edge, Thread, Context-Pack, NoAction}
  → Independent review (Đ37): PASS / FAIL / NEEDS_HUMAN
  → On PASS: apply chosen action with rollback key
  → Verify: round-trip (axis-1) + axis-2 coverage check + signal recheck
  → Report: PG-persisted; mirrored to KB

Auto-action is forbidden. Even low-risk findings require review. (rev5d P7: prefer graph enrichment before structural change; criterion 8.)

4.6 Action Decision Matrix (Q15, Q16, Q17)

Situation Recommended action
Two units always travel together AND not independently meaningful Merge (with full lifecycle: superseded, aliases, redirects, edge reassignment)
One unit grew internally heterogeneous (C1A 3-question test now fails) Split
Units are independently meaningful but often travel together Edge (universal_edges, Đ39) or Context-Pack (retrieval-side bundling)
Units share thread relevance but not direct coupling Thread (membership in semantic thread, D9)
Observed signal is noise / unit health is acceptable NoAction

Decision is recorded with rationale and full evidence bundle, regardless of choice. NoAction is a first-class outcome, not skipping the workflow.

4.7 Split Lifecycle (Q15; criterion 7)

Input: failing unit with evidence bundle
  → Propose new unit set (with new manifest fragment per D2)
  → REVIEW (D2 checklist applies)
  → Old unit marked superseded; new units created
  → History preserved; spans/roles migrated; edges reassigned per universal_edges
  → Aliases/redirects from old → new
  → VERIFY: axis-1 round-trip + axis-2 coverage + signal recheck
  → Report

4.8 Merge Lifecycle (Q16; criterion 7, P7)

Input: two units with strong, justified coupling
  → Pre-condition: evidence shows units are NOT independently meaningful (P7 guardrail)
  → If P7 doubt remains → recommend Edge/Thread/Context-Pack first
  → Propose new merged unit
  → REVIEW (D2 checklist; canonical parent uniqueness)
  → Old units superseded; new unit created
  → Aliases/redirects; edges reassigned
  → VERIFY + Report

Merge-by-coupling-alone is forbidden. P7 requires evidence that units are not independently meaningful.

4.9 Edge / Thread / Context-Pack (Q17; criterion 8)

Preferred over structural change when units remain independently meaningful:

  • Edge (Đ39 universal_edges): typed relation captures the coupling; both units survive.
  • Thread (D9 membership): if coupling is part of a domain axis spanning multiple units across documents.
  • Context-Pack (D11): retrieval-side bundling; both units always returned together when queried.

The action choice is reviewed; once applied, signals are re-evaluated post-action.

4.10 NoAction

A NoAction outcome is fully documented:

  • Why the signal does not warrant change.
  • What threshold would change the decision in a future cycle.
  • An entry in Decision Backlog Registry (D5) with next_review_date.

NoAction is explicit, not silent dismissal.

4.11 Verification After F2 Actions

For any structural change (Split/Merge): axis-1 round-trip MUST re-pass against the canonical source representation; affected publications must re-render to 0 drift. For edge/thread/context-pack: axis-2 coverage check + signal recheck.

4.12 Reporting

Every F2 cycle ends with a Report (PG-persisted, KB mirror). Reports carry rollback keys; rollback for F2 follows the D1 §4.8 model.

4.13 Routing of Signals (Q43)

All signals route to:

  • Segmentation Health Report (this deliverable).
  • Decision Backlog Registry (D5) for governance.
  • Đ37 escalation queue for high-risk findings (Đ32).

No new notification system (criterion 38).

4.14 Thread-Side Signals Consumed (Q41, Q42)

D3 consumes thread signals from D9 (missing_link, wrong_link, stale, overbroad, too_narrow, expected_artifact_missing) and retrieval signals from D11 (noisy_thread, wrong_audience_result, weak_thread). These contribute to the unit/thread health scoring.

5. PG Storage per Object (Design Intent — No DDL)

Object Target DB Layer Notes
health_signal_event directus Não Per-event raw record
unit_health_score directus Não Per-unit aggregate (view or table)
thread_health_score directus Não Per-thread aggregate
coupling_score directus Não Pair-level aggregate
segmentation_health_report directus Kho Persisted reports
f2_action_decision directus Kho Action chosen + rationale + evidence
evidence_bundle (F2) directus Não JSONB envelope

6. Schema Gaps

  1. health_signal_event table — no current capability.
  2. Aggregate views (unit_health_score, thread_health_score, coupling_score) — definitions and refresh policy.
  3. segmentation_health_report — persistence schema.
  4. f2_action_decision — distinct from F1 manifest; may share envelope shape.
  5. Co-edit / co-citation / co-retrieval instrumentationrequires_instrumentation.
  6. Edge reassignment audit trail — required by Split/Merge lifecycles.
  7. Alias / redirect table — for superseded units; may exist in TAC, verify.
  8. Threshold policy table — per-signal and per-aggregate thresholds; current policy storage unclear.

7. Law References

Surface Law
Segmentation rules (post-action verify) C1A
Risk gating Đ32
Roles / escalation Đ37
Universal edges authority Đ39
Vocabulary Đ24
Manifest-as-code for action decisions Đ38

8. Open Questions

  1. Threshold values for signal aggregates — defer to policy (D5).
  2. How co-edit signals are captured given the current CDC infrastructure — defer to D4 capability intake.
  3. Should coupling_score blend retrieval and edge signals with fixed weights or learned weights? Recommendation: fixed weights v0.1, learned later via capability intake.
  4. Cadence defaults (N, Y, M) for Segmentation Health Report — policy decision.

9. Coverage

Questions covered (primary): Q15, Q16, Q17, Q22, Q23, Q24, Q25. Questions covered (secondary): Q41, Q42, Q43, Q44.

Acceptance criteria covered:

  • 7 (split/merge lifecycle)
  • 8 (edge/context-pack/no-action, not auto-merge)
  • 9 (post-cut usage review)
  • 10 (Segmentation Health Report)
  • 11 (signal classification)
  • 35 (missing/wrong link detection — supporting D9)
  • 36 (thread split/merge — supporting D9)
  • 38 (no parallel notification — supporting)

Schema gaps: 8 named (see §6).

Law dependencies: C1A, Đ24, Đ32, Đ37, Đ38, Đ39.

Open questions: 4 (see §8).

Law conflicts encountered: none. P7 guardrail enforced (no merge-by-coupling-alone).

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/dot-iu-cutter-v0.1-segmentation-health-design-2026-05-15.md