dot-iu-cutter v0.1 — Cross-Temporal Semantic Threading Design (D9)

Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §12, §13.1 Scope: DESIGN ONLY.

1. Purpose

Define Semantic Threads — living professional-domain axes that connect IUs across documents, object types, and time. Define their objects, intake flow, enrichment loop, membership lifecycle, and detection signals. The thread is not an edge type; it is a higher-order construct that contains memberships, evidence, and signals.

2. Scope

Definition of semantic_thread and family
Semantic Intake Flow
Semantic Enrichment Loop
Membership lifecycle (candidate → accepted/rejected/needs_human → stale/superseded)
Negative knowledge for rejected proposals
Missing/wrong link detection
Thread split/merge lifecycle
User-directed vs system-discovered threads (both first-class)
Industry standards usage discipline (SKOS conceptual; W3C PROV; CDC; co-citation)

Out of scope: thread-first retrieval and consumer UX (D11), health-signal handling pipeline (D3), capability intake feeding extraction quality (D4), legal mapping summary (D10).

3. Dependencies

rev5d §12 (full), §13.1 (constraints), §13.2.4 (Đ24 vocabulary)
C1A (units must already pass birth gate)
Đ24 (no parallel taxonomy), Đ32 (high-risk approval), Đ33/Đ43 (PG placement), Đ37 (governance roles), Đ38 (manifest as code), Đ39 (universal_edges first), Đ44/UOSL (schema compat target), Đ0-G (birth gate)
D1 (operational design), D2 (manifest contract), D6 (axis-2 metadata), D11 (retrieval consumer)

4. Key Decisions

4.1 Thread Definition (Q37)

A Semantic Thread is:

A living professional-domain axis (chuỗi chuyên môn).
Connects units across documents, object types, and time.
Lives in PG (not a document).
Compatible with universal_edges (Đ39) and UOSL (Đ44) — does NOT replace them.
Has two formation sources (system-discovered + user-directed) that converge into the same governance lifecycle.

A thread is NOT an edge. It is a higher-order container.

Rationale: rev5d §12.2, §13.1.1.

4.2 Thread Object Family (Q39)

Conceptual objects (PG-first design intent; final placement in D7/D10):

Object	Role
`semantic_thread`	The living domain/topic
`semantic_thread_membership`	Approved: unit X is in thread Y
`semantic_thread_candidate`	Proposed but not yet reviewed; carries confidence
`semantic_thread_evidence`	W3C PROV-style evidence record
`semantic_thread_health_signal`	Missing/wrong/stale/overbroad/too-narrow/etc.
`semantic_thread_negative_knowledge`	Persisted rejection (P9.34)
`semantic_thread_expected_chain`	JSONB declaration of expected lifecycle artifacts

universal_edges-first rule: before proposing a separate semantic_thread_membership table, evaluate whether universal_edges (with edge_kind, status, evidence payload, lifecycle) suffices. If universal_edges is sufficient → memberships are edges with edge_kind = 'thread_member'; thread metadata lives separately. If not → flag schema gap and document the missing capability.

Rationale: rev5d §12.4, §13.1.7, §13.2.5; acceptance criterion 33, 39.

4.3 Semantic Intake Flow (Q38, Q40)

New unit/object (birth event from F1 cut, or imported artifact)
  → Extract signals:
       available_now: section_type, Đ24 labels, semantic_role, tsvector keywords, citations, embedding
       requires_instrumentation: co-edit, co-citation, co-retrieval
       future_capability: full lifecycle-chain awareness, deep KG traversal
  → Match against existing threads (keyword, label, embedding, edge proximity)
  → Produce candidate memberships, each with confidence + evidence bundle
  → Persist to semantic_thread_candidate (status = 'proposed')
  → Auto-accept gate (see 4.4) OR route to review
  → On acceptance → semantic_thread_membership (via universal_edges or table per 4.2)
  → On rejection → semantic_thread_negative_knowledge

Evidence types (Q40):

Structural: section_type, label, semantic_role, canonical_address.
Reference: citation, code symbol reference, manifest cross-link.
Statistical: embedding similarity, tsvector match, co-citation.
Behavioral (requires_instrumentation): co-edit, co-retrieval.
Human: user direction.

4.4 Auto-Accept Risk Gate (Đ32; rev5d §13.1.5)

A candidate may be auto-accepted ONLY when ALL hold:

Risk class is low (no legal/governance/code-impact implication).
≥ 2 independent evidence signals support the membership.
No conflicting semantic_thread_negative_knowledge entry for this (unit, thread) pair.
Policy explicitly allows auto-accept for this thread or domain.

Otherwise the candidate goes to candidate_for_review and routes to the appropriate reviewer per Đ37. High-risk goes to human review per Đ32.

Rationale: acceptance criterion 33; rev5d §13.1.5; Đ32.

4.5 Semantic Enrichment Loop (Q26 supporting; criterion 41)

New data → intake → candidates → evidence → AI/policy review
   → accepted: enrich thread (universal_edges + thread record)
   → rejected: persist as negative_knowledge
   → next-cycle matching uses both positive memberships and negative knowledge
   → TAC progress improves extraction; KG progress improves discovery
   → positive recursion (P10)

Negative knowledge suppresses re-proposal unless materially different evidence emerges. "Materially different" criteria (proposed for design-phase definition):

Confidence increases by ≥ threshold (e.g., 0.2 absolute), OR
New independent evidence type appears (e.g., previously only embedding similarity; now also citation), OR
Source revision of one side has changed (revision-aware re-eval).

Final thresholds are policy decisions, recorded in D5 backlog.

4.6 Membership Lifecycle (Q44; criterion 33)

candidate → [review] → accepted | rejected | needs_human
accepted   → [usage]  → stale | superseded
rejected   → [new evidence materially different] → re-candidate
needs_human → [escalation closed] → accepted | rejected

Each membership row carries: status, confidence, evidence_bundle (JSONB, PROV envelope), reviewed_by, reviewed_at, provenance ∈ {accepted_by_user, accepted_by_policy, accepted_by_ai, candidate_by_ai, rejected_by_user, rejected_by_review}.

Rationale: rev5d §13.1.4, §12.7.

4.7 Negative Knowledge (criterion 34)

semantic_thread_negative_knowledge records rejection with: unit_id, thread_id, evidence_at_rejection (JSONB), reviewed_by, reviewed_at, reason_class (semantic_mismatch, scope_too_narrow, scope_too_broad, wrong_authority, contradictory_evidence, user_explicit_reject).

Re-proposal must cite the negative-knowledge entry and demonstrate material difference.

Rationale: rev5d §13.1.8.

4.8 Missing/Wrong Link Detection (Q41, Q42; criterion 35)

Missing-link signals:

Signal	Source
`high_similarity_unlinked`	Embedding clustering finds high-similarity pair with no edge
`co_retrieval_no_edge`	Retrieval logs (requires_instrumentation)
`expected_artifact_missing`	`semantic_thread_expected_chain` mentions an artifact type with no member
`cited_without_edge`	Reports cite a canonical_address with no `universal_edge`

Wrong-link signals:

Signal	Source
`reviewer_rejection`	Explicit rejection in REVIEW
`retrieval_noise`	Returned but flagged irrelevant (D11)
`contradiction`	Semantic role conflict
`low_confidence_persistent`	Confidence remained low after multiple cycles

Thread-level anomalies (Q44):

Signal	Trigger
`overbroad`	> 50 members and high heterogeneity
`too_narrow`	< 3 members and stale
`stale`	No activity > N days (N is policy)
`noisy_retrieval`	D11 retrieval flagged with high `noisy_context_rate`

All signals route to Decision Backlog Registry (D5) and/or Segmentation Health Report (D3). No new notification system is created (criterion 38; Đ37).

4.9 Expected Lifecycle Chain (Q46; rev5d §13.1.6)

Each thread has an OPTIONAL expected_chain JSONB declaration:

{
  "expected_artifacts": [
    {"kind": "law", "authority": "enacted", "required": true},
    {"kind": "design", "required": true},
    {"kind": "code", "required": false},
    {"kind": "test_report", "required": true},
    {"kind": "incident_report", "required": false}
  ]
}

The intake flow detects missing required artifacts and emits expected_artifact_missing signals. The chain is a hook, not a hard schema; v0.1 ships the hook even if no chains are populated yet.

4.10 Governance Mapping (Q45; criterion 37)

Mapping to Đ37:

Role	Threading responsibility
Owner (per thread)	Approves user-directed thread creation; resolves conflicts
Reviewer (Đ37 roles)	Reviews `candidate_for_review` memberships
Escalation queue (Đ37)	High-risk per Đ32
Council / AI Council	Resolves contested rejections; oversees overbroad/too_narrow signals
Registry custodian	Decision Backlog Registry sweeps (D5)

If no Đ37 mapping exists for a threading concern → governance gap; record in D5; do NOT invent a parallel role.

4.11 Thread Split / Merge Lifecycle (Q44; criterion 36)

Split (one thread → two): old marked superseded, references retained; memberships migrated with provenance trail; aliases preserved; Decision Backlog records the rationale.

Merge (two threads → one): both old marked superseded; new thread acquires memberships; aliases preserved; negative knowledge from both is unioned with conflict review.

A split or merge is a governance action, not an automatic operation. Auto-merge based on similarity is forbidden (rev5d P7, criterion 8).

4.12 User-Directed Threads (Q40; criterion 40)

User-directed threads are first-class, equal to AI-discovered.

User may: create thread, assign memberships, reject AI suggestions, request discovery, merge/split threads.
User direction is authoritative intent, not automatic graph truth (rev5d §13.1.4).
Memberships from user direction carry provenance = accepted_by_user.
AI may flag inconsistencies (e.g., user adds a unit that contradicts negative knowledge) but does NOT silently override the user.
Inconsistency between user direction and AI evidence → health signal user_ai_disagreement → governance queue.

4.13 Industry Standards Discipline (criterion 39; rev5d §12.3, §13.1.2, §13.1.3)

SKOS — conceptual model for thread hierarchy/relations. v0.1 does NOT introduce RDF/SPARQL/triple-store.
BERTopic / embedding clustering — used for candidate discovery only.
Entity linking — IU content → thread membership candidate.
Co-citation — bibliometrics for co-occurrence; treated as evidence, not decision.
W3C PROV — evidence envelope discipline. Minimum: evidence_source, evidence_method, generated_by, generated_at, source_object_id, target_thread_or_object, confidence, review_status, reviewer, decision_reason.
CDC — PG trigger + NOTIFY/LISTEN for intake triggers.
PG-native maximum: tsvector, pg_trgm, triggers, materialized views, NOTIFY/LISTEN. Qdrant for vectors.
No new stack (Neo4j, triple store, ML pipeline) unless a gap is proven and approved per Đ32.

4.14 Positive Recursion (Q28 supporting; P10; criterion 41)

The threading subsystem is part of positive recursion:

New TAC capabilities → improved extraction → better intake.
New KG capabilities → improved candidate discovery.
Approved memberships and negative knowledge → smarter next-cycle matching.
Capability intake (D4) records these improvements.

5. PG Storage per Object (Design Intent — No DDL)

Object	Target DB	Layer	Notes
semantic_thread	directus	Kho	Living domain record
semantic_thread_membership	directus	Kho	Prefer reuse of `universal_edges` with `edge_kind='thread_member'`; if reuse impossible, flag gap
semantic_thread_candidate	directus	Não	Lifecycle status, confidence
semantic_thread_evidence	directus	Não	JSONB PROV envelope
semantic_thread_health_signal	directus	Não	Routes to D3 / D5
semantic_thread_negative_knowledge	directus	Kho	Persistent rejection memory
semantic_thread_expected_chain	directus	Não	JSONB on thread record

Final placement of separate-table vs universal_edges reuse: must be resolved in D7 (UOSL compat) and D10 (Legal Alignment) per Đ33/Đ43, Đ39.

6. Schema Gaps

semantic_thread core table — needs placement decision; not present today.
universal_edges extension — needs edge_kind value thread_member and possibly status/confidence fields; current Đ39 schema gap (verify).
semantic_thread_candidate lifecycle status enum — Đ24 vocabulary placement.
evidence_bundle PROV envelope — JSONB shape; minimum field set listed in §4.13 but not codified.
semantic_thread_negative_knowledge persistence — no current capability for rejection memory.
expected_chain JSONB — new field on thread record.
semantic_thread_health_signal routing — must reuse existing Decision Backlog Registry (D5), not new system.
CDC trigger plumbing — PG triggers / NOTIFY-LISTEN on tac_logical_unit birth event; instrumentation gap.
Co-edit / co-retrieval instrumentation — requires_instrumentation (criterion 11).
user_ai_disagreement health signal — vocabulary gap.

7. Law References

Surface	Law
Universal edges first	Đ39
Vocabulary	Đ24
Risk approval	Đ32
Governance roles	Đ37
Manifest-as-code style evidence	Đ38
Schema compat target	Đ44
PG placement	Đ33 / Đ43
Birth gate	Đ0-G

8. Open Questions

Exact threshold values for auto-accept and material-difference (defer to policy doc / D5 backlog).
Where semantic_thread lives if universal_edges doesn't fit — directus vs separate DB. Defer to D7/D10.
How is "thread heterogeneity" measured for overbroad detection — embedding variance? Edge density? Defer to D3.
Should semantic_thread_negative_knowledge decay over time, or is rejection permanent until material difference? Recommendation: permanent until material difference, with optional periodic review.

9. Coverage

Questions covered (primary): Q37, Q38, Q39, Q40, Q41, Q42, Q43, Q44, Q45, Q46. Questions covered (secondary): Q22, Q32.

Acceptance criteria covered:

31 (cross-temporal semantic threading)
32 (semantic intake flow)
33 (candidate/thread membership lifecycle)
34 (negative knowledge)
35 (missing/wrong link detection)
36 (thread split/merge lifecycle)
38 (no parallel notification — supporting)
39 (industry standards leverage)
40 (user-directed + system-discovered both first-class)
41 (TAC/KG progress feeds quality)

Schema gaps: 10 named (see §6).

Law dependencies: Đ24, Đ32, Đ33/Đ43, Đ37, Đ38, Đ39, Đ44, Đ0-G.

Open questions: 4 (see §8).

Law conflicts encountered: none.