KB-7831

dot-iu-cutter v0.1 — Cross-Temporal Semantic Threading Design

16 min read Revision 1
dot-iu-cutterdesignsemantic-threadingcross-temporalrev5d

dot-iu-cutter v0.1 — Cross-Temporal Semantic Threading Design (D9)

Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §12, §13.1 Scope: DESIGN ONLY.


1. Purpose

Define Semantic Threads — living professional-domain axes that connect IUs across documents, object types, and time. Define their objects, intake flow, enrichment loop, membership lifecycle, and detection signals. The thread is not an edge type; it is a higher-order construct that contains memberships, evidence, and signals.

2. Scope

  • Definition of semantic_thread and family
  • Semantic Intake Flow
  • Semantic Enrichment Loop
  • Membership lifecycle (candidate → accepted/rejected/needs_human → stale/superseded)
  • Negative knowledge for rejected proposals
  • Missing/wrong link detection
  • Thread split/merge lifecycle
  • User-directed vs system-discovered threads (both first-class)
  • Industry standards usage discipline (SKOS conceptual; W3C PROV; CDC; co-citation)

Out of scope: thread-first retrieval and consumer UX (D11), health-signal handling pipeline (D3), capability intake feeding extraction quality (D4), legal mapping summary (D10).

3. Dependencies

  • rev5d §12 (full), §13.1 (constraints), §13.2.4 (Đ24 vocabulary)
  • C1A (units must already pass birth gate)
  • Đ24 (no parallel taxonomy), Đ32 (high-risk approval), Đ33/Đ43 (PG placement), Đ37 (governance roles), Đ38 (manifest as code), Đ39 (universal_edges first), Đ44/UOSL (schema compat target), Đ0-G (birth gate)
  • D1 (operational design), D2 (manifest contract), D6 (axis-2 metadata), D11 (retrieval consumer)

4. Key Decisions

4.1 Thread Definition (Q37)

A Semantic Thread is:

  • A living professional-domain axis (chuỗi chuyên môn).
  • Connects units across documents, object types, and time.
  • Lives in PG (not a document).
  • Compatible with universal_edges (Đ39) and UOSL (Đ44) — does NOT replace them.
  • Has two formation sources (system-discovered + user-directed) that converge into the same governance lifecycle.

A thread is NOT an edge. It is a higher-order container.

Rationale: rev5d §12.2, §13.1.1.

4.2 Thread Object Family (Q39)

Conceptual objects (PG-first design intent; final placement in D7/D10):

Object Role
semantic_thread The living domain/topic
semantic_thread_membership Approved: unit X is in thread Y
semantic_thread_candidate Proposed but not yet reviewed; carries confidence
semantic_thread_evidence W3C PROV-style evidence record
semantic_thread_health_signal Missing/wrong/stale/overbroad/too-narrow/etc.
semantic_thread_negative_knowledge Persisted rejection (P9.34)
semantic_thread_expected_chain JSONB declaration of expected lifecycle artifacts

universal_edges-first rule: before proposing a separate semantic_thread_membership table, evaluate whether universal_edges (with edge_kind, status, evidence payload, lifecycle) suffices. If universal_edges is sufficient → memberships are edges with edge_kind = 'thread_member'; thread metadata lives separately. If not → flag schema gap and document the missing capability.

Rationale: rev5d §12.4, §13.1.7, §13.2.5; acceptance criterion 33, 39.

4.3 Semantic Intake Flow (Q38, Q40)

New unit/object (birth event from F1 cut, or imported artifact)
  → Extract signals:
       available_now: section_type, Đ24 labels, semantic_role, tsvector keywords, citations, embedding
       requires_instrumentation: co-edit, co-citation, co-retrieval
       future_capability: full lifecycle-chain awareness, deep KG traversal
  → Match against existing threads (keyword, label, embedding, edge proximity)
  → Produce candidate memberships, each with confidence + evidence bundle
  → Persist to semantic_thread_candidate (status = 'proposed')
  → Auto-accept gate (see 4.4) OR route to review
  → On acceptance → semantic_thread_membership (via universal_edges or table per 4.2)
  → On rejection → semantic_thread_negative_knowledge

Evidence types (Q40):

  • Structural: section_type, label, semantic_role, canonical_address.
  • Reference: citation, code symbol reference, manifest cross-link.
  • Statistical: embedding similarity, tsvector match, co-citation.
  • Behavioral (requires_instrumentation): co-edit, co-retrieval.
  • Human: user direction.

4.4 Auto-Accept Risk Gate (Đ32; rev5d §13.1.5)

A candidate may be auto-accepted ONLY when ALL hold:

  1. Risk class is low (no legal/governance/code-impact implication).
  2. ≥ 2 independent evidence signals support the membership.
  3. No conflicting semantic_thread_negative_knowledge entry for this (unit, thread) pair.
  4. Policy explicitly allows auto-accept for this thread or domain.

Otherwise the candidate goes to candidate_for_review and routes to the appropriate reviewer per Đ37. High-risk goes to human review per Đ32.

Rationale: acceptance criterion 33; rev5d §13.1.5; Đ32.

4.5 Semantic Enrichment Loop (Q26 supporting; criterion 41)

New data → intake → candidates → evidence → AI/policy review
   → accepted: enrich thread (universal_edges + thread record)
   → rejected: persist as negative_knowledge
   → next-cycle matching uses both positive memberships and negative knowledge
   → TAC progress improves extraction; KG progress improves discovery
   → positive recursion (P10)

Negative knowledge suppresses re-proposal unless materially different evidence emerges. "Materially different" criteria (proposed for design-phase definition):

  • Confidence increases by ≥ threshold (e.g., 0.2 absolute), OR
  • New independent evidence type appears (e.g., previously only embedding similarity; now also citation), OR
  • Source revision of one side has changed (revision-aware re-eval).

Final thresholds are policy decisions, recorded in D5 backlog.

4.6 Membership Lifecycle (Q44; criterion 33)

candidate → [review] → accepted | rejected | needs_human
accepted   → [usage]  → stale | superseded
rejected   → [new evidence materially different] → re-candidate
needs_human → [escalation closed] → accepted | rejected

Each membership row carries: status, confidence, evidence_bundle (JSONB, PROV envelope), reviewed_by, reviewed_at, provenance ∈ {accepted_by_user, accepted_by_policy, accepted_by_ai, candidate_by_ai, rejected_by_user, rejected_by_review}.

Rationale: rev5d §13.1.4, §12.7.

4.7 Negative Knowledge (criterion 34)

semantic_thread_negative_knowledge records rejection with: unit_id, thread_id, evidence_at_rejection (JSONB), reviewed_by, reviewed_at, reason_class (semantic_mismatch, scope_too_narrow, scope_too_broad, wrong_authority, contradictory_evidence, user_explicit_reject).

Re-proposal must cite the negative-knowledge entry and demonstrate material difference.

Rationale: rev5d §13.1.8.

Missing-link signals:

Signal Source
high_similarity_unlinked Embedding clustering finds high-similarity pair with no edge
co_retrieval_no_edge Retrieval logs (requires_instrumentation)
expected_artifact_missing semantic_thread_expected_chain mentions an artifact type with no member
cited_without_edge Reports cite a canonical_address with no universal_edge

Wrong-link signals:

Signal Source
reviewer_rejection Explicit rejection in REVIEW
retrieval_noise Returned but flagged irrelevant (D11)
contradiction Semantic role conflict
low_confidence_persistent Confidence remained low after multiple cycles

Thread-level anomalies (Q44):

Signal Trigger
overbroad > 50 members and high heterogeneity
too_narrow < 3 members and stale
stale No activity > N days (N is policy)
noisy_retrieval D11 retrieval flagged with high noisy_context_rate

All signals route to Decision Backlog Registry (D5) and/or Segmentation Health Report (D3). No new notification system is created (criterion 38; Đ37).

4.9 Expected Lifecycle Chain (Q46; rev5d §13.1.6)

Each thread has an OPTIONAL expected_chain JSONB declaration:

{
  "expected_artifacts": [
    {"kind": "law", "authority": "enacted", "required": true},
    {"kind": "design", "required": true},
    {"kind": "code", "required": false},
    {"kind": "test_report", "required": true},
    {"kind": "incident_report", "required": false}
  ]
}

The intake flow detects missing required artifacts and emits expected_artifact_missing signals. The chain is a hook, not a hard schema; v0.1 ships the hook even if no chains are populated yet.

4.10 Governance Mapping (Q45; criterion 37)

Mapping to Đ37:

Role Threading responsibility
Owner (per thread) Approves user-directed thread creation; resolves conflicts
Reviewer (Đ37 roles) Reviews candidate_for_review memberships
Escalation queue (Đ37) High-risk per Đ32
Council / AI Council Resolves contested rejections; oversees overbroad/too_narrow signals
Registry custodian Decision Backlog Registry sweeps (D5)

If no Đ37 mapping exists for a threading concern → governance gap; record in D5; do NOT invent a parallel role.

4.11 Thread Split / Merge Lifecycle (Q44; criterion 36)

Split (one thread → two): old marked superseded, references retained; memberships migrated with provenance trail; aliases preserved; Decision Backlog records the rationale.

Merge (two threads → one): both old marked superseded; new thread acquires memberships; aliases preserved; negative knowledge from both is unioned with conflict review.

A split or merge is a governance action, not an automatic operation. Auto-merge based on similarity is forbidden (rev5d P7, criterion 8).

4.12 User-Directed Threads (Q40; criterion 40)

User-directed threads are first-class, equal to AI-discovered.

  • User may: create thread, assign memberships, reject AI suggestions, request discovery, merge/split threads.
  • User direction is authoritative intent, not automatic graph truth (rev5d §13.1.4).
  • Memberships from user direction carry provenance = accepted_by_user.
  • AI may flag inconsistencies (e.g., user adds a unit that contradicts negative knowledge) but does NOT silently override the user.
  • Inconsistency between user direction and AI evidence → health signal user_ai_disagreement → governance queue.

4.13 Industry Standards Discipline (criterion 39; rev5d §12.3, §13.1.2, §13.1.3)

  • SKOS — conceptual model for thread hierarchy/relations. v0.1 does NOT introduce RDF/SPARQL/triple-store.
  • BERTopic / embedding clustering — used for candidate discovery only.
  • Entity linking — IU content → thread membership candidate.
  • Co-citation — bibliometrics for co-occurrence; treated as evidence, not decision.
  • W3C PROV — evidence envelope discipline. Minimum: evidence_source, evidence_method, generated_by, generated_at, source_object_id, target_thread_or_object, confidence, review_status, reviewer, decision_reason.
  • CDC — PG trigger + NOTIFY/LISTEN for intake triggers.
  • PG-native maximum: tsvector, pg_trgm, triggers, materialized views, NOTIFY/LISTEN. Qdrant for vectors.
  • No new stack (Neo4j, triple store, ML pipeline) unless a gap is proven and approved per Đ32.

4.14 Positive Recursion (Q28 supporting; P10; criterion 41)

The threading subsystem is part of positive recursion:

  • New TAC capabilities → improved extraction → better intake.
  • New KG capabilities → improved candidate discovery.
  • Approved memberships and negative knowledge → smarter next-cycle matching.
  • Capability intake (D4) records these improvements.

5. PG Storage per Object (Design Intent — No DDL)

Object Target DB Layer Notes
semantic_thread directus Kho Living domain record
semantic_thread_membership directus Kho Prefer reuse of universal_edges with edge_kind='thread_member'; if reuse impossible, flag gap
semantic_thread_candidate directus Não Lifecycle status, confidence
semantic_thread_evidence directus Não JSONB PROV envelope
semantic_thread_health_signal directus Não Routes to D3 / D5
semantic_thread_negative_knowledge directus Kho Persistent rejection memory
semantic_thread_expected_chain directus Não JSONB on thread record

Final placement of separate-table vs universal_edges reuse: must be resolved in D7 (UOSL compat) and D10 (Legal Alignment) per Đ33/Đ43, Đ39.

6. Schema Gaps

  1. semantic_thread core table — needs placement decision; not present today.
  2. universal_edges extension — needs edge_kind value thread_member and possibly status/confidence fields; current Đ39 schema gap (verify).
  3. semantic_thread_candidate lifecycle status enum — Đ24 vocabulary placement.
  4. evidence_bundle PROV envelope — JSONB shape; minimum field set listed in §4.13 but not codified.
  5. semantic_thread_negative_knowledge persistence — no current capability for rejection memory.
  6. expected_chain JSONB — new field on thread record.
  7. semantic_thread_health_signal routing — must reuse existing Decision Backlog Registry (D5), not new system.
  8. CDC trigger plumbing — PG triggers / NOTIFY-LISTEN on tac_logical_unit birth event; instrumentation gap.
  9. Co-edit / co-retrieval instrumentationrequires_instrumentation (criterion 11).
  10. user_ai_disagreement health signal — vocabulary gap.

7. Law References

Surface Law
Universal edges first Đ39
Vocabulary Đ24
Risk approval Đ32
Governance roles Đ37
Manifest-as-code style evidence Đ38
Schema compat target Đ44
PG placement Đ33 / Đ43
Birth gate Đ0-G

8. Open Questions

  1. Exact threshold values for auto-accept and material-difference (defer to policy doc / D5 backlog).
  2. Where semantic_thread lives if universal_edges doesn't fit — directus vs separate DB. Defer to D7/D10.
  3. How is "thread heterogeneity" measured for overbroad detection — embedding variance? Edge density? Defer to D3.
  4. Should semantic_thread_negative_knowledge decay over time, or is rejection permanent until material difference? Recommendation: permanent until material difference, with optional periodic review.

9. Coverage

Questions covered (primary): Q37, Q38, Q39, Q40, Q41, Q42, Q43, Q44, Q45, Q46. Questions covered (secondary): Q22, Q32.

Acceptance criteria covered:

  • 31 (cross-temporal semantic threading)
  • 32 (semantic intake flow)
  • 33 (candidate/thread membership lifecycle)
  • 34 (negative knowledge)
  • 35 (missing/wrong link detection)
  • 36 (thread split/merge lifecycle)
  • 38 (no parallel notification — supporting)
  • 39 (industry standards leverage)
  • 40 (user-directed + system-discovered both first-class)
  • 41 (TAC/KG progress feeds quality)

Schema gaps: 10 named (see §6).

Law dependencies: Đ24, Đ32, Đ33/Đ43, Đ37, Đ38, Đ39, Đ44, Đ0-G.

Open questions: 4 (see §8).

Law conflicts encountered: none.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/dot-iu-cutter-v0.1-cross-temporal-semantic-threading-design-2026-05-15.md