KB-78F9

QDRANT FULL AUDIT — Vector Sync Investigation

7 min read Revision 1
reportqdrantvectorauditorphan2026-04-05

QDRANT FULL AUDIT — Vector Sync Investigation

Date: 2026-04-05 | Status: AUDIT COMPLETE — 0 code changes


1. Actual Orphan Count (NOT hand-calculated)

POST /kb/audit-sync response:
  total_documents: 877
  total_vectors: 1,324
  ratio: 1.51 (NORMAL — chunking)
  ghost_count: 26 (docs without vectors)
  orphan_count: 4 (vectors without docs)
  status: needs_cleanup

PREVIOUS ESTIMATE WAS WRONG: Earlier reports said "444 orphans" by doing vectors - docs. That's WRONG because documents >4000 chars are chunked into MULTIPLE vectors. Ratio 1.51 is expected.

ACTUAL orphans: only 4 vectors. ACTUAL ghosts: 26 docs without vectors.

4 Orphan Vectors (vectors for deleted docs)

document_id Cause
knowledge/test/upload-check-20260405 Test doc created + deleted this session
knowledge__current-state__project-progress-tracker.md Doc deleted then recreated (PG direct update)
knowledge__current-state__reports__kb-protection-phase2-report Same pattern
knowledge__current-state__reports__trigger-guard-d26-p3-report Same pattern

Root cause: PG direct UPDATE (bypass API) creates new PG key but Qdrant retains vectors under old key format.

26 Ghost Docs (docs without vectors)

Category Count Examples
Empty/folder docs 3 "", knowledge/dev, knowledge/dev/blueprints
Task comments 13 operations/tasks/comments/comment-*
Legacy reports 2 mission-count-verify-report, mission-registry-pg-report
Registries (Directus sync) 3 registries/workflows/wf-1, etc.
Test/moved 2 test/conn-audit-moved, test/f1-moved
Tasks 1 operations/tasks/task-22

Root cause ghosts: Task comments and registries are synced from Directus via directus_sync.py → writes to PG via event system → but directus_sync.py does NOT call vector layer. No embedding generated.


2. Code Path Map — ALL write paths to kb_documents + Qdrant

PATH 1: API endpoints (CORRECT — syncs both PG + Qdrant)
  POST /documents → server.py:create_document()
    → pg_store.upsert() + vector_store.upsert_document()
  PUT /documents/{id} → server.py:update_document()
    → pg_store.upsert() + vector_store.upsert_document()
  DELETE /documents/{id} → server.py:delete_document()
    → pg_store.soft_delete() + vector_store.delete_document()

PATH 2: directus_sync.py (PARTIAL — Directus→PG only, NO vectors)
  Event: document.created/updated/deleted from Directus flows
    → directus_sync.py handles Directus→Directus (NOT Agent Data→Qdrant)
    NOTE: directus_sync listens for events AFTER API writes.
    It syncs TO Directus, not FROM Directus. No vector bypass here.

PATH 3: PG direct SQL (BYPASS — no Qdrant)
  SSH → psql → UPDATE kb_documents SET data = ...
    → PG triggers fire (snapshot, audit, updated_at)
    → Qdrant NOT touched
  Used by: agent sessions (S166-KB report updates), manual fixes

PATH 4: Reconcile scripts (PARTIAL)
  reconcile-knowledge.sh/py → Directus→Agent Data API → correct path
  reconcile-tasks.sh/py → Directus→Agent Data (tasks, not KB)

PATH 5: MCP tools
  upload_document → calls API POST /documents → correct path
  get_document_for_rewrite → read only

Bypass Paths Causing Issues

Path Issue Impact
PG direct SQL Modifies kb_documents without touching Qdrant Stale vectors (orphans with old doc_id format)
Task/comment sync Written to PG but never embedded 13 ghost task comments
Registry sync Written to PG but no embed call 3 ghost registry items
Folder/empty docs No content to embed 3 ghost empty docs

3. DOT/Cron Inventory for Vector Layer

Currently ALIVE on VPS

Tool/Cron Path Schedule Status
dot-vector-audit /opt/incomex/dot/bin/ NOT in crontab Manual only
dot-vector-audit-schedule /opt/incomex/dot/bin/ macOS LaunchAgent (not VPS) N/A on VPS
qdrant-backup.sh /opt/incomex/scripts/ NOT in crontab Manual only
dot-orphan-scanner /opt/incomex/dot/bin/ daily 2:30AM Directus orphans (NOT Qdrant)

DEAD (Cloud/CICD)

Tool Was Status
GH Actions vector-audit.yml CICD workflow DISABLED (S167)
Cloud Scheduler GCP cron DEAD (infra retired)

GAP: No automated vector audit running on VPS. dot-vector-audit exists but NOT scheduled.


4. PG Triggers — Do they touch Qdrant?

Trigger Function Calls Qdrant?
trg_kb_snapshot_update fn_kb_snapshot NO
trg_kb_snapshot_delete fn_kb_snapshot NO
trg_kb_audit fn_kb_audit NO
trg_kb_truncation_guard fn_kb_truncation_guard NO
trg_kb_updated_at fn_kb_updated_at NO

PG triggers are PG-only. Vector sync happens ONLY in application layer (server.py).


5. Fix Proposal (PERMANENT — CQ-1, NT-12)

Fix 1: Cleanup (one-time)

# Delete 4 orphan vectors
POST /kb/cleanup-orphans  (dry_run=false)
# Reindex 26 ghost docs (embed missing content)
POST /kb/reindex-missing

Fix 2: Schedule dot-vector-audit on VPS cron (PERMANENT)

# Add to crontab — daily 4:30AM UTC (after dot-kb-verify)
30 4 * * * /opt/incomex/dot/bin/dot-vector-audit --heal --local >> /var/log/incomex/dot-vector-audit.log 2>&1

DOT pairing: dot-vector-audit (Tier A, heal) ↔ dot-kb-verify (Tier A, detect)

Fix 3: Prevent future ghosts from non-embeddable docs

Documents that should NOT be embedded (task comments, empty folders, registries): add skip_embedding: true flag in metadata. API respects flag → no ghost.

Fix 4: Prevent PG direct SQL bypass

Document the rule: NEVER update kb_documents via SQL if Qdrant sync is needed. Use API endpoints. PG direct SQL is only for metadata fields that don't affect content/embedding.


Summary

Metric Value
Actual orphans 4 (not 444)
Ghost docs 26 (no vectors)
Ratio docs:vectors 1.51 (normal, chunking)
Bypass paths found 1 (PG direct SQL)
DOT gap dot-vector-audit NOT scheduled
Fix effort ~30 min (cleanup + cron + document rule)

Audit complete | 0 code changes | Fix proposal ready for approval