QDRANT FULL AUDIT — Vector Sync Investigation
QDRANT FULL AUDIT — Vector Sync Investigation
Date: 2026-04-05 | Status: AUDIT COMPLETE — 0 code changes
1. Actual Orphan Count (NOT hand-calculated)
POST /kb/audit-sync response:
total_documents: 877
total_vectors: 1,324
ratio: 1.51 (NORMAL — chunking)
ghost_count: 26 (docs without vectors)
orphan_count: 4 (vectors without docs)
status: needs_cleanup
PREVIOUS ESTIMATE WAS WRONG: Earlier reports said "444 orphans" by doing vectors - docs. That's WRONG because documents >4000 chars are chunked into MULTIPLE vectors. Ratio 1.51 is expected.
ACTUAL orphans: only 4 vectors. ACTUAL ghosts: 26 docs without vectors.
4 Orphan Vectors (vectors for deleted docs)
| document_id | Cause |
|---|---|
knowledge/test/upload-check-20260405 |
Test doc created + deleted this session |
knowledge__current-state__project-progress-tracker.md |
Doc deleted then recreated (PG direct update) |
knowledge__current-state__reports__kb-protection-phase2-report |
Same pattern |
knowledge__current-state__reports__trigger-guard-d26-p3-report |
Same pattern |
Root cause: PG direct UPDATE (bypass API) creates new PG key but Qdrant retains vectors under old key format.
26 Ghost Docs (docs without vectors)
| Category | Count | Examples |
|---|---|---|
| Empty/folder docs | 3 | "", knowledge/dev, knowledge/dev/blueprints |
| Task comments | 13 | operations/tasks/comments/comment-* |
| Legacy reports | 2 | mission-count-verify-report, mission-registry-pg-report |
| Registries (Directus sync) | 3 | registries/workflows/wf-1, etc. |
| Test/moved | 2 | test/conn-audit-moved, test/f1-moved |
| Tasks | 1 | operations/tasks/task-22 |
Root cause ghosts: Task comments and registries are synced from Directus via directus_sync.py → writes to PG via event system → but directus_sync.py does NOT call vector layer. No embedding generated.
2. Code Path Map — ALL write paths to kb_documents + Qdrant
PATH 1: API endpoints (CORRECT — syncs both PG + Qdrant)
POST /documents → server.py:create_document()
→ pg_store.upsert() + vector_store.upsert_document()
PUT /documents/{id} → server.py:update_document()
→ pg_store.upsert() + vector_store.upsert_document()
DELETE /documents/{id} → server.py:delete_document()
→ pg_store.soft_delete() + vector_store.delete_document()
PATH 2: directus_sync.py (PARTIAL — Directus→PG only, NO vectors)
Event: document.created/updated/deleted from Directus flows
→ directus_sync.py handles Directus→Directus (NOT Agent Data→Qdrant)
NOTE: directus_sync listens for events AFTER API writes.
It syncs TO Directus, not FROM Directus. No vector bypass here.
PATH 3: PG direct SQL (BYPASS — no Qdrant)
SSH → psql → UPDATE kb_documents SET data = ...
→ PG triggers fire (snapshot, audit, updated_at)
→ Qdrant NOT touched
Used by: agent sessions (S166-KB report updates), manual fixes
PATH 4: Reconcile scripts (PARTIAL)
reconcile-knowledge.sh/py → Directus→Agent Data API → correct path
reconcile-tasks.sh/py → Directus→Agent Data (tasks, not KB)
PATH 5: MCP tools
upload_document → calls API POST /documents → correct path
get_document_for_rewrite → read only
Bypass Paths Causing Issues
| Path | Issue | Impact |
|---|---|---|
| PG direct SQL | Modifies kb_documents without touching Qdrant | Stale vectors (orphans with old doc_id format) |
| Task/comment sync | Written to PG but never embedded | 13 ghost task comments |
| Registry sync | Written to PG but no embed call | 3 ghost registry items |
| Folder/empty docs | No content to embed | 3 ghost empty docs |
3. DOT/Cron Inventory for Vector Layer
Currently ALIVE on VPS
| Tool/Cron | Path | Schedule | Status |
|---|---|---|---|
| dot-vector-audit | /opt/incomex/dot/bin/ |
NOT in crontab | Manual only |
| dot-vector-audit-schedule | /opt/incomex/dot/bin/ |
macOS LaunchAgent (not VPS) | N/A on VPS |
| qdrant-backup.sh | /opt/incomex/scripts/ |
NOT in crontab | Manual only |
| dot-orphan-scanner | /opt/incomex/dot/bin/ |
daily 2:30AM | Directus orphans (NOT Qdrant) |
DEAD (Cloud/CICD)
| Tool | Was | Status |
|---|---|---|
GH Actions vector-audit.yml |
CICD workflow | DISABLED (S167) |
| Cloud Scheduler | GCP cron | DEAD (infra retired) |
GAP: No automated vector audit running on VPS. dot-vector-audit exists but NOT scheduled.
4. PG Triggers — Do they touch Qdrant?
| Trigger | Function | Calls Qdrant? |
|---|---|---|
| trg_kb_snapshot_update | fn_kb_snapshot | NO |
| trg_kb_snapshot_delete | fn_kb_snapshot | NO |
| trg_kb_audit | fn_kb_audit | NO |
| trg_kb_truncation_guard | fn_kb_truncation_guard | NO |
| trg_kb_updated_at | fn_kb_updated_at | NO |
PG triggers are PG-only. Vector sync happens ONLY in application layer (server.py).
5. Fix Proposal (PERMANENT — CQ-1, NT-12)
Fix 1: Cleanup (one-time)
# Delete 4 orphan vectors
POST /kb/cleanup-orphans (dry_run=false)
# Reindex 26 ghost docs (embed missing content)
POST /kb/reindex-missing
Fix 2: Schedule dot-vector-audit on VPS cron (PERMANENT)
# Add to crontab — daily 4:30AM UTC (after dot-kb-verify)
30 4 * * * /opt/incomex/dot/bin/dot-vector-audit --heal --local >> /var/log/incomex/dot-vector-audit.log 2>&1
DOT pairing: dot-vector-audit (Tier A, heal) ↔ dot-kb-verify (Tier A, detect)
Fix 3: Prevent future ghosts from non-embeddable docs
Documents that should NOT be embedded (task comments, empty folders, registries): add skip_embedding: true flag in metadata. API respects flag → no ghost.
Fix 4: Prevent PG direct SQL bypass
Document the rule: NEVER update kb_documents via SQL if Qdrant sync is needed. Use API endpoints. PG direct SQL is only for metadata fields that don't affect content/embedding.
Summary
| Metric | Value |
|---|---|
| Actual orphans | 4 (not 444) |
| Ghost docs | 26 (no vectors) |
| Ratio docs:vectors | 1.51 (normal, chunking) |
| Bypass paths found | 1 (PG direct SQL) |
| DOT gap | dot-vector-audit NOT scheduled |
| Fix effort | ~30 min (cleanup + cron + document rule) |
Audit complete | 0 code changes | Fix proposal ready for approval