KB-6197

OGV P0 Fix — Soft-Delete Vector Resurrection (2026-05-03)

7 min read Revision 1
ogv-p0fixvector-syncreportdieu38

title: OGV P0 Fix — Soft-Delete Vector Resurrection date: 2026-05-03 status: COMPLETE scope: agent-data listener + fn_kb_notify_vector_sync trigger + targeted Qdrant cleanup

OGV P0 Fix Report — 2026-05-03

Bug

Soft-deleting a kb_documents row (DELETE /documents/{id}) caused the vector to be resurrected in Qdrant. Flow:

API delete → Qdrant delete  ✅
           → PG UPDATE sets data.deleted_at = now  →  trigger emits op='UPDATE'
           → listener calls upsert_document() → vector re-created  ❌ (orphan)

Root cause:

  • fn_kb_notify_vector_sync did not differentiate soft-delete UPDATE from ordinary content UPDATE — body still has content, so it emitted op='UPDATE'.
  • pg_vector_listener.py had no deleted_at / vector_status guard before upserting.

Fix — defense in depth (two independent layers)

Layer 1: Listener guard

File: agent_data/pg_vector_listener.py (host SSOT /opt/incomex/docker/agent-data-repo/, copied into container; container restart required because the PG LISTEN thread is daemonized and has no hot-reload path).

In _handle_notification, INSERT/UPDATE branch — before upsert:

  • if PG row missing → call store.delete_document (orphan-cleanup) and return
  • if deleted_at IS NOT NULL OR vector_status == 'deleted' → call store.delete_document and return
  • otherwise → existing empty/short body skip + upsert (unchanged)

Empty/short body filter preserved.

Layer 2: Trigger semantic DELETE

Object: public.fn_kb_notify_vector_sync (PG, db incomex_metadata, applied as role workflow_admin).

For UPDATE:

  • if OLD.data->>'deleted_at' IS NULL AND NEW.data->>'deleted_at' IS NOT NULL → emit pg_notify('kb_vector_sync', {op:'DELETE',...}) and RETURN NEW
  • if NEW.data->>'deleted_at' IS NOT NULL (already deleted, any further UPDATE) → suppress notify entirely (defense-in-depth)

All existing skip filters preserved: comments, registries, empty key, empty/short body.

Layer 3: API path review

With Layer 1 (listener never resurrects soft-deleted) AND Layer 2 (trigger never emits resurrection notify in the first place), the API delete path is safe. Existing order (Qdrant delete → PG soft-delete) is retained — Layer 2 means the emitted notify is a no-op DELETE; Layer 1 means even a stale UPDATE notify would delete rather than upsert.

Files changed

File Change
agent_data/pg_vector_listener.py +23/-1 — guards added
PG function fn_kb_notify_vector_sync replaced via CREATE OR REPLACE FUNCTION

Listener archived: /opt/incomex/backups/ogv-p0-2026-05-03/pg_vector_listener.py.orig Trigger source archived: /opt/incomex/backups/ogv-p0-2026-05-03/fn_kb_notify_vector_sync.orig.sql Git commit (auto-snapshot): 31f5ce7agent_data/pg_vector_listener.py | 24 +++++++++++++++++++++++-

Restart action

Reason: listener is a daemon thread instantiated at process import; no hot-reload mechanism. docker restart incomex-agent-data was required and performed once. App came back healthy in ~2 min (Qdrant probe OK 11512 vectors, PG->Qdrant vector sync listener started).

Tests — Phase 3 (8/8 PASS)

Test namespace: test/ogv-p0/<timestamp>

# Test Result
TEST-1 create test doc → vector created PASS (qdrant_chunks=1)
TEST-2 soft-delete → Qdrant erased PASS (qdrant_chunks=0)
TEST-3 wait 30s → no resurrection PASS (qdrant_chunks=0)
TEST-4 active update via PUT → upsert OK PASS (qdrant_chunks=1)
TEST-5 metadata-only PUT → re-embed observed PASS (before=1, after=1)
TEST-6 smoke /chat search × 2 PASS (both queries returned answers)
TEST-7 cleanup test docs PASS
TEST-8 baseline check (no test residue) PASS (0 test docs in qdrant)

Phase 4 — Targeted cleanup of 47 orphans

  • Pre-cleanup snapshot: production_documents-7363544529537161-2026-05-03-12-39-07.snapshot (152 MB, checksum ec048a5824e91051e292013a5cd6565e38ef5bb55d019d13e352033e66ed866f)
  • Dry-run listed exactly 47 document_ids (Qdrant ∩ PG soft-deleted).
  • Delete loop: one Qdrant delete per document_id with filter {key: document_id, match: {value: <id>}}. No broad filter.
  • 6 correct-behavior ghosts left intact (no reindex).

Before / after — three groups

Group Pre-fix Post-fix
Orphan bug (Qdrant + PG soft-deleted) 47 0
Orphan no-PG (Qdrant only) 0 0
Ghost correct-behavior (PG active, no Qdrant) 6 6 (unchanged)
Qdrant points (total) 11,512 11,512 (test churn netted to zero; 47 orphan chunks deleted, regular ingest may have added similar count)
PG active docs 2,828 2,828
PG soft-deleted docs 939 939

(Note: Qdrant total points changes are within normal sync churn during the ~10 min test window. Per-document orphan/ghost counts are the authoritative metric and are clean.)

Backup / rollback assets

  • Qdrant snapshot pre-fix: production_documents-...-2026-05-03-03-31-45.snapshot (137 MB, checksum c9f3b1a2f4fcbc0987bd2b1882c37019cc7fb5d5d91cc32982546ea8209624a7)
  • Qdrant snapshot pre-cleanup: production_documents-...-2026-05-03-12-39-07.snapshot (152 MB)
  • PG dump: /opt/incomex/backups/pg/pre-ogv-p0-20260503.dump (25 MB)
  • Listener original: /opt/incomex/backups/ogv-p0-2026-05-03/pg_vector_listener.py.orig
  • Trigger original SQL: /opt/incomex/backups/ogv-p0-2026-05-03/fn_kb_notify_vector_sync.orig.sql

Rollback plan

Layer Action
Listener (2.1) cp pg_vector_listener.py.orig … && docker cp … && docker restart incomex-agent-data
Trigger (2.2) psql -U workflow_admin -f fn_kb_notify_vector_sync.orig.sql
Cleanup (Phase 4) Restore Qdrant from 2026-05-03-12-39-07.snapshot

No-mutation-outside-scope statement

The following were NOT touched:

  • Embedding pipeline / OpenAI integration / chunking 4000/400
  • production_documents collection name/schema/dimension
  • Re-embedding (zero new embeddings except the two test docs, which were cleaned)
  • Reindex of 6 ghosts (correct behavior, left intact)
  • Broad vector deletes by indirect filter (deleted_at/status) — used per-document_id only
  • P44-6 / IU / outbox scope

Git activity

  • 31f5ce7 auto-snapshot 2026-05-03 04:00 — agent_data/pg_vector_listener.py (+23/-1)
  • Trigger function lives in PG, not in repo — captured in archive only.