KB-161D

OGV-0 — Orphan/Ghost Root Cause Report

28 min read Revision 1
ogv-0orphan-vectorghost-vectorroot-causeread-only2026-05-03

OGV-0 — Orphan/Ghost Root Cause Report

Date: 2026-05-03 | Mode: READ-ONLY | 0 mutations to production data/config/code/container

Executive Summary

  • Orphans found: 47 (VRC baseline: 47, delta: +0)
  • Ghosts found: 6 (VRC baseline: 7, delta: -1)
    • Correct-behavior ghosts (too_short/empty_body/skipped): 6
    • Actual ghost bugs: 0
  • Primary root cause: soft delete vector resurrection: API delete deletes Qdrant then updates PG deleted_at; PG trigger emits UPDATE; listener treats UPDATE as upsert and does not check deleted_at. Confidence: LIKELY overall, CONFIRMED for items with HTTP DELETE log evidence.
  • Ongoing leak: YES (latent and recently observed). Latest orphan deleted_at: 2026-05-02; code path remains deployed. Observed rate: 47 over 2026-04-06..2026-05-02 ≈ 1.8/day, but clustered.
  • VRC delta: orphan count matches. Ghost count is lower by 1; current manual audit finds no embeddable ghost bug, so delta is likely one previously missing vector was reindexed or the active-doc set changed after VRC. Exact old ghost ID is UNKNOWN because VRC full list was not available through truncated KB read.

Step Evidence / 3 Câu Tuyên Ngôn

  • Vĩnh viễn: root cause is not the 47 records themselves; prevention must change delete/listener semantics so deleted docs cannot be re-upserted.
  • Nhầm được không: PG trigger/listener must branch on deleted_at and/or emit DELETE semantic on soft-delete; cleanup alone can be wrong again.
  • 100% tự động: after prevention, DOT/audit should detect mismatch and deletion path should converge without manual cleanup.
  • Bước 1-2 done before code: read skill, OR, constitution, TD-131/VRC/OGV review, then read production code in container.
  • Bước 3 code: N/A, read-only investigation, no code changes.
  • Bước 4-5 verify: manual Qdrant scroll + PG SELECT, no /kb/audit-sync, no cleanup/reindex.
  • Bước 6 report: this document at requested KB path. OR/TD update: N/A because mission is read-only root-cause report, not prevention implementation.

VRC Reconciliation

Metric VRC (2/5) Agent (today) Delta Explanation
Orphans 47 47 +0 Match after using production code semantics: compare Qdrant payload.document_id to PG data.document_id, not encoded PG key.
Ghosts 7 6 -1 Current active docs missing vectors are all empty/short folders/test fixture. Delta likely data changed or one prior ghost was reindexed/deleted after VRC.
Qdrant unique document_ids unknown 2867 N/A Full pagination: 116 pages, 11502 points.
PG active docs unknown 2826 N/A Active = deleted_at IS NULL and status != deleted, using JSONB schema.

Orphan Classification Table

# document_id PG exists? PG status deleted_at created_at body_len vector_status Qdrant chunks Classification Confidence Evidence
1 knowledge/current-state/tests/hardtest-1.md YES 2026-04-09T03:39:59.623440+00:00 2026-04-09T03:38:43.166070+00:00 122 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-09 03:39:59; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
2 knowledge/current-state/tests/hardtest-2.md YES 2026-04-09T03:43:02.100029+00:00 2026-04-09T03:40:21.210023+00:00 122 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-09 03:43:02; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
3 knowledge/current-state/tests/hardtest-3.md YES 2026-04-09T03:44:31.998007+00:00 2026-04-09T03:43:24.367854+00:00 122 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-09 03:44:31; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
4 knowledge/current-state/tests/hardtest-4.md YES 2026-04-09T03:45:57.902249+00:00 2026-04-09T03:44:52.616443+00:00 122 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-09 03:45:57; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
5 knowledge/current-state/tests/hardtest-5.md YES 2026-04-09T03:47:29.169901+00:00 2026-04-09T03:46:26.157366+00:00 122 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-09 03:47:29; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
6 knowledge/current-state/tests/s174-latency-test.md YES 2026-04-09T03:11:00.536375+00:00 2026-04-09T03:10:49.318481+00:00 188 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-09 03:11:00; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
7 knowledge/dev/architecture/collection-classification-law.md YES 2026-04-06T13:00:56.373374+00:00 2026-03-21T15:23:03.368720+00:00 4236 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:56; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
8 knowledge/dev/architecture/dieu26-new-registries-counting-law-draft.md YES 2026-04-06T13:00:52.413707+00:00 2026-03-27T03:19:45.894306+00:00 12587 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:52; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
9 knowledge/dev/architecture/dieu28-display-technology-law-v2-draft.md YES 2026-04-06T13:00:55.011146+00:00 2026-04-01T03:28:30.063690+00:00 8520 deleted 3 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:55; Qdrant chunks=3; listener handles UPDATE as upsert and ignores deleted_at
10 knowledge/dev/architecture/dieu32-approval-law-draft.md YES 2026-04-06T13:00:59.951212+00:00 2026-03-28T07:12:18.853198+00:00 4695 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:59; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
11 knowledge/dev/architecture/dieu33-postgresql-law-draft.md YES 2026-04-06T13:01:01.452053+00:00 2026-03-28T07:34:11.424843+00:00 11985 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:01; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
12 knowledge/dev/architecture/dieu34-workflow-law-draft.md YES 2026-04-06T13:01:02.646711+00:00 2026-03-28T13:32:13.356065+00:00 10841 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:02; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
13 knowledge/dev/architecture/dieu35-dot-governance-law-draft.md YES 2026-04-06T13:01:03.897405+00:00 2026-03-31T03:45:48.500152+00:00 19174 deleted 6 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:03; Qdrant chunks=6; listener handles UPDATE as upsert and ignores deleted_at
14 knowledge/dev/architecture/dieu37-governance-organization-law-draft.md YES 2026-04-06T13:01:05.095875+00:00 2026-04-01T08:41:12.710328+00:00 15148 deleted 5 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:05; Qdrant chunks=5; listener handles UPDATE as upsert and ignores deleted_at
15 knowledge/dev/architecture/dieu38-normative-document-law-draft.md YES 2026-04-06T13:01:06.224105+00:00 2026-04-02T07:35:17.946967+00:00 5169 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:06; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
16 knowledge/dev/architecture/dieu39-knowledge-graph-law-draft.md YES 2026-04-06T13:01:07.489204+00:00 2026-04-03T06:21:25.561866+00:00 21172 deleted 6 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:07; Qdrant chunks=6; listener handles UPDATE as upsert and ignores deleted_at
17 knowledge/dev/architecture/label-law.md YES 2026-04-06T13:00:51.262627+00:00 2026-03-14T23:09:40.387963+00:00 12627 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:51; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
18 knowledge/dev/architecture/lark-collection-namespace-amendment-v1.md YES 2026-04-13T05:37:01.599726+00:00 2026-04-13T05:29:02.614269+00:00 5518 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-13 05:37:01; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
19 knowledge/dev/architecture/nd-36-01-semantic-relationship-infrastructure-draft.md YES 2026-04-06T13:01:08.704538+00:00 2026-04-06T10:27:30.513972+00:00 3026 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:01:08; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
20 knowledge/dev/architecture/regression-protection-law.md YES 2026-04-06T13:00:57.526249+00:00 2026-03-22T09:23:12.337121+00:00 5329 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:57; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
21 knowledge/dev/architecture/system-integrity-law.md YES 2026-04-06T13:00:58.666852+00:00 2026-03-23T04:46:48.935175+00:00 25717 deleted 8 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 13:00:58; Qdrant chunks=8; listener handles UPDATE as upsert and ignores deleted_at
22 knowledge/dev/lark/nd-lark-01-feasibility-assignment-review-v1-gpt.md YES 2026-04-13T03:40:34.267141+00:00 2026-04-12T07:31:38.556422+00:00 14149 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-13 03:40:34; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
23 knowledge/dev/lark/nd-lark-01-reconstruction-blueprint-plan-v1-gpt.md YES 2026-04-13T03:40:31.119621+00:00 2026-04-12T05:39:29.221496+00:00 14188 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-13 03:40:31; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
24 knowledge/dev/lark/nd-lark-mvp-lark-to-pg-to-pdf-v1-gpt.md YES 2026-04-13T03:40:38.015826+00:00 2026-04-13T02:12:31.778466+00:00 7395 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-13 03:40:38; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
25 knowledge/dev/laws/amend-d37-d38-post-amend-lifecycle-draft.md YES 2026-04-19T11:18:45.478236+00:00 2026-04-19T10:55:07.698810+00:00 7529 deleted 3 soft_delete_no_sync CONFIRMED PG soft-deleted at 2026-04-19 11:18:45; Qdrant chunks=3; HTTP DELETE 200 in agent-data logs; HTTP PUT 200 before delete in logs; listener handles UPDATE as upsert and ignores deleted_at
26 knowledge/dev/laws/dieu38-phu-luc-02a-inventory.md YES 2026-04-24T08:55:05.867731+00:00 2026-04-24T08:19:17.941515+00:00 8155 deleted 3 soft_delete_no_sync CONFIRMED PG soft-deleted at 2026-04-24 08:55:05; Qdrant chunks=3; HTTP DELETE 200 in agent-data logs; listener handles UPDATE as upsert and ignores deleted_at
27 knowledge/dev/laws/dieu38-trien-khai/02a-inventory.md YES 2026-04-26T06:41:24.721660+00:00 2026-04-24T08:54:45.896061+00:00 10970 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:24; Qdrant chunks=4; HTTP PUT 200 before delete in logs; listener handles UPDATE as upsert and ignores deleted_at
28 knowledge/dev/laws/dieu38-trien-khai/02b-solution-approach.md YES 2026-04-26T06:41:28.774968+00:00 2026-04-24T08:53:58.194591+00:00 8095 deleted 3 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:28; Qdrant chunks=3; HTTP PUT 200 before delete in logs; listener handles UPDATE as upsert and ignores deleted_at
29 knowledge/dev/laws/dieu38-trien-khai/02c0-legal-alignment.md YES 2026-04-26T06:41:33.678609+00:00 2026-04-24T09:21:04.962206+00:00 12241 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:33; Qdrant chunks=4; HTTP PUT 200 before delete in logs; listener handles UPDATE as upsert and ignores deleted_at
30 knowledge/dev/laws/dieu38-trien-khai/02c1-text-unit-catalog.md YES 2026-04-26T06:41:38.374568+00:00 2026-04-24T09:48:20.650441+00:00 11870 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:38; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
31 knowledge/dev/laws/dieu38-trien-khai/02c2-component-catalog.md YES 2026-04-26T06:41:43.415331+00:00 2026-04-24T10:30:40.289437+00:00 19294 deleted 6 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:43; Qdrant chunks=6; listener handles UPDATE as upsert and ignores deleted_at
32 knowledge/dev/laws/dieu38-trien-khai/02c3-metadata-governance.md YES 2026-04-26T06:41:47.795493+00:00 2026-04-24T10:51:02.768065+00:00 17538 deleted 6 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:47; Qdrant chunks=6; listener handles UPDATE as upsert and ignores deleted_at
33 knowledge/dev/laws/dieu38-trien-khai/02d0-legal-unlock-memo.md YES 2026-04-26T06:41:51.174570+00:00 2026-04-24T15:00:20.779093+00:00 1383 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:51; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
34 knowledge/dev/laws/dieu38-trien-khai/02dx-cross-check-matrix.md YES 2026-04-26T06:41:55.205092+00:00 2026-04-24T14:59:57.353172+00:00 1511 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:55; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
35 knowledge/dev/laws/dieu38-trien-khai/02dy-reuse-memo.md YES 2026-04-26T06:41:58.645723+00:00 2026-04-24T15:01:07.178560+00:00 1561 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:41:58; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
36 knowledge/dev/laws/dieu38-trien-khai/02dz-legal-appendix-list.md YES 2026-04-26T06:42:02.512472+00:00 2026-04-24T15:00:43.011083+00:00 1074 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:42:02; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
37 knowledge/dev/laws/dieu38-trien-khai/handoff-tac1-concept-complete.md YES 2026-04-26T06:42:06.434402+00:00 2026-04-24T11:04:12.368506+00:00 5302 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:42:06; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
38 knowledge/dev/laws/dieu38-trien-khai/handoff-tac2-legal-complete-design-open.md YES 2026-04-26T06:42:10.923127+00:00 2026-04-25T07:43:06.900785+00:00 5589 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:42:10; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
39 knowledge/dev/laws/dieu38-trien-khai/legal-unlock-completion-report-l1-l5.md YES 2026-04-26T06:42:14.625744+00:00 2026-04-25T07:37:15.237097+00:00 6770 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-26 06:42:14; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
40 knowledge/dev/laws/dieu41-luat-van-hanh-ma-vps-v1.0.md YES 2026-04-18T15:41:01.528665+00:00 2026-04-14T07:47:02.868753+00:00 16065 deleted 5 soft_delete_no_sync CONFIRMED PG soft-deleted at 2026-04-18 15:41:01; Qdrant chunks=5; HTTP DELETE 200 in agent-data logs; HTTP PUT 200 before delete in logs; listener handles UPDATE as upsert and ignores deleted_at
41 knowledge/dev/planning/nd-lark-reconstruction-blueprint-plan-v1-gpt.md YES 2026-04-12T05:39:33.104891+00:00 2026-04-12T05:36:06.393875+00:00 14188 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-12 05:39:33; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
42 knowledge/external-systems/lark-base-88-data-flow.md YES 2026-04-10T07:08:33.544354+00:00 2026-04-10T06:56:43.852645+00:00 13104 deleted 4 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-10 07:08:33; Qdrant chunks=4; listener handles UPDATE as upsert and ignores deleted_at
43 knowledge/external-systems/lark-base-88-phai-cu-blueprint.md YES 2026-04-10T07:04:11.903147+00:00 2026-04-10T03:44:36.676437+00:00 8439 deleted 3 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-10 07:04:11; Qdrant chunks=3; listener handles UPDATE as upsert and ignores deleted_at
44 knowledge/external-systems/lark-base-registry.md YES 2026-04-10T07:05:16.696687+00:00 2026-04-10T06:04:15.793561+00:00 4519 deleted 2 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-10 07:05:16; Qdrant chunks=2; listener handles UPDATE as upsert and ignores deleted_at
45 knowledge/test/delete-sync-v2 YES 2026-04-06T14:51:12.777221+00:00 2026-04-06T14:50:58.746779+00:00 20 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 14:51:12; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
46 knowledge/test/delete-sync-verify YES 2026-04-06T14:47:09.151694+00:00 2026-04-06T14:45:40.548123+00:00 52 deleted 1 soft_delete_no_sync LIKELY PG soft-deleted at 2026-04-06 14:47:09; Qdrant chunks=1; listener handles UPDATE as upsert and ignores deleted_at
47 test/api-health-probe YES 2026-05-02T19:30:06.936043+00:00 2026-05-02T19:30:03.573136+00:00 14 deleted 1 soft_delete_no_sync CONFIRMED PG soft-deleted at 2026-05-02 19:30:06; Qdrant chunks=1; HTTP DELETE 200 in agent-data logs; HTTP PUT 200 before delete in logs; listener handles UPDATE as upsert and ignores deleted_at

Ghost Classification Table

# document_id created_at body_len vector_status Classification Confidence Evidence
1 knowledge/current-state/templates/test_empty.md.tmpl 2026-04-17T14:58:22.671714+00:00 2 pending too_short CONFIRMED body_len=2 < 10; trigger skips short content
2 knowledge/dev 2026-02-26T13:05:06.905597+00:00 0 none empty_body CONFIRMED body empty; trigger/reindex skip empty content
3 knowledge/dev/blueprints 2026-03-02T20:19:31.067359+00:00 0 none empty_body CONFIRMED body empty; trigger/reindex skip empty content
4 knowledge/dev/laws 2026-04-06T12:50:09.276724+00:00 0 none empty_body CONFIRMED body empty; trigger/reindex skip empty content
5 test/conn-audit-moved 2026-03-01T01:03:57.448556+00:00 0 none empty_body CONFIRMED body empty; trigger/reindex skip empty content
6 test/f1-moved 2026-03-01T01:35:34.352257+00:00 0 none empty_body CONFIRMED body empty; trigger/reindex skip empty content

Root Cause Frequency

Root cause Orphan count Ghost count Confidence Key evidence
soft_delete_no_sync / vector resurrection 47 0 {'LIKELY': 43, 'CONFIRMED': 4} All 47 are PG soft-deleted and still have Qdrant chunks; delete_document() deletes vectors then updates deleted_at; listener handles UPDATE as upsert and ignores deleted_at.
empty_body 0 5 CONFIRMED body_len=0; trigger/reindex skip empty content.
too_short 0 1 CONFIRMED body_len<10; trigger skips short content.

Trigger Disappearance Analysis

  • PG volume type: persistent bind mount /opt/workflow/postgres/data:/var/lib/postgresql/data; postgres container created 2026-03-25T02:07:09Z.
  • Trigger current status: present/enabled. trg_kb_vector_sync AFTER INSERT OR DELETE OR UPDATE ON public.kb_documents EXECUTE FUNCTION fn_kb_notify_vector_sync().
  • Trigger DDL in migration scripts: NO evidence found under /opt/incomex grep for trg_kb_vector_sync / fn_kb_notify_vector_sync; likely manual/live DB object unless created by missing external process.
  • Docker rebuild between 4/5 and 5/2: YES for agent-data (Created=2026-04-17T02:15:53Z, RestartCount=0); NO evidence for PG rebuild after 2026-03-25.
  • Root cause trigger loss: NOT SUPPORTED by current evidence. Trigger exists; defect is trigger/listener semantics for soft delete. Confidence: LIKELY.

Listener Health Timeline

  • Agent-data container: Created=2026-04-17T02:15:53Z, RestartCount=0.
  • Startup logs show PG->Qdrant vector sync listener started and service healthy.
  • 2026-04-17 05:34-05:35: PG connection errors/retries occurred.
  • Repeated Qdrant errors/timeouts appear on 2026-04-17, 04-18, 04-19, 04-22, 04-23, 04-24, 04-25, 04-28, 04-29, 04-30, 05-02, 05-03, including failed delete attempts for registry docs. These are supporting risk, but the 47 current orphans are explained without needing listener-down.

Orphan Timestamp Clustering

deleted_at date count
2026-04-06 16
2026-04-09 6
2026-04-10 3
2026-04-12 1
2026-04-13 4
2026-04-18 1
2026-04-19 1
2026-04-24 1
2026-04-26 13
2026-05-02 1

Pattern: clustered batches on 2026-04-06 and 2026-04-26 plus later single test/API deletions. This indicates batch cleanup/move events plus ongoing latent leak, not a one-time hard-delete-only event.

Write Path Audit

  • kb_documents actual schema is JSONB KV: key text, data jsonb, updated_at timestamptz; fields such as document_id, deleted_at, vector_status are inside data.
  • No source/created_by/origin columns exist in kb_documents, so source aggregation is N/A.
  • Directus directus_activity is accessible, but query for collection = kb_documents after 2026-04-05 returned no grouped rows.
  • Recent deletes include the 47 orphan items and many non-orphan deleted docs; the orphan set is exactly soft-deleted docs whose vectors remain.

Comparison Baseline Snapshot

  • PG kb_documents: total 3765; deleted 939; active 2826; vector_status ready 2818; pending 2; error 0; null/empty 1; synced 0.
  • Qdrant production_documents: points_count=11502, indexed_vectors_count=11090, status green, optimizer ok.
  • Qdrant scroll evidence: 116 pages, page size 100 except last 2, 2867 unique document_ids.

Proposed Cleanup Groups (KHÔNG cleanup — chỉ đề xuất)

Group Count Safe to cleanup? Pre-condition Notes
Soft-deleted docs with remaining Qdrant chunks 47 Yes, after prevention patch Patch listener/delete semantics first; run read-only audit immediately before cleanup Cleanup now would hide evidence and leak can recur.
Correct-behavior ghosts 6 No cleanup needed None Empty/short docs should not have vectors. Consider excluding from ghost bug metric.

Proposed Prevention Fixes

Fix Blocks root cause Priority
In pg_vector_listener.py, if UPDATE doc has deleted_at set, call delete_document instead of upsert. Soft-delete vector resurrection P0
In fn_kb_notify_vector_sync, emit semantic DELETE when NEW.data->>deleted_at changes from null to non-null. Makes PG event meaning explicit P0
In delete API, after PG soft-delete, avoid a second UPDATE upsert path or include tombstone semantics. Prevents API delete from recreating vectors P0
Adjust audit to classify empty/short/skipped ghosts separately from bugs. Prevents false-positive ghost alerts P1
Add DOT read-only audit with pagination and active-doc JSONB semantics. Detects regressions automatically P1

Unknowns

  • Exact VRC ghost list from 2026-05-02: UNKNOWN because get_document returned truncated content and semantic search did not expose the list. Delta -1 cannot be mapped to an ID.
  • Exact creation mechanism for trg_kb_vector_sync: UNKNOWN; DB object exists but no DDL file found under /opt/incomex.
  • Exact per-item notification processing logs for all 47: UNKNOWN; container logs contain HTTP DELETE evidence for a subset and code/timestamp evidence for all.

No-mutation Statement

Agent xác nhận: KHÔNG mutate bất kỳ production data/config/code/container nào. Không gọi /kb/cleanup-orphans, không gọi /kb/reindex-missing, không gọi /kb/audit-sync, không DELETE/UPDATE/INSERT PG rows, không DELETE/UPDATE Qdrant points, không restart services. Chỉ có report upload vào KB path theo yêu cầu.

Full Command Log

  • sed -n '1,260p' .claude/skills/incomex-rules.md
  • search_knowledge('operating rules SSOT'); search_knowledge('hiến pháp v4.0 constitution'); search_knowledge('Điều 38 triển khai vector sync Qdrant Directus Agent Data orphan ghost')
  • get_document('knowledge/dev/ssot/vps/vps-operating-rules.md'); get_document('knowledge/dev/laws/constitution.md'); get_document('knowledge/current-state/reports/td131-vector-sync-investigation.md')
  • ssh contabo 'docker ps --format ...'
  • docker exec incomex-agent-data env | grep QDRANT; docker exec postgres env | grep POSTGRES
  • Qdrant POST /collections/production_documents/points/scroll loop limit=100 until next_page_offset empty (116 pages)
  • psql incomex_metadata: SELECT column_name,data_type FROM information_schema.columns WHERE table_name='kb_documents'
  • psql incomex_metadata: SELECT jsonb_agg(...) FROM kb_documents (read-only snapshot)
  • local compare: Qdrant payload.document_id vs PG data.document_id for active docs
  • docker exec incomex-agent-data sed -n ... /app/agent_data/server.py / vector_store.py / pg_vector_listener.py
  • docker inspect postgres; docker inspect postgres Mounts; docker volume ls | grep postgres
  • cat /opt/incomex/docker/docker-compose.yml | grep -A 30 -B 5 'postgres:'
  • find/grep /opt/incomex for trigger DDL and git log --since=2026-04-01 '*.sql' '*.py'
  • docker logs --since 720h incomex-agent-data | grep -i pg-vector-sync/LISTEN/error/reconnect
  • psql incomex_metadata: SELECT pg_trigger/pg_proc definitions for trg_kb_vector_sync and fn_kb_notify_vector_sync
  • psql incomex_metadata: baseline counts + orphan deleted_at clustering + recent deletes
  • Qdrant GET /collections/production_documents for points_count
  • psql directus: SELECT action,count(*) FROM directus_activity WHERE collection='kb_documents' AND timestamp>'2026-04-05' GROUP BY action
  • docker logs --since 720h incomex-agent-data | grep -F -f orphan_ids (read-only log correlation)