VRC — Vector Reality Check: Agent Data + Langroid + Qdrant
VRC — Vector Reality Check: Agent Data + Langroid + Qdrant
Date: 2026-05-02 Status: INVESTIGATION COMPLETE — 0 mutations to data/config/code/container Agent: Codex Mode: READ-ONLY
Executive Verdict
- Legacy vector hiện tại là: document-based. Point payload có
document_id,content,metadata.chunk_index,metadata.total_chunks; không cóunit_id,canonical_address,unit_version_id. - IU vector nên: song song, không replace legacy. Legacy KB vector vẫn phục vụ KB độc lập.
- Duplicate content risk level: HIGH. Search hiện tại dedup chỉ theo
document_id; nếu KB draft và IU final nằm ở 2document_idkhác nhau thì không có logic suppress/prioritize.
Evidence Table
| Claim | Evidence | Source |
|---|---|---|
| Runtime container names were discovered, not assumed | incomex-agent-data, postgres, incomex-qdrant, uptime-kuma |
docker ps --format ... on VPS |
Active Qdrant collection is production_documents |
Qdrant collections API returned only production_documents |
GET http://qdrant:6333/collections from incomex-agent-data |
| Collection config is 1536-dim cosine | vectors.size=1536, distance=Cosine, points_count=11052, status=green |
GET /collections/production_documents |
| Legacy payload is document/chunk based | payload fields: content, document_id, metadata, parent_id, is_human_readable; metadata has chunk_index, total_chunks |
POST /points/scroll limit 3 |
| Chunker is custom char splitter | _split_text, CHUNK_SIZE=4000, CHUNK_OVERLAP=400 |
/app/agent_data/vector_store.py:15-80 |
| Upsert embeds per chunk and deterministic UUID5 point IDs | uuid5(NAMESPACE_DNS, f"{document_id}:chunk:{idx}") |
/app/agent_data/vector_store.py:186-227 |
| Search filters only tags/status; not source/unit | filter_tags -> metadata.tags; filter_status -> metadata.status |
/app/agent_data/vector_store.py:254-310 |
| Search dedups chunks by document only | seen_docs keyed by document_id |
/app/agent_data/vector_store.py:294-308 |
/chat bypasses Langroid internal RAG by default raw mode |
_retrieve_query_context() then _build_raw_reply; ! prefix bypasses DocChatAgent RAG in summarized mode |
/app/agent_data/server.py:886-928, 953-958 |
PG trigger function exists but no trigger is installed on kb_documents |
information_schema.triggers returned 0 rows; function source exists |
PG query on incomex_metadata |
| Current drift exists | read-only /kb/audit-sync returned orphan_count=47, ghost_count=7 |
POST /kb/audit-sync {"auto_heal": false} |
| Scheduled vector audit is broken and unsafe if fixed as-is | crontab calls dot-vector-audit --heal --local; latest log repeats FAIL (connection refused) to localhost:8000; source says --heal sends auto_heal:true |
crontab -l, /var/log/incomex/dot-vector-audit.log, /opt/incomex/dot/bin/dot-vector-audit |
| Agent Data health checks Qdrant/PG/OpenAI and flags vector/doc ratio critical | /health returned qdrant/postgres/openai ok, document_count=2770, vector_point_count=11052, ratio=3.99, sync_status=critical |
GET /health inside container |
VRC-1..VRC-13 Answers
VRC-1: Ingestion flow thực tế
Write paths table:
| Path | Trigger | Calls vector? | Code file | Evidence |
|---|---|---|---|---|
API/MCP create_document new doc |
POST /documents or MCP upload_document |
Yes, _sync_vector_entry() after PG set_doc |
/app/agent_data/server.py:1278-1381 |
vector_status=pending then _sync_vector_entry(... content.body ...) |
API/MCP create with upsert=true existing doc |
POST /documents?upsert=true |
Yes, delete old vectors then sync new | /app/agent_data/server.py:1292-1321 |
_delete_vector_entry(doc_id) then _sync_vector_entry() |
API/MCP update_document content changed |
PUT /documents/{doc_id} |
Yes, delete old vectors then sync new | /app/agent_data/server.py:1467-1487 |
content_changed gates delete+sync |
API/MCP update_document metadata-only |
PUT /documents/{doc_id} |
No re-embed | /app/agent_data/server.py:1467-1497 |
logs skip_reembed |
API/MCP delete_document |
DELETE /documents/{doc_id} |
Yes, delete by document_id filter |
/app/agent_data/server.py:1604-1647, /app/agent_data/vector_store.py:398-414 |
_delete_vector_entry(doc_id) before marking PG deleted |
PG direct SQL into kb_documents |
Intended trigger fn_kb_notify_vector_sync |
Currently no, because no trigger installed | PG information_schema.triggers |
0 trigger rows for kb_documents; function exists but unattached |
| PG listener | LISTEN kb_vector_sync |
Would call upsert/delete if NOTIFY arrives | /app/agent_data/pg_vector_listener.py:37-95 |
handles DELETE, INSERT, UPDATE; skips body <10 |
Directus sync knowledge_documents |
Agent Data -> Directus mirror | No Qdrant call found | /app/agent_data/directus_sync.py:267-390 |
SQL UPDATE/INSERT to knowledge_documents, no vector_store |
/kb/reindex |
Manual API | Yes, bulk re-embed all active docs | /app/agent_data/server.py:1965-2037 |
write endpoint; not called |
/kb/reindex-missing |
Manual API | Yes, re-embed ghosts | /app/agent_data/server.py:2329-2344 |
write endpoint; not called |
/kb/cleanup-orphans |
Manual API | Deletes orphan vectors when dry_run=false |
/app/agent_data/server.py:2053-2109 |
write endpoint; not called |
/kb/audit-sync |
Manual/cron | Read-only only when auto_heal=false; writes when true |
/app/agent_data/server.py:2241-2326 |
only auto_heal:false was called |
PG trigger function filter logic:
- Skip task comments: keys like
operations/tasks/comments/%oroperations__tasks__comments__%. - Skip registries: keys like
registries/%orregistries__%. - Skip empty key.
- Skip INSERT/UPDATE with
length(NEW.data->'content'->>'body') < 10. - Then
pg_notify('kb_vector_sync', json_build_object('op', TG_OP, 'key', v_key, 'document_id', ...)). - Evidence:
pg_get_functiondef('fn_kb_notify_vector_sync'). - Critical runtime fact:
SELECT ... FROM information_schema.triggers WHERE event_object_table='kb_documents'returned 0 rows.
VRC-2: Langroid/Chunking rule
Chunking config:
| Parameter | Value | Code file:line | Evidence |
|---|---|---|---|
| Splitter class | Custom _split_text; not Langroid splitter for production upsert |
/app/agent_data/vector_store.py:41-80 |
char-based function |
| Max chars/tokens | 4000 chars default | /app/agent_data/vector_store.py:15-17 |
QDRANT_CHUNK_SIZE default 4000 |
| Overlap | 400 chars default | /app/agent_data/vector_store.py:15-17 |
QDRANT_CHUNK_OVERLAP default 400 |
| Separator | paragraph \n\n, sentence . , word space fallback |
/app/agent_data/vector_store.py:56-71 |
rfind boundary selection |
| Min content to embed | API path skips only non-string/blank; PG listener skips <10 chars |
/app/agent_data/server.py:546-547, /app/agent_data/pg_vector_listener.py:71-73 |
direct API can embed short nonblank content; listener cannot |
| Embedding truncation | Each chunk embedding input truncated to 6000 chars | /app/agent_data/vector_store.py:152-156 |
truncated = text[:6000] |
| Embedding model | default text-embedding-3-small; env override QDRANT_EMBED_MODEL |
/app/agent_data/vector_store.py:94 |
runtime env did not show override |
| Chunk ID generation | UUID5 of document_id:chunk:{idx} |
/app/agent_data/vector_store.py:213-216 |
deterministic point ID |
| Chunk fields | metadata.chunk_index, metadata.total_chunks |
/app/agent_data/vector_store.py:200-210 |
payload builder |
Langroid status: AgentData subclasses DocChatAgent, and agent_config.vecdb uses Langroid QdrantDBConfig, but /chat explicitly bypasses DocChatAgent internal RAG after doing custom retrieval. Evidence: /app/agent_data/main.py:20-23, 53-68; /app/agent_data/server.py:313-346, 953-958.
VRC-3: Qdrant collections
Collections table:
| Collection name | Vector dim | Distance | Point count | Status | Purpose |
|---|---|---|---|---|---|
production_documents |
1536 | Cosine | 11052 | green | Active KB legacy vectors; configured by QDRANT_COLLECTION |
No staging/dev collection found via Qdrant API. Qdrant version: 1.16.3.
VRC-4: Qdrant payload fields — SAMPLE READ-ONLY
Sample payload (3 points, content redacted for length):
Point 1:
{
"id": "00071df5-8461-5166-93a3-0d1300db6ec8",
"payload": {
"content": "[long markdown, redacted]",
"document_id": "context-pack/20260430-220013-092d70/ENTITIES_OVERVIEW.md",
"metadata": {
"tags": ["dieu43", "context-pack", "build"],
"title": "Đ43 context-pack 20260430-220013-092d70 section=entities_overview",
"source": "dieu43_context_pack_publish",
"build_id": "20260430-220013-092d70",
"chunk_index": 0,
"total_chunks": 1
},
"parent_id": null,
"is_human_readable": false
}
}
Point 2:
{
"id": "00097689-3d59-5c42-9673-aec6e52c5879",
"payload": {
"content": "[long markdown chunk, redacted]",
"document_id": "context-pack/20260424-100012-68cb8a/DOT_REGISTRY.md",
"metadata": {
"tags": ["dieu43", "context-pack", "build"],
"title": "Đ43 context-pack 20260424-100012-68cb8a section=dot_registry",
"source": "dieu43_context_pack_publish",
"build_id": "20260424-100012-68cb8a",
"chunk_index": 9,
"total_chunks": 14
},
"parent_id": null,
"is_human_readable": false
}
}
Point 3:
{
"id": "000ae675-edec-5546-9a49-ae1e4937b44a",
"payload": {
"content": "[long markdown chunk, redacted]",
"document_id": "context-pack/20260422-070005-6af1da/PROJECT_MAP.md",
"metadata": {
"tags": ["dieu43", "context-pack", "build"],
"title": "Đ43 context-pack 20260422-070005-6af1da section=project_map",
"source": "dieu43_context_pack_publish",
"build_id": "20260422-070005-6af1da",
"chunk_index": 1,
"total_chunks": 7
},
"parent_id": null,
"is_human_readable": false
}
}
Payload fields summary:
| Field name | Type | Present in all 3? | Example value |
|---|---|---|---|
content |
string | Yes | markdown chunk |
document_id |
string | Yes | context-pack/.../PROJECT_MAP.md |
metadata |
object | Yes | tags/title/source/build_id/chunk fields |
metadata.tags |
array string | Yes | ["dieu43","context-pack","build"] |
metadata.title |
string | Yes | context-pack title |
metadata.source |
string | Yes | dieu43_context_pack_publish |
metadata.build_id |
string | Yes | 20260422-070005-6af1da |
metadata.chunk_index |
integer | Yes | 1 |
metadata.total_chunks |
integer | Yes | 7 |
parent_id |
string/null | Yes | null |
is_human_readable |
boolean | Yes | false |
unit_id |
absent | No | not present |
canonical_address |
absent | No | not present |
unit_version_id |
absent | No | not present |
content_hash |
absent in samples | No | not present |
Point ID format is UUID string, deterministic by code.
VRC-5: Orphan/ghost vector audit
Orphan/ghost audit:
| Checker | File/endpoint | Schedule | Auto-clean? | Log location |
|---|---|---|---|---|
dot-vector-audit |
/opt/incomex/dot/bin/dot-vector-audit |
daily 04:30 UTC | Intended yes because crontab passes --heal, but currently failing before endpoint call |
/var/log/incomex/dot-vector-audit.log |
/kb/audit-sync |
/app/agent_data/server.py:2241-2326 |
no internal schedule | No when auto_heal=false; yes when auto_heal=true |
app logs |
/kb/cleanup-orphans |
/app/agent_data/server.py:2053-2109 |
no internal schedule | Yes if dry_run=false |
app logs |
/kb/reindex-missing |
/app/agent_data/server.py:2329-2344 |
no internal schedule | Reindexes missing vectors | app logs |
Audit logic: _run_audit() loads active PG doc IDs from kb_documents, loads unique Qdrant document_id values, computes orphan_ids = qdrant_ids - pg_ids and ghost_ids = pg_ids - qdrant_ids. Evidence: /app/agent_data/server.py:2112-2150.
VRC-6: Update/delete semantics
Update semantics:
| Operation | Logic | Atomic? | Retry on fail? | Code file:line |
|---|---|---|---|---|
| upsert | split content, embed each chunk, batch upsert(... wait=True) |
Qdrant upsert batch is waited, but not transactional with PG | SDK helper wrapped by sync_retry; embedding per chunk can fail before upsert |
/app/agent_data/vector_store.py:163-252, 371-375 |
| update content | PG update first, delete all old vectors, then upsert new chunks | Not atomic across PG/Qdrant; failure between delete and upsert can create ghost | Delete+sync wrapped in try but no rollback | /app/agent_data/server.py:1448-1487 |
| delete | delete Qdrant vectors by filter document_id, then mark PG deleted |
Not atomic | sync_retry on Qdrant delete; PG update follows |
/app/agent_data/server.py:1604-1647, /app/agent_data/vector_store.py:398-414 |
| multi-chunk update | delete all old chunks then insert new chunk set | Not atomic | retry helper per Qdrant call; no per-doc transaction | same as above |
VRC-7: Scale hiện tại
Scale snapshot:
| Metric | Value | Source |
|---|---|---|
| KB docs total | 3709 rows | SELECT count(*) FROM kb_documents |
| KB active docs | 2770 | deleted_at IS NULL and /health |
| Qdrant points total | 11052 | Qdrant collection info and /health |
| Ratio points/docs | 3.99 | /health |
| Ghosts (doc without vector) | 7 | /kb/audit-sync auto_heal=false |
| Orphans (vector document_id without live doc) | 47 | /kb/audit-sync auto_heal=false |
| vector_status distribution | deleted=939, ready=2762, pending=2, none=5, blank=1 |
PG group query |
| Health sync status | critical |
/health data_integrity.sync_status |
VRC-8: Do-not-touch invariants
Do-not-touch invariants:
| Item | Type | Location | Ephemeral? | Reasoning |
|---|---|---|---|---|
production_documents collection |
Qdrant collection | incomex-qdrant |
No, persisted at /opt/incomex/docker/qdrant/data |
Active KB vector store |
| vector dimensions/distance | Qdrant schema | 1536/Cosine | No | Must match text-embedding-3-small; changing breaks retrieval |
| point payload contract | Qdrant payload | content, document_id, metadata, parent_id, is_human_readable |
No | /chat and audit depend on document_id and content |
QDRANT_COLLECTION=production_documents |
runtime config | /opt/incomex/docker/docker-compose.yml:99-101 |
No | Agent Data reads/writes this collection |
| Qdrant API key/env | secret/config | compose env and container env | No | Required by all Qdrant calls |
| OpenAI API key/env | secret/config | compose env and container env | No | Required for embedding |
kb_documents table |
PG metadata store | incomex_metadata.public.kb_documents |
No | SSOT for Agent Data document metadata |
fn_kb_notify_vector_sync() |
PG function | incomex_metadata.public |
No | Intended PG->Qdrant listener payload contract |
| Agent Data code inside container | runtime code | /app/agent_data/* |
Yes if patched in container; current docker diff only showed runtime files under /root, /tmp, credentials, and /app dir marker |
VPS/container is SSOT for runtime behavior |
dot-vector-audit cron |
monitoring cron | root crontab | No | Currently scheduled but failing; --heal is a write behavior |
| Qdrant backup script | backup | /opt/incomex/scripts/qdrant-backup.sh |
No | Last local snapshot observed 2026-04-03; script creates Qdrant snapshots |
| Docker restart policies | recovery | compose/inspect | No | Agent Data and Qdrant use unless-stopped |
VRC-9: Safe IU Vector architecture proposal (evidence-based)
Proposed IU vector architecture:
| Phase | Action | Impact on legacy? | Pre-condition | Risk |
|---|---|---|---|---|
| Phase 0 | Read-only mapping adapter: map kb_document_path/document_id to IU canonical_address outside Qdrant |
None | IU tables and canonical address mapping exist; no Qdrant writes | Incomplete dedup because legacy payload lacks IU fields |
| Phase 1 | Prefer IU-specific collection over payload enrichment for first writable phase | No change to legacy collection | IU embedding job + collection creation APR + rollback plan | Managing two collections and merge logic |
| Phase 2 | IU-specific projection/search adapter across KB collection + IU collection | Legacy still independent | Query API that can target both and combine results | Ranking/dedup complexity |
| Phase 3 | Unit-aware chunking strictly within IU boundaries | Legacy chunking untouched | Benchmarks, content_hash/canonical_address dedup, performance proof | Bad chunk boundary rules can reduce recall |
Reasoning: current legacy payload has no IU identifiers and current search only supports tags/status filters. Mutating legacy payload or chunking would directly affect /chat, audit, and KB behavior.
VRC-10: Duplicate Content Resolution
Capabilities and current behavior:
- Payload filtering exists in current code only for
metadata.tagsandmetadata.status; nosource=kb/iuor unit fields. Evidence:/app/agent_data/vector_store.py:275-290. - Qdrant itself accepts collection-scoped search/filter calls; current Agent Data
_qdrant_search()targets exactlyself.collection. Evidence:/app/agent_data/vector_store.py:383-389. - Current Agent Data has no multi-collection search support. Evidence: one
QDRANT_COLLECTIONand_retrieve_query_context()calls onestore.search(). - Qdrant ranking returns nearest vectors; current Agent Data dedups chunks by
document_idonly. It does not dedup semantically similar content across different documents. Evidence:/app/agent_data/vector_store.py:292-310. /chatexposes filterstags,tenant_id,status, but vector path ignorestenant_id. Evidence:/app/agent_data/server.py:231-259,1116-1123.
Duplicate options:
| Option | Mô tả | Ưu | Nhược | Feasibility (dựa VRC) |
|---|---|---|---|---|
| A: Collection separation | IU vector dùng collection riêng; search target KB, IU, or both in adapter | Clean separation, no legacy mutation | Need second collection and merge code | High for Phase 1+, because current legacy is one collection and collection choice is config/code-scoped |
| B: Payload tagging + filter | Same collection, add source=kb/iu, filter at search |
Simple concept | Requires mutating legacy collection payload/search contract; current code has no source filter | Low for Phase 0, medium later with APR |
| C: Priority ranking + dedup | Search KB + IU, dedup by canonical_address/content_hash, prefer IU enacted | Best quality | Requires IU metadata/hash and search merge layer | Best target for Phase 2, not Phase 0 |
Phase 0 safest option: A-read-only precursor: no IU vectors yet, build mapping adapter only; when writable, use collection separation.
Why: current legacy has live drift (47 orphan, 7 ghost), broken scheduled audit, critical ratio, and no IU payload fields. Touching legacy payload/chunking/search in Phase 0 risks breaking KB retrieval and audit while the monitoring loop is already unreliable.
Conditions to move stronger:
- IU schema has stable
canonical_address,unit_version_id,content_hash. dot-vector-auditschedule is fixed to a correct URL and write behavior is explicitly approved.- New IU collection has backup/restore and read-only verifier before ingestion.
- Search adapter can return source labels and dedup by canonical/content hash.
Risk if choosing wrong:
- Same collection/tagging too early can pollute legacy KB search and make rollback hard.
- Replacing legacy chunking can change current recall and invalidate existing audit ratios.
- Cross-collection merge without dedup can double-answer common KB draft + IU final content.
VRC-11: Search retrieval behavior
Search retrieval behavior:
| Behavior | Actual implementation | Code file:line | Evidence |
|---|---|---|---|
| Search endpoint | POST /chat; sync endpoint |
/app/agent_data/server.py:825-928 |
calls _retrieve_query_context() |
| Collection queried | Single configured collection QDRANT_COLLECTION, runtime production_documents |
/app/agent_data/vector_store.py:86-94, 383-389 |
self.collection |
| Payload filter support | Tags/status only | /app/agent_data/vector_store.py:275-290 |
metadata.tags, metadata.status |
| Filter exposed to API caller | filters.tags, filters.tenant_id, filters.status; vector path ignores tenant_id |
/app/agent_data/server.py:231-259, 1116-1123 |
no tenant passed to store.search() |
| Rerank logic | None found | /app/agent_data/vector_store.py:292-310 |
raw score ordering from Qdrant then document dedup |
| Dedup logic | Dedup matching chunks by document_id only |
/app/agent_data/vector_store.py:294-308 |
seen_docs |
| Top-K default | 5, configurable 1-20 | /app/agent_data/server.py:254-260 |
Field(default=5, ge=1, le=20) |
| Duplicate handling KB + IU same content | None for different document_id; both can appear |
Inference from search code | dedup key is only document_id |
| Response fields to caller | document_id, snippet, score, metadata; response/content/session/usage |
/app/agent_data/server.py:140-160, 1127-1133 |
Pydantic models |
| Multi-collection search | Not implemented in Agent Data | /app/agent_data/vector_store.py:383-389 |
one collection_name=self.collection |
VRC-12: Orphan/missing vector detection — current state
Orphan/missing vector detection — current state:
| Check | Status | Last run | Result | Evidence |
|---|---|---|---|---|
| dot-vector-audit in crontab | Active entry, but operationally failing | daily 04:30 UTC scheduled | Calls wrong local URL and fails | crontab + log |
| dot-vector-audit last log | Failing repeatedly | latest tail shows repeated failures | Pre-flight check... FAIL (connection refused) |
/var/log/incomex/dot-vector-audit.log |
/kb/audit-sync read-only? |
Read-only only with auto_heal=false |
N/A | safe call made with false | /app/agent_data/server.py:2248-2267 |
| Current orphan count | Active drift | 2026-05-02 investigation | 47 | /kb/audit-sync |
| Current ghost count | Active drift | 2026-05-02 investigation | 7 | /kb/audit-sync |
| Alert/notification on drift | Partial | Uptime Kuma monitors /api/health Qdrant keyword and health endpoint |
health shows critical but monitor keyword only checks qdrant OK, not ratio critical | Uptime Kuma DB, /health |
| Auto-cleanup active? | Scheduled intent yes, actual no | cron daily | --heal configured but cannot connect |
crontab + log |
| Vector:Doc ratio healthy? | No | live | 3.99, critical |
/health |
VRC-13: Hệ thống giám sát đồng bộ vector ↔ data — current state
Vector sync monitoring — current state:
| System | Component | Status (active/inactive/unknown) | Evidence |
|---|---|---|---|
| PG→Qdrant listener | startup thread, reconnect loop | Code active, runtime effect limited because no PG trigger installed | /app/agent_data/resilient_client.py:469-475; trigger query returned 0 |
Agent Data /health |
Qdrant connection check | Active | /health reports qdrant/postgres/openai and data_integrity |
| Uptime Kuma | Agent Data / Qdrant monitor | Active | monitors: Agent Data Health, Qdrant Health, Docker Services; default Telegram notification active |
| Docker restart policy | container recovery | Active | unless-stopped for Agent Data and Qdrant |
| vector_status stuck check | pending/error detection | Unknown/no dedicated active check found | PG has pending/none/blank; no cron specifically found for vector_status stuck |
| Qdrant backup | schedule/location | Script exists; current schedule not found in crontab; last observed local snapshot 2026-04-03 | /opt/incomex/scripts/qdrant-backup.sh; /opt/incomex/backups/qdrant |
| Embedding API failure | retry/timeout/fallback | Retry wrapper exists; errors recorded as vector_status=error; no fallback embedding model |
/app/agent_data/vector_store.py:147-161, 240-252; resilient_client.py:45-48, 191-220 |
| dot-vector-audit | orphan/ghost periodic | Scheduled but failing | crontab + log |
Gap summary:
- PG trigger missing means direct PG writes do not notify listener.
dot-vector-audit --heal --localis scheduled but failing because it checkshttp://localhost:8000from host where no service is listening.- Uptime Kuma Qdrant monitor checks qdrant
"status":"ok", but notdata_integrity.sync_status=critical. - No evidence of active vector_status stuck alert.
- Qdrant backup script exists, but no current cron entry was found and last observed local snapshot is 2026-04-03.
Do-not-touch Invariants (Patch 6 format bắt buộc)
| Invariant | Why it must not change | Evidence | Required approval if change |
|---|---|---|---|
Legacy collection production_documents |
Active KB search/audit collection | Qdrant API returned only this collection; compose sets QDRANT_COLLECTION |
Separate APR + rollback plan |
| Vector dimension 1536 and Cosine distance | Must match existing OpenAI embeddings and search assumptions | Qdrant collection config | Separate APR + rollback plan |
Payload keys document_id and content |
Search/audit/delete depend on them | vector_store.search(), delete_document(), _run_audit() |
Separate APR + rollback plan |
| Metadata chunk fields | Multi-chunk dedup and traceability depend on them | payload samples and vector_store.py:200-210 |
Separate APR + rollback plan |
| Legacy chunking 4000/400 | Existing vector ratios and retrieval behavior depend on it | vector_store.py:15-17, health baseline comments |
Separate APR + rollback plan |
Delete-by-document_id semantics |
Cleanup/delete removes all chunks for a doc | vector_store.py:398-414 |
Separate APR + rollback plan |
kb_documents JSONB store |
Agent Data metadata SSOT | pg_store.py:80-111, server.py paths |
Separate APR + rollback plan |
/kb/audit-sync auto_heal=true behavior |
It can reindex/delete vectors | server.py:2269-2326 |
Separate APR + rollback plan |
Qdrant storage path /opt/incomex/docker/qdrant/data |
Persistent vector data | compose lines 32-35 and docker mount | Separate APR + rollback plan |
| API keys and embedding env vars | Required for vector operations | compose lines 95-115, env names observed | Separate APR + rollback plan |
Docker restart policy unless-stopped |
Current recovery behavior | docker inspect and compose | Separate APR + rollback plan |
| Uptime Kuma health monitors | Current external alert path | Uptime Kuma DB monitor rows | Separate APR + rollback plan |
Unknowns
- Exact reason PG trigger is absent: function exists, but no trigger rows exist; migration history was not mutated or inspected beyond read-only evidence.
- Whether Qdrant backup has an external scheduler outside root crontab: script and snapshots exist, but root crontab did not show
qdrant-backup.sh. - Whether listener thread is currently alive internally: code starts it, but logs did not show
pg-vector-syncmessages in the sampled window. - Exact Qdrant global/multi-collection API features beyond installed version: Agent Data does not implement multi-collection search; IU design should not rely on unavailable app-layer support.
- Full alert delivery status in Telegram: Uptime Kuma has an active default Telegram notification, but no live alert was triggered in this investigation.
Recommended Safe Architecture
- Legacy vector: untouched.
- IU bridge/vector: parallel.
- Duplicate resolution: Phase 0 mapping adapter only; Phase 1 new IU collection; Phase 2 merge + dedup adapter.
- Migration phases:
- Phase 0: read-only mapping adapter, no vector writes.
- Phase 1: create IU collection through approved APR; do not enrich legacy payload yet.
- Phase 2: multi-source retrieval adapter, return
source,canonical_address,content_hash, score. - Phase 3: unit-aware chunking only after benchmark and rollback design.
Risk Table
| Risk | Severity | Mitigate |
|---|---|---|
| Legacy drift already exists | High | Fix monitor/report first; do not add IU writes into legacy collection |
| Missing PG trigger means direct PG changes bypass vector sync | High | Separate APR to restore trigger or formalize API-only write path |
Scheduled audit is configured with --heal but failing |
High | Separate APR: correct URL/mode, decide dry-run vs heal, add rollback |
| Duplicate KB + IU search results | High | Collection separation + adapter dedup by canonical/content_hash |
| Same-collection payload tagging pollutes legacy search | Medium/High | Avoid in Phase 0; require APR and backfill verifier |
| Update deletes old chunks before new upsert | Medium | Add verifier/retry before relying on new IU ingestion |
| Qdrant backup freshness unclear | Medium | Verify/restore backup schedule before IU collection writes |
No-mutation Statement
Agent xác nhận: KHÔNG mutate bất kỳ Qdrant points/collection/schema/payload, PG rows/schema/triggers, runtime code/config/container, restart/rebuild/re-embed/cleanup nào trong quá trình điều tra. Mutation duy nhất: tạo report markdown này tại path được yêu cầu. Endpoint audit duy nhất được gọi là /kb/audit-sync với {"auto_heal": false} sau khi đọc source xác nhận nhánh đó read-only.
Commands đã chạy (read-only, secrets redacted where relevant):
sed -n '1,260p' .claude/skills/incomex-rules.md
rg -n "search_knowledge|operating rules SSOT|hiến pháp|constitution|knowledge" -S .claude knowledge
search_knowledge("operating rules SSOT")
search_knowledge("hiến pháp v4.0 constitution")
search_knowledge("vector Qdrant Agent Data Langroid kb_documents sync law điều 38")
search_knowledge("law-01-foundation-principles vector sync Directus Agent Data Qdrant")
batch_read(["knowledge/dev/ssot/vps/vps-operating-rules.md","knowledge/dev/laws/constitution.md","knowledge/dev/laws/law-14-no-duplicate.md","knowledge/dev/laws/law-01-foundation-principles.md"])
ssh -i ~/.ssh/contabo_vps contabo 'docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"'
ssh -i ~/.ssh/contabo_vps contabo 'hostname; whoami; date -Is; pwd'
ssh -i ~/.ssh/contabo_vps contabo 'crontab -l 2>&1'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data find /app -maxdepth 3 -type f | sort'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data grep -RIn "...vector/search/chunk patterns..." /app --include="*.py"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data env | grep -Ei "qdrant|embed|openai|vector|collection|postgres|database|directus"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data nl -ba /app/agent_data/vector_store.py | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data nl -ba /app/agent_data/server.py | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data nl -ba /app/agent_data/pg_vector_listener.py | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d directus -Atc "SELECT datname FROM pg_database ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -Atc "SELECT concat(schemaname,'.',tablename) ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -c "SELECT trigger_name ... information_schema.triggers ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -c "SELECT pg_get_functiondef(...)"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -c "SELECT count...; SELECT vector_status..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET Qdrant /collections with api-key from env"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET Qdrant /collections/production_documents with api-key from env"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "POST Qdrant /points/scroll limit 3 payload only"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET Qdrant / version"'
ssh -i ~/.ssh/contabo_vps contabo 'ls -l /opt/incomex/dot/bin/dot-vector-audit; sed -n "1,260p" /opt/incomex/dot/bin/dot-vector-audit'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "POST /kb/audit-sync auto_heal=false"'
ssh -i ~/.ssh/contabo_vps contabo 'tail -80 /var/log/incomex/dot-vector-audit.log'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET /health"'
ssh -i ~/.ssh/contabo_vps contabo 'docker inspect incomex-agent-data ...; docker inspect incomex-qdrant ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker inspect ... mounts'
ssh -i ~/.ssh/contabo_vps contabo 'find /opt/incomex -maxdepth 3 ... docker-compose/.env'
ssh -i ~/.ssh/contabo_vps contabo 'grep -RIn "qdrant|agent-data|QDRANT|OPENAI|embedding|6333|8000" ... | sed redact'
ssh -i ~/.ssh/contabo_vps contabo 'nl -ba /opt/incomex/docker/docker-compose.yml | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker diff incomex-agent-data'
ssh -i ~/.ssh/contabo_vps contabo 'find /opt/incomex -maxdepth 4 -iname "*vector*" -o -iname "*qdrant*"'
ssh -i ~/.ssh/contabo_vps contabo 'ls -l /opt/incomex/scripts/qdrant-backup.sh; sed -n ...; find /opt/incomex/backups/qdrant ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec uptime-kuma sqlite3 /app/data/kuma.db "select ... from monitor ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec uptime-kuma sqlite3 /app/data/kuma.db "select ... from notification ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker logs --tail/--since ... incomex-agent-data | grep ...'