VRC — Vector Reality Check: Agent Data + Langroid + Qdrant

Date: 2026-05-02 Status: INVESTIGATION COMPLETE — 0 mutations to data/config/code/container Agent: Codex Mode: READ-ONLY

Executive Verdict

Legacy vector hiện tại là: document-based. Point payload có document_id, content, metadata.chunk_index, metadata.total_chunks; không có unit_id, canonical_address, unit_version_id.
IU vector nên: song song, không replace legacy. Legacy KB vector vẫn phục vụ KB độc lập.
Duplicate content risk level: HIGH. Search hiện tại dedup chỉ theo document_id; nếu KB draft và IU final nằm ở 2 document_id khác nhau thì không có logic suppress/prioritize.

Evidence Table

Claim	Evidence	Source
Runtime container names were discovered, not assumed	`incomex-agent-data`, `postgres`, `incomex-qdrant`, `uptime-kuma`	`docker ps --format ...` on VPS
Active Qdrant collection is `production_documents`	Qdrant collections API returned only `production_documents`	`GET http://qdrant:6333/collections` from `incomex-agent-data`
Collection config is 1536-dim cosine	`vectors.size=1536`, `distance=Cosine`, `points_count=11052`, `status=green`	`GET /collections/production_documents`
Legacy payload is document/chunk based	payload fields: `content`, `document_id`, `metadata`, `parent_id`, `is_human_readable`; metadata has `chunk_index`, `total_chunks`	`POST /points/scroll` limit 3
Chunker is custom char splitter	`_split_text`, `CHUNK_SIZE=4000`, `CHUNK_OVERLAP=400`	`/app/agent_data/vector_store.py:15-80`
Upsert embeds per chunk and deterministic UUID5 point IDs	`uuid5(NAMESPACE_DNS, f"{document_id}:chunk:{idx}")`	`/app/agent_data/vector_store.py:186-227`
Search filters only tags/status; not source/unit	`filter_tags` -> `metadata.tags`; `filter_status` -> `metadata.status`	`/app/agent_data/vector_store.py:254-310`
Search dedups chunks by document only	`seen_docs` keyed by `document_id`	`/app/agent_data/vector_store.py:294-308`
`/chat` bypasses Langroid internal RAG by default raw mode	`_retrieve_query_context()` then `_build_raw_reply`; `!` prefix bypasses DocChatAgent RAG in summarized mode	`/app/agent_data/server.py:886-928`, `953-958`
PG trigger function exists but no trigger is installed on `kb_documents`	`information_schema.triggers` returned 0 rows; function source exists	PG query on `incomex_metadata`
Current drift exists	read-only `/kb/audit-sync` returned `orphan_count=47`, `ghost_count=7`	`POST /kb/audit-sync {"auto_heal": false}`
Scheduled vector audit is broken and unsafe if fixed as-is	crontab calls `dot-vector-audit --heal --local`; latest log repeats `FAIL (connection refused)` to `localhost:8000`; source says `--heal` sends `auto_heal:true`	`crontab -l`, `/var/log/incomex/dot-vector-audit.log`, `/opt/incomex/dot/bin/dot-vector-audit`
Agent Data health checks Qdrant/PG/OpenAI and flags vector/doc ratio critical	`/health` returned qdrant/postgres/openai `ok`, `document_count=2770`, `vector_point_count=11052`, `ratio=3.99`, `sync_status=critical`	`GET /health` inside container

VRC-1..VRC-13 Answers

VRC-1: Ingestion flow thực tế

Write paths table:

Path	Trigger	Calls vector?	Code file	Evidence
API/MCP `create_document` new doc	POST `/documents` or MCP `upload_document`	Yes, `_sync_vector_entry()` after PG `set_doc`	`/app/agent_data/server.py:1278-1381`	`vector_status=pending` then `_sync_vector_entry(... content.body ...)`
API/MCP create with `upsert=true` existing doc	POST `/documents?upsert=true`	Yes, delete old vectors then sync new	`/app/agent_data/server.py:1292-1321`	`_delete_vector_entry(doc_id)` then `_sync_vector_entry()`
API/MCP `update_document` content changed	PUT `/documents/{doc_id}`	Yes, delete old vectors then sync new	`/app/agent_data/server.py:1467-1487`	`content_changed` gates delete+sync
API/MCP `update_document` metadata-only	PUT `/documents/{doc_id}`	No re-embed	`/app/agent_data/server.py:1467-1497`	logs `skip_reembed`
API/MCP `delete_document`	DELETE `/documents/{doc_id}`	Yes, delete by `document_id` filter	`/app/agent_data/server.py:1604-1647`, `/app/agent_data/vector_store.py:398-414`	`_delete_vector_entry(doc_id)` before marking PG deleted
PG direct SQL into `kb_documents`	Intended trigger `fn_kb_notify_vector_sync`	Currently no, because no trigger installed	PG `information_schema.triggers`	0 trigger rows for `kb_documents`; function exists but unattached
PG listener	LISTEN `kb_vector_sync`	Would call upsert/delete if NOTIFY arrives	`/app/agent_data/pg_vector_listener.py:37-95`	handles `DELETE`, `INSERT`, `UPDATE`; skips body `<10`
Directus sync `knowledge_documents`	Agent Data -> Directus mirror	No Qdrant call found	`/app/agent_data/directus_sync.py:267-390`	SQL UPDATE/INSERT to `knowledge_documents`, no `vector_store`
`/kb/reindex`	Manual API	Yes, bulk re-embed all active docs	`/app/agent_data/server.py:1965-2037`	write endpoint; not called
`/kb/reindex-missing`	Manual API	Yes, re-embed ghosts	`/app/agent_data/server.py:2329-2344`	write endpoint; not called
`/kb/cleanup-orphans`	Manual API	Deletes orphan vectors when `dry_run=false`	`/app/agent_data/server.py:2053-2109`	write endpoint; not called
`/kb/audit-sync`	Manual/cron	Read-only only when `auto_heal=false`; writes when `true`	`/app/agent_data/server.py:2241-2326`	only `auto_heal:false` was called

PG trigger function filter logic:

Skip task comments: keys like operations/tasks/comments/% or operations__tasks__comments__%.
Skip registries: keys like registries/% or registries__%.
Skip empty key.
Skip INSERT/UPDATE with length(NEW.data->'content'->>'body') < 10.
Then pg_notify('kb_vector_sync', json_build_object('op', TG_OP, 'key', v_key, 'document_id', ...)).
Evidence: pg_get_functiondef('fn_kb_notify_vector_sync').
Critical runtime fact: SELECT ... FROM information_schema.triggers WHERE event_object_table='kb_documents' returned 0 rows.

VRC-2: Langroid/Chunking rule

Chunking config:

Parameter	Value	Code file:line	Evidence
Splitter class	Custom `_split_text`; not Langroid splitter for production upsert	`/app/agent_data/vector_store.py:41-80`	char-based function
Max chars/tokens	4000 chars default	`/app/agent_data/vector_store.py:15-17`	`QDRANT_CHUNK_SIZE` default `4000`
Overlap	400 chars default	`/app/agent_data/vector_store.py:15-17`	`QDRANT_CHUNK_OVERLAP` default `400`
Separator	paragraph `\n\n`, sentence `.` , word space fallback	`/app/agent_data/vector_store.py:56-71`	`rfind` boundary selection
Min content to embed	API path skips only non-string/blank; PG listener skips `<10` chars	`/app/agent_data/server.py:546-547`, `/app/agent_data/pg_vector_listener.py:71-73`	direct API can embed short nonblank content; listener cannot
Embedding truncation	Each chunk embedding input truncated to 6000 chars	`/app/agent_data/vector_store.py:152-156`	`truncated = text[:6000]`
Embedding model	default `text-embedding-3-small`; env override `QDRANT_EMBED_MODEL`	`/app/agent_data/vector_store.py:94`	runtime env did not show override
Chunk ID generation	UUID5 of `document_id:chunk:{idx}`	`/app/agent_data/vector_store.py:213-216`	deterministic point ID
Chunk fields	`metadata.chunk_index`, `metadata.total_chunks`	`/app/agent_data/vector_store.py:200-210`	payload builder

Langroid status: AgentData subclasses DocChatAgent, and agent_config.vecdb uses Langroid QdrantDBConfig, but /chat explicitly bypasses DocChatAgent internal RAG after doing custom retrieval. Evidence: /app/agent_data/main.py:20-23, 53-68; /app/agent_data/server.py:313-346, 953-958.

VRC-3: Qdrant collections

Collections table:

Collection name	Vector dim	Distance	Point count	Status	Purpose
`production_documents`	1536	Cosine	11052	green	Active KB legacy vectors; configured by `QDRANT_COLLECTION`

No staging/dev collection found via Qdrant API. Qdrant version: 1.16.3.

VRC-4: Qdrant payload fields — SAMPLE READ-ONLY

Sample payload (3 points, content redacted for length):

Point 1:

{
  "id": "00071df5-8461-5166-93a3-0d1300db6ec8",
  "payload": {
    "content": "[long markdown, redacted]",
    "document_id": "context-pack/20260430-220013-092d70/ENTITIES_OVERVIEW.md",
    "metadata": {
      "tags": ["dieu43", "context-pack", "build"],
      "title": "Đ43 context-pack 20260430-220013-092d70 section=entities_overview",
      "source": "dieu43_context_pack_publish",
      "build_id": "20260430-220013-092d70",
      "chunk_index": 0,
      "total_chunks": 1
    },
    "parent_id": null,
    "is_human_readable": false
  }
}

Point 2:

{
  "id": "00097689-3d59-5c42-9673-aec6e52c5879",
  "payload": {
    "content": "[long markdown chunk, redacted]",
    "document_id": "context-pack/20260424-100012-68cb8a/DOT_REGISTRY.md",
    "metadata": {
      "tags": ["dieu43", "context-pack", "build"],
      "title": "Đ43 context-pack 20260424-100012-68cb8a section=dot_registry",
      "source": "dieu43_context_pack_publish",
      "build_id": "20260424-100012-68cb8a",
      "chunk_index": 9,
      "total_chunks": 14
    },
    "parent_id": null,
    "is_human_readable": false
  }
}

Point 3:

{
  "id": "000ae675-edec-5546-9a49-ae1e4937b44a",
  "payload": {
    "content": "[long markdown chunk, redacted]",
    "document_id": "context-pack/20260422-070005-6af1da/PROJECT_MAP.md",
    "metadata": {
      "tags": ["dieu43", "context-pack", "build"],
      "title": "Đ43 context-pack 20260422-070005-6af1da section=project_map",
      "source": "dieu43_context_pack_publish",
      "build_id": "20260422-070005-6af1da",
      "chunk_index": 1,
      "total_chunks": 7
    },
    "parent_id": null,
    "is_human_readable": false
  }
}

Payload fields summary:

Field name	Type	Present in all 3?	Example value
`content`	string	Yes	markdown chunk
`document_id`	string	Yes	`context-pack/.../PROJECT_MAP.md`
`metadata`	object	Yes	tags/title/source/build_id/chunk fields
`metadata.tags`	array string	Yes	`["dieu43","context-pack","build"]`
`metadata.title`	string	Yes	context-pack title
`metadata.source`	string	Yes	`dieu43_context_pack_publish`
`metadata.build_id`	string	Yes	`20260422-070005-6af1da`
`metadata.chunk_index`	integer	Yes	`1`
`metadata.total_chunks`	integer	Yes	`7`
`parent_id`	string/null	Yes	`null`
`is_human_readable`	boolean	Yes	`false`
`unit_id`	absent	No	not present
`canonical_address`	absent	No	not present
`unit_version_id`	absent	No	not present
`content_hash`	absent in samples	No	not present

Point ID format is UUID string, deterministic by code.

VRC-5: Orphan/ghost vector audit

Orphan/ghost audit:

Checker	File/endpoint	Schedule	Auto-clean?	Log location
`dot-vector-audit`	`/opt/incomex/dot/bin/dot-vector-audit`	daily 04:30 UTC	Intended yes because crontab passes `--heal`, but currently failing before endpoint call	`/var/log/incomex/dot-vector-audit.log`
`/kb/audit-sync`	`/app/agent_data/server.py:2241-2326`	no internal schedule	No when `auto_heal=false`; yes when `auto_heal=true`	app logs
`/kb/cleanup-orphans`	`/app/agent_data/server.py:2053-2109`	no internal schedule	Yes if `dry_run=false`	app logs
`/kb/reindex-missing`	`/app/agent_data/server.py:2329-2344`	no internal schedule	Reindexes missing vectors	app logs

Audit logic: _run_audit() loads active PG doc IDs from kb_documents, loads unique Qdrant document_id values, computes orphan_ids = qdrant_ids - pg_ids and ghost_ids = pg_ids - qdrant_ids. Evidence: /app/agent_data/server.py:2112-2150.

VRC-6: Update/delete semantics

Update semantics:

Operation	Logic	Atomic?	Retry on fail?	Code file:line
upsert	split content, embed each chunk, batch `upsert(... wait=True)`	Qdrant upsert batch is waited, but not transactional with PG	SDK helper wrapped by `sync_retry`; embedding per chunk can fail before upsert	`/app/agent_data/vector_store.py:163-252`, `371-375`
update content	PG update first, delete all old vectors, then upsert new chunks	Not atomic across PG/Qdrant; failure between delete and upsert can create ghost	Delete+sync wrapped in try but no rollback	`/app/agent_data/server.py:1448-1487`
delete	delete Qdrant vectors by filter `document_id`, then mark PG deleted	Not atomic	`sync_retry` on Qdrant delete; PG update follows	`/app/agent_data/server.py:1604-1647`, `/app/agent_data/vector_store.py:398-414`
multi-chunk update	delete all old chunks then insert new chunk set	Not atomic	retry helper per Qdrant call; no per-doc transaction	same as above

VRC-7: Scale hiện tại

Scale snapshot:

Metric	Value	Source
KB docs total	3709 rows	`SELECT count(*) FROM kb_documents`
KB active docs	2770	`deleted_at IS NULL` and `/health`
Qdrant points total	11052	Qdrant collection info and `/health`
Ratio points/docs	3.99	`/health`
Ghosts (doc without vector)	7	`/kb/audit-sync auto_heal=false`
Orphans (vector document_id without live doc)	47	`/kb/audit-sync auto_heal=false`
vector_status distribution	`deleted=939`, `ready=2762`, `pending=2`, `none=5`, blank=1	PG group query
Health sync status	`critical`	`/health data_integrity.sync_status`

VRC-8: Do-not-touch invariants

Do-not-touch invariants:

Item	Type	Location	Ephemeral?	Reasoning
`production_documents` collection	Qdrant collection	`incomex-qdrant`	No, persisted at `/opt/incomex/docker/qdrant/data`	Active KB vector store
vector dimensions/distance	Qdrant schema	1536/Cosine	No	Must match `text-embedding-3-small`; changing breaks retrieval
point payload contract	Qdrant payload	`content`, `document_id`, `metadata`, `parent_id`, `is_human_readable`	No	`/chat` and audit depend on `document_id` and `content`
`QDRANT_COLLECTION=production_documents`	runtime config	`/opt/incomex/docker/docker-compose.yml:99-101`	No	Agent Data reads/writes this collection
Qdrant API key/env	secret/config	compose env and container env	No	Required by all Qdrant calls
OpenAI API key/env	secret/config	compose env and container env	No	Required for embedding
`kb_documents` table	PG metadata store	`incomex_metadata.public.kb_documents`	No	SSOT for Agent Data document metadata
`fn_kb_notify_vector_sync()`	PG function	`incomex_metadata.public`	No	Intended PG->Qdrant listener payload contract
Agent Data code inside container	runtime code	`/app/agent_data/*`	Yes if patched in container; current `docker diff` only showed runtime files under `/root`, `/tmp`, credentials, and `/app` dir marker	VPS/container is SSOT for runtime behavior
`dot-vector-audit` cron	monitoring cron	root crontab	No	Currently scheduled but failing; `--heal` is a write behavior
Qdrant backup script	backup	`/opt/incomex/scripts/qdrant-backup.sh`	No	Last local snapshot observed 2026-04-03; script creates Qdrant snapshots
Docker restart policies	recovery	compose/inspect	No	Agent Data and Qdrant use `unless-stopped`

VRC-9: Safe IU Vector architecture proposal (evidence-based)

Proposed IU vector architecture:

Phase	Action	Impact on legacy?	Pre-condition	Risk
Phase 0	Read-only mapping adapter: map `kb_document_path/document_id` to IU `canonical_address` outside Qdrant	None	IU tables and canonical address mapping exist; no Qdrant writes	Incomplete dedup because legacy payload lacks IU fields
Phase 1	Prefer IU-specific collection over payload enrichment for first writable phase	No change to legacy collection	IU embedding job + collection creation APR + rollback plan	Managing two collections and merge logic
Phase 2	IU-specific projection/search adapter across KB collection + IU collection	Legacy still independent	Query API that can target both and combine results	Ranking/dedup complexity
Phase 3	Unit-aware chunking strictly within IU boundaries	Legacy chunking untouched	Benchmarks, content_hash/canonical_address dedup, performance proof	Bad chunk boundary rules can reduce recall

Reasoning: current legacy payload has no IU identifiers and current search only supports tags/status filters. Mutating legacy payload or chunking would directly affect /chat, audit, and KB behavior.

VRC-10: Duplicate Content Resolution

Capabilities and current behavior:

Payload filtering exists in current code only for metadata.tags and metadata.status; no source=kb/iu or unit fields. Evidence: /app/agent_data/vector_store.py:275-290.
Qdrant itself accepts collection-scoped search/filter calls; current Agent Data _qdrant_search() targets exactly self.collection. Evidence: /app/agent_data/vector_store.py:383-389.
Current Agent Data has no multi-collection search support. Evidence: one QDRANT_COLLECTION and _retrieve_query_context() calls one store.search().
Qdrant ranking returns nearest vectors; current Agent Data dedups chunks by document_id only. It does not dedup semantically similar content across different documents. Evidence: /app/agent_data/vector_store.py:292-310.
/chat exposes filters tags, tenant_id, status, but vector path ignores tenant_id. Evidence: /app/agent_data/server.py:231-259, 1116-1123.

Duplicate options:

Option	Mô tả	Ưu	Nhược	Feasibility (dựa VRC)
A: Collection separation	IU vector dùng collection riêng; search target KB, IU, or both in adapter	Clean separation, no legacy mutation	Need second collection and merge code	High for Phase 1+, because current legacy is one collection and collection choice is config/code-scoped
B: Payload tagging + filter	Same collection, add `source=kb/iu`, filter at search	Simple concept	Requires mutating legacy collection payload/search contract; current code has no source filter	Low for Phase 0, medium later with APR
C: Priority ranking + dedup	Search KB + IU, dedup by canonical_address/content_hash, prefer IU enacted	Best quality	Requires IU metadata/hash and search merge layer	Best target for Phase 2, not Phase 0

Phase 0 safest option: A-read-only precursor: no IU vectors yet, build mapping adapter only; when writable, use collection separation.

Why: current legacy has live drift (47 orphan, 7 ghost), broken scheduled audit, critical ratio, and no IU payload fields. Touching legacy payload/chunking/search in Phase 0 risks breaking KB retrieval and audit while the monitoring loop is already unreliable.

Conditions to move stronger:

IU schema has stable canonical_address, unit_version_id, content_hash.
dot-vector-audit schedule is fixed to a correct URL and write behavior is explicitly approved.
New IU collection has backup/restore and read-only verifier before ingestion.
Search adapter can return source labels and dedup by canonical/content hash.

Risk if choosing wrong:

Same collection/tagging too early can pollute legacy KB search and make rollback hard.
Replacing legacy chunking can change current recall and invalidate existing audit ratios.
Cross-collection merge without dedup can double-answer common KB draft + IU final content.

VRC-11: Search retrieval behavior

Search retrieval behavior:

Behavior	Actual implementation	Code file:line	Evidence
Search endpoint	POST `/chat`; sync endpoint	`/app/agent_data/server.py:825-928`	calls `_retrieve_query_context()`
Collection queried	Single configured collection `QDRANT_COLLECTION`, runtime `production_documents`	`/app/agent_data/vector_store.py:86-94`, `383-389`	`self.collection`
Payload filter support	Tags/status only	`/app/agent_data/vector_store.py:275-290`	`metadata.tags`, `metadata.status`
Filter exposed to API caller	`filters.tags`, `filters.tenant_id`, `filters.status`; vector path ignores tenant_id	`/app/agent_data/server.py:231-259`, `1116-1123`	no tenant passed to `store.search()`
Rerank logic	None found	`/app/agent_data/vector_store.py:292-310`	raw score ordering from Qdrant then document dedup
Dedup logic	Dedup matching chunks by `document_id` only	`/app/agent_data/vector_store.py:294-308`	`seen_docs`
Top-K default	5, configurable 1-20	`/app/agent_data/server.py:254-260`	`Field(default=5, ge=1, le=20)`
Duplicate handling KB + IU same content	None for different `document_id`; both can appear	Inference from search code	dedup key is only `document_id`
Response fields to caller	`document_id`, `snippet`, `score`, `metadata`; response/content/session/usage	`/app/agent_data/server.py:140-160`, `1127-1133`	Pydantic models
Multi-collection search	Not implemented in Agent Data	`/app/agent_data/vector_store.py:383-389`	one `collection_name=self.collection`

VRC-12: Orphan/missing vector detection — current state

Orphan/missing vector detection — current state:

Check	Status	Last run	Result	Evidence
dot-vector-audit in crontab	Active entry, but operationally failing	daily 04:30 UTC scheduled	Calls wrong local URL and fails	crontab + log
dot-vector-audit last log	Failing repeatedly	latest tail shows repeated failures	`Pre-flight check... FAIL (connection refused)`	`/var/log/incomex/dot-vector-audit.log`
`/kb/audit-sync` read-only?	Read-only only with `auto_heal=false`	N/A	safe call made with false	`/app/agent_data/server.py:2248-2267`
Current orphan count	Active drift	2026-05-02 investigation	47	`/kb/audit-sync`
Current ghost count	Active drift	2026-05-02 investigation	7	`/kb/audit-sync`
Alert/notification on drift	Partial	Uptime Kuma monitors `/api/health` Qdrant keyword and health endpoint	health shows critical but monitor keyword only checks qdrant OK, not ratio critical	Uptime Kuma DB, `/health`
Auto-cleanup active?	Scheduled intent yes, actual no	cron daily	`--heal` configured but cannot connect	crontab + log
Vector:Doc ratio healthy?	No	live	3.99, `critical`	`/health`

VRC-13: Hệ thống giám sát đồng bộ vector ↔ data — current state

Vector sync monitoring — current state:

System	Component	Status (active/inactive/unknown)	Evidence
PG→Qdrant listener	startup thread, reconnect loop	Code active, runtime effect limited because no PG trigger installed	`/app/agent_data/resilient_client.py:469-475`; trigger query returned 0
Agent Data `/health`	Qdrant connection check	Active	`/health` reports qdrant/postgres/openai and data_integrity
Uptime Kuma	Agent Data / Qdrant monitor	Active	monitors: `Agent Data Health`, `Qdrant Health`, `Docker Services`; default Telegram notification active
Docker restart policy	container recovery	Active	`unless-stopped` for Agent Data and Qdrant
vector_status stuck check	pending/error detection	Unknown/no dedicated active check found	PG has pending/none/blank; no cron specifically found for vector_status stuck
Qdrant backup	schedule/location	Script exists; current schedule not found in crontab; last observed local snapshot 2026-04-03	`/opt/incomex/scripts/qdrant-backup.sh`; `/opt/incomex/backups/qdrant`
Embedding API failure	retry/timeout/fallback	Retry wrapper exists; errors recorded as `vector_status=error`; no fallback embedding model	`/app/agent_data/vector_store.py:147-161`, `240-252`; `resilient_client.py:45-48`, `191-220`
dot-vector-audit	orphan/ghost periodic	Scheduled but failing	crontab + log

Gap summary:

PG trigger missing means direct PG writes do not notify listener.
dot-vector-audit --heal --local is scheduled but failing because it checks http://localhost:8000 from host where no service is listening.
Uptime Kuma Qdrant monitor checks qdrant "status":"ok", but not data_integrity.sync_status=critical.
No evidence of active vector_status stuck alert.
Qdrant backup script exists, but no current cron entry was found and last observed local snapshot is 2026-04-03.

Do-not-touch Invariants (Patch 6 format bắt buộc)

Invariant	Why it must not change	Evidence	Required approval if change
Legacy collection `production_documents`	Active KB search/audit collection	Qdrant API returned only this collection; compose sets `QDRANT_COLLECTION`	Separate APR + rollback plan
Vector dimension 1536 and Cosine distance	Must match existing OpenAI embeddings and search assumptions	Qdrant collection config	Separate APR + rollback plan
Payload keys `document_id` and `content`	Search/audit/delete depend on them	`vector_store.search()`, `delete_document()`, `_run_audit()`	Separate APR + rollback plan
Metadata chunk fields	Multi-chunk dedup and traceability depend on them	payload samples and `vector_store.py:200-210`	Separate APR + rollback plan
Legacy chunking 4000/400	Existing vector ratios and retrieval behavior depend on it	`vector_store.py:15-17`, health baseline comments	Separate APR + rollback plan
Delete-by-`document_id` semantics	Cleanup/delete removes all chunks for a doc	`vector_store.py:398-414`	Separate APR + rollback plan
`kb_documents` JSONB store	Agent Data metadata SSOT	`pg_store.py:80-111`, `server.py` paths	Separate APR + rollback plan
`/kb/audit-sync auto_heal=true` behavior	It can reindex/delete vectors	`server.py:2269-2326`	Separate APR + rollback plan
Qdrant storage path `/opt/incomex/docker/qdrant/data`	Persistent vector data	compose lines 32-35 and docker mount	Separate APR + rollback plan
API keys and embedding env vars	Required for vector operations	compose lines 95-115, env names observed	Separate APR + rollback plan
Docker restart policy `unless-stopped`	Current recovery behavior	docker inspect and compose	Separate APR + rollback plan
Uptime Kuma health monitors	Current external alert path	Uptime Kuma DB monitor rows	Separate APR + rollback plan

Unknowns

Exact reason PG trigger is absent: function exists, but no trigger rows exist; migration history was not mutated or inspected beyond read-only evidence.
Whether Qdrant backup has an external scheduler outside root crontab: script and snapshots exist, but root crontab did not show qdrant-backup.sh.
Whether listener thread is currently alive internally: code starts it, but logs did not show pg-vector-sync messages in the sampled window.
Exact Qdrant global/multi-collection API features beyond installed version: Agent Data does not implement multi-collection search; IU design should not rely on unavailable app-layer support.
Full alert delivery status in Telegram: Uptime Kuma has an active default Telegram notification, but no live alert was triggered in this investigation.

Recommended Safe Architecture

Legacy vector: untouched.
IU bridge/vector: parallel.
Duplicate resolution: Phase 0 mapping adapter only; Phase 1 new IU collection; Phase 2 merge + dedup adapter.
Migration phases:
- Phase 0: read-only mapping adapter, no vector writes.
- Phase 1: create IU collection through approved APR; do not enrich legacy payload yet.
- Phase 2: multi-source retrieval adapter, return source, canonical_address, content_hash, score.
- Phase 3: unit-aware chunking only after benchmark and rollback design.

Risk Table

Risk	Severity	Mitigate
Legacy drift already exists	High	Fix monitor/report first; do not add IU writes into legacy collection
Missing PG trigger means direct PG changes bypass vector sync	High	Separate APR to restore trigger or formalize API-only write path
Scheduled audit is configured with `--heal` but failing	High	Separate APR: correct URL/mode, decide dry-run vs heal, add rollback
Duplicate KB + IU search results	High	Collection separation + adapter dedup by canonical/content_hash
Same-collection payload tagging pollutes legacy search	Medium/High	Avoid in Phase 0; require APR and backfill verifier
Update deletes old chunks before new upsert	Medium	Add verifier/retry before relying on new IU ingestion
Qdrant backup freshness unclear	Medium	Verify/restore backup schedule before IU collection writes

No-mutation Statement

Agent xác nhận: KHÔNG mutate bất kỳ Qdrant points/collection/schema/payload, PG rows/schema/triggers, runtime code/config/container, restart/rebuild/re-embed/cleanup nào trong quá trình điều tra. Mutation duy nhất: tạo report markdown này tại path được yêu cầu. Endpoint audit duy nhất được gọi là /kb/audit-sync với {"auto_heal": false} sau khi đọc source xác nhận nhánh đó read-only.

Commands đã chạy (read-only, secrets redacted where relevant):

sed -n '1,260p' .claude/skills/incomex-rules.md
rg -n "search_knowledge|operating rules SSOT|hiến pháp|constitution|knowledge" -S .claude knowledge
search_knowledge("operating rules SSOT")
search_knowledge("hiến pháp v4.0 constitution")
search_knowledge("vector Qdrant Agent Data Langroid kb_documents sync law điều 38")
search_knowledge("law-01-foundation-principles vector sync Directus Agent Data Qdrant")
batch_read(["knowledge/dev/ssot/vps/vps-operating-rules.md","knowledge/dev/laws/constitution.md","knowledge/dev/laws/law-14-no-duplicate.md","knowledge/dev/laws/law-01-foundation-principles.md"])
ssh -i ~/.ssh/contabo_vps contabo 'docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"'
ssh -i ~/.ssh/contabo_vps contabo 'hostname; whoami; date -Is; pwd'
ssh -i ~/.ssh/contabo_vps contabo 'crontab -l 2>&1'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data find /app -maxdepth 3 -type f | sort'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data grep -RIn "...vector/search/chunk patterns..." /app --include="*.py"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data env | grep -Ei "qdrant|embed|openai|vector|collection|postgres|database|directus"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data nl -ba /app/agent_data/vector_store.py | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data nl -ba /app/agent_data/server.py | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data nl -ba /app/agent_data/pg_vector_listener.py | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d directus -Atc "SELECT datname FROM pg_database ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -Atc "SELECT concat(schemaname,'.',tablename) ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -c "SELECT trigger_name ... information_schema.triggers ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -c "SELECT pg_get_functiondef(...)"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec postgres psql -U directus -d incomex_metadata -c "SELECT count...; SELECT vector_status..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET Qdrant /collections with api-key from env"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET Qdrant /collections/production_documents with api-key from env"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "POST Qdrant /points/scroll limit 3 payload only"'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET Qdrant / version"'
ssh -i ~/.ssh/contabo_vps contabo 'ls -l /opt/incomex/dot/bin/dot-vector-audit; sed -n "1,260p" /opt/incomex/dot/bin/dot-vector-audit'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "POST /kb/audit-sync auto_heal=false"'
ssh -i ~/.ssh/contabo_vps contabo 'tail -80 /var/log/incomex/dot-vector-audit.log'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec incomex-agent-data python -c "GET /health"'
ssh -i ~/.ssh/contabo_vps contabo 'docker inspect incomex-agent-data ...; docker inspect incomex-qdrant ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker inspect ... mounts'
ssh -i ~/.ssh/contabo_vps contabo 'find /opt/incomex -maxdepth 3 ... docker-compose/.env'
ssh -i ~/.ssh/contabo_vps contabo 'grep -RIn "qdrant|agent-data|QDRANT|OPENAI|embedding|6333|8000" ... | sed redact'
ssh -i ~/.ssh/contabo_vps contabo 'nl -ba /opt/incomex/docker/docker-compose.yml | sed -n ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker diff incomex-agent-data'
ssh -i ~/.ssh/contabo_vps contabo 'find /opt/incomex -maxdepth 4 -iname "*vector*" -o -iname "*qdrant*"'
ssh -i ~/.ssh/contabo_vps contabo 'ls -l /opt/incomex/scripts/qdrant-backup.sh; sed -n ...; find /opt/incomex/backups/qdrant ...'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec uptime-kuma sqlite3 /app/data/kuma.db "select ... from monitor ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker exec uptime-kuma sqlite3 /app/data/kuma.db "select ... from notification ..."'
ssh -i ~/.ssh/contabo_vps contabo 'docker logs --tail/--since ... incomex-agent-data | grep ...'