P3D Agent Prompt — Vector/Search Freshness Audit READ-ONLY
P3D Agent Prompt — Vector/Search Freshness Audit READ-ONLY
Date: 2026-05-10 Author: GPT-5.5 Thinking / Incomex Hội đồng AI Purpose: Investigate why newly uploaded KB reports/prompts are not reliably searchable via searchKnowledge/vector. Recommended effort: medium-high Mode: READ-ONLY INVESTIGATION ONLY
0. Context
We observed that KB documents written by createDocument are available via direct path reads (getDocument / batchReadDocuments), but semantic search sometimes returns weak, stale, noisy, or incomplete results. This may be an indexing freshness issue, vector drift, ranking issue, metadata/title boost issue, or a mismatch between KB document store and Qdrant legacy vector store.
Important distinction:
- Legacy KB vector search already exists and is document-based in Qdrant collection
production_documents. - IU/vector-per-unit is a future plan and has NOT been implemented yet.
- Do not confuse legacy KB vector sync with future IU vector chunking.
Prior evidence:
knowledge/dev/laws/dieu38-trien-khai/reports/vector-reality-check-agent-data-qdrant-2026-05-02.md
Current GPT health check on 2026-05-10 showed:
qdrant=ok
postgres=ok
openai=ok
document_count=2403
vector_point_count=4967
ratio=2.07
sync_status=warning
webhooks_registered=0
webhooks_active=0
listeners=1
1. Mission
Run a READ-ONLY audit to determine the real cause of current search/vector unreliability.
Questions to answer:
- Are newly created KB documents being vectorized immediately after
createDocument? - Are recent documents present in PG but missing in Qdrant?
- Are recent documents present in Qdrant but ranked poorly?
- Is searchKnowledge using Qdrant vector search only, hybrid search, or additional keyword logic?
- Is title/path/document_id being boosted or ignored?
- Is there a lag between KB write and vector availability?
- Are webhooks/triggers/listeners actually connected for direct PG writes and API writes?
- Is the warning sync_status caused by ghosts/orphans, ratio threshold, pending vector_status rows, or other drift?
- What minimal safe fix is recommended, if any, without touching legacy vector behavior prematurely?
2. Hard boundaries
- No mutation.
- No reindex.
- No auto_heal.
- No
/kb/reindex,/kb/reindex-missing,/kb/cleanup-orphanswith write mode. - No Qdrant point deletion/upsert.
- No DB INSERT/UPDATE/DELETE.
- No trigger creation.
- No restart/redeploy.
- No config/code change.
- No DOT script execution except read-only inspection.
- Do not run
dot-vector-audit --heal. - Do not implement IU vector.
- Do not change
production_documentscollection.
3. Documents to use as test set
Use these recent docs as known targets. Verify direct PG/KB existence and vector/search behavior:
knowledge/dev/laws/dieu44-trien-khai/reviews/gpt-review-p3d-step1-reauthored-spec-and-pack1-directive-2026-05-10.md
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-pack1-readonly-inventory-prompt.md
knowledge/dev/laws/dieu44-trien-khai/directives/gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly-2026-05-10.md
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-agent-copy-paste-run-step1-checkpoint-and-pack1-inventory-2026-05-10.md
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-agent-vector-search-freshness-audit-readonly-2026-05-10.md
For each target:
- Check KB/PG document exists.
- Check vector_status if available.
- Check Qdrant points for
document_idif accessible. - Check number of chunks / point payload.
- Run or simulate search queries and record if target appears in top 5/top 10/top 20.
4. Suggested read-only checks
C1 — Agent Data health
Call or inspect /health read-only. Record:
document_count
vector_point_count
ratio
sync_status
qdrant/postgres/openai status
listeners
webhooks_registered/webhooks_active
C2 — Audit sync read-only only
If endpoint exists and source confirms auto_heal=false is read-only, call:
POST /kb/audit-sync {"auto_heal": false}
Record orphan_count and ghost_count. Do not auto-heal.
C3 — Recent document vector presence
For target document IDs, inspect Qdrant points by document_id filter. Record:
document_id
point_count
chunk_indexes
metadata.title
metadata.tags
payload content prefix
C4 — Search/ranking test
Run searchKnowledge-equivalent or Agent Data chat/search endpoint if available with exact and semantic queries:
Examples:
"GPT Review P3D Step 1 Re-authored Spec Pack 1 Directive"
"p3d-pack1-readonly-inventory-prompt revision 2"
"gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly"
"vector search freshness audit readonly 2026-05-10"
For each query, record whether expected target appears in top 5/top 10/top 20.
C5 — Write path / vector sync code inspection
Read-only inspect relevant code paths and confirm current behavior:
- API/MCP createDocument vector sync path.
- updateDocument content change vector sync path.
- metadata-only update skip behavior.
- PG listener and trigger status.
- whether
kb_documentstrigger exists. - whether webhooks are registered/active.
- whether search dedups by document_id only.
- whether search boosts title/path/document_id.
C6 — Cron/audit monitoring status
Inspect read-only:
dot-vector-auditschedule.- whether it still points to wrong localhost URL.
- whether it is configured with
--heal. - latest logs.
- Qdrant backup schedule if visible.
5. Required report
Upload report to:
knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md
Report must include:
phase_status=PASS|PARTIAL|BLOCKED
mode=READ_ONLY_INVESTIGATION
no_mutation_performed=true
kb_direct_read_ok=true|false
qdrant_status=ok|warning|fail
sync_status=<value>
recent_docs_vectorized=all|partial|none|unknown
search_ranking_quality=good|noisy|bad|unknown
root_cause=<short>
root_cause_confidence=high|medium|low
recommended_next_action=<short>
unsafe_actions_not_taken=<list>
Also include a table:
Target doc | KB exists | vector points | search top5/top10/top20 | notes
6. Expected final response from Agent
Return only:
vector_audit_status=PASS|PARTIAL|BLOCKED
report_path=knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md
root_cause=<short>
confidence=high|medium|low
no_mutation_performed=true
recommended_next_action=<short>