KB-67A7

P3D Agent Prompt — Vector/Search Freshness Audit READ-ONLY

7 min read Revision 1

promptvectorsearchfreshnessqdrantagent-datareadonlyp3d2026-05-10

P3D Agent Prompt — Vector/Search Freshness Audit READ-ONLY

Date: 2026-05-10 Author: GPT-5.5 Thinking / Incomex Hội đồng AI Purpose: Investigate why newly uploaded KB reports/prompts are not reliably searchable via searchKnowledge/vector. Recommended effort: medium-high Mode: READ-ONLY INVESTIGATION ONLY

0. Context

We observed that KB documents written by createDocument are available via direct path reads (getDocument / batchReadDocuments), but semantic search sometimes returns weak, stale, noisy, or incomplete results. This may be an indexing freshness issue, vector drift, ranking issue, metadata/title boost issue, or a mismatch between KB document store and Qdrant legacy vector store.

Important distinction:

Legacy KB vector search already exists and is document-based in Qdrant collection production_documents.
IU/vector-per-unit is a future plan and has NOT been implemented yet.
Do not confuse legacy KB vector sync with future IU vector chunking.

Prior evidence:

knowledge/dev/laws/dieu38-trien-khai/reports/vector-reality-check-agent-data-qdrant-2026-05-02.md

Current GPT health check on 2026-05-10 showed:

qdrant=ok
postgres=ok
openai=ok
document_count=2403
vector_point_count=4967
ratio=2.07
sync_status=warning
webhooks_registered=0
webhooks_active=0
listeners=1

1. Mission

Run a READ-ONLY audit to determine the real cause of current search/vector unreliability.

Questions to answer:

Are newly created KB documents being vectorized immediately after createDocument?
Are recent documents present in PG but missing in Qdrant?
Are recent documents present in Qdrant but ranked poorly?
Is searchKnowledge using Qdrant vector search only, hybrid search, or additional keyword logic?
Is title/path/document_id being boosted or ignored?
Is there a lag between KB write and vector availability?
Are webhooks/triggers/listeners actually connected for direct PG writes and API writes?
Is the warning sync_status caused by ghosts/orphans, ratio threshold, pending vector_status rows, or other drift?
What minimal safe fix is recommended, if any, without touching legacy vector behavior prematurely?

2. Hard boundaries

No mutation.
No reindex.
No auto_heal.
No /kb/reindex, /kb/reindex-missing, /kb/cleanup-orphans with write mode.
No Qdrant point deletion/upsert.
No DB INSERT/UPDATE/DELETE.
No trigger creation.
No restart/redeploy.
No config/code change.
No DOT script execution except read-only inspection.
Do not run dot-vector-audit --heal.
Do not implement IU vector.
Do not change production_documents collection.

3. Documents to use as test set

Use these recent docs as known targets. Verify direct PG/KB existence and vector/search behavior:

knowledge/dev/laws/dieu44-trien-khai/reviews/gpt-review-p3d-step1-reauthored-spec-and-pack1-directive-2026-05-10.md
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-pack1-readonly-inventory-prompt.md
knowledge/dev/laws/dieu44-trien-khai/directives/gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly-2026-05-10.md
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-agent-copy-paste-run-step1-checkpoint-and-pack1-inventory-2026-05-10.md
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-agent-vector-search-freshness-audit-readonly-2026-05-10.md

For each target:

Check KB/PG document exists.
Check vector_status if available.
Check Qdrant points for document_id if accessible.
Check number of chunks / point payload.
Run or simulate search queries and record if target appears in top 5/top 10/top 20.

4. Suggested read-only checks

C1 — Agent Data health

Call or inspect /health read-only. Record:

document_count
vector_point_count
ratio
sync_status
qdrant/postgres/openai status
listeners
webhooks_registered/webhooks_active

C2 — Audit sync read-only only

If endpoint exists and source confirms auto_heal=false is read-only, call:

POST /kb/audit-sync {"auto_heal": false}

Record orphan_count and ghost_count. Do not auto-heal.

C3 — Recent document vector presence

For target document IDs, inspect Qdrant points by document_id filter. Record:

document_id
point_count
chunk_indexes
metadata.title
metadata.tags
payload content prefix

C4 — Search/ranking test

Run searchKnowledge-equivalent or Agent Data chat/search endpoint if available with exact and semantic queries:

Examples:

"GPT Review P3D Step 1 Re-authored Spec Pack 1 Directive"
"p3d-pack1-readonly-inventory-prompt revision 2"
"gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly"
"vector search freshness audit readonly 2026-05-10"

For each query, record whether expected target appears in top 5/top 10/top 20.

C5 — Write path / vector sync code inspection

Read-only inspect relevant code paths and confirm current behavior:

API/MCP createDocument vector sync path.
updateDocument content change vector sync path.
metadata-only update skip behavior.
PG listener and trigger status.
whether kb_documents trigger exists.
whether webhooks are registered/active.
whether search dedups by document_id only.
whether search boosts title/path/document_id.

C6 — Cron/audit monitoring status

Inspect read-only:

dot-vector-audit schedule.
whether it still points to wrong localhost URL.
whether it is configured with --heal.
latest logs.
Qdrant backup schedule if visible.

5. Required report

Upload report to:

knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md

Report must include:

phase_status=PASS|PARTIAL|BLOCKED
mode=READ_ONLY_INVESTIGATION
no_mutation_performed=true
kb_direct_read_ok=true|false
qdrant_status=ok|warning|fail
sync_status=<value>
recent_docs_vectorized=all|partial|none|unknown
search_ranking_quality=good|noisy|bad|unknown
root_cause=<short>
root_cause_confidence=high|medium|low
recommended_next_action=<short>
unsafe_actions_not_taken=<list>

Also include a table:

Target doc | KB exists | vector points | search top5/top10/top20 | notes

6. Expected final response from Agent

Return only:

vector_audit_status=PASS|PARTIAL|BLOCKED
report_path=knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md
root_cause=<short>
confidence=high|medium|low
no_mutation_performed=true
recommended_next_action=<short>