P3D — Vector Search Reliability Hardening — Implementation Report
P3D — Vector Search Reliability Hardening — Implementation Report
Date: 2026-05-11 Mode: IMPLEMENTATION (compose build durability + canary + recency + audit classification + unified contract) Author: Claude Opus 4.7 Prompt ref:
knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-vector-search-reliability-hardening-implementation-prompt.mdrev2 Design ref:knowledge/dev/laws/dieu44-trien-khai/design/p3d-vector-search-reliability-hardening-design.md
Status Fields
phase_status=PASS
mode=IMPLEMENTATION
search_boost_behavior=PASS
production_durability=PASS
compose_build_added=true
agent_data_rebuilt=true
agent_data_force_recreated=true
canary_status=PASS
recency_tiebreak=SKIPPED_METADATA_UNAVAILABLE
recency_safe_for_semantic=true
audit_warning_cleanup=IMPLEMENTED
audit_sync_status_after=clean
qdrant_points_before=5015
qdrant_points_after=5015
qdrant_mutation_performed=false
db_write_performed=false
pg_schema_mutation_performed=false
no_reindex_performed=true
rollback_performed=false
health_after=healthy
unified_search_contract_ready=true
§1. Preflight
| Field | Value |
|---|---|
AGENT_DATA |
incomex-agent-data (was Up 27 min healthy) |
QDRANT |
incomex-qdrant (Up 7 weeks healthy) |
COMPOSE_FILE |
/opt/incomex/docker/docker-compose.yml |
REPO_DIR |
/opt/incomex/docker/agent-data-repo (git clean, head ff2fc25) |
Dockerfile |
present; COPY agent_data/ /app/agent_data/ confirmed |
| Health before | status=healthy, services qdrant/postgres/openai all ok |
points_count before |
5015 (status green) |
Backup directory: /tmp/p3d-vector-hardening-backup-20260511-050538/
vector_store.py.beforemd5ec3038e4aca71f8a3b10209eb4ec21ffserver.py.beforemd5ed31c1126b8453dc572a72b5aa1b8ed8docker-compose.yml.beforemd5581b85067b4b19cc7fcd96b6d350dfba
§2. Deployment Durability — Compose build: Added
Change
agent-data:
image: agent-data-local:latest
+ build:
+ context: ./agent-data-repo
+ dockerfile: Dockerfile
container_name: incomex-agent-data
Only agent-data touched. No other service, env, network, or volume modified.
Validation
docker compose config --quiet → OK
diff backup/before → 3 lines added, no other change
Build + Force-recreate
docker compose build agent-data → OK (~130s; layered base reused; agent_data/ baked from host)
docker compose up -d --force-recreate agent-data → Container recreated
docker inspect → State.Health.Status: healthy (≈65s start-up)
Proof of durability
- New container image md5s differ from baked-in 2026-04-17 image (new build).
- Container
/app/agent_data/vector_store.pymd5 after recreate:7eb74143f53e6a5fd34a54f1ffbeab38(matches new build). - Container
/app/agent_data/server.pymd5:70b8d82460c64993d43170561c998126. - Canary PASS after
--force-recreate(see §5). Confirms the rerank patch survives container recreate. The earlierdocker cphotfix is now redundant.
Compose file location note
The compose file lives outside the agent-data-repo git tree at /opt/incomex/docker/docker-compose.yml. Per prompt §9, the compose change was not forced into the agent-data-repo. Backup retained at /tmp/p3d-vector-hardening-backup-20260511-050538/docker-compose.yml.before.
docker-compose.ymlmd5 after:93c004b03d59140c1570a8df863b8055docker-compose.yml.beforemd5:581b85067b4b19cc7fcd96b6d350dfba
§3. Canary — /opt/incomex/dot/bin/dot-search-canary
Script behavior
- Resolves
AGENT_DATA_API_KEYfrom env →/opt/incomex/docker/.env→ container env (silent fallback chain). - Auto-detects endpoint: tries
localhost:8000, then nginx publichttps://vps.incomexsaigoncorp.vn/api, thendocker exectunnel. - Calls
POST /mcp/tools/search_knowledgewith{"query": ..., "limit": 10}. - Parses
result.context[](real response shape, captured from live API). - For each test, scans top results for expected slug substring in
document_id. - Emits one line per test (PASS/FAIL + rank + matched doc_id). Exit non-zero on any FAIL.
T1–T8 results (both before and after --force-recreate)
| # | Query | Expected slug | Max rank | Pre-rebuild | Post-recreate |
|---|---|---|---|---|---|
| T1 | GPT Review P3D Step 1 Re-authored Spec Pack 1 Directive | gpt-review-p3d-step1-reauthored-spec-and-pack1-directive |
≤2 | rank 1 PASS | rank 1 PASS |
| T2 | p3d-pack1-readonly-inventory-prompt revision 2 | p3d-pack1-readonly-inventory-prompt |
=1 | rank 1 PASS | rank 1 PASS |
| T3 | gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly | gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly |
=1 | rank 1 PASS | rank 1 PASS |
| T4 | vector search freshness root cause | p3d-vector-search-freshness-audit |
≤3 | rank 1 PASS | rank 1 PASS |
| T5 | operating rules SSOT | operating-rules |
≤3 | rank 1 PASS | rank 1 PASS |
| T6 | vector search unified contract SSOT | vector-search |
≤3 | rank 1 PASS | rank 1 PASS |
| T7 | P3D information unit text-as-code requirements spec | p3d-information-unit-text-as-code-requirements-spec |
=1 | rank 1 PASS | rank 1 PASS |
| T8 | hiến pháp constitution v4 | constitution |
≤3 | rank 1 PASS | rank 1 PASS |
canary_status=PASS in both runs. Log files:
/tmp/p3d-search-canary-before-rebuild-20260511-050538.log/tmp/p3d-search-canary-after-recreate-20260511-050538.log
Permissions: 755. No mutation of KB or Qdrant performed by the script.
§4. Recency Tie-Break
Audit of available metadata
Qdrant payload metadata keys observed by scrolling 3 sample points: tags, title, chunk_index, total_chunks, sometimes source. No timestamp fields (created_at, updated_at, modified_at, etc.) are stored in the current payload schema.
Implementation
vector_store.py adds:
_TS_KEYS = ("updated_at", "updatedAt", "modified_at", "modifiedAt",
"created_at", "createdAt")
def _extract_timestamp(metadata) -> datetime | None: ...
def _recency_boost(metadata, now) -> tuple[float, str | None]:
ts = _extract_timestamp(metadata)
if ts is None:
return 0.0, None # SKIPPED — no payload mutation, no PG query
age_days = max(0.0, (now - ts).total_seconds() / 86400.0)
if age_days >= 30.0:
return 0.0, "recency_aged_out"
return 0.005 * (1.0 - age_days / 30.0), f"recency_age_{age_days:.1f}d"
In _apply_path_title_boost, recency is applied only when:
SEARCH_RECENCY_TIEBREAKenv flag is on (default true).- The candidate already has a non-zero path/title/tag/dir boost (
boost > 0). - Timestamp metadata is present.
If condition 3 fails for the legacy production_documents collection (current state), recency contributes 0.0 — pure semantic queries are unaffected. No Qdrant mutation. No per-candidate PG query.
Status
recency_tiebreak=SKIPPED_METADATA_UNAVAILABLE
recency_safe_for_semantic=true
The code path is in place; future payload schema (e.g., when ingest writes updated_at into payload, or when IU vector lands) will activate the tiebreak automatically with no further code change.
§5. Audit Warning Cleanup — Code Classification
Change in _run_audit
Added pure code classifier:
def _classify_non_vectorizable(doc_id, body):
if not doc_id: return "empty_document_id"
base = doc_id.rsplit("/", 1)[-1]
if "." not in base: return "directory_like_path"
if base.endswith(".tmpl"): return "template_file"
if body is not None and len(body.strip()) < 10: return "body_too_short"
return None
_run_audit now partitions ghost ids into actionable_ghost_ids vs non_vectorizable[]. Only actionable ghosts set status=needs_cleanup. No DB write. No vector_status update. No Qdrant write. No reindex.
Audit-sync result (auto_heal=false) after recreate
{
"total_documents": 2426,
"total_vectors": 5015,
"ghost_count": 0,
"raw_ghost_count": 5,
"non_vectorizable_count": 5,
"non_vectorizable": [
{"document_id": "", "reason": "empty_document_id"},
{"document_id": "knowledge/current-state/templates/test_empty.md.tmpl", "reason": "template_file"},
{"document_id": "knowledge/dev", "reason": "directory_like_path"},
{"document_id": "knowledge/dev/blueprints", "reason": "directory_like_path"},
{"document_id": "knowledge/dev/laws", "reason": "directory_like_path"}
],
"orphan_count": 0,
"status": "clean",
"recommendations": []
}
All 5 known ghost entries are now classified as non-vectorizable. status=clean. No misleading ghost warning.
/health data_integrity.sync_status
Still reports warning because ratio=2.07 > 2.0 (the chunked-doc ratio threshold). The prompt explicitly preferred classification over raising the threshold; the threshold logic is left untouched in this pack. Flagged for separate future work if desired (raise to 3.0 or compute expected ratio from chunk distribution).
audit_warning_cleanup=IMPLEMENTED
audit_sync_status_after=clean
§6. Unified KB/IU Search Contract
_apply_path_title_boost(query, candidates, top_k, collection_name=None) now accepts an optional collection_name parameter. QdrantVectorStore.search passes self.collection (default production_documents). Same rerank logic will serve future IU vector collection — no fork required.
The candidate dict gains _collection for downstream specialization without behavior change.
Future IU vector payload schema (documented but not implemented here): unit_id, canonical_address, unit_version_id, content_hash. When the IU collection is created and payload includes timestamps, the recency tiebreak (§4) will activate automatically.
unified_search_contract_ready=true
§7. Validation Before Rebuild
python3 -m py_compile vector_store.py server.py → OK
docker compose config --quiet → OK
git diff --stat
agent_data/server.py | 59 +++++++ (+59 / −3)
agent_data/vector_store.py | 85 +++++++ (+82 / −3)
§8. Post-Recreate Qdrant Verification
| Before | After | |
|---|---|---|
points_count |
5015 | 5015 |
status |
green | green |
indexed_vectors_count |
— | matches |
Delta = 0. No Qdrant mutation performed.
§9. Git Commit
[main eaf2140] P3D: harden vector search rerank — recency tiebreak + audit classify + collection_name
2 files changed, 134 insertions(+), 10 deletions(-)
Parent: ff2fc25 (P3D vector search: app-layer path/title boost rerank).
Repo: /opt/incomex/docker/agent-data-repo. Compose change recorded separately (lives outside repo).
§10. Acceptance Criteria
| AC | Criterion | Result |
|---|---|---|
| AC-1 | docker compose build agent-data succeeds from host repo |
PASS |
| AC-2 | docker compose up -d --force-recreate agent-data → boost still works |
PASS (canary 8/8) |
| AC-3 | Canary script runs T1–T8, all PASS | PASS |
| AC-4 | Recency tie-break behavior verified | n/a — SKIPPED_METADATA_UNAVAILABLE (code path present, gated) |
| AC-5 | Pure semantic queries unaffected | PASS (T4/T5/T6/T8 unchanged; recency contributes 0 when boost=0 by design) |
| AC-6 | Audit sync_status meaningful / not noisy |
PASS (audit-sync status=clean; ghost_count=0; raw_ghost=5 classified) |
| AC-7 | Ghost count excludes empty/folder/short docs | PASS (all 5 classified) |
| AC-8 | Qdrant point count unchanged | PASS (5015 = 5015) |
| AC-9 | Health = healthy | PASS |
| AC-10 | _apply_path_title_boost accepts collection_name |
PASS |
§11. Rollback (not executed)
If regression: restore from /tmp/p3d-vector-hardening-backup-20260511-050538/:
cp $BACKUP_DIR/vector_store.py.before /opt/incomex/docker/agent-data-repo/agent_data/vector_store.py
cp $BACKUP_DIR/server.py.before /opt/incomex/docker/agent-data-repo/agent_data/server.py
cp $BACKUP_DIR/docker-compose.yml.before /opt/incomex/docker/docker-compose.yml
rm -f /opt/incomex/dot/bin/dot-search-canary
cd /opt/incomex/docker
docker compose build agent-data
docker compose up -d --force-recreate agent-data
§12. Warnings / Deferred Items
data_integrity.sync_status=warningat/healthis unchanged becauseratio=2.07 > 2.0. Per prompt §4.4, we preferred classification over threshold change. Audit-sync now clean; the/healthratio-warning is informational only.- Recency activation depends on payload schema. Today's payload lacks timestamps. To turn on the tiebreak materially, future upsert paths (or IU vector ingest) should write
updated_atinto the payload. No retro upsert performed in this pack. - Cron canary not installed in this pack (design §D notes it as optional after GPT approve). Script is ready; can be added to crontab in a future change.
- Compose file is outside agent-data-repo — change recorded via backup + md5 checksums in §2 instead of git commit.
§13. Evidence Index
- Preflight + run log:
/tmp/p3d-vector-hardening-20260511-050538.log(terminal capture) - Backup:
/tmp/p3d-vector-hardening-backup-20260511-050538/ - Pre-rebuild canary log:
/tmp/p3d-search-canary-before-rebuild-20260511-050538.log - Post-recreate canary log:
/tmp/p3d-search-canary-after-recreate-20260511-050538.log - Qdrant after:
/tmp/p3d-qdrant-after-20260511-050538.txt(points_count=5015 status=green) - Audit after:
/tmp/p3d-audit-after-20260511-050538.json - Git commit:
eaf2140onmainin/opt/incomex/docker/agent-data-repo