KB-2522

P3D — Vector Search Reliability Hardening — Implementation Report

14 min read Revision 1
p3dvector-searchhardeningimplementationreportcanarydurability2026-05-11

P3D — Vector Search Reliability Hardening — Implementation Report

Date: 2026-05-11 Mode: IMPLEMENTATION (compose build durability + canary + recency + audit classification + unified contract) Author: Claude Opus 4.7 Prompt ref: knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-vector-search-reliability-hardening-implementation-prompt.md rev2 Design ref: knowledge/dev/laws/dieu44-trien-khai/design/p3d-vector-search-reliability-hardening-design.md


Status Fields

phase_status=PASS
mode=IMPLEMENTATION
search_boost_behavior=PASS
production_durability=PASS
compose_build_added=true
agent_data_rebuilt=true
agent_data_force_recreated=true
canary_status=PASS
recency_tiebreak=SKIPPED_METADATA_UNAVAILABLE
recency_safe_for_semantic=true
audit_warning_cleanup=IMPLEMENTED
audit_sync_status_after=clean
qdrant_points_before=5015
qdrant_points_after=5015
qdrant_mutation_performed=false
db_write_performed=false
pg_schema_mutation_performed=false
no_reindex_performed=true
rollback_performed=false
health_after=healthy
unified_search_contract_ready=true

§1. Preflight

Field Value
AGENT_DATA incomex-agent-data (was Up 27 min healthy)
QDRANT incomex-qdrant (Up 7 weeks healthy)
COMPOSE_FILE /opt/incomex/docker/docker-compose.yml
REPO_DIR /opt/incomex/docker/agent-data-repo (git clean, head ff2fc25)
Dockerfile present; COPY agent_data/ /app/agent_data/ confirmed
Health before status=healthy, services qdrant/postgres/openai all ok
points_count before 5015 (status green)

Backup directory: /tmp/p3d-vector-hardening-backup-20260511-050538/

  • vector_store.py.before md5 ec3038e4aca71f8a3b10209eb4ec21ff
  • server.py.before md5 ed31c1126b8453dc572a72b5aa1b8ed8
  • docker-compose.yml.before md5 581b85067b4b19cc7fcd96b6d350dfba

§2. Deployment Durability — Compose build: Added

Change

   agent-data:
     image: agent-data-local:latest
+    build:
+      context: ./agent-data-repo
+      dockerfile: Dockerfile
     container_name: incomex-agent-data

Only agent-data touched. No other service, env, network, or volume modified.

Validation

docker compose config --quiet         → OK
diff backup/before → 3 lines added, no other change

Build + Force-recreate

docker compose build agent-data       → OK (~130s; layered base reused; agent_data/ baked from host)
docker compose up -d --force-recreate agent-data → Container recreated
docker inspect → State.Health.Status: healthy (≈65s start-up)

Proof of durability

  • New container image md5s differ from baked-in 2026-04-17 image (new build).
  • Container /app/agent_data/vector_store.py md5 after recreate: 7eb74143f53e6a5fd34a54f1ffbeab38 (matches new build).
  • Container /app/agent_data/server.py md5: 70b8d82460c64993d43170561c998126.
  • Canary PASS after --force-recreate (see §5). Confirms the rerank patch survives container recreate. The earlier docker cp hotfix is now redundant.

Compose file location note

The compose file lives outside the agent-data-repo git tree at /opt/incomex/docker/docker-compose.yml. Per prompt §9, the compose change was not forced into the agent-data-repo. Backup retained at /tmp/p3d-vector-hardening-backup-20260511-050538/docker-compose.yml.before.

  • docker-compose.yml md5 after: 93c004b03d59140c1570a8df863b8055
  • docker-compose.yml.before md5: 581b85067b4b19cc7fcd96b6d350dfba

§3. Canary — /opt/incomex/dot/bin/dot-search-canary

Script behavior

  • Resolves AGENT_DATA_API_KEY from env → /opt/incomex/docker/.env → container env (silent fallback chain).
  • Auto-detects endpoint: tries localhost:8000, then nginx public https://vps.incomexsaigoncorp.vn/api, then docker exec tunnel.
  • Calls POST /mcp/tools/search_knowledge with {"query": ..., "limit": 10}.
  • Parses result.context[] (real response shape, captured from live API).
  • For each test, scans top results for expected slug substring in document_id.
  • Emits one line per test (PASS/FAIL + rank + matched doc_id). Exit non-zero on any FAIL.

T1–T8 results (both before and after --force-recreate)

# Query Expected slug Max rank Pre-rebuild Post-recreate
T1 GPT Review P3D Step 1 Re-authored Spec Pack 1 Directive gpt-review-p3d-step1-reauthored-spec-and-pack1-directive ≤2 rank 1 PASS rank 1 PASS
T2 p3d-pack1-readonly-inventory-prompt revision 2 p3d-pack1-readonly-inventory-prompt =1 rank 1 PASS rank 1 PASS
T3 gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly =1 rank 1 PASS rank 1 PASS
T4 vector search freshness root cause p3d-vector-search-freshness-audit ≤3 rank 1 PASS rank 1 PASS
T5 operating rules SSOT operating-rules ≤3 rank 1 PASS rank 1 PASS
T6 vector search unified contract SSOT vector-search ≤3 rank 1 PASS rank 1 PASS
T7 P3D information unit text-as-code requirements spec p3d-information-unit-text-as-code-requirements-spec =1 rank 1 PASS rank 1 PASS
T8 hiến pháp constitution v4 constitution ≤3 rank 1 PASS rank 1 PASS

canary_status=PASS in both runs. Log files:

  • /tmp/p3d-search-canary-before-rebuild-20260511-050538.log
  • /tmp/p3d-search-canary-after-recreate-20260511-050538.log

Permissions: 755. No mutation of KB or Qdrant performed by the script.


§4. Recency Tie-Break

Audit of available metadata

Qdrant payload metadata keys observed by scrolling 3 sample points: tags, title, chunk_index, total_chunks, sometimes source. No timestamp fields (created_at, updated_at, modified_at, etc.) are stored in the current payload schema.

Implementation

vector_store.py adds:

_TS_KEYS = ("updated_at", "updatedAt", "modified_at", "modifiedAt",
            "created_at", "createdAt")

def _extract_timestamp(metadata) -> datetime | None: ...
def _recency_boost(metadata, now) -> tuple[float, str | None]:
    ts = _extract_timestamp(metadata)
    if ts is None:
        return 0.0, None        # SKIPPED — no payload mutation, no PG query
    age_days = max(0.0, (now - ts).total_seconds() / 86400.0)
    if age_days >= 30.0:
        return 0.0, "recency_aged_out"
    return 0.005 * (1.0 - age_days / 30.0), f"recency_age_{age_days:.1f}d"

In _apply_path_title_boost, recency is applied only when:

  1. SEARCH_RECENCY_TIEBREAK env flag is on (default true).
  2. The candidate already has a non-zero path/title/tag/dir boost (boost > 0).
  3. Timestamp metadata is present.

If condition 3 fails for the legacy production_documents collection (current state), recency contributes 0.0 — pure semantic queries are unaffected. No Qdrant mutation. No per-candidate PG query.

Status

recency_tiebreak=SKIPPED_METADATA_UNAVAILABLE
recency_safe_for_semantic=true

The code path is in place; future payload schema (e.g., when ingest writes updated_at into payload, or when IU vector lands) will activate the tiebreak automatically with no further code change.


§5. Audit Warning Cleanup — Code Classification

Change in _run_audit

Added pure code classifier:

def _classify_non_vectorizable(doc_id, body):
    if not doc_id: return "empty_document_id"
    base = doc_id.rsplit("/", 1)[-1]
    if "." not in base: return "directory_like_path"
    if base.endswith(".tmpl"): return "template_file"
    if body is not None and len(body.strip()) < 10: return "body_too_short"
    return None

_run_audit now partitions ghost ids into actionable_ghost_ids vs non_vectorizable[]. Only actionable ghosts set status=needs_cleanup. No DB write. No vector_status update. No Qdrant write. No reindex.

Audit-sync result (auto_heal=false) after recreate

{
  "total_documents": 2426,
  "total_vectors": 5015,
  "ghost_count": 0,
  "raw_ghost_count": 5,
  "non_vectorizable_count": 5,
  "non_vectorizable": [
    {"document_id": "", "reason": "empty_document_id"},
    {"document_id": "knowledge/current-state/templates/test_empty.md.tmpl", "reason": "template_file"},
    {"document_id": "knowledge/dev", "reason": "directory_like_path"},
    {"document_id": "knowledge/dev/blueprints", "reason": "directory_like_path"},
    {"document_id": "knowledge/dev/laws", "reason": "directory_like_path"}
  ],
  "orphan_count": 0,
  "status": "clean",
  "recommendations": []
}

All 5 known ghost entries are now classified as non-vectorizable. status=clean. No misleading ghost warning.

/health data_integrity.sync_status

Still reports warning because ratio=2.07 > 2.0 (the chunked-doc ratio threshold). The prompt explicitly preferred classification over raising the threshold; the threshold logic is left untouched in this pack. Flagged for separate future work if desired (raise to 3.0 or compute expected ratio from chunk distribution).

audit_warning_cleanup=IMPLEMENTED
audit_sync_status_after=clean

§6. Unified KB/IU Search Contract

_apply_path_title_boost(query, candidates, top_k, collection_name=None) now accepts an optional collection_name parameter. QdrantVectorStore.search passes self.collection (default production_documents). Same rerank logic will serve future IU vector collection — no fork required.

The candidate dict gains _collection for downstream specialization without behavior change.

Future IU vector payload schema (documented but not implemented here): unit_id, canonical_address, unit_version_id, content_hash. When the IU collection is created and payload includes timestamps, the recency tiebreak (§4) will activate automatically.

unified_search_contract_ready=true

§7. Validation Before Rebuild

python3 -m py_compile vector_store.py server.py  → OK
docker compose config --quiet                    → OK
git diff --stat
  agent_data/server.py       | 59 +++++++  (+59 / −3)
  agent_data/vector_store.py | 85 +++++++  (+82 / −3)

§8. Post-Recreate Qdrant Verification

Before After
points_count 5015 5015
status green green
indexed_vectors_count matches

Delta = 0. No Qdrant mutation performed.


§9. Git Commit

[main eaf2140] P3D: harden vector search rerank — recency tiebreak + audit classify + collection_name
 2 files changed, 134 insertions(+), 10 deletions(-)

Parent: ff2fc25 (P3D vector search: app-layer path/title boost rerank).

Repo: /opt/incomex/docker/agent-data-repo. Compose change recorded separately (lives outside repo).


§10. Acceptance Criteria

AC Criterion Result
AC-1 docker compose build agent-data succeeds from host repo PASS
AC-2 docker compose up -d --force-recreate agent-data → boost still works PASS (canary 8/8)
AC-3 Canary script runs T1–T8, all PASS PASS
AC-4 Recency tie-break behavior verified n/a — SKIPPED_METADATA_UNAVAILABLE (code path present, gated)
AC-5 Pure semantic queries unaffected PASS (T4/T5/T6/T8 unchanged; recency contributes 0 when boost=0 by design)
AC-6 Audit sync_status meaningful / not noisy PASS (audit-sync status=clean; ghost_count=0; raw_ghost=5 classified)
AC-7 Ghost count excludes empty/folder/short docs PASS (all 5 classified)
AC-8 Qdrant point count unchanged PASS (5015 = 5015)
AC-9 Health = healthy PASS
AC-10 _apply_path_title_boost accepts collection_name PASS

§11. Rollback (not executed)

If regression: restore from /tmp/p3d-vector-hardening-backup-20260511-050538/:

cp $BACKUP_DIR/vector_store.py.before  /opt/incomex/docker/agent-data-repo/agent_data/vector_store.py
cp $BACKUP_DIR/server.py.before        /opt/incomex/docker/agent-data-repo/agent_data/server.py
cp $BACKUP_DIR/docker-compose.yml.before /opt/incomex/docker/docker-compose.yml
rm -f /opt/incomex/dot/bin/dot-search-canary
cd /opt/incomex/docker
docker compose build agent-data
docker compose up -d --force-recreate agent-data

§12. Warnings / Deferred Items

  1. data_integrity.sync_status=warning at /health is unchanged because ratio=2.07 > 2.0. Per prompt §4.4, we preferred classification over threshold change. Audit-sync now clean; the /health ratio-warning is informational only.
  2. Recency activation depends on payload schema. Today's payload lacks timestamps. To turn on the tiebreak materially, future upsert paths (or IU vector ingest) should write updated_at into the payload. No retro upsert performed in this pack.
  3. Cron canary not installed in this pack (design §D notes it as optional after GPT approve). Script is ready; can be added to crontab in a future change.
  4. Compose file is outside agent-data-repo — change recorded via backup + md5 checksums in §2 instead of git commit.

§13. Evidence Index

  • Preflight + run log: /tmp/p3d-vector-hardening-20260511-050538.log (terminal capture)
  • Backup: /tmp/p3d-vector-hardening-backup-20260511-050538/
  • Pre-rebuild canary log: /tmp/p3d-search-canary-before-rebuild-20260511-050538.log
  • Post-recreate canary log: /tmp/p3d-search-canary-after-recreate-20260511-050538.log
  • Qdrant after: /tmp/p3d-qdrant-after-20260511-050538.txt (points_count=5015 status=green)
  • Audit after: /tmp/p3d-audit-after-20260511-050538.json
  • Git commit: eaf2140 on main in /opt/incomex/docker/agent-data-repo
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-reliability-hardening-implementation-report.md