KB-22BF

P3D — Vector Search Reliability Hardening Design

8 min read Revision 1
p3dvector-searchhardeningdesigncanarydurabilityrecencyaudit

P3D — Vector Search Reliability Hardening Design

Date: 2026-05-11 Author: Opus 4.7 Directive: gpt-directive-opus-vector-search-reliability-hardening-pack-2026-05-11.md Mode: DESIGN ONLY


§A. Current Accepted PASS State

Search boost implementation PASS: T1 rank 5→1, T2 rank 3→1, T3/T4 no regression, T5/T6 semantic preserved. Code: vector_store.py +146 lines, single _apply_path_title_boost() function. Feature flag SEARCH_RERANK_ENABLED defaults true. Git commit ff2fc25 on host repo.

§B. Remaining Production Risks

# Risk Severity Current state
1 Container recreate loses patch HIGH docker cp hotfix in writable layer; compose uses baked image without build:
2 No regression test after deploy MEDIUM Manual T1-T6 only; no automated canary
3 No recency tie-break LOW Two docs with same boost → older may win
4 Noisy audit warnings LOW 5 ghost entries (empty/folder/short); sync_status=warning always
5 IU vector will need same rerank FUTURE Contract documented but not enforced in code

§C. Deployment Durability Plan

Root cause: compose file has image: agent-data-local:latest but NO build: directive. Image was built manually (2026-04-17) and tagged. Code changes on host don't propagate to image unless manually rebuilt.

Fix: Add build: to compose service definition alongside image::

agent-data:
  image: agent-data-local:latest
  build:
    context: ./agent-data-repo
    dockerfile: Dockerfile
  # ... rest unchanged

With this change:

  • docker compose build agent-data → rebuilds image from source including ff2fc25
  • docker compose up -d → uses rebuilt image
  • Container recreate → uses image with patch baked in
  • Existing docker compose restart → still works (same container)

Preflight: Verify Dockerfile exists in agent-data-repo/ and COPY includes agent_data/ directory.

Rollback: Remove build: line from compose → back to manual image management.

§D. Canary/Regression Test Plan

Create /opt/incomex/dot/bin/dot-search-canary — a bash script that:

  1. Runs 8 search queries via Agent Data HTTP API (POST /mcp with search_knowledge tool)
  2. Parses JSON response to extract rank of expected target
  3. Reports PASS/FAIL per query + overall verdict
  4. Logs to /var/log/incomex/dot-search-canary.log

Test cases (8 queries):

# Query Expected doc (basename) Max acceptable rank
T1 GPT Review P3D Step 1 Re-authored Spec Pack 1 Directive gpt-review-p3d-step1-reauthored-spec-and-pack1-directive ≤2
T2 p3d-pack1-readonly-inventory-prompt revision 2 p3d-pack1-readonly-inventory-prompt =1
T3 gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly =1
T4 vector search freshness root cause p3d-vector-search-freshness-audit-report ≤3
T5 operating rules SSOT operating-rules ≤3
T6 vector search unified contract SSOT p3d-vector-search-unified-contract-ssot ≤2
T7 P3D information unit text-as-code requirements spec p3d-information-unit-text-as-code-requirements-spec =1
T8 hiến pháp constitution v4 constitution ≤2

Cron integration (optional, after GPT approve): add to daily cron alongside dot-vector-audit:

35 4 * * * /opt/incomex/dot/bin/dot-search-canary >> /var/log/incomex/dot-search-canary.log 2>&1

§E. Recency Tie-Break Design

Principle: Only break ties, never override semantic or boost ranking.

Implementation: In _apply_path_title_boost(), after computing final_score = original_score + boost:

# Only when boost > 0 (path/title match exists):
if boost > 0 and metadata.get("updated_at"):
    # Tiny recency nudge: max +0.005 for docs updated in last 24h
    # Decays to 0 over 30 days
    age_days = (now - updated_at).days
    recency = max(0, 0.005 * (1 - age_days / 30))
    final_score += recency

Key constraints:

  • Max recency boost = +0.005 (negligible vs semantic scores ~0.5-0.7)
  • Only applies when boost > 0 (path/title already matched) — pure semantic queries unaffected
  • Decays to 0 after 30 days → no perpetual bias
  • Feature flag: SEARCH_RECENCY_TIEBREAK=true (separate from rerank flag)

§F. Audit Warning Cleanup Design

Current state: sync_status=warning caused by:

  • ratio > 2.0 threshold (normal for chunked docs)
  • 5 ghost entries: "" (empty-id), knowledge/dev, knowledge/dev/blueprints, knowledge/dev/laws, test_empty.md.tmpl
  • 1 pending row with 2-char body

Proposed fix (code patch in server.py audit logic):

  1. Exclude folder/empty docs from ghost count: if document_id is empty, a directory path (no file extension), or body length < 10 → exclude from ghost list
  2. Adjust ratio threshold: ratio > 2.0 is normal for chunked collections. Raise to 3.0, or better: compute expected ratio from actual chunk distribution and flag only significant deviation
  3. Mark test template as excluded: set vector_status='excluded' for test_empty.md.tmpl (2-char body, intentionally not vectorizable)

No Qdrant mutation. No reindex. No auto-heal. Only audit reporting logic + 1 PG update (test template status).

§G. Unified KB/IU Search Contract

Already documented in design/p3d-vector-search-unified-contract-ssot.md (2026-05-11). This hardening pack formalizes in code:

  1. _apply_path_title_boost() accepts collection_name parameter (default production_documents)
  2. Same function serves future IU collection — no fork
  3. IU payload adds: logical_unit_id, unit_version_id, unit_kind, canonical_address, content_hash
  4. IU chunking never crosses unit/version boundary (enforced at chunk time, not search time)
  5. Canary test set is extensible — add IU test cases when IU vector exists

Not implemented now: IU vector collection, IU chunking, IU payload schema. Only code structure prepared.

§H. Non-Goals

  • ❌ IU vector implementation
  • ❌ Qdrant collection replacement
  • ❌ Bulk reindex
  • ❌ TAC/IU migration
  • ❌ Pack 1 resume
  • ❌ New embedding model
  • ❌ BM25/ElasticSearch
  • ❌ Full rewrite of Agent Data search

§I. Rollback Plan

Component Rollback
Compose build: Remove build: line → back to manual image
Recency tie-break Set SEARCH_RECENCY_TIEBREAK=false or revert code
Audit cleanup Revert audit logic; set test template back to pending
Canary script Delete script from /opt/incomex/dot/bin/
Image rebuild docker compose build from backup branch

§J. Acceptance Criteria

# Criterion Verification
AC-1 docker compose build agent-data succeeds from host repo Build log
AC-2 docker compose up -d --force-recreate agent-data → search boost still works Run canary after recreate
AC-3 Canary script runs T1-T8, all PASS Script output
AC-4 Recency tie-break: 2 docs same boost → newer ranks first Construct test case
AC-5 Recency tie-break: pure semantic query → no recency effect T4/T5 unchanged
AC-6 Audit sync_status = healthy or meaningful warning (not noisy) /kb/audit-sync
AC-7 Ghost count excludes empty/folder/short docs Audit output
AC-8 Qdrant point count unchanged Before = after
AC-9 Health = healthy /health
AC-10 _apply_path_title_boost() accepts collection_name param Code inspect

Hardening Design | 2026-05-11 | Opus 4.7 | DESIGN ONLY

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/design/p3d-vector-search-reliability-hardening-design.md