P3D — Vector Search Reliability Hardening Design
P3D — Vector Search Reliability Hardening Design
Date: 2026-05-11 Author: Opus 4.7 Directive: gpt-directive-opus-vector-search-reliability-hardening-pack-2026-05-11.md Mode: DESIGN ONLY
§A. Current Accepted PASS State
Search boost implementation PASS: T1 rank 5→1, T2 rank 3→1, T3/T4 no regression, T5/T6 semantic preserved. Code: vector_store.py +146 lines, single _apply_path_title_boost() function. Feature flag SEARCH_RERANK_ENABLED defaults true. Git commit ff2fc25 on host repo.
§B. Remaining Production Risks
| # | Risk | Severity | Current state |
|---|---|---|---|
| 1 | Container recreate loses patch | HIGH | docker cp hotfix in writable layer; compose uses baked image without build: |
| 2 | No regression test after deploy | MEDIUM | Manual T1-T6 only; no automated canary |
| 3 | No recency tie-break | LOW | Two docs with same boost → older may win |
| 4 | Noisy audit warnings | LOW | 5 ghost entries (empty/folder/short); sync_status=warning always |
| 5 | IU vector will need same rerank | FUTURE | Contract documented but not enforced in code |
§C. Deployment Durability Plan
Root cause: compose file has image: agent-data-local:latest but NO build: directive. Image was built manually (2026-04-17) and tagged. Code changes on host don't propagate to image unless manually rebuilt.
Fix: Add build: to compose service definition alongside image::
agent-data:
image: agent-data-local:latest
build:
context: ./agent-data-repo
dockerfile: Dockerfile
# ... rest unchanged
With this change:
docker compose build agent-data→ rebuilds image from source includingff2fc25docker compose up -d→ uses rebuilt image- Container recreate → uses image with patch baked in
- Existing
docker compose restart→ still works (same container)
Preflight: Verify Dockerfile exists in agent-data-repo/ and COPY includes agent_data/ directory.
Rollback: Remove build: line from compose → back to manual image management.
§D. Canary/Regression Test Plan
Create /opt/incomex/dot/bin/dot-search-canary — a bash script that:
- Runs 8 search queries via Agent Data HTTP API (
POST /mcpwith search_knowledge tool) - Parses JSON response to extract rank of expected target
- Reports PASS/FAIL per query + overall verdict
- Logs to
/var/log/incomex/dot-search-canary.log
Test cases (8 queries):
| # | Query | Expected doc (basename) | Max acceptable rank |
|---|---|---|---|
| T1 | GPT Review P3D Step 1 Re-authored Spec Pack 1 Directive | gpt-review-p3d-step1-reauthored-spec-and-pack1-directive | ≤2 |
| T2 | p3d-pack1-readonly-inventory-prompt revision 2 | p3d-pack1-readonly-inventory-prompt | =1 |
| T3 | gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly | gpt-directive-agent-run-step1-checkpoint-and-pack1-inventory-readonly | =1 |
| T4 | vector search freshness root cause | p3d-vector-search-freshness-audit-report | ≤3 |
| T5 | operating rules SSOT | operating-rules | ≤3 |
| T6 | vector search unified contract SSOT | p3d-vector-search-unified-contract-ssot | ≤2 |
| T7 | P3D information unit text-as-code requirements spec | p3d-information-unit-text-as-code-requirements-spec | =1 |
| T8 | hiến pháp constitution v4 | constitution | ≤2 |
Cron integration (optional, after GPT approve): add to daily cron alongside dot-vector-audit:
35 4 * * * /opt/incomex/dot/bin/dot-search-canary >> /var/log/incomex/dot-search-canary.log 2>&1
§E. Recency Tie-Break Design
Principle: Only break ties, never override semantic or boost ranking.
Implementation: In _apply_path_title_boost(), after computing final_score = original_score + boost:
# Only when boost > 0 (path/title match exists):
if boost > 0 and metadata.get("updated_at"):
# Tiny recency nudge: max +0.005 for docs updated in last 24h
# Decays to 0 over 30 days
age_days = (now - updated_at).days
recency = max(0, 0.005 * (1 - age_days / 30))
final_score += recency
Key constraints:
- Max recency boost = +0.005 (negligible vs semantic scores ~0.5-0.7)
- Only applies when boost > 0 (path/title already matched) — pure semantic queries unaffected
- Decays to 0 after 30 days → no perpetual bias
- Feature flag:
SEARCH_RECENCY_TIEBREAK=true(separate from rerank flag)
§F. Audit Warning Cleanup Design
Current state: sync_status=warning caused by:
ratio > 2.0threshold (normal for chunked docs)- 5 ghost entries:
""(empty-id),knowledge/dev,knowledge/dev/blueprints,knowledge/dev/laws,test_empty.md.tmpl - 1 pending row with 2-char body
Proposed fix (code patch in server.py audit logic):
- Exclude folder/empty docs from ghost count: if
document_idis empty, a directory path (no file extension), or body length < 10 → exclude from ghost list - Adjust ratio threshold: ratio > 2.0 is normal for chunked collections. Raise to 3.0, or better: compute expected ratio from actual chunk distribution and flag only significant deviation
- Mark test template as excluded: set
vector_status='excluded'fortest_empty.md.tmpl(2-char body, intentionally not vectorizable)
No Qdrant mutation. No reindex. No auto-heal. Only audit reporting logic + 1 PG update (test template status).
§G. Unified KB/IU Search Contract
Already documented in design/p3d-vector-search-unified-contract-ssot.md (2026-05-11). This hardening pack formalizes in code:
_apply_path_title_boost()acceptscollection_nameparameter (defaultproduction_documents)- Same function serves future IU collection — no fork
- IU payload adds:
logical_unit_id,unit_version_id,unit_kind,canonical_address,content_hash - IU chunking never crosses unit/version boundary (enforced at chunk time, not search time)
- Canary test set is extensible — add IU test cases when IU vector exists
Not implemented now: IU vector collection, IU chunking, IU payload schema. Only code structure prepared.
§H. Non-Goals
- ❌ IU vector implementation
- ❌ Qdrant collection replacement
- ❌ Bulk reindex
- ❌ TAC/IU migration
- ❌ Pack 1 resume
- ❌ New embedding model
- ❌ BM25/ElasticSearch
- ❌ Full rewrite of Agent Data search
§I. Rollback Plan
| Component | Rollback |
|---|---|
Compose build: |
Remove build: line → back to manual image |
| Recency tie-break | Set SEARCH_RECENCY_TIEBREAK=false or revert code |
| Audit cleanup | Revert audit logic; set test template back to pending |
| Canary script | Delete script from /opt/incomex/dot/bin/ |
| Image rebuild | docker compose build from backup branch |
§J. Acceptance Criteria
| # | Criterion | Verification |
|---|---|---|
| AC-1 | docker compose build agent-data succeeds from host repo |
Build log |
| AC-2 | docker compose up -d --force-recreate agent-data → search boost still works |
Run canary after recreate |
| AC-3 | Canary script runs T1-T8, all PASS | Script output |
| AC-4 | Recency tie-break: 2 docs same boost → newer ranks first | Construct test case |
| AC-5 | Recency tie-break: pure semantic query → no recency effect | T4/T5 unchanged |
| AC-6 | Audit sync_status = healthy or meaningful warning (not noisy) |
/kb/audit-sync |
| AC-7 | Ghost count excludes empty/folder/short docs | Audit output |
| AC-8 | Qdrant point count unchanged | Before = after |
| AC-9 | Health = healthy | /health |
| AC-10 | _apply_path_title_boost() accepts collection_name param |
Code inspect |
Hardening Design | 2026-05-11 | Opus 4.7 | DESIGN ONLY