GPT Review — P3D Vector/Search Freshness Audit + Stop Pack 1
GPT Review — P3D Vector/Search Freshness Audit + Stop Pack 1
Date: 2026-05-11 Reviewer: GPT-5.5 Thinking / Incomex Hội đồng AI Reviewed:
knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md- Opus supplemental investigation on TAC/IU parallel systems
Verdict
STOP P3D Pack 1 continuation temporarily.
The production KB vector/search layer is a foundation dependency for the text-as-code build process. It must be stabilized before continuing further TAC↔IU work.
Main finding
The current observed search problem is not vector freshness and not missing vectors for the recent P3D documents.
Agent evidence:
- Five recent target documents exist in KB/PG.
- All five have
vector_status=ready. - All five have Qdrant points.
- All five are retrievable in top 20.
- Two exact-ish targets are not rank 1 because older semantically similar documents outrank them.
Confirmed root cause:
SEARCH_RANKING_NO_PATH_TITLE_BOOST_NOT_VECTOR_FRESHNESS
Confidence: high.
Is TAC technical debt causing this issue?
Direct cause: No
The current unreliable searchKnowledge behavior is in the legacy KB vector/search layer:
- Qdrant collection:
production_documents. - Legacy payload:
document_id,content,metadata, chunk fields. - Normal search path: Qdrant vector first.
- PG keyword fallback only runs when Qdrant has no hits or errors.
- No path/title/document_id boost in normal Qdrant path.
This exists independently of TAC/IU reconciliation.
Indirect future risk: Yes
Opus is correct that TAC and IU are currently parallel systems:
- TAC has 86 real law units and render proof.
- Native IU has gateway/edit/save machinery and pilot rows.
- Reconciliation is not done yet.
This does not explain today’s exact search-ranking failure, but it is a real architecture debt that will make future vector/search worse if ignored:
- duplicate KB/TAC/IU representations can create duplicate search results;
- legacy vector has no
unit_id,canonical_address, orunit_version_id; - IU vector is not implemented yet;
- without TAC↔IU reconciliation, future unit-level vector cannot be canonical.
Therefore:
- Fix legacy KB search ranking first.
- Then continue TAC↔IU reconciliation.
- Do not implement IU vector yet until canonical unit boundary is settled.
Production risk assessment
Current vector system is operating but not sufficiently reliable for production-grade agent workflow:
- Qdrant/Postgres/OpenAI are OK.
- Recent docs are vectorized.
sync_status=warningstill exists.- Search ranking is noisy for exact path/title/document lookup.
- Health warning is mostly ratio-threshold + noisy ghost accounting, not missing recent P3D docs.
This is acceptable for ad-hoc semantic recall, but not acceptable as a production planning dependency where agents must reliably find exact current reports/prompts.
Required pause
Pause:
- Pack 1 implementation.
- Pack 1 DDL planning beyond read-only inventory.
- IU vector design/implementation.
- Any migration based on search assumptions.
Allowed:
- Read-only Pack 1 inventory if already queued.
- Vector/search fix design and review.
- Production-safe search behavior patch after review.
Next required pack
Create and execute a production-safe pack:
P3D_VECTOR_SEARCH_HYBRID_PATH_TITLE_BOOST_PACK
Goal:
- Make exact path/title/document_id queries deterministic or near-deterministic.
- Preserve legacy Qdrant collection and payload.
- Avoid reindex as first response.
- Avoid touching IU vector.
- Keep changes small, reversible, and tested.
Pack scope recommendation
Design first, then implementation only after GPT/User review.
Required capabilities:
- Exact
document_id/ path slug boost. metadata.titleexact/substring boost.- Optional keyword overlap boost using title + path + body snippet.
- Hybrid rerank over Qdrant results and/or PG keyword candidates.
- Deterministic behavior for exact filenames such as
p3d-pack1-readonly-inventory-prompt.md. - Search output should preserve original Qdrant score and include boost reason/debug fields if possible.
- Audit health warning should not confuse empty/folder/short docs with real missing recent docs.
Hard boundaries
- No Qdrant collection replacement.
- No bulk reindex as first step.
- No auto-heal.
- No deleting/upserting points except if implementation requires normal search-code-only change; prefer no vector writes.
- No IU vector implementation.
- No TAC↔IU migration.
- No schema migration unless explicitly approved in a later pack.
- No production deploy without tests and rollback.
Status
vector_audit_review=PASS_ACCEPTED
root_cause=SEARCH_RANKING_NO_PATH_TITLE_BOOST_NOT_VECTOR_FRESHNESS
pack1_status=PAUSED_FOR_VECTOR_SEARCH_RELIABILITY
next_required_pack=P3D_VECTOR_SEARCH_HYBRID_PATH_TITLE_BOOST_PACK
implementation_allowed=false_until_prompt_review