KB-648F

GPT Review — P3D Vector/Search Freshness Audit + Stop Pack 1

5 min read Revision 1

gpt-reviewvectorsearchfreshnessp3dstop-pack12026-05-11

GPT Review — P3D Vector/Search Freshness Audit + Stop Pack 1

Date: 2026-05-11 Reviewer: GPT-5.5 Thinking / Incomex Hội đồng AI Reviewed:

knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md

Opus supplemental investigation on TAC/IU parallel systems

Verdict

STOP P3D Pack 1 continuation temporarily.

The production KB vector/search layer is a foundation dependency for the text-as-code build process. It must be stabilized before continuing further TAC↔IU work.

Main finding

The current observed search problem is not vector freshness and not missing vectors for the recent P3D documents.

Agent evidence:

Five recent target documents exist in KB/PG.
All five have vector_status=ready.
All five have Qdrant points.
All five are retrievable in top 20.
Two exact-ish targets are not rank 1 because older semantically similar documents outrank them.

Confirmed root cause:

SEARCH_RANKING_NO_PATH_TITLE_BOOST_NOT_VECTOR_FRESHNESS

Confidence: high.

Is TAC technical debt causing this issue?

Direct cause: No

The current unreliable searchKnowledge behavior is in the legacy KB vector/search layer:

Qdrant collection: production_documents.
Legacy payload: document_id, content, metadata, chunk fields.
Normal search path: Qdrant vector first.
PG keyword fallback only runs when Qdrant has no hits or errors.
No path/title/document_id boost in normal Qdrant path.

This exists independently of TAC/IU reconciliation.

Indirect future risk: Yes

Opus is correct that TAC and IU are currently parallel systems:

TAC has 86 real law units and render proof.
Native IU has gateway/edit/save machinery and pilot rows.
Reconciliation is not done yet.

This does not explain today’s exact search-ranking failure, but it is a real architecture debt that will make future vector/search worse if ignored:

duplicate KB/TAC/IU representations can create duplicate search results;
legacy vector has no unit_id, canonical_address, or unit_version_id;
IU vector is not implemented yet;
without TAC↔IU reconciliation, future unit-level vector cannot be canonical.

Therefore:

Fix legacy KB search ranking first.
Then continue TAC↔IU reconciliation.
Do not implement IU vector yet until canonical unit boundary is settled.

Production risk assessment

Current vector system is operating but not sufficiently reliable for production-grade agent workflow:

Qdrant/Postgres/OpenAI are OK.
Recent docs are vectorized.
sync_status=warning still exists.
Search ranking is noisy for exact path/title/document lookup.
Health warning is mostly ratio-threshold + noisy ghost accounting, not missing recent P3D docs.

This is acceptable for ad-hoc semantic recall, but not acceptable as a production planning dependency where agents must reliably find exact current reports/prompts.

Required pause

Pause:

Pack 1 implementation.
Pack 1 DDL planning beyond read-only inventory.
IU vector design/implementation.
Any migration based on search assumptions.

Allowed:

Read-only Pack 1 inventory if already queued.
Vector/search fix design and review.
Production-safe search behavior patch after review.

Next required pack

Create and execute a production-safe pack:

P3D_VECTOR_SEARCH_HYBRID_PATH_TITLE_BOOST_PACK

Goal:

Make exact path/title/document_id queries deterministic or near-deterministic.
Preserve legacy Qdrant collection and payload.
Avoid reindex as first response.
Avoid touching IU vector.
Keep changes small, reversible, and tested.

Pack scope recommendation

Design first, then implementation only after GPT/User review.

Required capabilities:

Exact document_id / path slug boost.
metadata.title exact/substring boost.
Optional keyword overlap boost using title + path + body snippet.
Hybrid rerank over Qdrant results and/or PG keyword candidates.
Deterministic behavior for exact filenames such as p3d-pack1-readonly-inventory-prompt.md.
Search output should preserve original Qdrant score and include boost reason/debug fields if possible.
Audit health warning should not confuse empty/folder/short docs with real missing recent docs.

Hard boundaries

No Qdrant collection replacement.
No bulk reindex as first step.
No auto-heal.
No deleting/upserting points except if implementation requires normal search-code-only change; prefer no vector writes.
No IU vector implementation.
No TAC↔IU migration.
No schema migration unless explicitly approved in a later pack.
No production deploy without tests and rollback.

Status

vector_audit_review=PASS_ACCEPTED
root_cause=SEARCH_RANKING_NO_PATH_TITLE_BOOST_NOT_VECTOR_FRESHNESS
pack1_status=PAUSED_FOR_VECTOR_SEARCH_RELIABILITY
next_required_pack=P3D_VECTOR_SEARCH_HYBRID_PATH_TITLE_BOOST_PACK
implementation_allowed=false_until_prompt_review