KB-648F

GPT Review — P3D Vector/Search Freshness Audit + Stop Pack 1

5 min read Revision 1
gpt-reviewvectorsearchfreshnessp3dstop-pack12026-05-11

GPT Review — P3D Vector/Search Freshness Audit + Stop Pack 1

Date: 2026-05-11 Reviewer: GPT-5.5 Thinking / Incomex Hội đồng AI Reviewed:

  • knowledge/dev/laws/dieu44-trien-khai/reports/p3d-vector-search-freshness-audit-report.md
  • Opus supplemental investigation on TAC/IU parallel systems

Verdict

STOP P3D Pack 1 continuation temporarily.

The production KB vector/search layer is a foundation dependency for the text-as-code build process. It must be stabilized before continuing further TAC↔IU work.

Main finding

The current observed search problem is not vector freshness and not missing vectors for the recent P3D documents.

Agent evidence:

  • Five recent target documents exist in KB/PG.
  • All five have vector_status=ready.
  • All five have Qdrant points.
  • All five are retrievable in top 20.
  • Two exact-ish targets are not rank 1 because older semantically similar documents outrank them.

Confirmed root cause:

SEARCH_RANKING_NO_PATH_TITLE_BOOST_NOT_VECTOR_FRESHNESS

Confidence: high.

Is TAC technical debt causing this issue?

Direct cause: No

The current unreliable searchKnowledge behavior is in the legacy KB vector/search layer:

  • Qdrant collection: production_documents.
  • Legacy payload: document_id, content, metadata, chunk fields.
  • Normal search path: Qdrant vector first.
  • PG keyword fallback only runs when Qdrant has no hits or errors.
  • No path/title/document_id boost in normal Qdrant path.

This exists independently of TAC/IU reconciliation.

Indirect future risk: Yes

Opus is correct that TAC and IU are currently parallel systems:

  • TAC has 86 real law units and render proof.
  • Native IU has gateway/edit/save machinery and pilot rows.
  • Reconciliation is not done yet.

This does not explain today’s exact search-ranking failure, but it is a real architecture debt that will make future vector/search worse if ignored:

  • duplicate KB/TAC/IU representations can create duplicate search results;
  • legacy vector has no unit_id, canonical_address, or unit_version_id;
  • IU vector is not implemented yet;
  • without TAC↔IU reconciliation, future unit-level vector cannot be canonical.

Therefore:

  • Fix legacy KB search ranking first.
  • Then continue TAC↔IU reconciliation.
  • Do not implement IU vector yet until canonical unit boundary is settled.

Production risk assessment

Current vector system is operating but not sufficiently reliable for production-grade agent workflow:

  • Qdrant/Postgres/OpenAI are OK.
  • Recent docs are vectorized.
  • sync_status=warning still exists.
  • Search ranking is noisy for exact path/title/document lookup.
  • Health warning is mostly ratio-threshold + noisy ghost accounting, not missing recent P3D docs.

This is acceptable for ad-hoc semantic recall, but not acceptable as a production planning dependency where agents must reliably find exact current reports/prompts.

Required pause

Pause:

  • Pack 1 implementation.
  • Pack 1 DDL planning beyond read-only inventory.
  • IU vector design/implementation.
  • Any migration based on search assumptions.

Allowed:

  • Read-only Pack 1 inventory if already queued.
  • Vector/search fix design and review.
  • Production-safe search behavior patch after review.

Next required pack

Create and execute a production-safe pack:

P3D_VECTOR_SEARCH_HYBRID_PATH_TITLE_BOOST_PACK

Goal:

  • Make exact path/title/document_id queries deterministic or near-deterministic.
  • Preserve legacy Qdrant collection and payload.
  • Avoid reindex as first response.
  • Avoid touching IU vector.
  • Keep changes small, reversible, and tested.

Pack scope recommendation

Design first, then implementation only after GPT/User review.

Required capabilities:

  1. Exact document_id / path slug boost.
  2. metadata.title exact/substring boost.
  3. Optional keyword overlap boost using title + path + body snippet.
  4. Hybrid rerank over Qdrant results and/or PG keyword candidates.
  5. Deterministic behavior for exact filenames such as p3d-pack1-readonly-inventory-prompt.md.
  6. Search output should preserve original Qdrant score and include boost reason/debug fields if possible.
  7. Audit health warning should not confuse empty/folder/short docs with real missing recent docs.

Hard boundaries

  • No Qdrant collection replacement.
  • No bulk reindex as first step.
  • No auto-heal.
  • No deleting/upserting points except if implementation requires normal search-code-only change; prefer no vector writes.
  • No IU vector implementation.
  • No TAC↔IU migration.
  • No schema migration unless explicitly approved in a later pack.
  • No production deploy without tests and rollback.

Status

vector_audit_review=PASS_ACCEPTED
root_cause=SEARCH_RANKING_NO_PATH_TITLE_BOOST_NOT_VECTOR_FRESHNESS
pack1_status=PAUSED_FOR_VECTOR_SEARCH_RELIABILITY
next_required_pack=P3D_VECTOR_SEARCH_HYBRID_PATH_TITLE_BOOST_PACK
implementation_allowed=false_until_prompt_review
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/reviews/gpt-review-p3d-vector-search-freshness-audit-and-stop-pack1-2026-05-11.md