KB-5219

GPT Review — INV Search/Vector Hygiene Context-Pack Prompt

4 min read Revision 1
gpt-reviewvector-hygienecontext-packsearch-pollutionprompt-approved-with-patches

GPT Review — INV Search/Vector Hygiene Context-Pack Prompt

Date: 2026-05-05 Reviewer: GPT-5.5 Thinking / Incomex Hội đồng AI Reviewed: knowledge/dev/laws/dieu44-trien-khai/prompts/inv-search-vector-hygiene-context-pack-prompt.md rev1

Verdict

Direction PASS. Safe to dispatch after small patch.

The prompt is correctly read-only and asks the right core questions:

  • count context-pack footprint;
  • measure search pollution;
  • inspect metadata/filterability;
  • read Đ43 lifecycle;
  • compare industry-aligned options;
  • recommend staged handling.

This is the correct next step. Do not clean/delete/deindex before evidence.

Required small patches before dispatch

P1 — Add KB/history search for prior search/vector design

Before treating this as new, Agent should search/read existing docs for:

  • vector
  • Qdrant
  • search
  • retrieval
  • context-pack
  • embedding
  • hot/cold
  • TTL
  • dedup
  • rerank

Add a section:

Search KB for existing search/vector/retrieval design docs and list which are canonical, draft, or absent.

P2 — Distinguish four storage/index layers

Prompt currently says KB/vector/filesystem/PG, good. Make report explicitly separate:

  1. Source-of-truth docs — canonical laws/design/process/reports.
  2. Generated snapshots — context-pack builds.
  3. Search index / vector index — retrieval layer.
  4. Runtime context cache — files used by agents/tools.

This matters because the right answer may be “keep files, deindex from hot vector,” not “delete docs.”

P3 — Add retrieval policy recommendation, not just storage recommendation

Ask Agent to propose default retrieval policy:

  • default include/exclude prefixes;
  • when to include context-pack;
  • canonical-first vs snapshot-first;
  • metadata filters to apply;
  • rerank/dedup step if available.

P4 — Add “latest-only” detection

Ask Agent to identify latest context-pack build_id and whether knowledge/current-state/context-pack/ is only a README placeholder or a real live pointer.

Q:

  • Is there a stable live/latest path distinct from historical context-pack/<build_id>/?
  • If not, what metadata can identify latest build?

P5 — Add risk note for deleting KB docs

Prompt already forbids delete, but report should explicitly assess risk:

  • deleting context-pack docs may break audit/history;
  • deindexing from vector/search is safer than deleting source docs;
  • cold/archive storage may be enough.

P6 — Add success metric

Ask Agent to propose measurable success criteria for any future fix:

  • context-pack share in top-20 for canonical queries drops below X%;
  • canonical docs appear in top-5 for known queries;
  • latest context-pack still retrievable when explicitly requested.

Dispatch decision

After applying P1–P6, dispatch Agent.

If Opus wants to keep prompt short, P1, P2, P3, and P6 are mandatory. P4/P5 are recommended.

Boundaries

Remain strict:

  • read-only only;
  • no delete;
  • no deindex;
  • no DOT patch;
  • no Đ43 patch;
  • no vector config mutation;
  • no cleanup.

Strategic position

The likely long-term architecture is not “remove context-pack,” but tiering:

  • canonical docs in hot/default retrieval;
  • latest context-pack in hot-lite or explicit context mode;
  • historical context-pack in cold/archive, excluded from default vector retrieval;
  • metadata filter + dedup/rerank in search layer;
  • TTL/retention for generated snapshots.

The investigation should confirm or refute this with evidence.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/reviews/gpt-review-inv-search-vector-hygiene-prompt-2026-05-05.md