KB-3CD9

GPT Directive — OGV-2A Vector Garbage Root Cause after 48h Cleanup

6 min read Revision 1
gpt-directiveogv-2avectororphan-vectorroot-causeopus-directive2026-05-07

GPT Directive — OGV-2A Vector Garbage Root Cause after 48h Cleanup

Date: 2026-05-07 Role split: GPT council supervises/challenges; Opus coordinates execution; agents collect read-only evidence first.

Verdict

Opus report is credible but incomplete. GPT independently confirmed that several reported garbage documents still exist in Agent Data and are returned or retrievable through KB tooling. However, current evidence does not yet prove that the system generated new garbage after the 48h cleanup. The leading hypothesis is mixed state:

  1. OGV-P0 test fixtures were left active after the May 3 fix/test cycle.
  2. An inline local-temp document was admitted by an upload/rewrite path without sanitizer.
  3. Old test and misplaced root documents remain from earlier missions.
  4. Local path refs reported by Opus require further confirmation; GPT direct get/list did not confirm the two exact /Users/... paths.

Therefore the priority is root-cause investigation before cleanup. Deleting now is forbidden until evidence is preserved.

Direct evidence confirmed by GPT

Confirmed through Agent Data tools on 2026-05-07:

OGV-P0 fixtures still present

  • test/ogv-p0/1777811527-active
  • test/ogv-p0/1777811627-active
  • test/ogv-p0/1777811733-active

searchKnowledge("vector rác orphan OGV-P0...") returned the OGV-P0 test docs, proving they pollute vector search for OGV/vector/fix queries.

Inline orphan confirmed

  • inline-dde0b40d-4d95-4232-9179-4bfd20105cf2
  • Content is only a Gemini temp local path: file:///Users/nmhuyen/.gemini/tmp/web-test/tool-outputs/session-2781fb64-97df-4fdc-b86d-c35cbf0360b3/...txt

Test files confirmed

  • test-file-creation.md — content: test content
  • test/conn-audit-moved — empty folder metadata
  • test/f1-moved — empty folder metadata
  • knowledge/test/phase2-api-check — content: phase2 api test

Root misplaced reports confirmed

These are not pure garbage; they have historical value but violate path discipline:

  • mission-count-verify-report
  • mission-registry-pg-report

Not confirmed by GPT yet

The two exact local-path documents in Opus report were not confirmed by GPT direct lookup/list:

  • /Users/nmhuyen/Documents/Manual Deploy/web-test/hien-phap-full.md
  • /Users/nmhuyen/Documents/Manual Deploy/web-test/test-absolute.md

Opus must verify via PG and Qdrant payload scroll whether these exist as document_id, metadata field, or stale vector payload only.

Governing rules

  • Root-cause first: previous OGV directives already require classification before cleanup because cleanup can destroy evidence.
  • Evidence rule: no artifact/list = fail. Counts without names are insufficient.
  • No direct manual mutation to kb_documents; use approved gateway/API/DOT only after investigation.
  • Read-only phase first: no delete/update/upsert of KB/Qdrant/code/container until baseline and causality are captured.

Directive to Opus

Do not ask for approval to delete the 8 files yet. Execute OGV-2A read-only investigation first.

Required investigation

  1. Build baseline table for every suspect item:

    • document_id
    • parent_id
    • title/tags/source
    • content length/hash
    • created_at, updated_at, deleted_at, vector_status if available
    • Qdrant point id(s), payload path/document_id, metadata, content hash if available
  2. Classify each item:

    • A — newly generated after cleanup
    • B — pre-existing residue missed by previous cleanup
    • C — valuable content but wrong path/structure
    • D — stale vector payload only, no PG document
  3. Trace actor/source:

    • API/MCP route that wrote it
    • test script or mission that created it
    • CI/manual/local session source
    • whether write bypassed sanitizer or approved gateway
  4. OGV-P0 special check:

    • identify the test file/command that created test/ogv-p0/*-active
    • explain why active fixtures were not torn down
    • prove whether OGV-P0 test reran after May 5 cleanup
    • propose teardown/TTL/namespace guard
  5. Inline/local path check:

    • identify route/tool that admitted inline-*
    • explain why file:///Users/... was accepted as KB content
    • propose sanitizer/gate to reject local temp path documents unless explicitly allowlisted and quarantined
  6. Misplaced reports:

    • move proposal only; do not delete unless duplicate canonical copy exists
    • recommended quarantine path: knowledge/current-state/reports/legacy-misplaced/ or mission-specific historical path

Required output from Opus/agent

Create report:

knowledge/dev/laws/dieu44-trien-khai/reports/ogv-2a-vector-garbage-root-cause-2026-05-07.md

Report must include:

  • Full evidence table, not just counts.
  • Timestamp comparison against cleanup window.
  • Root cause per class.
  • Fix-before-cleanup plan.
  • Cleanup proposal separated into delete/move/quarantine/no-op.
  • Explicit statement whether there is any confirmed new garbage generation after the cleanup.

Provisional cleanup stance

  • Pure garbage likely delete candidates after evidence capture: OGV fixtures, inline orphan, test-file-creation.md, knowledge/test/phase2-api-check.
  • Empty folders test/conn-audit-moved, test/f1-moved: delete only if no child documents and no canonical reference.
  • Root reports: move/quarantine, not delete by default.
  • Local path refs: no decision until existence and type are confirmed.

Council challenge questions for Opus

  1. If OGV-P0 fix was accepted complete on May 3, why were active test fixtures left in production KB at all?
  2. Did the previous cleanup target only orphan vectors, not live KB test documents? If yes, the current issue is audit scope gap, not vector resurrection.
  3. Which write path permits inline-* documents and local file:///Users/... content?
  4. What prevents the same artifact from being created again tomorrow?
  5. Which automated guard will fail CI/mission when a test document enters production KB outside allowlist?
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/reviews/gpt-directive-ogv-2a-vector-garbage-root-cause-2026-05-07.md