KB-7F9A

S170-CLARIFY — PA5 Remove _fs_key Recommended

11 min read Revision 1
reportinvestigationdoc-id_fs_keyPA5rootcauses1702026-04-07

S170-CLARIFY — Root Cause Deepening + PA5 Recommendation

Date: 2026-04-07 | Status: INVESTIGATED — PA5 recommended, chờ duyệt


Q0: TẠI SAO _fs_key CÒN TỒN TẠI?

Firestore còn dùng ở đâu trong production code?

GREP evidence (agent_data/ + mcp_server/):

  • pg_store.py:1 — comment: "replaces Firestore" (documentation only)
  • pg_store.py:6 — comment: "mimicking Firestore's document model" (documentation only)
  • memory.py:3 — comment: "Migration S109-CP2: Replaced Firestore" (documentation only)
  • memory.py:79FirestoreChatHistory = PostgresChatHistory (backward-compat alias)
  • mcp_server/server.py:194 — comment: "List documents from Firestore KB" (STALE comment)
  • mcp_server/stdio_server.py:404 — comment: "Primary: list from Firestore KB" (STALE comment)

Production requirements.txt: google-cloud-firestore NOT present. Only in requirements-dev.txt. Dockerfile: uses requirements.txt (production only). Firestore package NOT in image. server.py: No import firestore, no _firestore() function. Zero Firestore calls.

Kết luận Q0: _fs_key là DEAD LEGACY CODE.

Firestore retired (S109). _fs_key was kept because:

  1. S109 migration (PR #289-293) replaced Firestore with pg_store but kept the encoding to avoid migrating 900 existing keys
  2. No one questioned whether the encoding was still needed for PG
  3. pg_store.py line 6 literally says "mimicking Firestore's document model" — the design CHOSE to keep the Firestore pattern

Rào cản kỹ thuật để xoá:

  • 15 call sites in server.py → change to pass doc_id directly (remove _fs_key() wrapper)
  • 1,807 rows in kb_documentsUPDATE SET key = replace(key, '__', '/')
  • 2,468 rows in kb_documents_historyUPDATE SET document_key = replace(document_key, '__', '/')
  • 3,040 rows in kb_audit_logUPDATE SET document_key = replace(document_key, '__', '/')
  • 5 PG triggers → fn_kb_notify_vector_sync uses both __ and / patterns (needs simplify)
  • Total: 7,315 rows + 15 code lines + 1 PG function

Scale (CQ-3): At 1M rows, migration takes ~10 seconds (UPDATE ... SET key = replace(key,'__','/')). PG handles this trivially. Not a concern.


A1: Exact Call Sites (15, not 17)

# File:Line Context Type Risk if _fs_key removed
1 server.py:481 Session sentinel check READ Remove wrapper
2 server.py:757 /ingest write WRITE Pass doc_id direct
3 server.py:1163 move (deprecated 501) READ Dead code
4 server.py:1210 move ancestor READ Dead code
5 server.py:1230 create_document READ+WRITE Pass doc_id direct
6 server.py:1336 update_document READ+WRITE Pass doc_id direct
7 server.py:1481 move_document READ Dead code (501)
8 server.py:1549 delete_document READ Pass doc_id direct
9 server.py:1614 get_document READ Pass doc_id direct
10 server.py:1694 patch_document READ+WRITE Pass doc_id direct
11 server.py:1803 batch_documents READ Pass doc_id direct
12 server.py:1884 list_kb READ Pass doc_id direct
13 server.py:1962 reindex vector_status WRITE Pass doc_id direct
14 server.py:2100 reindex_missing read READ Pass doc_id direct
15 server.py:2137 reindex_missing status WRITE Pass doc_id direct

3 dead code (move_document deprecated). 12 active — all trivial: remove _fs_key() wrapper.

A2: Physical SSOT

SELECT count(*) FROM kb_documents WHERE key LIKE '%/%';  → 0
SELECT count(*) FROM kb_documents WHERE key LIKE '%__%'; → 1,806

PHYSICAL SSOT is __ format. Every single key uses __. Zero keys with /. Report S170 saying "/" is SSOT was about LOGICAL intent, not physical reality.

A3: Sync status

AD knowledge active: 563
Directus agentdata records: 576

Difference: 13 Directus-only records (from before current sync setup). 0 orphans per DOT-317.

A4: First commit

a974eb2 — feat: upsert flag + status filter for Directus Flow pipeline (WEB-60) (#245)

This is the EARLIEST commit in VPS git. _fs_key predates even this — it was in the codebase from Firestore era.


B: Phương Án So Sánh (5 PA × 6 cột)

PA Mô tả PG FIRST? Vĩnh viễn? Cơ hội nhầm? DOT cặp? Tự động 100%?
PA1 Python normalize in listener ❌ App fix ❌ 14 sites unfixed CÒN — new listener = same bug DOT-317 ❌ Manual norm each listener
PA2 PG generated column doc_id ✅ Partial __ key remains CÒN — events still use raw __ DOT-317 ❌ App must read new column
PA3 PA2 + CHECK on events ✅ Partial __ key remains Ít hơn PA2 DOT-317 ❌ CHECK only prevents, not eliminates
PA4 PG DOMAIN type ✅ Partial __ key remains Ít hơn PA2 DOT-317 ❌ Domain on column, not on key
PA5 XOÁ _fs_key + migrate key→/ ✅✅ PG key IS logical path ✅ 1 format 0 — no encoding DOT-317 ✅ No conversion step

PA5 vượt trội mọi PA khác vì:

  1. Vĩnh viễn: Loại bỏ encoding layer. PG key = logical path = Directus path = Qdrant path.
  2. 0 cơ hội nhầm: Không có _fs_key → không có mismatch. Listener mới không cần normalize.
  3. PG FIRST: PG key = SSOT. Không derived. Không generated. Direct.
  4. Scale: 7,315 row UPDATE ~ 1 second. 15 code lines remove wrapper. Trivial.

C: 5 Câu OR Section XII cho PA5

C1: Thiết kế kỹ thuật

-- Step 1: Migrate keys (single transaction, <1s)
BEGIN;
UPDATE kb_documents SET key = replace(key, '__', '/') WHERE key LIKE '%__%';
UPDATE kb_documents_history SET document_key = replace(document_key, '__', '/') WHERE document_key LIKE '%__%';
UPDATE kb_audit_log SET document_key = replace(document_key, '__', '/') WHERE document_key LIKE '%__%';
COMMIT;

-- Step 2: Simplify fn_kb_notify_vector_sync (remove __ patterns)
-- Step 3: Add CHECK constraint
ALTER TABLE kb_documents ADD CONSTRAINT chk_key_no_double_underscore
  CHECK (key NOT LIKE '%__%');

-- Step 4: Remove _fs_key from server.py (15 lines: remove wrapper call)
-- Step 5: Rebuild Docker image

C2: Quy trình

  • QT: Data Migration (Đ33 PostgreSQL Law)
  • Backup trước migration (pg_dump)
  • Freeze tag cho rollback

C3: DOT cặp (CP-12)

  • DOT-317 (sync-orphan-scan, Tier A, daily): detect lệch AD↔Directus
  • DOT_KB_PROTECT (Tier B, event triggers): protect history/audit
  • CHECK constraint: PG-level enforcement (no __ in key)

C4: Quản trị thay đổi (Đ31)

  • Contract: kb_documents.key uses / separator. __ PROHIBITED by CHECK.
  • Listener mới vi phạm? PG CHECK blocks INSERT/UPDATE with __. Automatic.
  • Test: INSERT INTO kb_documents (key, data) VALUES ('test__bad', '{}') → ERROR.

C5: Tự động 100%?

  • Migration: 1 SQL script (auto)
  • CHECK constraint: PG enforces (auto)
  • Code change: remove 15 _fs_key() calls (manual, one-time)
  • After deploy: 100% automatic — no conversion step anywhere

D: Diagram BEFORE/AFTER

BEFORE (hiện tại)

User POST /documents {doc_id: "knowledge/dev/laws/file.md"}
  ↓
server.py: doc_key = _fs_key(doc_id)  →  "knowledge__dev__laws__file.md"
  ↓
PG kb_documents.key = "knowledge__dev__laws__file.md"  ← PHYSICAL (__) 
  ↓
emit(DOCUMENT_CREATED, {doc_id: "knowledge/dev/laws/file.md"})  ← LOGICAL (/)
  ↓                                              ↓
directus_sync: _should_sync("knowledge/...")  Qdrant: doc_id = "knowledge/dev/laws/file.md"
  ↓
Directus: source_id = "agentdata:knowledge/dev/laws/file.md"

⚠️ DELETE /documents/knowledge__dev__laws__file.md
  ↓
doc_id = "knowledge__dev__laws__file.md"  ← URL (__) ≠ logical (/)
emit(DOCUMENT_DELETED, {doc_id: "knowledge__dev__..."})
  ↓
directus_sync: _should_sync("knowledge__dev__...") → FALSE ❌ ORPHAN

AFTER (PA5)

User POST /documents {doc_id: "knowledge/dev/laws/file.md"}
  ↓
server.py: doc_key = doc_id  (NO _fs_key)
  ↓
PG kb_documents.key = "knowledge/dev/laws/file.md"  ← SAME FORMAT
  ↓
emit(DOCUMENT_CREATED, {doc_id: "knowledge/dev/laws/file.md"})
  ↓                                              ↓
directus_sync: _should_sync("knowledge/...")  Qdrant: doc_id = "knowledge/dev/laws/file.md"
  ↓
Directus: source_id = "agentdata:knowledge/dev/laws/file.md"

✅ DELETE /documents/knowledge/dev/laws/file.md
  ↓
doc_id = "knowledge/dev/laws/file.md"  ← SAME FORMAT
emit(DOCUMENT_DELETED, {doc_id: "knowledge/dev/..."})
  ↓
directus_sync: _should_sync("knowledge/dev/...") → TRUE ✅ SYNCED

Legacy Purge Candidates (ghi nhận cho S171)

File/Config Legacy Type Tự tin "đã chết" Evidence
memory.py:79 FirestoreChatHistory = PostgresChatHistory Firestore alias Cao Backward-compat, no caller in prod
pg_store.py:1,6 comments "replaces Firestore", "mimicking Firestore" Firestore comment Cao Documentation only
mcp_server/server.py:194 "Firestore KB" comment Firestore comment Cao Stale comment
mcp_server/stdio_server.py:404 "Firestore KB" comment Firestore comment Cao Stale comment
requirements-dev.txt google-cloud-firestore==2.21.0 Firestore dep Cao Not in prod requirements
137 Firestore references in tests/ Firestore test mocks Cao Mock _firestore that doesn't exist
main.py:111,179 gcs_ingest tool GCS Cao Returns "GCS ingestion is disabled"
scripts/e2e_gcs_setup.py GCS Cao E2E fixture upload to GCS
.github/workflows/artifact-registry-cd.yml Artifact Registry Trung May still be used for image push
fn_kb_notify_vector_sync __ patterns Firestore encoding Cao Will be removed by PA5

VERIFY (5 self-checks)

# Check Result
1 7 questions + Q0 with evidence ✅ grep output, PG queries, git log
2 15 call sites table ✅ 15 rows, classified READ/WRITE
3 5 PA × 6 columns ✅ All cells filled with reasoning
4 PA5 recommended + 5 OR questions ✅ DDL, DOT, contract, auto
5 BEFORE/AFTER diagrams ✅ Text-based flow

Q0 answered: _fs_key is dead Firestore legacy. PA5 (remove) is the only permanent fix. Chờ Desktop duyệt.