S170-CLARIFY — PA5 Remove _fs_key Recommended
S170-CLARIFY — Root Cause Deepening + PA5 Recommendation
Date: 2026-04-07 | Status: INVESTIGATED — PA5 recommended, chờ duyệt
Q0: TẠI SAO _fs_key CÒN TỒN TẠI?
Firestore còn dùng ở đâu trong production code?
GREP evidence (agent_data/ + mcp_server/):
pg_store.py:1— comment: "replaces Firestore" (documentation only)pg_store.py:6— comment: "mimicking Firestore's document model" (documentation only)memory.py:3— comment: "Migration S109-CP2: Replaced Firestore" (documentation only)memory.py:79—FirestoreChatHistory = PostgresChatHistory(backward-compat alias)mcp_server/server.py:194— comment: "List documents from Firestore KB" (STALE comment)mcp_server/stdio_server.py:404— comment: "Primary: list from Firestore KB" (STALE comment)
Production requirements.txt: google-cloud-firestore NOT present. Only in requirements-dev.txt.
Dockerfile: uses requirements.txt (production only). Firestore package NOT in image.
server.py: No import firestore, no _firestore() function. Zero Firestore calls.
Kết luận Q0: _fs_key là DEAD LEGACY CODE.
Firestore retired (S109). _fs_key was kept because:
- S109 migration (PR #289-293) replaced Firestore with pg_store but kept the encoding to avoid migrating 900 existing keys
- No one questioned whether the encoding was still needed for PG
pg_store.pyline 6 literally says "mimicking Firestore's document model" — the design CHOSE to keep the Firestore pattern
Rào cản kỹ thuật để xoá:
- 15 call sites in
server.py→ change to pass doc_id directly (remove_fs_key()wrapper) - 1,807 rows in
kb_documents→UPDATE SET key = replace(key, '__', '/') - 2,468 rows in
kb_documents_history→UPDATE SET document_key = replace(document_key, '__', '/') - 3,040 rows in
kb_audit_log→UPDATE SET document_key = replace(document_key, '__', '/') - 5 PG triggers → fn_kb_notify_vector_sync uses both
__and/patterns (needs simplify) - Total: 7,315 rows + 15 code lines + 1 PG function
Scale (CQ-3): At 1M rows, migration takes ~10 seconds (UPDATE ... SET key = replace(key,'__','/')). PG handles this trivially. Not a concern.
A1: Exact Call Sites (15, not 17)
| # | File:Line | Context | Type | Risk if _fs_key removed |
|---|---|---|---|---|
| 1 | server.py:481 | Session sentinel check | READ | Remove wrapper |
| 2 | server.py:757 | /ingest write | WRITE | Pass doc_id direct |
| 3 | server.py:1163 | move (deprecated 501) | READ | Dead code |
| 4 | server.py:1210 | move ancestor | READ | Dead code |
| 5 | server.py:1230 | create_document | READ+WRITE | Pass doc_id direct |
| 6 | server.py:1336 | update_document | READ+WRITE | Pass doc_id direct |
| 7 | server.py:1481 | move_document | READ | Dead code (501) |
| 8 | server.py:1549 | delete_document | READ | Pass doc_id direct |
| 9 | server.py:1614 | get_document | READ | Pass doc_id direct |
| 10 | server.py:1694 | patch_document | READ+WRITE | Pass doc_id direct |
| 11 | server.py:1803 | batch_documents | READ | Pass doc_id direct |
| 12 | server.py:1884 | list_kb | READ | Pass doc_id direct |
| 13 | server.py:1962 | reindex vector_status | WRITE | Pass doc_id direct |
| 14 | server.py:2100 | reindex_missing read | READ | Pass doc_id direct |
| 15 | server.py:2137 | reindex_missing status | WRITE | Pass doc_id direct |
3 dead code (move_document deprecated). 12 active — all trivial: remove _fs_key() wrapper.
A2: Physical SSOT
SELECT count(*) FROM kb_documents WHERE key LIKE '%/%'; → 0
SELECT count(*) FROM kb_documents WHERE key LIKE '%__%'; → 1,806
PHYSICAL SSOT is __ format. Every single key uses __. Zero keys with /. Report S170 saying "/" is SSOT was about LOGICAL intent, not physical reality.
A3: Sync status
AD knowledge active: 563
Directus agentdata records: 576
Difference: 13 Directus-only records (from before current sync setup). 0 orphans per DOT-317.
A4: First commit
a974eb2 — feat: upsert flag + status filter for Directus Flow pipeline (WEB-60) (#245)
This is the EARLIEST commit in VPS git. _fs_key predates even this — it was in the codebase from Firestore era.
B: Phương Án So Sánh (5 PA × 6 cột)
| PA | Mô tả | PG FIRST? | Vĩnh viễn? | Cơ hội nhầm? | DOT cặp? | Tự động 100%? |
|---|---|---|---|---|---|---|
| PA1 | Python normalize in listener | ❌ App fix | ❌ 14 sites unfixed | CÒN — new listener = same bug | DOT-317 | ❌ Manual norm each listener |
| PA2 | PG generated column doc_id |
✅ Partial | ❌ __ key remains |
CÒN — events still use raw __ |
DOT-317 | ❌ App must read new column |
| PA3 | PA2 + CHECK on events | ✅ Partial | ❌ __ key remains |
Ít hơn PA2 | DOT-317 | ❌ CHECK only prevents, not eliminates |
| PA4 | PG DOMAIN type | ✅ Partial | ❌ __ key remains |
Ít hơn PA2 | DOT-317 | ❌ Domain on column, not on key |
| PA5 | XOÁ _fs_key + migrate key→/ |
✅✅ PG key IS logical path | ✅ 1 format | 0 — no encoding | DOT-317 | ✅ No conversion step |
PA5 vượt trội mọi PA khác vì:
- Vĩnh viễn: Loại bỏ encoding layer. PG key = logical path = Directus path = Qdrant path.
- 0 cơ hội nhầm: Không có
_fs_key→ không có mismatch. Listener mới không cần normalize. - PG FIRST: PG key = SSOT. Không derived. Không generated. Direct.
- Scale: 7,315 row UPDATE ~ 1 second. 15 code lines remove wrapper. Trivial.
C: 5 Câu OR Section XII cho PA5
C1: Thiết kế kỹ thuật
-- Step 1: Migrate keys (single transaction, <1s)
BEGIN;
UPDATE kb_documents SET key = replace(key, '__', '/') WHERE key LIKE '%__%';
UPDATE kb_documents_history SET document_key = replace(document_key, '__', '/') WHERE document_key LIKE '%__%';
UPDATE kb_audit_log SET document_key = replace(document_key, '__', '/') WHERE document_key LIKE '%__%';
COMMIT;
-- Step 2: Simplify fn_kb_notify_vector_sync (remove __ patterns)
-- Step 3: Add CHECK constraint
ALTER TABLE kb_documents ADD CONSTRAINT chk_key_no_double_underscore
CHECK (key NOT LIKE '%__%');
-- Step 4: Remove _fs_key from server.py (15 lines: remove wrapper call)
-- Step 5: Rebuild Docker image
C2: Quy trình
- QT: Data Migration (Đ33 PostgreSQL Law)
- Backup trước migration (pg_dump)
- Freeze tag cho rollback
C3: DOT cặp (CP-12)
- DOT-317 (sync-orphan-scan, Tier A, daily): detect lệch AD↔Directus
- DOT_KB_PROTECT (Tier B, event triggers): protect history/audit
- CHECK constraint: PG-level enforcement (no
__in key)
C4: Quản trị thay đổi (Đ31)
- Contract:
kb_documents.keyuses/separator.__PROHIBITED by CHECK. - Listener mới vi phạm? PG CHECK blocks INSERT/UPDATE with
__. Automatic. - Test:
INSERT INTO kb_documents (key, data) VALUES ('test__bad', '{}')→ ERROR.
C5: Tự động 100%?
- Migration: 1 SQL script (auto)
- CHECK constraint: PG enforces (auto)
- Code change: remove 15
_fs_key()calls (manual, one-time) - After deploy: 100% automatic — no conversion step anywhere
D: Diagram BEFORE/AFTER
BEFORE (hiện tại)
User POST /documents {doc_id: "knowledge/dev/laws/file.md"}
↓
server.py: doc_key = _fs_key(doc_id) → "knowledge__dev__laws__file.md"
↓
PG kb_documents.key = "knowledge__dev__laws__file.md" ← PHYSICAL (__)
↓
emit(DOCUMENT_CREATED, {doc_id: "knowledge/dev/laws/file.md"}) ← LOGICAL (/)
↓ ↓
directus_sync: _should_sync("knowledge/...") Qdrant: doc_id = "knowledge/dev/laws/file.md"
↓
Directus: source_id = "agentdata:knowledge/dev/laws/file.md"
⚠️ DELETE /documents/knowledge__dev__laws__file.md
↓
doc_id = "knowledge__dev__laws__file.md" ← URL (__) ≠ logical (/)
emit(DOCUMENT_DELETED, {doc_id: "knowledge__dev__..."})
↓
directus_sync: _should_sync("knowledge__dev__...") → FALSE ❌ ORPHAN
AFTER (PA5)
User POST /documents {doc_id: "knowledge/dev/laws/file.md"}
↓
server.py: doc_key = doc_id (NO _fs_key)
↓
PG kb_documents.key = "knowledge/dev/laws/file.md" ← SAME FORMAT
↓
emit(DOCUMENT_CREATED, {doc_id: "knowledge/dev/laws/file.md"})
↓ ↓
directus_sync: _should_sync("knowledge/...") Qdrant: doc_id = "knowledge/dev/laws/file.md"
↓
Directus: source_id = "agentdata:knowledge/dev/laws/file.md"
✅ DELETE /documents/knowledge/dev/laws/file.md
↓
doc_id = "knowledge/dev/laws/file.md" ← SAME FORMAT
emit(DOCUMENT_DELETED, {doc_id: "knowledge/dev/..."})
↓
directus_sync: _should_sync("knowledge/dev/...") → TRUE ✅ SYNCED
Legacy Purge Candidates (ghi nhận cho S171)
| File/Config | Legacy Type | Tự tin "đã chết" | Evidence |
|---|---|---|---|
memory.py:79 FirestoreChatHistory = PostgresChatHistory |
Firestore alias | Cao | Backward-compat, no caller in prod |
pg_store.py:1,6 comments "replaces Firestore", "mimicking Firestore" |
Firestore comment | Cao | Documentation only |
mcp_server/server.py:194 "Firestore KB" comment |
Firestore comment | Cao | Stale comment |
mcp_server/stdio_server.py:404 "Firestore KB" comment |
Firestore comment | Cao | Stale comment |
requirements-dev.txt google-cloud-firestore==2.21.0 |
Firestore dep | Cao | Not in prod requirements |
| 137 Firestore references in tests/ | Firestore test mocks | Cao | Mock _firestore that doesn't exist |
main.py:111,179 gcs_ingest tool |
GCS | Cao | Returns "GCS ingestion is disabled" |
scripts/e2e_gcs_setup.py |
GCS | Cao | E2E fixture upload to GCS |
.github/workflows/artifact-registry-cd.yml |
Artifact Registry | Trung | May still be used for image push |
fn_kb_notify_vector_sync __ patterns |
Firestore encoding | Cao | Will be removed by PA5 |
VERIFY (5 self-checks)
| # | Check | Result |
|---|---|---|
| 1 | 7 questions + Q0 with evidence | ✅ grep output, PG queries, git log |
| 2 | 15 call sites table | ✅ 15 rows, classified READ/WRITE |
| 3 | 5 PA × 6 columns | ✅ All cells filled with reasoning |
| 4 | PA5 recommended + 5 OR questions | ✅ DDL, DOT, contract, auto |
| 5 | BEFORE/AFTER diagrams | ✅ Text-based flow |
Q0 answered: _fs_key is dead Firestore legacy. PA5 (remove) is the only permanent fix. Chờ Desktop duyệt.