S175 Fix Execution Report v3 — Commit Bug Fix + Backfill
S175 Fix Execution Report v3
Date: 2026-04-09 Session: S175 P3 RESUME (Claude Code) Mode: Edit allowed (per Desktop chính approval) Goal: Debug + fix Directus drift root cause; backfill NULL source_id; smoke test; hardtest
Summary
Root cause của S175 drift đã được xác định và fix:
H3 CONFIRMED: _db_conn() context manager trong directus_sync.py set
autocommit=False nhưng KHÔNG gọi conn.commit() sau yield. Mọi UPDATE
chạy thành công nhưng bị psycopg2 silent rollback khi connection close →
handler return "updated" nhưng DB không đổi.
Fix: 1 dòng thêm conn.commit() sau yield conn. Verified end-to-end qua
smoke test: row id=964 version 1→2→3, content có/không marker, không sinh row mới.
VIỆC 1 — Debug H1 vs H3
Hypothesis test
H1 (NULL source_id → INSERT new row): REJECTED
SELECT id, file_path, source_id, is_current_version, version_number, date_updated
FROM public.knowledge_documents WHERE file_path LIKE '%assembly-step1-inventory%';
→ 510 | knowledge/current-state/reports/assembly-step1-inventory.md
| agentdata:knowledge/current-state/reports/assembly-step1-inventory.md (NOT NULL)
| t | 1 | NULL
Row 510 có source_id SET, không phải NULL. Writer đi nhánh UPDATE, không INSERT. Codex's smoke test smoke đúng row 510, không tạo row mới. H1 bị bác.
H3 root cause
@contextmanager
def _db_conn():
conn = psycopg2.connect(_db_dsn())
conn.autocommit = False
try:
yield conn # ← UPDATE chạy ở đây
except Exception:
conn.rollback()
raise
finally:
conn.close() # ← KHÔNG commit() trước close → ROLLBACK
grep commit toàn file: chỉ thấy dòng 89 autocommit = False. Không có
conn.commit() ở bất kỳ đâu. Mỗi transaction đều bị silent rollback.
Evidence: row 510 date_created=2026-03-05, date_updated=NULL, version_number=1
— không hề bị thay đổi sau Codex's smoke test.
VIỆC 2 — Backfill 12 clean NULL rows
Discovery: 3 of 15 NULL rows are README dup pairs
15 NULL rows tổng:
- 3 in README dup groups (id 298, 299, 301) — NULL side của 3 pairs với existing 345/346/347
- 12 clean (id 300, 302, 961, 971, 975, 977, 978, 979, 980, 981, 982, 983)
First backfill attempt (count=15) failed with:
ERROR: duplicate key value violates unique constraint "idx_kd_current_source_id_unique"
DETAIL: Key (source_id)=(agentdata:knowledge/current-state/README.md) already exists.
Reported to Desktop, approved Option A: backfill 12 clean, defer 3 README to followup.
Backfill execution
BEGIN;
UPDATE public.knowledge_documents
SET source_id = 'agentdata:' || file_path, date_updated = NOW()
WHERE is_current_version=true AND source_id IS NULL AND file_path IS NOT NULL
AND id NOT IN (298, 299, 301);
SELECT COUNT(*) AS still_null FROM public.knowledge_documents
WHERE is_current_version=true AND source_id IS NULL;
COMMIT;
Output:
BEGIN
UPDATE 12
still_null
------------
3
COMMIT
Verify post-commit: 12 backfilled rows all have date_updated=2026-04-09 09:21:36.07504
and source_id format agentdata:<file_path>. 3 README NULL rows preserved as planned.
VIỆC 3 — Fix _db_conn() commit bug
Code diff
File: /opt/incomex/docker/agent-data-repo/agent_data/directus_sync.py lines 85-97
@contextmanager
def _db_conn():
conn = psycopg2.connect(_db_dsn())
conn.autocommit = False
try:
yield conn
+ conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
Build evidence
py_compileOK- Backup:
/root/backup/directus_sync.py.20260409T092227Z docker build -t agent-data-local:latest .→ exit 0, image manifestbfe092449032e23ee9ad091f5a327e11b29f4b5d9757aa9c40cace5a588dd5b7docker compose up -d --force-recreate agent-data→ Starteddocker inspect ... .State.Health.Status→ healthy/api/health→status=healthy, services={qdrant: ok, postgres: ok, openai: ok}
VIỆC 4 — Smoke test rerun
Test doc
knowledge/current-state/reports/agent-data-connectivity-check-gpt-2026-03-31.md
- id=964 (NOT in 15 NULL list, NOT assembly-step1)
- Both Agent Data (rev=1) and Directus exist
- source_id matches:
agentdata:knowledge/current-state/reports/...
Before update
id=964 | version_number=1 | date_updated=NULL | content_len=1356 (Directus) / 1356 (Agent Data)
is_current_version=t
Update via PUT /documents/{doc_id} (correct schema: document_id + patch + update_mask)
PUT /api/documents/.../agent-data-connectivity-check-gpt-2026-03-31.md
{
"document_id": "...",
"patch": {"content": {"body": <orig + marker>, "mime_type": "text/markdown"}},
"update_mask": ["content"]
}
→ {"id":"...","status":"updated","revision":2}
After update (5s wait)
id=964 | version_number=2 | date_updated=2026-04-09 09:26:55.973472 | content_len=1389
has_marker=t | is_current_version=t | current_count=1
ALL VERIFY POINTS PASS:
- ✓ version_number tăng (1→2)
- ✓ date_updated NOT NULL (was NULL, now 09:26:55)
- ✓ content có marker (
SMOKE-TEST-S175-MARKER-VIEC4) - ✓ is_current_version=true (preserved)
- ✓ current_count=1 (no new row)
Restore
PUT same endpoint with original body (1356 bytes)
→ {"status":"updated","revision":3}
After restore:
id=964 | version_number=3 | date_updated=2026-04-09 09:28:14.72428 | content_len=1356
has_marker=f
Marker removed. Content restored. End-to-end fix verified.
Note on logs
No explicit "Directus sync" log line visible at INFO level in container output. The proof of work is in the DB (version_number tăng 2 lần, content thay đổi đúng). Logger may be at DEBUG level or pytest-style fire-and-forget tasks don't surface.
VIỆC 5 — P4 hardtest 4 kịch bản
Kịch bản 1: Update doc reflect <60s, no new row
PASS (covered by VIỆC 4 smoke test). Version went 1→2→3, content updated/restored, no duplicate row created. Latency: SQL changes visible immediately after PUT 200 OK.
Kịch bản 2: Move doc → old current=false, new current=true
N/A by design. Move endpoint deprecated by S170:
POST /documents/.../move
→ {"code":"NOT_IMPLEMENTED","message":"move_document is deprecated. Use:
(1) upload_document to new path, (2) delete old path."}
The atomic writer's _select_rows_for_source handles same-source-id duplicates
correctly (sets old current=false in first UPDATE), but cross-path move is now
a 2-step operation outside writer scope.
Kịch bản 3: INSERT duplicate current → DB reject UNIQUE
PASS.
BEGIN;
INSERT INTO knowledge_documents (...)
VALUES (..., 'agentdata:knowledge/current-state/reports/agent-data-connectivity-check-gpt-2026-03-31.md');
ROLLBACK;
Output:
BEGIN
ROLLBACK
ERROR: duplicate key value violates unique constraint "idx_kd_current_source_id_unique"
DETAIL: Key (source_id)=(agentdata:...) already exists.
Partial UNIQUE (source_id) WHERE is_current_version=true AND source_id IS NOT NULL
correctly enforces 1-current-row-per-source-id invariant.
Kịch bản 4: 3 README dup groups intact
PASS.
id | file_path | source_id | is_current_version
-----+-----------------------------------+---------------------------------------------+--------------------
298 | knowledge/current-state/README.md | | t ← NULL preserved
345 | knowledge/current-state/README.md | agentdata:knowledge/current-state/README.md | t
299 | knowledge/current-tasks/README.md | | t ← NULL preserved
346 | knowledge/current-tasks/README.md | agentdata:knowledge/current-tasks/README.md | t
300 | knowledge/dev/README.md | agentdata:knowledge/dev/README.md | t ← backfilled (was NULL, no conflict)
301 | knowledge/other/README.md | | t ← NULL preserved
347 | knowledge/other/README.md | agentdata:knowledge/other/README.md | t
3 NULL-side rows (298, 299, 301) preserved untouched. knowledge/dev/README.md (300)
was backfilled because no conflict pair existed (no row 348).
VIỆC 6 — Followup note
Created: knowledge/current-state/issues/s175-readme-duplicates-followup.md
- Tag:
s175-followup - Documents 6 rows of 3 README dup groups
- Plan for next session cleanup
- Recommendation to add
source_id NOT NULLafter cleanup
Known Limitations
-
Move endpoint deprecated — Cross-path moves require 2 separate operations (upload to new path + delete old). The atomic writer doesn't handle this case automatically. If a doc is renamed in Agent Data, the old Directus row stays with the old source_id forever (current=true) until someone DELETEs it.
-
3 README NULL rows still NULL — Deferred to followup. Until then, these rows can't be enforced by the partial UNIQUE constraint (NULL excluded).
-
No "Directus sync" INFO log — End-to-end test confirms writer works, but logger output for the listener is invisible (likely DEBUG level or fire-and-forget swallowed). Consider raising log level for sync events.
-
Out of scope (still TODO from earlier S175):
limit=500indot-knowledge-sync-agentdataline 87 (other 10 hardcoded limit sites)- Event bus retry / persistent queue (no retry on container restart)
- 18 duplicate law document rows from earlier batch sync bug (not yet deduped)
Bài Học (Lessons Learned)
1. autocommit=False context manager BẮT BUỘC có conn.commit() sau yield
Pattern @contextmanager + psycopg2 + autocommit=False là combo nguy hiểm
nếu thiếu commit. PG sẽ silent rollback mọi thay đổi khi connection close.
Code review checklist phải có: "any autocommit=False? grep commit?"
2. Backfill data PHẢI check conflict với UNIQUE constraint TRƯỚC khi UPDATE
Lần đầu Claude Code chạy backfill 15 rows mà không check, hit UNIQUE violation ở row đầu tiên (298), toàn bộ transaction rollback. Đúng quy trình:
- SELECT các row cần update
- LEFT JOIN với existing rows trên cột UNIQUE
- Identify conflicts trước
- Skip conflicts hoặc resolve trước khi UPDATE
Pattern an toàn:
UPDATE ... WHERE ... AND NOT EXISTS (
SELECT 1 FROM ... WHERE conflicting_unique_key = ...
);
3. Pattern "source_id NULL" là debt từ schema cũ chưa enforce
Trước S175 P2, không có constraint trên source_id. Một số row legacy có source_id NULL (15 rows hiện tại). Sau khi dọn xong 3 README dup groups, nên thêm:
ALTER TABLE knowledge_documents ALTER COLUMN source_id SET NOT NULL;
Để chặn vĩnh viễn pattern này. Cần làm trong phiên followup, KHÔNG trong S175.
Files Modified
/opt/incomex/docker/agent-data-repo/agent_data/directus_sync.py(1 line added)/opt/incomex/docker/.env(DIRECTUS_DB_* vars — done by Codex earlier)/opt/incomex/docker/docker-compose.yml(env passthrough — done by Codex earlier)
Backup Files (Rollback Targets)
/root/backup/s175-knowledge_documents-20260409T072635Z.sql(P0 DB snapshot)/root/backup/directus_sync.py.20260409T075910Z(pre-Codex rewrite)/root/backup/directus_sync.py.20260409T092227Z(Claude Code's commit fix)
Image
Built: agent-data-local:latest
Manifest list: sha256:bfe092449032e23ee9ad091f5a327e11b29f4b5d9757aa9c40cace5a588dd5b7
Container ID: incomex-agent-data (healthy)