KB-21EE

S175 Fix Execution Report v3 — Commit Bug Fix + Backfill

12 min read Revision 1
reports175fixv3directus-synccommit-bug

S175 Fix Execution Report v3

Date: 2026-04-09 Session: S175 P3 RESUME (Claude Code) Mode: Edit allowed (per Desktop chính approval) Goal: Debug + fix Directus drift root cause; backfill NULL source_id; smoke test; hardtest

Summary

Root cause của S175 drift đã được xác định và fix:

H3 CONFIRMED: _db_conn() context manager trong directus_sync.py set autocommit=False nhưng KHÔNG gọi conn.commit() sau yield. Mọi UPDATE chạy thành công nhưng bị psycopg2 silent rollback khi connection close → handler return "updated" nhưng DB không đổi.

Fix: 1 dòng thêm conn.commit() sau yield conn. Verified end-to-end qua smoke test: row id=964 version 1→2→3, content có/không marker, không sinh row mới.

VIỆC 1 — Debug H1 vs H3

Hypothesis test

H1 (NULL source_id → INSERT new row): REJECTED

SELECT id, file_path, source_id, is_current_version, version_number, date_updated
FROM public.knowledge_documents WHERE file_path LIKE '%assembly-step1-inventory%';
→ 510 | knowledge/current-state/reports/assembly-step1-inventory.md
    | agentdata:knowledge/current-state/reports/assembly-step1-inventory.md (NOT NULL)
    | t | 1 | NULL

Row 510 có source_id SET, không phải NULL. Writer đi nhánh UPDATE, không INSERT. Codex's smoke test smoke đúng row 510, không tạo row mới. H1 bị bác.

H3 root cause

@contextmanager
def _db_conn():
    conn = psycopg2.connect(_db_dsn())
    conn.autocommit = False
    try:
        yield conn          # ← UPDATE chạy ở đây
    except Exception:
        conn.rollback()
        raise
    finally:
        conn.close()        # ← KHÔNG commit() trước close → ROLLBACK

grep commit toàn file: chỉ thấy dòng 89 autocommit = False. Không có conn.commit() ở bất kỳ đâu. Mỗi transaction đều bị silent rollback.

Evidence: row 510 date_created=2026-03-05, date_updated=NULL, version_number=1 — không hề bị thay đổi sau Codex's smoke test.

VIỆC 2 — Backfill 12 clean NULL rows

Discovery: 3 of 15 NULL rows are README dup pairs

15 NULL rows tổng:

  • 3 in README dup groups (id 298, 299, 301) — NULL side của 3 pairs với existing 345/346/347
  • 12 clean (id 300, 302, 961, 971, 975, 977, 978, 979, 980, 981, 982, 983)

First backfill attempt (count=15) failed with:

ERROR: duplicate key value violates unique constraint "idx_kd_current_source_id_unique"
DETAIL: Key (source_id)=(agentdata:knowledge/current-state/README.md) already exists.

Reported to Desktop, approved Option A: backfill 12 clean, defer 3 README to followup.

Backfill execution

BEGIN;
UPDATE public.knowledge_documents
SET source_id = 'agentdata:' || file_path, date_updated = NOW()
WHERE is_current_version=true AND source_id IS NULL AND file_path IS NOT NULL
  AND id NOT IN (298, 299, 301);
SELECT COUNT(*) AS still_null FROM public.knowledge_documents
WHERE is_current_version=true AND source_id IS NULL;
COMMIT;

Output:

BEGIN
UPDATE 12
 still_null
------------
          3
COMMIT

Verify post-commit: 12 backfilled rows all have date_updated=2026-04-09 09:21:36.07504 and source_id format agentdata:<file_path>. 3 README NULL rows preserved as planned.

VIỆC 3 — Fix _db_conn() commit bug

Code diff

File: /opt/incomex/docker/agent-data-repo/agent_data/directus_sync.py lines 85-97

 @contextmanager
 def _db_conn():
     conn = psycopg2.connect(_db_dsn())
     conn.autocommit = False
     try:
         yield conn
+        conn.commit()
     except Exception:
         conn.rollback()
         raise
     finally:
         conn.close()

Build evidence

  • py_compile OK
  • Backup: /root/backup/directus_sync.py.20260409T092227Z
  • docker build -t agent-data-local:latest . → exit 0, image manifest bfe092449032e23ee9ad091f5a327e11b29f4b5d9757aa9c40cace5a588dd5b7
  • docker compose up -d --force-recreate agent-data → Started
  • docker inspect ... .State.Health.Statushealthy
  • /api/healthstatus=healthy, services={qdrant: ok, postgres: ok, openai: ok}

VIỆC 4 — Smoke test rerun

Test doc

knowledge/current-state/reports/agent-data-connectivity-check-gpt-2026-03-31.md

  • id=964 (NOT in 15 NULL list, NOT assembly-step1)
  • Both Agent Data (rev=1) and Directus exist
  • source_id matches: agentdata:knowledge/current-state/reports/...

Before update

id=964 | version_number=1 | date_updated=NULL | content_len=1356 (Directus) / 1356 (Agent Data)
is_current_version=t

Update via PUT /documents/{doc_id} (correct schema: document_id + patch + update_mask)

PUT /api/documents/.../agent-data-connectivity-check-gpt-2026-03-31.md
{
  "document_id": "...",
  "patch": {"content": {"body": <orig + marker>, "mime_type": "text/markdown"}},
  "update_mask": ["content"]
}
→ {"id":"...","status":"updated","revision":2}

After update (5s wait)

id=964 | version_number=2 | date_updated=2026-04-09 09:26:55.973472 | content_len=1389
has_marker=t | is_current_version=t | current_count=1

ALL VERIFY POINTS PASS:

  • ✓ version_number tăng (1→2)
  • ✓ date_updated NOT NULL (was NULL, now 09:26:55)
  • ✓ content có marker (SMOKE-TEST-S175-MARKER-VIEC4)
  • ✓ is_current_version=true (preserved)
  • ✓ current_count=1 (no new row)

Restore

PUT same endpoint with original body (1356 bytes)
→ {"status":"updated","revision":3}

After restore:

id=964 | version_number=3 | date_updated=2026-04-09 09:28:14.72428 | content_len=1356
has_marker=f

Marker removed. Content restored. End-to-end fix verified.

Note on logs

No explicit "Directus sync" log line visible at INFO level in container output. The proof of work is in the DB (version_number tăng 2 lần, content thay đổi đúng). Logger may be at DEBUG level or pytest-style fire-and-forget tasks don't surface.

VIỆC 5 — P4 hardtest 4 kịch bản

Kịch bản 1: Update doc reflect <60s, no new row

PASS (covered by VIỆC 4 smoke test). Version went 1→2→3, content updated/restored, no duplicate row created. Latency: SQL changes visible immediately after PUT 200 OK.

Kịch bản 2: Move doc → old current=false, new current=true

N/A by design. Move endpoint deprecated by S170:

POST /documents/.../move
→ {"code":"NOT_IMPLEMENTED","message":"move_document is deprecated. Use:
  (1) upload_document to new path, (2) delete old path."}

The atomic writer's _select_rows_for_source handles same-source-id duplicates correctly (sets old current=false in first UPDATE), but cross-path move is now a 2-step operation outside writer scope.

Kịch bản 3: INSERT duplicate current → DB reject UNIQUE

PASS.

BEGIN;
INSERT INTO knowledge_documents (...)
VALUES (..., 'agentdata:knowledge/current-state/reports/agent-data-connectivity-check-gpt-2026-03-31.md');
ROLLBACK;

Output:

BEGIN
ROLLBACK
ERROR: duplicate key value violates unique constraint "idx_kd_current_source_id_unique"
DETAIL: Key (source_id)=(agentdata:...) already exists.

Partial UNIQUE (source_id) WHERE is_current_version=true AND source_id IS NOT NULL correctly enforces 1-current-row-per-source-id invariant.

Kịch bản 4: 3 README dup groups intact

PASS.

 id  |             file_path             |                  source_id                  | is_current_version
-----+-----------------------------------+---------------------------------------------+--------------------
 298 | knowledge/current-state/README.md |                                             | t   ← NULL preserved
 345 | knowledge/current-state/README.md | agentdata:knowledge/current-state/README.md | t
 299 | knowledge/current-tasks/README.md |                                             | t   ← NULL preserved
 346 | knowledge/current-tasks/README.md | agentdata:knowledge/current-tasks/README.md | t
 300 | knowledge/dev/README.md           | agentdata:knowledge/dev/README.md           | t   ← backfilled (was NULL, no conflict)
 301 | knowledge/other/README.md         |                                             | t   ← NULL preserved
 347 | knowledge/other/README.md         | agentdata:knowledge/other/README.md         | t

3 NULL-side rows (298, 299, 301) preserved untouched. knowledge/dev/README.md (300) was backfilled because no conflict pair existed (no row 348).

VIỆC 6 — Followup note

Created: knowledge/current-state/issues/s175-readme-duplicates-followup.md

  • Tag: s175-followup
  • Documents 6 rows of 3 README dup groups
  • Plan for next session cleanup
  • Recommendation to add source_id NOT NULL after cleanup

Known Limitations

  1. Move endpoint deprecated — Cross-path moves require 2 separate operations (upload to new path + delete old). The atomic writer doesn't handle this case automatically. If a doc is renamed in Agent Data, the old Directus row stays with the old source_id forever (current=true) until someone DELETEs it.

  2. 3 README NULL rows still NULL — Deferred to followup. Until then, these rows can't be enforced by the partial UNIQUE constraint (NULL excluded).

  3. No "Directus sync" INFO log — End-to-end test confirms writer works, but logger output for the listener is invisible (likely DEBUG level or fire-and-forget swallowed). Consider raising log level for sync events.

  4. Out of scope (still TODO from earlier S175):

    • limit=500 in dot-knowledge-sync-agentdata line 87 (other 10 hardcoded limit sites)
    • Event bus retry / persistent queue (no retry on container restart)
    • 18 duplicate law document rows from earlier batch sync bug (not yet deduped)

Bài Học (Lessons Learned)

1. autocommit=False context manager BẮT BUỘC có conn.commit() sau yield

Pattern @contextmanager + psycopg2 + autocommit=False là combo nguy hiểm nếu thiếu commit. PG sẽ silent rollback mọi thay đổi khi connection close. Code review checklist phải có: "any autocommit=False? grep commit?"

2. Backfill data PHẢI check conflict với UNIQUE constraint TRƯỚC khi UPDATE

Lần đầu Claude Code chạy backfill 15 rows mà không check, hit UNIQUE violation ở row đầu tiên (298), toàn bộ transaction rollback. Đúng quy trình:

  • SELECT các row cần update
  • LEFT JOIN với existing rows trên cột UNIQUE
  • Identify conflicts trước
  • Skip conflicts hoặc resolve trước khi UPDATE

Pattern an toàn:

UPDATE ... WHERE ... AND NOT EXISTS (
  SELECT 1 FROM ... WHERE conflicting_unique_key = ...
);

3. Pattern "source_id NULL" là debt từ schema cũ chưa enforce

Trước S175 P2, không có constraint trên source_id. Một số row legacy có source_id NULL (15 rows hiện tại). Sau khi dọn xong 3 README dup groups, nên thêm:

ALTER TABLE knowledge_documents ALTER COLUMN source_id SET NOT NULL;

Để chặn vĩnh viễn pattern này. Cần làm trong phiên followup, KHÔNG trong S175.

Files Modified

  • /opt/incomex/docker/agent-data-repo/agent_data/directus_sync.py (1 line added)
  • /opt/incomex/docker/.env (DIRECTUS_DB_* vars — done by Codex earlier)
  • /opt/incomex/docker/docker-compose.yml (env passthrough — done by Codex earlier)

Backup Files (Rollback Targets)

  • /root/backup/s175-knowledge_documents-20260409T072635Z.sql (P0 DB snapshot)
  • /root/backup/directus_sync.py.20260409T075910Z (pre-Codex rewrite)
  • /root/backup/directus_sync.py.20260409T092227Z (Claude Code's commit fix)

Image

Built: agent-data-local:latest Manifest list: sha256:bfe092449032e23ee9ad091f5a327e11b29f4b5d9757aa9c40cace5a588dd5b7 Container ID: incomex-agent-data (healthy)