KB-6AF1

9000x-onboarding · 05 — Healthcheck SQL fix (vector_boundary false-positive)

3 min read Revision 1
iu-corev0.69000xhealthcheckvector-boundarysql-fix

9000x — Healthcheck vector_boundary SQL fix

Defect surfaced by live onboarding

Pre-9000x, iu_vector_sync_point held 60 unique IUs across 61 indexed points — just one multi-chunk IU (d3ad5874-… with 2 chunks). The healthcheck SQL was written to tolerate exactly that single multi-chunk:

-- before (cutter_agent/iu_core/healthcheck.py @ vector_boundary surface):
SELECT count(*) AS sync_point_rows,
       count(DISTINCT unit_id) AS unique_units,
       (count(DISTINCT unit_id) = count(*)
        OR count(*) - count(DISTINCT unit_id) <= 1) AS per_iu_boundary_ok
  FROM public.iu_vector_sync_point WHERE sync_status='indexed';

After 9000x onboarding the table has 141 unique IUs across 149 indexed points — 8 multi-chunk IUs (1 KT-B + 7 from DIEU-35). The old check returned false because 149 - 141 = 8 > 1, and the Mac cron at 11:10:00Z flipped vector_boundary.ok=false with reason per-IU boundary breach — a false positive.

Root semantic

The DB-level CHECK iu_vector_sync_point_boundary_chk forbids duplicate (unit_id, chunk_index) AND requires consistent chunk_count across the rows of one IU. Multi-chunk IUs are FINE as long as their rows have distinct chunk_index. The pre-9000x SQL was a stand-in for that invariant; it was correct only while the corpus had at most one multi-chunk IU.

Fix applied

-- after:
SELECT count(*) AS sync_point_rows,
       count(DISTINCT unit_id) AS unique_units,
       NOT EXISTS (
         SELECT 1 FROM public.iu_vector_sync_point
          WHERE sync_status='indexed' AND unit_id IS NOT NULL
          GROUP BY unit_id, chunk_index
         HAVING count(*) > 1
       ) AS per_iu_boundary_ok
  FROM public.iu_vector_sync_point WHERE sync_status='indexed';

This surfaces the EXACT same invariant as the DB CHECK with a single GROUP BY scan. Live run returns 149|141|t — GREEN.

Verification

  • All 7 healthcheck surfaces GREEN after the fix (overall_ok=True):
three_axis_cache:     in_sync (table=163, view=163)
directus_collection:  163 rows / 1 read-permission
qdrant_collection:    iu_core_iu_chunks (149 indexed)
auto_refresh_trigger: gate=false fires_24h=1
vector_boundary:      149 pts / 141 unique  (per_iu_boundary_ok=true)
write_gates:          all 6 inert
operator_runtime:     open_runs=0 failed_24h=0 active_leases=0
  • All 10 tests in tests/test_iu_core_5000x_healthcheck.py still pass (they use a fake fixture per_iu_boundary_ok=True/False, not the SQL itself).
  • Mac cron will pick up the fix on the next */10 fire (the wrapper invokes cutter_agent/iu_core/healthcheck.py directly from the repo).

Lesson

Healthcheck SQL that approximates an invariant by row arithmetic is brittle. When the DB layer already enforces the invariant via a CHECK, the healthcheck should surface the SAME predicate (GROUP BY + HAVING) so the operator sees what the DB sees — not a heuristic that holds only for a transient corpus shape.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-9000x-qdrant-onboarding-piece-platform-open-goal/05-healthcheck-boundary-sql-fix.md