KB-5E1E rev 2

S167H-FIX Data Quality Fix Report

5 min read Revision 2
reports167hdata-qualityscannerdieu312026-03-26

S167H-FIX — Data Quality Fix Report

Date: 2026-03-26 | Agent: Claude CLI (claude-go) PR: #636 (MERGED) | Branch: fix/s167h-data-quality Post-deploy verified: 2026-03-26T08:04 UTC

Findings Fixed

Finding #2225: _dot_origin NULL (2,079 records)

Breakdown:

Collection NULL count
universal_edges 2,040
collection_registry 32
meta_catalog 4
entity_species 3
14 other collections 0

Root cause: Records created before DOT tracking was established.

Guard: DEFAULT 'DOT:UNKNOWN' added to all 18 managed collections (verified: 18/18). Backfill: NULL -> 'LEGACY|S167H|2026-03-26' (fn_validate_dot_origin trigger-compliant). Result: 0 NULL remaining (post-deploy verified).

Finding #2226: Broken universal_edges (2,040 edges)

Root cause: Scanner false positive. A3 query only checked 5 entity tables but universal_edges references 20+ tables.

Breakdown of "broken" codes (actually valid in their own tables):

source_collection count Why scanner missed
entity_dependencies 141 table not in query
checkpoint_instances 88 table not in query
task_comments 80 uses integer IDs, no code column
workflows 80 uses integer IDs, not process_code
taxonomy 76 table not in query
taxonomy_facets 1 table not in query

Fix: Expanded A3 query from 5 to 20 entity tables + numeric ID handling (workflows id::text, task_comments id::text). Cleanup: 1 genuinely broken edge deleted (id=2897, CAT-100 -> LBL-101, S143 test residue). Result: 0 broken edges (post-deploy verified).

Edge Guard Decision

Scanner detection every 6h is sufficient. FK constraints impractical across 20+ source tables. To be revisited as tech debt when system stabilizes.

Post-Deploy Verification (run_id: s167h-post-deploy)

PG Data Checks

Check Result
_dot_origin NULL across 18 collections 0
DEFAULT guard set 18/18
Broken edge #2897 deleted true
Open system_issues 1 (watchdog)

Scanner Run

Run ID: s167h-post-deploy
Measurements: 10
PASS: 9 | FAIL: 0 | ERROR: 0
WATCHDOG: alive
Pass Rate: 100.0% (9/9)
Issues Created: 0 | Reopened: 0
Auto-Resolved: 0

Production URL Evidence (section 0-AF)

System Issues:

curl -s https://vps.incomexsaigoncorp.vn/api/registry/system-issues
{"totals":{"all":1,"critical":1,"warning":0,"info":0,"group_count":1}}

Health:

curl -s https://vps.incomexsaigoncorp.vn/api/health
{"status":"healthy","data_integrity":{"document_count":590,"vector_point_count":866,"ratio":1.47,"sync_status":"ok"}}

Key Pages:

URL Status
/knowledge/registries 200
/knowledge/registries/health 200
/knowledge/registries/species 200

Before/After (full journey S167G -> S167H-FIX)

Metric S167G Start S167H-INV S167H-FIX (post-deploy)
Open issues 6 3 1 (watchdog)
Critical 3 1 1 (watchdog)
Warning 3 2 0
Pass rate 60.0% 77.8% 100.0%
_dot_origin NULL 2,079 2,079 0
Broken edges (scanner) 2,040 2,040 0
Measurements 11 10 10

Self Check

# Question Result
1 Operating Rules read? DAT
2 Assembly Gate? DAT
3 _dot_origin breakdown documented? DAT
4 Root cause clear? DAT
5 Guard added (DEFAULT 18/18)? DAT
6 Backfill: 0 NULL remaining? DAT
7 Broken edges breakdown? DAT
8 Root cause: scanner false positive? DAT
9 Edge guard: scanner sufficient + TD? DAT
10 Cleanup: 0 broken remaining? DAT
11 CI GREEN + merged? DAT (PR #636)
12 Deploy + verify production? DAT (s167h-post-deploy)
13 Pass rate >= 90%? DAT (100%)
14 Production URL evidence? DAT

S167H-FIX DONE. Pass rate 100%. 0 warnings. Post-deploy verified.