KB-4F36
Dieu 37 P1: Health Performance Fix Report
6 min read Revision 1
Dieu 37 P1: Health Performance Fix Report
Date: 2026-04-02
Script: dot/bin/dot-collection-health
Version: v2.0.0 -> v2.1.1
PRs: #662 (H14-H17 PG-only), #663 (Checks 1,2,11 bulk SQL)
Changes Summary
| Check | Before (v2.0.0) | After (v2.1.1) | Method |
|---|---|---|---|
| Check 1 (Tier-1 fields) | Per-collection loop (133+ SSH queries) | 1 bulk SQL | CROSS JOIN + NOT EXISTS |
| Check 2 (Tier-2 fields) | Per-collection loop | 1 bulk SQL | Same pattern |
| Check 9 (Group invalid) | Hardcoded English list (WRONG) | Cross-reference via collection_registry | Dynamic subquery |
| Check 11 (sync fields) | Per-collection loop (80+ SSH queries) | 1 bulk SQL | CROSS JOIN + NOT EXISTS |
| H14 (FK constraints) | 133+ Directus API calls (curl) |
2 SQL queries (directus_fields + information_schema) | PG-only, 0 API |
| H15 (Permissions) | N*M individual SQL (5 policies x 133 cols) | 3 bulk INSERT...WHERE NOT EXISTS | CTE with RETURNING |
| H16 (Species mapping) | Per-collection loop | 2 bulk queries | INSERT...ON CONFLICT |
| H17 (No DELETE) | Per-collection loop | 1 SELECT + 1 bulk DELETE | Retroactive all governed |
| Governance_role | Not validated | Unknown roles flagged | governed/observed/excluded recognized |
Key Design Decisions
- Assembly First (NT8+NT11): directus_fields, directus_permissions, directus_policies are PG tables. Read via SQL, never API.
- Luat Bao toan retroactive: H17 applies to ALL governed collections from collection_registry, not just 6 new tables.
- Check 9 fix: Old hardcoded list
('CMS','LOG','REGISTRY',...)was English; actual data is Vietnamese (cau_truc,cong_cu...). No SSOT table for atom_group exists. Fixed via cross-reference: atom_group is valid if its meta_catalog collection also exists in collection_registry. - governance_role validation: 3 stable roles recognized (governed, observed, excluded). Any new role = governance law change = flagged + system_issues.
Production Verification Evidence (VPS 38.242.240.89)
V4: Full health < 60s
real 0m18.910s
user 0m2.020s
sys 0m4.289s
PASS - 18.9s (was 4m37s with v2.1.0, which was already improved from v2.0.0). 14.7x improvement.
V5: system_issues logged
SELECT id, title, issue_type, source, detected_at FROM system_issues
WHERE source='dot-collection-health' ORDER BY detected_at DESC LIMIT 5;
4053|DELETE permission found: pivot_definitions|collection_onboarding_gap|dot-collection-health|2026-04-02 04:44:17.959
4052|DELETE permission found: birth_registry|collection_onboarding_gap|dot-collection-health|2026-04-02 04:44:17.758
4051|DELETE permission found: entity_species|collection_onboarding_gap|dot-collection-health|2026-04-02 04:44:17.606
4050|DELETE permission found: entity_species|collection_onboarding_gap|dot-collection-health|2026-04-02 04:44:17.414
4049|DELETE permission found: entity_species|collection_onboarding_gap|dot-collection-health|2026-04-02 04:44:17.199
PASS - system_issues logged for DELETE removals, permissions gaps, species gaps.
V6: DELETE permissions on ALL governed = 0
SELECT count(*) FROM directus_permissions dp
WHERE dp.action='delete'
AND dp.collection IN (
SELECT collection_name FROM collection_registry
WHERE governance_role='governed' AND status IS DISTINCT FROM 'retired'
);
0
PASS - Zero DELETE permissions on governed collections.
V7: Test DELETE -> health auto-removes
-- Inserted test DELETE permission on dot_tools (id=1227)
INSERT INTO directus_permissions ... VALUES ('dot_tools','delete',...) RETURNING id;
1227|dot_tools|delete
-- Ran health -> output:
[WARN] dot_tools: DELETE permission (policy: AI Agent Policy) — removing...
DELETE 1
[OK] FIXED: removed 1 DELETE permissions
[WARN] CHECK 17: 1 governed collections had DELETE permissions (removed)
-- system_issues logged:
DELETE permission found: dot_tools|2026-04-02 04:49:35.703
PASS - Self-healing confirmed: auto-removed + logged.
V8: Idempotent (second run = 0 changes)
17 checks, 6 issue(s), 0 auto-fix(es), compliance 95.9%
[OK] CHECK 14 PASS
[OK] CHECK 15 PASS
[OK] CHECK 17 PASS
PASS - 0 auto-fixes on second run. H14/H15/H17 all PASS.
V9: No hardcoded numbers
grep -n 'CHECKS=\|COUNT=\|TOTAL=' dot/bin/dot-collection-health
# Only =0 initializations and $((CHECKS+1)) increments
PASS - No magic numbers.
Known Issues (not in scope)
- Check 3: 15 active in meta_catalog but missing in PG (pre-existing)
- Check 4: 20 orphans in table_registry (pre-existing)
- Check 8: 139 field_type conflicts (pre-existing)
- Check 10: 142 NULL atom_group (pre-existing)
- Check 16: 25 governed collections with NULL species_code (data gap, not fixable by script)
Anti-patterns Documented
- AP: Directus API does NOT create PG FK constraints -> DOT must ALTER TABLE
- AP: Script using Directus API when PG table has same data = violates NT8 Assembly First
- AP: Verify must check positive (sufficient) + negative (forbidden)
- AP: New governance rules must be retroactive for existing data
- AP: Hardcoded English enum list broke when data migrated to Vietnamese