KB-6ABC

recovery-runbook.md

5 min read Revision 1

VPS Recovery Runbook

Last updated: 2026-02-28 | Status: Active

Overview

This runbook covers recovery procedures for the INCOMEX VPS infrastructure (38.242.240.89). All services run as Docker containers managed by docker-compose.

Service Architecture

Service Container Port Health Check
MySQL 8.0 incomex-mysql 3306 (internal) mysqladmin ping
Qdrant incomex-qdrant 6333 (internal) TCP check
Directus 11.5 incomex-directus 8055 (internal) /server/info
Agent Data incomex-agent-data 8000 (internal) /info
Nuxt SSR incomex-nuxt 3000 (internal) TCP 3000
Nginx incomex-nginx 80, 443 N/A
Uptime Kuma uptime-kuma 3001 Built-in

Scenario 1: Single Container Crash

Symptoms: One service down, others healthy

Recovery:

ssh -i ~/.ssh/contabo_vps root@38.242.240.89
cd /opt/incomex/docker
docker compose up -d <service-name>
docker ps --format "table {{.Names}}\t{{.Status}}"

Verification: Run /opt/incomex/scripts/test-mcp-connectivity.sh

Scenario 2: Full Stack Down

Symptoms: All services unreachable, VPS accessible via SSH

Recovery:

ssh -i ~/.ssh/contabo_vps root@38.242.240.89
cd /opt/incomex/docker
docker compose down
docker compose up -d
sleep 90  # Wait for all health checks
docker ps --format "table {{.Names}}\t{{.Status}}"
/opt/incomex/scripts/test-mcp-connectivity.sh

Post-recovery checks:

  1. MCP connectivity: /opt/incomex/scripts/test-mcp-connectivity.sh
  2. Config integrity: /opt/incomex/scripts/check-config-integrity.sh
  3. Event system listeners >= 1
  4. Uptime Kuma dashboard: http://38.242.240.89:3001

Scenario 3: Agent Data Deployment

Standard deploy after PR merge:

ssh -i ~/.ssh/contabo_vps root@38.242.240.89
cd /opt/incomex/docker
docker compose pull agent-data
docker compose up -d agent-data
sleep 45  # Startup + warm-up
curl -sf https://vps.incomexsaigoncorp.vn/api/health

Rollback (if new image fails):

docker images --format "{{.Repository}}:{{.Tag}} {{.CreatedAt}}" | grep agent-data
# Use previous image tag in docker-compose override

Scenario 4: Database Recovery

MySQL (Directus data)

Backups at /opt/incomex/backups/mysql/ (daily 2AM, 7-day retention)

docker exec -i incomex-mysql mysql -u root -p<password> <db> < backup-file.sql
docker restart incomex-directus

Qdrant (Vector DB)

Backups at /opt/incomex/backups/qdrant/ (daily 3AM, 7-day retention)

docker compose stop qdrant
cp -r /opt/incomex/backups/qdrant/latest/* /opt/incomex/docker/qdrant/data/
docker compose up -d qdrant
curl -X POST https://vps.incomexsaigoncorp.vn/api/kb/reindex -H "X-API-Key: $API_KEY"

Scenario 5: SSL Certificate Issue

certbot certificates
certbot renew --force-renewal
docker restart incomex-nginx

Scenario 6: Disk Full

df -h /
/opt/incomex/scripts/disk-monitor.sh  # Auto-prunes at 85%
docker system prune -f

Scenario 7: MCP Transport Failure

Diagnosis: /opt/incomex/scripts/test-mcp-connectivity.sh

Common fixes:

  1. Nginx config corrupt: docker restart incomex-nginx
  2. Agent Data unresponsive: docker restart incomex-agent-data (wait 45s)
  3. DNS issue: dig vps.incomexsaigoncorp.vn
  4. API key mismatch: grep AGENT_DATA_API_KEY /opt/incomex/docker/.env

Scenario 8: Directus Sync Not Working

Diagnosis: Check event system listeners count via /info endpoint

Fix (if listeners=0):

  • Verify DIRECTUS_ADMIN_TOKEN and DIRECTUS_URL in docker-compose.yml
  • Verify .env has DIRECTUS_ADMIN_TOKEN value
  • Recreate: docker compose up -d agent-data

Key Files

Path Purpose
/opt/incomex/docker/docker-compose.yml Stack definition
/opt/incomex/docker/.env Environment variables (secrets)
/opt/incomex/scripts/ Monitoring and backup scripts
/opt/incomex/backups/ MySQL and Qdrant backups
/opt/incomex/.checksums/ Config integrity baselines
/opt/incomex/.uptime-kuma-admin-pass Uptime Kuma admin password

Monitoring

  • Uptime Kuma: 4 monitors (MCP, Agent Data, Directus, OPS Proxy) at http://38.242.240.89:3001
  • Cron MCP test: Every 5 min -> /var/log/mcp-health.log
  • Cron config check: Hourly -> /var/log/config-integrity.log
  • Backups: MySQL daily 2AM, Qdrant daily 3AM, 7-day retention
  • Docker logs: max 50MB x 3 files per container