Docker Recovery

Recover the platform when containers fail to start, networks drift, or the local Docker environment becomes unstable.

Common symptoms

  • Containers exit immediately after startup
  • Prometheus cannot reach the application target
  • Grafana loads but shows no data
  • Docker Desktop is running but services are unreachable

Step 1 — Verify Docker health

docker info
docker version

Confirm Docker Desktop is running and the daemon is available before troubleshooting the stack itself.

Step 2 — Inspect container state

docker ps -a

Look for containers that are repeatedly restarting, exited, or unhealthy.

Step 3 — Restart the compose stack

docker compose down
docker compose up -d

Step 4 — Review logs

docker compose logs --tail=100
docker compose logs prometheus
docker compose logs grafana
docker compose logs auth-service

Step 5 — Reset stale resources if needed

docker network prune
docker volume prune

Use pruning carefully. This removes unused Docker resources and may delete stale volumes.

Step 6 — Validate recovery

  • Containers are running normally
  • /metrics responds successfully
  • Prometheus target is UP
  • Grafana dashboards begin rendering data again

Related runbooks