Deployment & Maintenance

Using Claude Code? Ask it to “deploy Galaxy to Docker Desktop” or “deploy Galaxy to Lima” and it will run through the setup steps below automatically. The repo includes a CLAUDE.md with deployment instructions that Claude Code follows.

Environments

Environment Platform Namespace Overlay Browser Ports Images
Local Dev Docker Desktop galaxy-dev k8s/overlays/local-dev 30000 (web) Local builds (galaxy-*)
Staging Docker Desktop galaxy-staging k8s/overlays/staging 31000 (web) GHCR (ghcr.io/erikevenson/galaxy-*)
Lima Lima VM + k3s galaxy-lima k8s/overlays/lima 31000 (web) GHCR (ghcr.io/erikevenson/galaxy-*)

Local Dev and Staging can run simultaneously on Docker Desktop using different namespaces and ports. Staging and Lima share ports (31000–31002) so only one can run at a time.

Local Dev (Docker Desktop)

Prerequisites

  • Docker Desktop with Kubernetes enabled
  • kubectl configured for your cluster
  • mkcert installed for TLS certificate generation

First-Time Deployment

  1. Generate TLS certificates:
    ./scripts/setup-tls.sh galaxy-dev
    
  2. Create secrets:
    ./scripts/create-secrets.sh galaxy-dev
    

    Save the output — it shows the generated admin password and JWT secret.

  3. Build all service images:
    ./scripts/build-images.sh
    
  4. Deploy to Kubernetes:
    ./scripts/deploy-k8s.sh galaxy-dev
    
  5. Run database migrations:
    kubectl apply -f k8s/base/migration-job.yaml -n galaxy-dev
    kubectl wait --for=condition=complete job/db-migration -n galaxy-dev --timeout=60s
    

Updating Services

After making code changes:

  1. Bump the version: scripts/bump-version.sh <version>
  2. Rebuild changed services: scripts/build-images.sh (or build individually)
  3. Re-deploy: scripts/deploy-k8s.sh galaxy-dev
  4. If the migration job image also changed, delete and re-run it:
    kubectl delete job db-migration -n galaxy-dev
    kubectl apply -f k8s/base/migration-job.yaml -n galaxy-dev
    kubectl wait --for=condition=complete job/db-migration -n galaxy-dev --timeout=60s
    

For stateful services (physics, tick-engine, api-gateway, players, galaxy), pause the game before redeploying to prevent position jumps. See Service Restarts below.

Service Endpoints

Service URL
Web Client https://localhost:30000
API Gateway (direct) https://localhost:30002

The web client’s nginx reverse proxy forwards /api/ and /ws requests to the API gateway, so browsers only need port 30000. Port 30002 is available for direct API access (e.g., the galaxy-admin CLI). The admin view is built into the web client — access it from the View menu.

Staging (Docker Desktop)

Staging runs alongside local dev on the same Docker Desktop cluster but pulls pre-built images from GHCR instead of using local builds. This validates that CI-built images work correctly before deploying to Lima or production.

How Images Are Delivered

Staging and Lima pull pre-built images from GitHub Container Registry (GHCR). The CI pipeline (.github/workflows/build-push.yml) automatically builds and pushes multi-arch images to ghcr.io/erikevenson/galaxy-* on every push to main.

If the GitHub repository is public, no authentication is needed. If packages are private, log in to GHCR first:

echo $GHCR_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

Prerequisites

  • Docker Desktop with Kubernetes enabled (same as local dev)
  • CI build completed (images available on GHCR)

First-Time Deployment

  1. Generate TLS certificates:
    ./scripts/setup-tls.sh galaxy-staging
    
  2. Create secrets:
    ./scripts/create-secrets.sh galaxy-staging
    

    Save the output — it shows the generated admin password and JWT secret.

  3. Deploy to Kubernetes:
    ./scripts/deploy-k8s.sh galaxy-staging
    

    Kubernetes will automatically pull images from GHCR.

  4. Run database migrations:
    kubectl apply -f k8s/base/migration-job.yaml -n galaxy-staging
    kubectl wait --for=condition=complete job/db-migration -n galaxy-staging --timeout=60s
    

Updating Services

When new code is pushed to main, CI builds and pushes updated images to GHCR. To deploy the new version:

  1. Bump the version: scripts/bump-version.sh <version> (updates all overlays)
  2. Re-apply: scripts/deploy-k8s.sh galaxy-staging
  3. If the migration job image also changed, delete and re-run it:
    kubectl delete job db-migration -n galaxy-staging
    kubectl apply -f k8s/base/migration-job.yaml -n galaxy-staging
    kubectl wait --for=condition=complete job/db-migration -n galaxy-staging --timeout=60s
    

Service Endpoints

Service URL
Web Client https://localhost:31000
API Gateway (direct) https://localhost:31002
Prometheus http://localhost:31090
Grafana http://localhost:31091

The web client’s nginx reverse proxy forwards /api/ and /ws requests to the API gateway, so browsers only need port 31000. Port 31002 is available for direct API access (e.g., the galaxy-admin CLI). The admin view is built into the web client — access it from the View menu.

Lima (k3s)

A local Lima VM running k3s validates the full cloud deployment workflow (GHCR image pulls, Kustomize overlays, non-Docker-Desktop storage) before moving to production infrastructure.

How Images Are Delivered

Same as Staging — pulls from GHCR.

Before deploying to Lima, ensure CI has completed successfully — otherwise k3s will fail to pull the images. Check the Actions tab for the latest build status.

If the GitHub repository is public, no authentication is needed for image pulls. If packages are private, configure GHCR authentication during VM provisioning by setting the GHCR_TOKEN environment variable (see specs/architecture/lima-staging.md for details).

Prerequisites

  • Lima installed (brew install lima)
  • kubectl installed
  • mkcert installed for TLS certificate generation
  • CI build completed (images available on GHCR)

VM Setup

  1. Start the Lima VM:
    limactl start lima/galaxy-staging.yaml
    
  2. Extract kubeconfig:
    scripts/lima-kubeconfig.sh
    
  3. Set KUBECONFIG (required for all subsequent kubectl and scripts/ commands):
    export KUBECONFIG=~/.kube/config-lima-galaxy
    

    Add this to your shell profile (.bashrc, .zshrc) to persist across sessions.

  4. Verify k3s is ready:
    kubectl get nodes        # Should show one Ready node
    kubectl get sc           # Should show local-path as default
    

First-Time Deployment

All commands below assume KUBECONFIG is set to ~/.kube/config-lima-galaxy.

  1. Create TLS secrets:
    scripts/setup-tls.sh galaxy-lima
    
  2. Create application secrets:
    scripts/create-secrets.sh galaxy-lima
    

    Save the output — it shows the generated admin password and JWT secret.

  3. Deploy all services (includes ConfigMaps, infrastructure, and application pods):
    scripts/deploy-k8s.sh galaxy-lima
    
  4. Run database migrations:
    kubectl apply -k k8s/overlays/lima/ -l app.kubernetes.io/name=db-migration
    kubectl wait --for=condition=complete job/db-migration -n galaxy-lima --timeout=120s
    
  5. Verify:
    kubectl get pods -n galaxy-lima          # All pods should be Running
    curl -k https://localhost:31000          # Web client
    curl -k https://localhost:31002/api/status  # API gateway status
    

Updating Services

When new code is pushed to main, CI builds and pushes updated images to GHCR. To deploy the new version:

  1. Bump the version: scripts/bump-version.sh <version> (updates all overlays including Lima)
  2. Re-apply: scripts/deploy-k8s.sh galaxy-lima
  3. If the migration job image also changed, delete and re-run it:
    kubectl delete job db-migration -n galaxy-lima
    kubectl apply -k k8s/overlays/lima/ -l app.kubernetes.io/name=db-migration
    kubectl wait --for=condition=complete job/db-migration -n galaxy-lima --timeout=120s
    

Service Endpoints

Service URL
Web Client https://localhost:31000
API Gateway (direct) https://localhost:31002
Prometheus http://localhost:31090
Grafana http://localhost:31091

The web client’s nginx reverse proxy forwards /api/ and /ws requests to the API gateway, so browsers only need port 31000. Port 31002 is available for direct API access (e.g., the galaxy-admin CLI). The admin view is built into the web client — access it from the View menu.

Port Forwarding

Lima forwards host ports to the VM’s k3s NodePorts:

Host Port Guest Port Service
16443 6443 k3s API server
31000 31000 Web client
31002 31002 API gateway
31090 31090 Prometheus
31091 31091 Grafana

The k3s API uses host port 16443 (not 6443) to avoid conflict with Docker Desktop Kubernetes.

Teardown

limactl stop galaxy-staging
limactl delete galaxy-staging   # Removes VM and disk
rm ~/.kube/config-lima-galaxy

Database Migrations

Migrations are managed with Alembic and run as a Kubernetes Job.

Running Migrations

# Apply the migration job
kubectl apply -f k8s/base/migration-job.yaml -n galaxy-dev

# Wait for completion
kubectl wait --for=condition=complete job/db-migration -n galaxy-dev --timeout=60s

# Check logs
kubectl logs job/db-migration -n galaxy-dev

Re-running After New Migrations

If you’ve added new migration files, delete the old job first:

kubectl delete job db-migration -n galaxy-dev
kubectl apply -f k8s/base/migration-job.yaml -n galaxy-dev

Service Restarts

Stateless Services (No Pause Required)

web-client serves static files via nginx. It can be restarted at any time without affecting game state.

# Build new image
docker build -t galaxy-web-client:<version> --no-cache -f Dockerfile .

# Deploy
kubectl set image deployment/web-client web-client=galaxy-web-client:<version> -n galaxy-dev
kubectl rollout status deployment/web-client -n galaxy-dev

Stateful Services (Pause First)

For physics, tick-engine, api-gateway, players, and galaxy services:

  1. Pause the game via the admin dashboard or CLI
  2. Build and deploy the updated service
  3. If physics was restarted, also restart tick-engine (it needs to reinitialize physics state)
  4. Resume the game

Failure to pause before restarting stateful services can cause position jumps or state inconsistencies.

Version Management

Always bump the version before building images:

./scripts/bump-version.sh 1.15.0

This updates version strings across all services and Kubernetes manifests. The version is baked into images at build time, so building before bumping results in stale version numbers.

The script verifies each sed replacement succeeded — if a file format changes and the pattern no longer matches, the script exits with a non-zero status and names the file that failed.

Backup and Recovery

Snapshots

Game state snapshots (via the admin dashboard or CLI) capture the complete in-game state: all ship positions, velocities, fuel levels, and game time. Use these for quick state recovery.

PostgreSQL Backups

A CronJob runs daily at 2:00 AM UTC, creating SQL dumps:

# Check backup status
kubectl get cronjob postgres-backup -n galaxy-dev

# Manual backup
kubectl create job --from=cronjob/postgres-backup manual-backup -n galaxy-dev

# View available backups
kubectl exec -n galaxy-dev postgres-0 -- ls /backup/

# Restore from backup
kubectl exec -i -n galaxy-dev postgres-0 -- psql -U galaxy -d galaxy < backup-file.sql

Backups are retained for 7 days on a 2Gi persistent volume.

Redis Persistence

Redis uses Append-Only File (AOF) persistence with everysec fsync. Data survives pod restarts. Redis stores the live game state (positions, velocities, tick counter), with a 150MB memory limit and noeviction policy.

Configuration Reference

Game Configuration (ConfigMap: galaxy-config)

Key Default Description
SERVER_NAME galaxy-dev Server instance name displayed in status bar
TICK_RATE 1.0 Ticks per second (0.1–100 Hz)
SNAPSHOT_INTERVAL 60 Auto-snapshot interval in seconds
LOG_LEVEL INFO Logging verbosity

Service Endpoints (ConfigMap: galaxy-config)

Key Default Description
TICK_ENGINE_GRPC_HOST tick-engine:50051 Tick engine gRPC address
PHYSICS_GRPC_HOST physics:50051 Physics service gRPC address
PLAYERS_GRPC_HOST players:50051 Players service gRPC address
GALAXY_GRPC_HOST galaxy:50051 Galaxy service gRPC address

Secrets (Secret: galaxy-secrets)

Key Description
JWT_SECRET_KEY JWT signing key (min 256 bits)
POSTGRES_PASSWORD Database password
ADMIN_USERNAME Bootstrap admin username
ADMIN_PASSWORD Bootstrap admin password

Resource Limits

Service Memory CPU
physics 256–512 Mi 200m–1000m
api-gateway 128–256 Mi 100m–500m
tick-engine 128–256 Mi 100m–500m
players 128–256 Mi 100m–500m
galaxy 128–256 Mi 100m–500m
web-client 32–64 Mi 10m–100m

Service Startup Order

Services use init containers to wait for their dependencies:

  1. postgres, redis — Infrastructure (no dependencies)
  2. galaxy, players — Depend on postgres
  3. physics — Depends on redis, galaxy
  4. tick-engine — Depends on redis, postgres, physics, galaxy
  5. api-gateway — Depends on postgres, redis, tick-engine, players, physics
  6. web-client — No dependencies (stateless)

Health Checks

All services expose health endpoints:

Service Liveness Readiness Port
api-gateway /health/live /health/ready 8000 (HTTPS)
tick-engine /health/live /health/ready 8001
physics /health/live /health/ready 8002
players /health/live /health/ready 8003
galaxy /health/live /health/ready 8004
web-client /health /health 8443 (HTTPS)

Back to top

Galaxy — Kubernetes-based multiplayer space game

This site uses Just the Docs, a documentation theme for Jekyll.