Deployment Scripts¶

Sapari's deployment is driven by bash scripts in scripts/deployment/. The same scripts are used by GitHub Actions CD and by operators SSH'd into a server. No deploy logic lives in YAML -- everything lives in the scripts.

This page explains what each script does, when to use it, and what to expect. For a one-page quick reference of commands, see scripts/README.md.

Conventions (all scripts)¶

Run as deploy user (except setup-server.sh which is root).
Auto-detect environment from hostname pattern (*prod* / *staging*) or SAPARI_ENV override.
Read /home/deploy/sapari/.env (one file per server, not per env name).
All have --help -- the definitive per-script reference.
All use set -euo pipefail -- fail fast on unset vars or pipeline errors.
All output is color-coded when stdout is a TTY (plain text when piped).
All log deploy/rollback events to /home/deploy/sapari/deploys.log for history.

`setup-server.sh` (root, one-time)¶

Hardens a fresh Hetzner server. Run once immediately after provisioning.

What it does: 1. Sets hostname to sapari-prod or sapari-staging (prompts if not already set) 2. Creates deploy user with sudo + docker group membership (passwordless sudo for CD) 3. Copies root's SSH keys to deploy user; disables root SSH + password auth 4. Configures UFW: deny incoming except 443 (public) and 22 (from operator IP) 5. Installs unattended-upgrades and fail2ban (defense in depth) 6. Installs Docker via the official script 7. Installs sapari-docker-firewall.service — a oneshot systemd unit that locks the Docker DOCKER-USER iptables chain so :443 inbound only accepts traffic from Cloudflare egress CIDRs. UFW alone doesn't cover Docker-NAT'd ports (Docker runs its own iptables rules before UFW's INPUT chain), so this is the second layer that keeps the public IP from being reachable on :443 from anywhere outside Cloudflare. Re-applies on every boot; idempotent. 8. Generates an SSH deploy key for GitHub at /home/deploy/.ssh/github_deploy_key (operator adds the public key to the repo's Deploy Keys) 9. Configures ~/.ssh/config to use that key for github.com

Port 80 is intentionally NOT opened -- Caddy uses Cloudflare DNS-01 ACME challenge, not HTTP-01. The firewall stays minimal.

Idempotent: yes. Re-running skips already-complete steps.

Required argument: --my-ip <YOUR_IP> (restricts SSH to that IP only).

`first-deploy.sh` (deploy user, one-time)¶

Bootstraps the application on a fresh server after setup-server.sh. Assumes the operator has: 1. Cloned the repo to /home/deploy/sapari 2. Created /home/deploy/sapari/.env with production values (copy from backend/.env.production.example) 3. Set permissions: chmod 600 .env

What it does: 1. Validates preconditions (deploy user, repo clone, env file with secure permissions, Docker installed) 2. Pulls all images from GHCR (backend, Caddy build context, Redis, RabbitMQ) 3. Runs alembic upgrade head via the migrate service 4. Runs seed_all.py (creates tiers, admin user, Stripe products -- idempotent) 5. Starts all services with docker compose up -d 6. Waits up to 60s for all healthchecks to pass 7. Runs health.sh to verify endpoints respond

Exit codes: 0=success; 1=precondition failed; 2=image pull; 3=migration; 4=seed; 5=startup; 6=health check.

When it's done, the operator still has to: configure DNS (point backend-<env>.sapari.io to the server's IP with a gray cloud), wait for Caddy to obtain a Let's Encrypt cert, and set up the Cloudflare Worker + Pages binding + Access policy.

`deploy.sh` (deploy user, on every deploy)¶

Standard deploy. CD calls this via SSH; operators call it for hotfixes.

Flow (critical ordering): 1. git fetch && git reset --hard origin/<branch> -- server's repo is a deploy artifact; local edits blow away 2. Capture the currently-running image SHA (for the deploy log + rollback hint) 3. docker compose pull -- pull the new image (via the floating staging or production tag) 4. docker compose run --rm migrate -- run migrations BEFORE starting new containers 5. docker compose up -d -- recreate containers using the new image 6. Wait up to 30s for healthchecks 7. health.sh --quiet to verify end-to-end 8. Log the deploy event

Critical safety property: If the migration fails, the old code keeps running on the old schema. New code never starts. This is the point of ordering migration before container restart -- the old code on old schema is always a valid state; new code on old schema is not.

If step 7 (health) fails, the script prints the previous SHA and recommends rollback.sh <previous-sha>. The migration is NOT auto-rolled back -- operator decides.

Exit codes: 0=healthy; 1=precondition; 2=image pull; 3=migration (services NOT restarted); 4=startup; 5=health check.

Optional argument: --tag <sha> to pin a specific image instead of the floating tag.

`rollback.sh` (deploy user, emergency)¶

Rolls back to a specific image SHA when deploy.sh broke something.

Migration safety check:

Each backend image carries a LABEL sapari.alembic_head="<revision>" baked in at build time. When rolling back, the script compares this label to the live DB's current alembic version:

Same head: safe rollback (no migration was applied in the failed deploy). Proceeds normally.
Different head: script aborts with a warning. Options:
Manually alembic downgrade <old-head> first, then retry
Restore DB via Neon time travel (6h restore window on free tier)
Pass --ignore-migration-warning if you're confident the migration was backwards-compatible

Required argument: the SHA or tag to roll back to. Use health.sh --history to find recent SHAs.

Warning after success: CD will overwrite the rollback on the next push to staging / main. To make the rollback stick, revert the offending commit on the branch.

`restart.sh` (deploy user, ops)¶

Restarts a service (or all) without changing the image. Use after editing .env or for a stuck worker.

Always uses docker compose up -d, not docker compose restart. Reason: restart does not re-read env_file, so env changes are silently dropped. up -d computes a config hash and recreates the container only if something actually changed.

Default: ./scripts/deployment/restart.sh restarts all services (brief per-service downtime during recreation)
Granular: ./scripts/deployment/restart.sh <service> restarts one service
Force: ./scripts/deployment/restart.sh --force <service> recreates even if config hash unchanged (for stuck containers)
List: ./scripts/deployment/restart.sh --list prints available service names

Dependencies: docker compose up -d <service> starts the service's depends_on deps if not running, but does not touch dependents (services that depend ON it). Restarting redis does not auto-restart web, even though web depends on redis.

`health.sh` (any user, diagnostic)¶

Reads-only health check used internally by deploy/rollback/first-deploy and manually by operators.

What it checks: 1. Containers -- all expected services are running (compared against the canonical list in _lib.sh) 2. API liveness -- GET /health via docker compose exec web (the expose: port isn't host-bound) 3. API readiness -- GET /health/ready returns {"status": "healthy" | "unhealthy"}. For per-dependency breakdown, use the admin panel's Overview tab (which calls /admin/audit/health for component status), or pull host/container metrics from Beszel. 4. Queue depths -- RabbitMQ Management API on port 15672 with creds from env file. Warns if any queue > 50 messages.

Modes: - health.sh -- human-readable output (default) - health.sh --json -- machine-readable JSON (for external monitors) - health.sh --quiet -- exit code only (CD uses this) - health.sh --history -- show the last 10 deploy log entries

Exit codes: 0=all green; 1=can't reach backend at all; 2=containers missing; 3=readiness failed; 4=queue depth over threshold.

`run-task.sh` (deploy user, ad-hoc)¶

Invokes a Python task in a one-shot container (using the taskiq-scheduler service -- it has the full backend image and matching env).

Modes: - run-task.sh <script> -- runs backend/scripts/<script>.py (e.g., seed_all, seed_stripe_products, seed_trial_credits) - run-task.sh --module <path> -- runs python -m <path> for ad-hoc modules - run-task.sh --list -- lists available scripts

Use cases: backfills, manual re-seeding, testing a worker pipeline step in isolation.

Shared Library (`_lib.sh`)¶

Every script sources this. Provides:

Color-coded logging (log_info, log_ok, log_warn, log_error)
Environment detection from hostname
dc() wrapper around docker compose -f docker-compose.prod.yml --env-file .env
wait_for_healthy <seconds> -- polls docker compose ps until all services report healthy
ALL_SERVICES array (canonical service names)
log_deploy / read_deploy_history -- append-only deploy log
require_user / require_root -- user assertions

Deploy Log¶

Every deploy and rollback appends a line to /home/deploy/sapari/deploys.log:

2026-04-13T22:15:33Z deploy production sha=3ee133e exit=0 duration=24s previous=0f72072
2026-04-13T22:18:11Z rollback production sha=0f72072 exit=0 duration=15s previous=3ee133e

View with health.sh --history. Useful for answering "what SHA was running 30 minutes ago when things broke?"

Integration with GitHub Actions¶

The CD workflow joins the tailnet before SSHing, because the server's SSH port is firewalled to tailnet IPs only (100.64.0.0/10):

- uses: tailscale/github-action@v3
  with:
    oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
    oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
    tags: tag:ci
    version: latest
- uses: appleboy/ssh-action@v1
  with:
    host: ${{ secrets.SSH_HOST }}      # tailnet IP
    username: deploy
    key: ${{ secrets.SSH_KEY }}
    script: |
      cd /home/deploy/sapari
      ./scripts/deployment/deploy.sh

All complexity is in the script, version-controlled with the rest of the code.

Two deploy workflows: deploy.yml auto-triggers on workflow_run after a successful Build on the staging branch. deploy-production.yml is workflow_dispatch only -- requires typed YES confirm input and optionally pins a specific image SHA. (Private repos without GitHub Team can't use required-reviewer environment gates; the typed-confirm is the deliberate-action substitute.)

Rollback works the same way -- rollback.yml is workflow_dispatch, takes env + SHA, validates image in GHCR, then runs rollback.sh <sha>.

Key Files¶

File	Purpose
`scripts/deployment/_lib.sh`	Shared helpers, sourced by all scripts
`scripts/deployment/setup-server.sh`	One-time server hardening
`scripts/deployment/first-deploy.sh`	One-time app bootstrap
`scripts/deployment/deploy.sh`	Standard deploy
`scripts/deployment/rollback.sh`	Roll back to a specific image SHA
`scripts/deployment/restart.sh`	Restart services without changing image
`scripts/deployment/health.sh`	Diagnostic check
`scripts/deployment/run-task.sh`	Manual one-off Python task
`scripts/README.md`	One-page operator cheatsheet

← Deployment Migrations →

Deployment Scripts¶

Conventions (all scripts)¶

setup-server.sh (root, one-time)¶

first-deploy.sh (deploy user, one-time)¶

deploy.sh (deploy user, on every deploy)¶

rollback.sh (deploy user, emergency)¶

restart.sh (deploy user, ops)¶

health.sh (any user, diagnostic)¶

run-task.sh (deploy user, ad-hoc)¶

Shared Library (_lib.sh)¶