Skip to content

Sapari Infrastructure Architecture

Context

Sapari is an AI-powered video editing platform that automates post-production for content creators: silence removal, false start detection, profanity filtering, asset compositing, and caption generation. Users interact with a browser-based editor (React 19) backed by a FastAPI API, with heavy processing handled by background workers consuming from RabbitMQ priority queues via TaskIQ.

The infrastructure progresses through versions as load grows. v1.0 is the launch architecture optimized for cost. v2.0 is the endgame with GPU acceleration and Kubernetes autoscaling. Intermediate versions split the workload incrementally.

Current version: v1.0 (launch)


Design Principles

  1. Scale vertically as long as possible. A bigger box is operationally cheaper than a cluster.
  2. No hard lock-in. Every external service has a migration path. Workers are Docker containers that only need queue access.
  3. Colocate compute and I/O. Workers, Redis, and temporary video storage live on the same Hetzner machine (or same datacenter at v1.2+) for low latency.
  4. Minimize ops surface. Plain Docker Compose at v1.x. Move to k3s only when horizontal scaling forces it.
  5. Defer expensive infrastructure. GPU server (€184/mo) is justified only when transcription costs on the OpenAI API exceed the GPU's amortized cost.

Version Progression

Version Topology Cost/mo Trigger to upgrade
v1.0 Single CCX23 (everything) + CX33 staging + Cloudflare edge ~€39 Launch state
v1.1 Vertical scale -- CCX33 or CCX43 ~€70-130 RAM consistently >75% during peak
v1.2 Horizontal split -- API on one box, workers on another, WireGuard between ~€80-140 Concurrent renders regularly queueing
v1.3 Managed data plane -- Upstash Redis + CloudAMQP RabbitMQ ~€100-160 Self-hosted Redis/RabbitMQ becomes ops burden
v2.0 k3s cluster, GEX44 GPU, KEDA autoscaling, NVENC, local Whisper ~€225 + variable Worker box capacity exceeded OR transcription cost > $200/mo

Each version is a fully working state. You can stay at any version indefinitely.


v1.0: Launch Architecture (current)

Overview

graph TB
    subgraph Cloudflare
        CF_PAGES["Cloudflare Pages<br/><i>Frontend (React 19) + Landing (Astro)</i>"]
        CF_R2["R2 Storage<br/><i>3 buckets: raw, exports, assets</i>"]
        CF_DNS["DNS + CDN"]
        CF_WORKER["Workers<br/><i>/api/* proxy to backend</i>"]
        CF_ACCESS["Access<br/><i>Staging gate (GitHub OAuth)</i>"]
    end

    subgraph Neon
        POSTGRES_PROD[("PostgreSQL<br/><i>sapari-production project</i>")]
        POSTGRES_STG[("PostgreSQL<br/><i>sapari-staging project</i>")]
    end

    subgraph "Production Server (Hetzner CCX23, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East))"
        CADDY_P["Caddy 2 (TLS via DNS-01)"]
        API_P["FastAPI Backend"]
        WORKERS_P["6 Workers + Scheduler<br/><i>analysis, render, download, proxy, asset-edit, email</i>"]
        REDIS_P[("Redis 7<br/><i>cache + sessions + pub/sub</i>")]
        RMQ_P[("RabbitMQ 3<br/><i>task broker + management UI</i>")]
    end

    subgraph "Staging Server (Hetzner CX33, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East))"
        CADDY_S["Caddy 2"]
        API_S["FastAPI Backend"]
        WORKERS_S["6 Workers + Scheduler"]
        REDIS_S[("Redis 7")]
        RMQ_S[("RabbitMQ 3")]
    end

    %% User flows -- production
    USER["User Browser"] -- "HTTPS" --> CF_DNS
    CF_DNS --> CF_PAGES
    CF_DNS --> CF_WORKER
    CF_WORKER -- "/api/*" --> CADDY_P
    CF_WORKER -- "static" --> CF_PAGES
    CADDY_P --> API_P
    API_P --> POSTGRES_PROD
    API_P -- "presigned" --> CF_R2
    API_P --> REDIS_P
    API_P --> RMQ_P
    RMQ_P --> WORKERS_P
    WORKERS_P --> POSTGRES_PROD
    WORKERS_P --> CF_R2
    WORKERS_P --> REDIS_P

    %% Staging gated by Access
    DEV["Developer"] -- "GitHub login" --> CF_ACCESS
    CF_ACCESS --> CADDY_S
    CADDY_S --> API_S

Physical Layout

Machine Spec Cost Role
Hetzner CCX23 (production) 4 vCPU dedicated, 16 GB RAM, 160 GB SSD, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) €31.99/mo + 20% backups (€6.40) Backend API + 6 workers + scheduler + Redis + RabbitMQ + Caddy
Hetzner CX33 (staging) 4 vCPU shared, 8 GB RAM, 80 GB SSD, HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) €6.99/mo + 20% backups (€1.40) Same stack as production

Managed Services

Service Provider Plan Purpose
Postgres (production) Neon Free tier (separate project) 100 CU-hrs/mo, 0.5 GB storage, 6h restore
Postgres (staging) Neon Free tier (separate project) Same as above, isolated
Object storage Cloudflare R2 Free tier (10 GB) 3 buckets per env, free egress
DNS + CDN + Pages + Workers + Access Cloudflare Free tier Frontend hosting, API proxy, staging gate
Payments Stripe Pay-as-you-go Subscriptions, credit billing
Email Postmark Pay-as-you-go Transactional email
Observability Logfire Free tier Structured logs, OpenTelemetry traces

Networking

graph LR
    USER["User Browser"]

    subgraph "Cloudflare Edge"
        WORKER["Worker proxy"]
        PAGES["Pages (static assets)"]
    end

    subgraph "Hetzner Server (Production)"
        CADDY["Caddy<br/>api.sapari.io<br/>(TLS via DNS-01)"]
        BACKEND["Backend container"]
    end

    USER -- "https://app.sapari.io" --> WORKER
    WORKER -- "/api/* (HTTPS)" --> CADDY
    WORKER -- "static" --> PAGES
    CADDY --> BACKEND

Why Cloudflare Worker proxy: Frontend hardcodes relative /api/v1 path (shared/api/client.ts:7) and cookies use SameSite=Strict in production (auth/session/manager.py:18,523). Cross-origin would require code changes + weakening CSRF posture. The Worker makes frontend and backend appear same-origin to the browser. Backend cookies are host-only (no domain= set, verified manager.py:526-544), so they bind naturally to the Worker's domain.

Why DNS-only (gray cloud) on api.sapari.io: Browser never sees this domain. Worker fetches it directly. Adding Cloudflare proxy would double-proxy (CF -> CF -> Caddy -> Backend), adding latency.

TLS strategy: Caddy uses Cloudflare DNS-01 ACME challenge (no port 80 exposed). UFW only allows 443 publicly + 22 from operator IP.

Stack on Each Server

Single Docker Compose file (docker-compose.prod.yml), shared between staging and production with different env files:

Container Image Purpose Memory limit
caddy caddy:2-alpine TLS + reverse proxy 128m
backend ghcr.io/sapari-backend (target: prod) FastAPI single worker 512m
analysis-worker same image Whisper API + LLM 1g
render-worker same image FFmpeg libx264 3g
download-worker same image yt-dlp + audio extraction + waveform + thumbnail 1g
proxy-worker same image FFmpeg H.264 480p re-encode for non-web-compatible clips 2g (2 vCPUs)
asset-edit-worker same image Image operations 512m
email-worker same image Postmark API 512m
scheduler same image TaskIQ cron 256m
redis redis:7-alpine Cache + sessions + pub/sub 256m
rabbitmq rabbitmq/Dockerfile (rabbitmq:3.13.7-management-alpine + delayed-message-exchange plugin) Broker + management UI 512m

Single image, multiple commands: All backend containers run from the same Docker image. They differ only in the command (fastapi run vs taskiq worker analysis_broker etc.). Trade-off: code changes affect all containers, but build time stays low and operations stay simple. See INFRASTRUCTURE_PROVISIONING_PLAN.md for detail.

Deployment

GitHub Actions builds + pushes Docker images to GHCR on push to staging or main. Server SSH is firewalled to the tailnet only, so deploy workflows run tailscale/github-action@v3 to join the tailnet before the appleboy/ssh-action step, then execute ./scripts/deployment/deploy.sh on the server. Cloudflare Pages auto-deploys frontend + landing on the same push.

CD branch flow: - staging branch -> Build -> auto-deploy to staging server (+ staging Pages) - main branch -> Build only. Production deploys are manual via Actions -> Deploy Production -> Run workflow with typed YES confirmation. Landing Pages still auto-deploys from main.

Production uses a manual trigger (workflow_dispatch + typed confirm) rather than a required-reviewer gate, because private repos without GitHub Team can't use reviewer or wait-timer environment protection rules. The typed confirm is the deliberate-action equivalent.

Deploy = stop-then-start per service. Brief downtime per service (seconds) during docker compose up -d. React Query retries + maintenance screen handle it.

Backups

Data Mechanism Cost
Postgres Neon built-in 6h time travel Free
R2 buckets Cloudflare's built-in durability + application-level protection (soft deletes, IntegrityError handling on ClipFile, reconcile cron). R2 doesn't currently support S3-style versioning. None (built-in)
Server data (Redis, RabbitMQ, Caddy, .env files) Hetzner automated backups (toggle at provisioning) 20% surcharge per server
.env secrets Stored in 1Password / Bitwarden as you create them Free

No backup scripts in v1.0 -- everything is a managed-service toggle.

Cost Summary (v1.0)

Item Provider Cost
CCX23 production server Hetzner €31.99
Hetzner backups (production) Hetzner €6.40
CX33 staging server Hetzner €6.99
Hetzner backups (staging) Hetzner €1.40
Postgres (both projects) Neon Free
R2 storage Cloudflare Free
Cloudflare Pages + Workers + Access + DNS Cloudflare Free
Logfire Logfire Free
Total monthly baseline €46.78 / ~$50

Variable costs: - LLM API (DeepSeek + OpenAI Whisper + GPT-5.x for analysis steps) -- per-token - Postmark email -- $1.25 per 10K emails - Stripe -- 2.9% + 30¢ per transaction (standard) - R2 beyond 10 GB storage -- \(0.015/GB/mo - Neon Launch upgrade if 100 CU-hrs/mo exceeded -- ~\)20-30/mo per project


v1.1: Vertical Scale

Trigger: Production CCX23 RAM consistently >75% during peak hours, or render queue regularly >2 deep.

Change: Resize the Hetzner box in-place (1-2 minutes downtime during reboot). Same architecture, more resources.

Upgrade Spec Cost/mo
CCX23 -> CCX33 8 vCPU dedicated, 32 GB RAM €62.99 + €12.60 backups
CCX23 -> CCX43 16 vCPU dedicated, 64 GB RAM €125.49 + €25.10 backups

What changes: - Increase TASKIQ_WORKER_CONCURRENCY from 1 to 2 or 3 in .env.production (more concurrent renders) - Increase mem_limit on workers in docker-compose.prod.yml (more headroom) - Optionally raise FFMPEG_THREADS if CPU is consistently underutilized

What doesn't change: Domain layout, Cloudflare config, services, deployment process. Just a bigger box.


v1.2: Horizontal Split (API + Workers)

Trigger: Even at CCX43 (€125/mo), workers and API contend for resources. Worker OOM is starting to affect API responsiveness.

Change: Move workers to a dedicated server. API stays on its own. WireGuard tunnel between them for private Redis + RabbitMQ access.

graph TB
    subgraph "API Server (CCX23)"
        CADDY_API["Caddy"]
        API["FastAPI"]
    end

    subgraph "Worker Server (CCX33 or larger)"
        WORKERS["6 workers + scheduler"]
        REDIS[("Redis")]
        RMQ[("RabbitMQ")]
    end

    USER -- "https://app.sapari.io" --> CF["Cloudflare Worker proxy"]
    CF --> CADDY_API
    CADDY_API --> API
    API -- "WireGuard 10.0.0.0/24" --> REDIS
    API -- "WireGuard" --> RMQ
    WORKERS --> REDIS
    WORKERS --> RMQ
    API --> NEON[("Neon Postgres")]
    WORKERS --> NEON
    WORKERS --> R2[("R2")]
    API --> R2

Cost example: CCX23 API (€32) + CCX33 workers (€63) + backups (€19) = ~€114/mo + same managed services.

What changes: - Provision second Hetzner box, set up WireGuard between API box and worker box - API box .env.production points Redis/RabbitMQ to 10.0.0.2 (worker box's WireGuard IP) - Workers move to the new box, API stays on the old one - Backup Redis + RabbitMQ data on the worker box (Hetzner backups still cover both) - DNS unchanged (api.sapari.io still points to API box)

What doesn't change: Cloudflare layer, Neon, R2, deployment scripts (just deploy to two servers instead of one).


v1.3: Managed Data Plane

Trigger: Self-hosting Redis and RabbitMQ becomes operational burden. RabbitMQ broker config drifts, Redis memory pressure causes evictions, you've spent more time tuning them than the value they provide.

Change: Replace self-hosted Redis with Upstash, RabbitMQ with CloudAMQP. WireGuard tunnel goes away (back to public TLS for everything).

Service Provider Free tier Paid tier
Redis Upstash 256 MB, 500K commands/day Pay-per-request beyond
RabbitMQ CloudAMQP 1M messages/mo $19/mo for Cluster Standard

What changes: - Both servers connect to managed Redis/RabbitMQ over public TLS - Server budget shrinks (no need for the worker box's RAM to host Redis/RabbitMQ) - WireGuard goes away - Backup story simplifies (managed services back themselves up)

What doesn't change: Cloudflare layer, Neon, R2, application code (env vars point at new endpoints).


v2.0: GPU + Kubernetes

Trigger: EITHER (a) worker box capacity exceeded even with managed data plane, OR (b) OpenAI Whisper API costs exceed ~$200/mo (~33,000 minutes of audio at $0.006/min).

Change: Full architecture from the original infrastructure plan. GEX44 GPU server runs analysis (local Whisper) + render (NVENC) + asset-edit. CCX13 control plane runs k3s + remaining workers + managed RMQ/Redis (or keep managed). KEDA autoscales render workers per queue depth. Burst CCX13 nodes spin up on demand.

v2.0 Architecture (full)

This was the original infrastructure plan. Preserving it as the v2.0 endgame:

Physical Infrastructure

All Hetzner resources in HIL (Hillsboro, US-West) or ASH (Ashburn, US-East) datacenter, vSwitch private network (10.0.0.0/24).

Machine Spec Role Cost Always on?
GEX44 i5-13500 (14 cores), RTX 4000 20GB GPU, 64GB RAM, 2×1.92TB NVMe Analysis + render + asset-edit workers, local Whisper €184/mo + €264 setup Yes
CCX13 #1 2 vCPU, 8GB RAM, 80GB SSD k3s control plane, RabbitMQ, Redis, download + email workers $13.49/mo Yes
CCX13 #2 2 vCPU, 8GB RAM, 80GB SSD k3s agent, overflow render workers $13.49/mo On-demand (autoscaled)

k3s Cluster

graph TB
    SERVER["k3s Server (CCX13 #1)<br/>Control plane + etcd"]
    AGENT1["k3s Agent (GEX44)<br/>Labels: gpu=true, role=worker"]
    AGENT2["k3s Agent (CCX13 #2)<br/>Labels: role=burst-render"]

    SERVER --> AGENT1
    SERVER --> AGENT2

KEDA Autoscaling

  • Render worker: 0-6 replicas, 1 pod per pending task in queue, 5-min cooldown
  • Analysis worker: 1-2 replicas (GPU shared via semaphore), scales when queue >3
  • Asset-edit worker: 0-3 replicas, 2 tasks per pod, 2-min cooldown

When pending pods exceed GEX44 capacity, cluster-autoscaler provisions a CCX13 burst node (~2-3 minute provisioning + k3s join time).

GPU Acceleration

The RTX 4000 has independent NVENC/NVDEC and CUDA engines, enabling concurrent Whisper transcription (CUDA) and FFmpeg rendering (NVENC). Render workers detect GPU at startup and use h264_nvenc; CPU-only nodes (burst) fall back to libx264.

Pluggable Transcription

TranscriptionRouter selects backends based on availability and cost: - LocalWhisperBackend (GPU): preferred, ~€0/min amortized, 1 concurrent (semaphore) - OpenAIWhisperBackend: overflow when GPU busy, $0.006/min, 5 concurrent - GroqWhisperBackend (future): $0.003/min if added

Networking

WireGuard tunnel from API host (still on Hetzner or moved to Render/FastAPI Cloud) to control plane. Internal Hetzner traffic uses vSwitch.

Migration from v1.x to v2.0

This is a substantial migration, not a toggle. Major steps: 1. Provision GEX44 + CCX13 #1 2. Install k3s, NVIDIA drivers, container toolkit 3. Deploy backend image + workers as k8s manifests (existing Dockerfile works) 4. Wire KEDA against existing RabbitMQ 5. DNS cutover for api.sapari.io (or split into separate subdomains per worker pool) 6. Decommission v1.x server

Not in scope until v2.0 trigger fires. Documented as the endgame, not the next step.


Observability (all versions)

  • Logfire (free tier) -- structured logs + OpenTelemetry traces. FastAPI, SQLAlchemy, Redis, Pydantic AI all instrumented.
  • System Health admin page -- in-app dashboard for component status, queue depths, server resources. Auto-refreshes every 10s.
  • RabbitMQ Management UI (port 15672 via SSH tunnel) -- queue depth, message rates, per-priority breakdown.
  • Hetzner Cloud console -- CPU, RAM, disk graphs.
  • Sentry (frontend only at v1.0) -- backend Sentry is a follow-up.

Migration Paths Summary

Component v1.x escape v2.0 escape
Server hosting (Hetzner) Resize box Add GEX44 as k3s agent
Postgres (Neon) Neon Launch -> Scale Self-hosted on Hetzner
Object storage (R2) (no escape needed -- free egress) Same
API hosting Stay self-hosted, OR move to FastAPI Cloud / Render Same
Workers Single box -> two boxes -> k3s cluster k3s with KEDA
Redis Self-hosted -> Upstash Either
RabbitMQ Self-hosted -> CloudAMQP Either

Open Questions

  1. When to split API + workers (v1.0 -> v1.2)? Concrete trigger: 2+ concurrent renders queueing for >5 min during peak. Vague "feels slow" doesn't qualify.

  2. When to add GPU (v1.x -> v2.0)? Math-based: when monthly OpenAI Whisper bill > Hetzner GEX44 cost (€184). At $0.006/min, that's ~30,500 minutes of audio per month -- equivalent to ~500 medium-length videos.

  3. When to move backend off Hetzner self-host? Likely never. FastAPI Cloud might tempt, but custom domain support and pricing are still maturing.

  4. NVENC quality vs CPU libx264? NVENC at preset p5 is faster but slightly lower quality. CPU libx264 produces smaller files at the same quality. v2.0 could offer "high quality" exports routed to CPU burst nodes.