Skip to content

Data Flow

This page walks through how data moves through Sapari from upload to final export. Understanding this flow helps when debugging issues or adding new features.

The Complete Journey

A video goes through several stages in Sapari:

flowchart LR
    A[Upload] --> B[Process]
    B --> C[Analyze]
    C --> D[Review]
    D --> E[Render]
    E --> F[Download]

    style A fill:#ff3300,color:#fff
    style F fill:#ff3300,color:#fff

Each stage involves different components, but the pattern is consistent: the API receives a request, queues a background task via RabbitMQ, and publishes events via Redis pub/sub when done.

Stage 1: Upload

The upload process uses presigned URLs so clients upload directly to R2 without going through our servers. Two paths, picked by file size:

  • Single-PUT for files < 25 MiB (MULTIPART_CUTOFF_BYTES) — one presigned URL, one PUT.
  • Multipart for files ≥ 25 MiB — parallel parts (4 concurrent at 16 MiB each), per-part retry, mid-stream resume.

The dispatch happens in the frontend hook (useUploadClip / useUploadAsset); the user-facing API is the same.

Single-PUT (small files)

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant R2

    Client->>API: POST /clips/presign
    API->>DB: Create Clip + ClipFile records
    API->>R2: Generate presigned PUT URL
    API->>Client: {upload_url, content_type, clip_uuid}
    Client->>R2: PUT file bytes (direct upload)
    Client->>API: POST /clips/{uuid}/confirm
    API->>R2: HEAD object (read actual Content-Length for quota recheck)
    API->>API: Queue process_clip_artifacts

Multipart (large files, ≥ 25 MiB)

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant R2

    Client->>API: POST /clips/multipart/initiate
    API->>R2: create_multipart_upload (~100-200 ms)
    API->>R2: presign N initial part URLs
    API->>DB: Create Clip + ClipFile (status=MULTIPART_INITIATED, upload_id in metadata_json)
    API->>Client: {upload_id, parts_count, parts: [{part_number, url}, ...50]}

    par Parallel parts (capped at 4)
        Client->>R2: PUT part 1 → ETag
        Client->>R2: PUT part 2 → ETag
        Client->>R2: PUT part N → ETag
    end

    Note over Client,API: Refill: GET /multipart/parts/urls?from=51&to=100<br/>(when window drops below LOOKAHEAD = 50)
    Note over Client: ETags round-trip byte-exact (with R2's literal quotes)<br/>— stripping fails complete with InvalidPart

    Client->>API: POST /clips/multipart/complete (parts: [{part_number, etag}])
    API->>R2: complete_multipart_upload + HEAD (Convention #16 split-session, no DB held)
    API->>DB: Status MULTIPART_INITIATED → UPLOADED + storage accounting
    API->>API: Queue process_clip_artifacts

Cancel + resume: client calls POST /multipart/abort to release R2 parts AND delete the Clip + ClipFile / UserAsset + AssetFile rows; the cron sweeps any orphaned uploads past STUCK_MULTIPART_MINUTES = 180 (URL expiry + 60 min) with the same deletion semantics. There is no FAILED recovery state for an in-flight multipart upload — a zombie row with a Retry button can't actually resume because R2's abort destroys the parts. The frontend persists {upload_id, key, parts_uploaded} in IndexedDB keyed by file fingerprint so transient interruptions (page refresh, network drop) can resume via GET /multipart/parts while the backend row is still MULTIPART_INITIATED; explicit cancel clears IndexedDB.

For YouTube imports, the flow is simpler:

sequenceDiagram
    participant Client
    participant API
    participant Worker

    Client->>API: POST /clips/youtube-import {url}
    API->>API: fetch_video_info (sync, bounded wait_for) — reject if duration > cap
    API->>API: Create Clip + ClipFile records
    API->>API: Queue download_youtube_video
    API->>Client: {clip_uuid}
    Worker->>Worker: Download + process
    Worker-->>Client: ClipReadyEvent (SSE)

Stage 2: Process

The download_broker handles video processing. Proxy generation runs on a separate proxy_broker / taskiq-proxy-worker so CPU-heavy re-encodes don't block audio extraction for subsequent imports:

flowchart TB
    subgraph DW["Download Worker (download_broker)"]
        A[Pick up task] --> B[Download video]
        B --> C[Extract audio]
        C --> D[Generate waveform]
        D --> E{Web compatible?}
        E -->|Yes| F[Update ClipFile]
        E -->|No| G[Enqueue on proxy_broker]
        G --> F
        F --> H[Publish ClipReadyEvent]
    end
    subgraph PW["Proxy Worker (proxy_broker)"]
        P1[Pick up generate_clip_proxy] --> P2[Chained FFmpeg: 480p H.264 + 10x20 sprite]
        P2 --> P3[Upload proxy.mp4 + sprite.jpg]
        P3 --> P4[Set proxy_key, sprite_key, sprite_seconds_per_tile]
    end
    G -.-> P1

The extracted audio is 16kHz mono MP3, optimized for Whisper. The waveform is an array of ~100 peaks per second for timeline visualization.

Stage 3: Analyze

Analysis runs on the analysis_broker as a pipeline of steps:

flowchart TB
    A[Load Audio] --> B[Transcribe]
    B --> C[Detect Silences]
    B --> D[Detect False Starts]
    B --> P[Detect Profanity]
    C --> E[Validate Edits]
    D --> E
    P --> E
    E --> F[Create Edits]
    F --> G[Update Project]
    G --> H[Publish AnalysisCompleteEvent]

    style B fill:#ff3300,color:#fff
    style C fill:#ff6633,color:#fff
    style D fill:#ff6633,color:#fff
    style P fill:#ef4444,color:#fff

Silence, false start, and profanity detection run in parallel since they're independent. The pipeline publishes progress events after each step.

Stage 4: Review

This stage happens in the frontend:

flowchart LR
    A[Fetch Edits] --> B[Display Timeline]
    B --> C{User Action}
    C -->|Toggle| D[PATCH /edits/{id}]
    C -->|Adjust| D
    C -->|Add Cut| F[POST /edits]
    C -->|Save Draft| E[POST /drafts]
    D --> B
    E --> B
    F --> B

Users see the transcript with detected edits highlighted. They can: - Toggle edits on/off - Adjust edit boundaries by dragging - Add manual cuts via the ADD CUT button (shown in cyan) - Save drafts with different edit configurations

Edit types: silence (detected pauses), false_start (detected repetitions), profanity (detected swear words), manual (user-created cuts).

Edits also have an action field: cut removes video+audio, mute keeps video but silences/bleeps audio (used for profanity).

Stage 5: Render

When the user triggers a render, we snapshot the current edit state:

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Worker
    participant R2

    Client->>API: POST /exports {name, settings}
    API->>DB: Create Export with edit_snapshot
    API->>API: Queue render_export
    API->>Client: {export_uuid, status: pending}

    Worker->>DB: Load Export + edit_snapshot
    Worker->>R2: Download clips
    Worker->>Worker: Apply cuts (FFmpeg)
    Worker->>Worker: Audio processing (optional)
    Worker->>R2: Upload rendered video
    Worker->>DB: Update Export status
    Worker-->>Client: ExportCompleteEvent (SSE)

The snapshot means users can keep editing while a render is in progress - the render uses the frozen state.

Audio Processing (when "Audio Clean" enabled): 1. Noise Reduction - FFT-based removal of background noise (AC, fans, room tone) 2. LUFS Normalization - Adjusts loudness to -14 LUFS (YouTube/Spotify standard)

Stage 6: Download

Completed exports live in R2:

sequenceDiagram
    participant Client
    participant API
    participant R2

    Client->>API: GET /exports/{id}/download
    API->>R2: Generate presigned GET URL
    API->>Client: {url, expires_in: 3600}
    Client->>R2: GET (direct download)

Asset Editing

Assets (user-uploaded videos/images) can be trimmed or have audio extracted. This uses a fire-and-forget pattern - the API returns immediately while processing happens in the background. Trim bounds (start_ms, end_ms, cuts[i].end_ms) are validated against AssetFile.duration_ms at the API layer before the worker is enqueued — out-of-range values return 400. If the asset is still processing (duration_ms IS NULL), the bounds check is skipped and the worker silent-clamps at render.

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Worker
    participant R2

    Client->>API: POST /assets/{uuid}/edit
    Note right of Client: {cuts: [...], extract_audio, save_mode}
    API->>DB: Create pending Asset copy
    API->>API: Queue edit_asset task
    API->>Client: {new_asset_uuid}
    Note right of Client: Returns immediately

    Client->>Client: Poll asset list (refetchInterval)

    Worker->>R2: Download source asset
    Worker->>Worker: FFmpeg multi-cut processing
    Worker->>R2: Upload result
    Worker->>DB: Update Asset status → uploaded

    Client->>API: GET /assets (polling)
    API->>Client: Asset now has status: uploaded

Key features: - Multi-cut support: Multiple regions can be removed in a single operation using FFmpeg filter_complex - Fire-and-forget: New asset created with status: pending, updated to uploaded when done - Save modes: copy creates a new asset, replace overwrites the original - Polling: Frontend polls while any asset has status: pending (edit / YouTube download in flight) or status: processing (upload-confirm worker probing ffprobe / waveform / thumbnail). SSE events (asset_ready / asset_failed) invalidate the same cache; polling is the fallback for SSE drops.

Event Flow

Events tie everything together:

flowchart LR
    subgraph Workers
        W1[Download]
        W2[Analysis]
        W3[Render]
        W4[Asset Edit]
    end

    subgraph Redis
        PS[(Pub/Sub)]
    end

    subgraph API
        SSE[SSE Endpoint]
    end

    subgraph Frontend
        ES[EventSource]
        RQ[React Query]
    end

    W1 -->|Publish| PS
    W2 -->|Publish| PS
    W3 -->|Publish| PS
    PS -->|Subscribe| SSE
    SSE -->|Stream| ES
    ES -->|Invalidate| RQ

Note: Asset Edit uses polling instead of SSE since assets are user-scoped (not project-scoped) and updates are infrequent.

Each project has its own channel: project:{uuid}:events. The frontend subscribes when you open a project and invalidates React Query caches when events arrive.

Storage Layout

All files live in R2 with a predictable structure:

flowchart TB
    subgraph Clips["sapari-raw bucket"]
        C1["clips/{prefix}/{uuid}/"]
        C1 --> C2[original.mp4]
        C1 --> C3[audio.mp3]
        C1 --> C4[proxy.mp4]
        C1 --> C5[waveform.json]
    end

    subgraph Exports["sapari-exports bucket"]
        E1["exports/{project}/{export}/"]
        E1 --> E2[Final Cut v1.mp4]
    end

    subgraph Assets["sapari-assets bucket"]
        A1["assets/{prefix}/{uuid}/"]
        A1 --> A2[video.mp4]
        A1 --> A3[thumbnail.jpg]
    end

The {prefix} is the first 2 characters of the UUID. This helps S3/R2 distribute files across partitions for better performance.

Debugging Tips

When something goes wrong, trace through the stages:

flowchart TB
    A[Issue Reported] --> B{Which stage?}
    B -->|Upload| C[Check ClipFile status]
    B -->|Process| D[Check worker logs]
    B -->|Analysis| E[Check Whisper API / LLM logs]
    B -->|Render| F[Check Export.error_message]
    B -->|Events| G[Check Redis pub/sub]

    C --> H[PENDING = upload incomplete]
    C --> I[PROCESSING = worker running]
    C --> J[FAILED = check error_message]

← Overview Storage →