Data Flow¶

This page walks through how data moves through Sapari from upload to final export. Understanding this flow helps when debugging issues or adding new features.

The Complete Journey¶

A video goes through several stages in Sapari:

flowchart LR
    A[Upload] --> B[Process]
    B --> C[Analyze]
    C --> D[Review]
    D --> E[Render]
    E --> F[Download]

    style A fill:#ff3300,color:#fff
    style F fill:#ff3300,color:#fff

Each stage involves different components, but the pattern is consistent: the API receives a request, queues a background task via RabbitMQ, and publishes events via Redis pub/sub when done.

Stage 1: Upload¶

The upload process uses presigned URLs so clients upload directly to R2 without going through our servers. Two paths, picked by file size:

Single-PUT for files < 25 MiB (MULTIPART_CUTOFF_BYTES) — one presigned URL, one PUT.
Multipart for files ≥ 25 MiB — parallel parts (4 concurrent at 16 MiB each), per-part retry, mid-stream resume.

The dispatch happens in the frontend hook (useUploadClip / useUploadAsset); the user-facing API is the same.

Single-PUT (small files)¶

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant R2

    Client->>API: POST /clips/presign
    API->>DB: Create Clip + ClipFile records
    API->>R2: Generate presigned PUT URL
    API->>Client: {upload_url, content_type, clip_uuid}
    Client->>R2: PUT file bytes (direct upload)
    Client->>API: POST /clips/{uuid}/confirm
    API->>R2: HEAD object (read actual Content-Length for quota recheck)
    API->>API: Queue process_clip_artifacts

Multipart (large files, ≥ 25 MiB)¶

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant R2

    Client->>API: POST /clips/multipart/initiate
    API->>R2: create_multipart_upload (~100-200 ms)
    API->>R2: presign N initial part URLs
    API->>DB: Create Clip + ClipFile (status=MULTIPART_INITIATED, upload_id in metadata_json)
    API->>Client: {upload_id, parts_count, parts: [{part_number, url}, ...50]}

    par Parallel parts (capped at 4)
        Client->>R2: PUT part 1 → ETag
        Client->>R2: PUT part 2 → ETag
        Client->>R2: PUT part N → ETag
    end

    Note over Client,API: Refill: GET /multipart/parts/urls?from=51&to=100<br/>(when window drops below LOOKAHEAD = 50)
    Note over Client: ETags round-trip byte-exact (with R2's literal quotes)<br/>— stripping fails complete with InvalidPart

    Client->>API: POST /clips/multipart/complete (parts: [{part_number, etag}])
    API->>R2: complete_multipart_upload + HEAD (Convention #16 split-session, no DB held)
    API->>DB: Status MULTIPART_INITIATED → UPLOADED + storage accounting
    API->>API: Queue process_clip_artifacts

Cancel + resume: client calls POST /multipart/abort to release R2 parts AND delete the Clip + ClipFile / UserAsset + AssetFile rows; the cron sweeps any orphaned uploads past STUCK_MULTIPART_MINUTES = 180 (URL expiry + 60 min) with the same deletion semantics. There is no FAILED recovery state for an in-flight multipart upload — a zombie row with a Retry button can't actually resume because R2's abort destroys the parts. The frontend persists {upload_id, key, parts_uploaded} in IndexedDB keyed by file fingerprint so transient interruptions (page refresh, network drop) can resume via GET /multipart/parts while the backend row is still MULTIPART_INITIATED; explicit cancel clears IndexedDB.

For YouTube imports, the flow is simpler:

sequenceDiagram
    participant Client
    participant API
    participant Worker

    Client->>API: POST /clips/youtube-import {url}
    API->>API: fetch_video_info (sync, bounded wait_for) — reject if duration > cap
    API->>API: Create Clip + ClipFile records
    API->>API: Queue download_youtube_video
    API->>Client: {clip_uuid}
    Worker->>Worker: Download + process
    Worker-->>Client: ClipReadyEvent (SSE)

Stage 2: Process¶

The download_broker handles video processing. Proxy generation runs on a separate proxy_broker / taskiq-proxy-worker so CPU-heavy re-encodes don't block audio extraction for subsequent imports:

flowchart TB
    subgraph DW["Download Worker (download_broker)"]
        A[Pick up task] --> B[Download video]
        B --> C[Extract audio]
        C --> D[Generate waveform]
        D --> E{Web compatible?}
        E -->|Yes| F[Update ClipFile]
        E -->|No| G[Enqueue on proxy_broker]
        G --> F
        F --> H[Publish ClipReadyEvent]
    end
    subgraph PW["Proxy Worker (proxy_broker)"]
        P1[Pick up generate_clip_proxy] --> P2[Chained FFmpeg: 480p H.264 + 10x20 sprite]
        P2 --> P3[Upload proxy.mp4 + sprite.jpg]
        P3 --> P4[Set proxy_key, sprite_key, sprite_seconds_per_tile]
    end
    G -.-> P1

The extracted audio is 16kHz mono MP3, optimized for Whisper. The waveform is an array of ~100 peaks per second for timeline visualization.

Stage 3: Analyze¶

Analysis runs on the analysis_broker as a pipeline of steps:

flowchart TB
    A[Load Audio] --> B[Transcribe]
    B --> C[Detect Silences]
    B --> D[Detect False Starts]
    B --> P[Detect Profanity]
    C --> E[Validate Edits]
    D --> E
    P --> E
    E --> F[Create Edits]
    F --> G[Update Project]
    G --> H[Publish AnalysisCompleteEvent]

    style B fill:#ff3300,color:#fff
    style C fill:#ff6633,color:#fff
    style D fill:#ff6633,color:#fff
    style P fill:#ef4444,color:#fff

Silence, false start, and profanity detection run in parallel since they're independent. The pipeline publishes progress events after each step.

Stage 4: Review¶

This stage happens in the frontend:

flowchart LR
    A[Fetch Edits] --> B[Display Timeline]
    B --> C{User Action}
    C -->|Toggle| D[PATCH /edits/{id}]
    C -->|Adjust| D
    C -->|Add Cut| F[POST /edits]
    C -->|Save Draft| E[POST /drafts]
    D --> B
    E --> B
    F --> B

Users see the transcript with detected edits highlighted. They can: - Toggle edits on/off - Adjust edit boundaries by dragging - Add manual cuts via the ADD CUT button (shown in cyan) - Save drafts with different edit configurations

Edit types: silence (detected pauses), false_start (detected repetitions), profanity (detected swear words), manual (user-created cuts).

Edits also have an action field: cut removes video+audio, mute keeps video but silences/bleeps audio (used for profanity).

Stage 5: Render¶

When the user triggers a render, we snapshot the current edit state:

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Worker
    participant R2

    Client->>API: POST /exports {name, settings}
    API->>DB: Create Export with edit_snapshot
    API->>API: Queue render_export
    API->>Client: {export_uuid, status: pending}

    Worker->>DB: Load Export + edit_snapshot
    Worker->>R2: Download clips
    Worker->>Worker: Apply cuts (FFmpeg)
    Worker->>Worker: Audio processing (optional)
    Worker->>R2: Upload rendered video
    Worker->>DB: Update Export status
    Worker-->>Client: ExportCompleteEvent (SSE)

The snapshot means users can keep editing while a render is in progress - the render uses the frozen state.

Audio Processing (when "Audio Clean" enabled): 1. Noise Reduction - FFT-based removal of background noise (AC, fans, room tone) 2. LUFS Normalization - Adjusts loudness to -14 LUFS (YouTube/Spotify standard)

Stage 6: Download¶

Completed exports live in R2:

sequenceDiagram
    participant Client
    participant API
    participant R2

    Client->>API: GET /exports/{id}/download
    API->>R2: Generate presigned GET URL
    API->>Client: {url, expires_in: 3600}
    Client->>R2: GET (direct download)

Asset Editing¶

Assets (user-uploaded videos/images) can be trimmed or have audio extracted. This uses a fire-and-forget pattern - the API returns immediately while processing happens in the background. Trim bounds (start_ms, end_ms, cuts[i].end_ms) are validated against AssetFile.duration_ms at the API layer before the worker is enqueued — out-of-range values return 400. If the asset is still processing (duration_ms IS NULL), the bounds check is skipped and the worker silent-clamps at render.

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Worker
    participant R2

    Client->>API: POST /assets/{uuid}/edit
    Note right of Client: {cuts: [...], extract_audio, save_mode}
    API->>DB: Create pending Asset copy
    API->>API: Queue edit_asset task
    API->>Client: {new_asset_uuid}
    Note right of Client: Returns immediately

    Client->>Client: Poll asset list (refetchInterval)

    Worker->>R2: Download source asset
    Worker->>Worker: FFmpeg multi-cut processing
    Worker->>R2: Upload result
    Worker->>DB: Update Asset status → uploaded

    Client->>API: GET /assets (polling)
    API->>Client: Asset now has status: uploaded

Key features: - Multi-cut support: Multiple regions can be removed in a single operation using FFmpeg filter_complex - Fire-and-forget: New asset created with status: pending, updated to uploaded when done - Save modes: copy creates a new asset, replace overwrites the original - Polling: Frontend polls while any asset has status: pending (edit / YouTube download in flight) or status: processing (upload-confirm worker probing ffprobe / waveform / thumbnail). SSE events (asset_ready / asset_failed) invalidate the same cache; polling is the fallback for SSE drops.

Event Flow¶

Events tie everything together:

flowchart LR
    subgraph Workers
        W1[Download]
        W2[Analysis]
        W3[Render]
        W4[Asset Edit]
    end

    subgraph Redis
        PS[(Pub/Sub)]
    end

    subgraph API
        SSE[SSE Endpoint]
    end

    subgraph Frontend
        ES[EventSource]
        RQ[React Query]
    end

    W1 -->|Publish| PS
    W2 -->|Publish| PS
    W3 -->|Publish| PS
    PS -->|Subscribe| SSE
    SSE -->|Stream| ES
    ES -->|Invalidate| RQ

Note: Asset Edit uses polling instead of SSE since assets are user-scoped (not project-scoped) and updates are infrequent.

Each project has its own channel: project:{uuid}:events. The frontend subscribes when you open a project and invalidates React Query caches when events arrive.

Storage Layout¶

All files live in R2 with a predictable structure:

flowchart TB
    subgraph Clips["sapari-raw bucket"]
        C1["clips/{prefix}/{uuid}/"]
        C1 --> C2[original.mp4]
        C1 --> C3[audio.mp3]
        C1 --> C4[proxy.mp4]
        C1 --> C5[waveform.json]
    end

    subgraph Exports["sapari-exports bucket"]
        E1["exports/{project}/{export}/"]
        E1 --> E2[Final Cut v1.mp4]
    end

    subgraph Assets["sapari-assets bucket"]
        A1["assets/{prefix}/{uuid}/"]
        A1 --> A2[video.mp4]
        A1 --> A3[thumbnail.jpg]
    end

The {prefix} is the first 2 characters of the UUID. This helps S3/R2 distribute files across partitions for better performance.

Debugging Tips¶

When something goes wrong, trace through the stages:

flowchart TB
    A[Issue Reported] --> B{Which stage?}
    B -->|Upload| C[Check ClipFile status]
    B -->|Process| D[Check worker logs]
    B -->|Analysis| E[Check Whisper API / LLM logs]
    B -->|Render| F[Check Export.error_message]
    B -->|Events| G[Check Redis pub/sub]

    C --> H[PENDING = upload incomplete]
    C --> I[PROCESSING = worker running]
    C --> J[FAILED = check error_message]

← Overview Storage →