Skip to content

Render Pipeline

The render pipeline takes a project with edits and produces a final video. It downloads the source clips, applies cuts with FFmpeg, and uploads the result.

Pipeline Structure

flowchart LR
    A[Download Clips] --> B[Apply Cuts]
    B --> C[Upload Export]
    C --> D[Update Export]

    style A fill:#ff3300,color:#fff
    style B fill:#ff3300,color:#fff

Each step is sequential - we need the clips before we can cut them, and we need the cut video before we can upload it.

Triggering a Render

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Redis
    participant Worker
    participant R2

    Client->>API: POST /projects/{uuid}/exports
    API->>DB: Create Export with edit_snapshot
    API->>Redis: Queue render_export
    API->>Client: {export_uuid, status: pending}

    Worker->>Redis: Pick up task
    Worker-->>Client: ExportStartedEvent (SSE)
    Worker->>DB: Load Export + edit_snapshot
    Worker-->>Client: ExportProgressEvent 5%
    Worker->>R2: Download clips
    Worker-->>Client: ExportProgressEvent 30%
    Worker->>Worker: Apply cuts (FFmpeg)
    Worker-->>Client: ExportProgressEvent 85%
    Worker->>R2: Upload rendered video
    Worker->>DB: Update Export status
    Worker-->>Client: ExportCompleteEvent (SSE)
POST /api/v1/projects/{project_uuid}/exports
{
    "name": "Final Cut v1"
}

This creates an Export record with a snapshot of the current edits, then queues a render_export task.

Edit Snapshots

When you trigger a render, we freeze the current active edits into edit_snapshot:

flowchart TB
    subgraph Project["Project State"]
        E1[Edit 1: active]
        E2[Edit 2: inactive]
        E3[Edit 3: active]
    end

    subgraph Trigger["Render Triggered"]
        S[Snapshot active edits]
    end

    subgraph Export["Export Record"]
        ES["edit_snapshot: [Edit 1, Edit 3]"]
    end

    E1 --> S
    E3 --> S
    S --> ES

    subgraph Later["User keeps editing"]
        E1 -->|Toggle off| E1X[Edit 1: inactive]
        E2 -->|Toggle on| E2X[Edit 2: active]
    end

    ES -.->|Unaffected| R[Render uses frozen snapshot]
{
    "edits": [
        {"start_ms": 1000, "end_ms": 2500, "type": "SILENCE", "action": "cut"},
        {"start_ms": 5000, "end_ms": 6200, "type": "FALSE_START", "action": "cut"},
        {"start_ms": 8000, "end_ms": 8500, "type": "PROFANITY", "action": "mute"}
    ],
    "settings": {
        "resolution": "1080p",
        "audio_censorship": "bleep"
    }
}

This means you can toggle edits after triggering a render without affecting the in-progress export. You can also re-render with different edits by triggering another export.

Steps

DownloadClipsStep

Downloads all clips for the project and concatenates them if there are multiple.

  1. Query clips in display order
  2. Generate presigned download URLs (1 hour expiry)
  3. Download via httpx streaming (faster than FFmpeg's HTTP input)
  4. If multiple clips, concatenate with FFmpeg

The output is a single video file ready for cutting.

ApplyCutsStep

Applies the edit snapshot using FFmpeg's select filter. Edits are separated by action: - CUT edits: Remove video and audio segments - MUTE edits: Keep video, silence or bleep audio

flowchart TB
    A[Read edit_snapshot] --> B[Separate CUT vs MUTE edits]
    B --> C[Calculate segments to keep from CUTs]
    C --> D[Remap MUTE timestamps for cuts]
    D --> E{Audio censorship mode?}
    E -->|none| F[No audio processing]
    E -->|mute| G[Apply volume=0 filter]
    E -->|bleep| H[Mix with 1kHz sine tone]
    F --> I[FFmpeg render]
    G --> I
    H --> I
    I --> J[Output video]

    style G fill:#f59e0b,color:#fff
    style H fill:#ef4444,color:#fff
  1. Read edit_snapshot from the export record
  2. Separate edits by action field (CUT vs MUTE)
  3. Calculate segments to keep (inverse of CUT edits only)
  4. Remap MUTE edit timestamps to account for removed content
  5. Build FFmpeg filter graph based on audio_censorship setting
  6. Run FFmpeg

Mute Mode - Silences audio during profanity:

ffmpeg -i input.mp4 \
  -vf "select='...'" \
  -af "aselect='...',volume=enable='between(t,5,7)':volume=0" \
  output.mp4

Bleep Mode - Overlays 1kHz tone during profanity using filter_complex:

ffmpeg -i input.mp4 -filter_complex "
  [0:v]select='...',setpts=...[vout];
  [0:a]aselect='...',volume=enable='between(t,5,7)':volume=0[main];
  sine=frequency=1000:duration=15.5,aformat=channel_layouts=stereo,
    volume='if(between(t,5,7),0.25,0)':eval=frame[bleep];
  [main][bleep]amix=inputs=2:normalize=0[aout]
" -map "[vout]" -map "[aout]" output.mp4

The bleep tone is a standard 1kHz sine wave at 25% volume, mixed only during mute regions.

Audio Processing (Optional)

When the user enables "Audio Clean", two processing filters are applied:

flowchart LR
    A[Source Audio] --> B[Noise Reduction]
    B --> C[LUFS Normalization]
    C --> D[Output Audio]

    style B fill:#3b82f6,color:#fff
    style C fill:#10b981,color:#fff

1. Noise Reduction (afftdn)

FFT-based spectral analysis removes constant background noise: - Fan/AC hum - Room tone - Computer noise

afftdn=nf=-25

The noise floor of -25 dB provides moderate reduction without affecting voice quality. Lower values (e.g., -40) are more aggressive but may introduce artifacts.

2. LUFS Loudness Normalization (loudnorm)

Adjusts overall loudness to broadcast standards using EBU R128:

loudnorm=I=-14:TP=-1.5:LRA=11
Parameter Value Purpose
I (Integrated) -14 LUFS Target loudness (YouTube/Spotify standard)
TP (True Peak) -1.5 dB Prevents clipping on lossy codecs
LRA (Loudness Range) 11 Preserves natural dynamics

Why -14 LUFS? - YouTube normalizes to -14 LUFS (quieter content gets boosted, louder gets reduced) - Spotify uses -14 LUFS for podcasts - Matches typical professional podcast/video loudness

Processing Order

Noise reduction runs BEFORE normalization. This prevents the normalizer from amplifying background noise when boosting quiet audio.

# Combined filter chain
-af "afftdn=nf=-25,loudnorm=I=-14:TP=-1.5:LRA=11"

Asset Overlay Compositing

When asset edits with visual_mode=overlay are present, the render pipeline composites them onto the main video using FFmpeg's overlay filter. Each overlay goes through:

loop (images only) → trim → setpts (PTS shift) → scale → hflip → vflip → rotate → colorchannelmixer (opacity) → overlay
Filter When Applied Purpose
loop=-1:size=1:start=0 Image inputs (no asset_duration_ms) Loop the single frame so the overlay stream fills the window; without this the overlay shows for one frame then the filter holds the last frame, producing a static image regardless of content
trim=duration=W Always Cap the overlay stream to the enable window duration
setpts=PTS-STARTPTS+S/TB Always Shift overlay's PTS so its frame 0 arrives at main t=start_ms. Without this, a video overlay starts playing at main t=0, exhausts before the enable window opens, and freezes on the last frame
scale Always Size overlay to overlay_size_percent of video
hflip overlay_flip_h=true Mirror horizontally
vflip overlay_flip_v=true Mirror vertically
rotate overlay_rotation_deg > 0 Rotate by N degrees (converted to radians)
colorchannelmixer overlay_opacity_percent < 100 Apply transparency
overlay Always Position on main video with time-based enable

Multiple overlays are chained sequentially, each composited onto the result of the previous.

Asset B-roll Compositing (REPLACE + INSERT)

When asset edits with visual_mode=replace or insert are present, the render uses a two-pass pipeline. Pass 1 applies CUT edits to produce a trimmed main video; Pass 2 splits that trimmed video into main/asset segments and concatenates them via FFmpeg's concat filter.

concat is strict about input uniformity — it rejects the whole filter graph with Failed to configure output pad on Parsed_concat_N if any two inputs differ on:

  • Video dimensions or sample aspect ratio. Every segment is scaled and padded to the exact output_width × output_height computed from the main's aspect, then setsar=1 forces square pixels. The variable-width scale=-2:'min(H,ih)' fallback is only used when output dimensions aren't known; within the B-roll path they always are.
  • Audio sample rate, channel layout, or sample format. Every audio sub-chain (real [0:a]atrim, real [N:a]atrim, and synthesized anullsrc) ends with aresample=48000,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo so heterogeneous sources (e.g. a 44.1kHz mono screen recording + a 48kHz stereo broll) concatenate cleanly.

Image inputs (PNG/JPEG assets in REPLACE or INSERT) have a single frame at PTS=0. Without intervention, concat emits one frame for a multi-second window and the user sees the image "blink". Image segments are detected via absent asset_duration_ms and prepend loop=-1:size=1:start=0,trim=duration=D to fill the window.

Silent inputs — screen recordings without microphone, muted brolls, PNGs — are probed via has_audio_stream() (ffprobe) before the command is built. The main's probe feeds main_has_audio; asset probes feed inputs_without_audio: frozenset[int]. Any segment whose underlying input has no audio stream synthesizes anullsrc instead of referencing [N:a] — FFmpeg would otherwise reject the filter graph with Stream specifier ':a' ... matches no streams.

REPLACE audio semantics. REPLACE swaps video visually but keeps the main's audio continuous. VideoReplaceSegment.audio_mode defaults to "original_only" for REPLACE segments, pulling audio from [0:a]atrim={output_start}:{output_end} rather than from the asset. The other audio modes (asset_only, mix, none) still fall through to the pre-existing asset-audio or silence branches; original_only is the only new branch. INSERT keeps its legacy default of "mix" because INSERT adds a new segment of time and asset audio is the natural choice there.

Main Video Transforms

The main video can be flipped horizontally and/or vertically. These transforms are applied in the FFmpeg filter chain after scaling/padding but before subtitle burn-in:

select → setpts → scale/pad → hflip/vflip → subtitles
Filter When Applied Purpose
hflip video_flip_h=true Mirror video horizontally
vflip video_flip_v=true Mirror video vertically

Both transforms also apply in the live preview via CSS scale(-1, 1) / scale(1, -1) on the <video> element only (overlays and controls remain unaffected).

Main Volume Control

The main_volume_percent setting (0-100%) adjusts the main video's audio level before mute/normalization processing:

# Volume at 50%
-af "volume=0.50"

Processing order: volume → mute censorship → noise reduction → LUFS normalization.

UploadExportStep

Uploads the rendered video to R2:

  • Bucket: STORAGE_BUCKET_EXPORTS
  • Key: exports/{project_uuid}/{export_uuid}/{filename}.mp4

UpdateExportStep

Finalizes the export record:

  • Sets status to COMPLETED
  • Stores storage_key
  • Publishes ExportCompleteEvent

Timeouts

Long videos need time to process:

Step Timeout
Download clips 15 minutes
Apply cuts 30 minutes
Upload export 10 minutes
Total pipeline 1 hour

If a step exceeds its timeout, the export fails and we set status to FAILED with an error message.

Error Handling

flowchart TB
    A[Step Executing] --> B{Success?}
    B -->|Yes| C[Next Step]
    B -->|No| D[Catch Exception]
    D --> E[Set status = FAILED]
    E --> F[Store error_message]
    F --> G[Publish ExportFailedEvent]
    G --> H[Cleanup temp files]
    H --> I[Frontend shows error]
    I --> J{User action}
    J -->|Retry| A

If any step fails:

  1. The pipeline catches the exception
  2. Sets export.status = FAILED
  3. Stores the error message in export.error_message
  4. Publishes ExportFailedEvent
  5. Cleans up temporary files

The frontend shows the error to the user so they can retry.

Downloading Exports

Once complete, users can download via:

GET /api/v1/exports/{export_uuid}/download

This returns a presigned download URL valid for 1 hour.

Key Files

Component Location
Pipeline definition backend/src/workers/render/pipeline.py
Task entry backend/src/workers/render/tasks.py
Pipeline steps backend/src/workers/render/steps.py
FFmpeg utilities backend/src/workers/render/ffmpeg/

← API Endpoints Download Pipeline →