Render Pipeline¶

The render pipeline takes a project with edits and produces a final video. It downloads the source clips, applies cuts with FFmpeg, and uploads the result.

Pipeline Structure¶

flowchart LR
    A[Download Clips] --> B[Apply Cuts]
    B --> C[Upload Export]
    C --> D[Update Export]

    style A fill:#ff3300,color:#fff
    style B fill:#ff3300,color:#fff

Each step is sequential - we need the clips before we can cut them, and we need the cut video before we can upload it.

Triggering a Render¶

sequenceDiagram
    participant Client
    participant API
    participant DB
    participant Redis
    participant Worker
    participant R2

    Client->>API: POST /projects/{uuid}/exports
    API->>DB: Create Export with edit_snapshot
    API->>Redis: Queue render_export
    API->>Client: {export_uuid, status: pending}

    Worker->>Redis: Pick up task
    Worker-->>Client: ExportStartedEvent (SSE)
    Worker->>DB: Load Export + edit_snapshot
    Worker-->>Client: ExportProgressEvent 5%
    Worker->>R2: Download clips
    Worker-->>Client: ExportProgressEvent 30%
    Worker->>Worker: Apply cuts (FFmpeg)
    Worker-->>Client: ExportProgressEvent 85%
    Worker->>R2: Upload rendered video
    Worker->>DB: Update Export status
    Worker-->>Client: ExportCompleteEvent (SSE)

POST /api/v1/projects/{project_uuid}/exports
{
    "name": "Final Cut v1"
}

This creates an Export record with a snapshot of the current edits, then queues a render_export task.

Edit Snapshots¶

When you trigger a render, we freeze the current active edits into edit_snapshot:

flowchart TB
    subgraph Project["Project State"]
        E1[Edit 1: active]
        E2[Edit 2: inactive]
        E3[Edit 3: active]
    end

    subgraph Trigger["Render Triggered"]
        S[Snapshot active edits]
    end

    subgraph Export["Export Record"]
        ES["edit_snapshot: [Edit 1, Edit 3]"]
    end

    E1 --> S
    E3 --> S
    S --> ES

    subgraph Later["User keeps editing"]
        E1 -->|Toggle off| E1X[Edit 1: inactive]
        E2 -->|Toggle on| E2X[Edit 2: active]
    end

    ES -.->|Unaffected| R[Render uses frozen snapshot]

{
    "edits": [
        {"start_ms": 1000, "end_ms": 2500, "type": "SILENCE", "action": "cut"},
        {"start_ms": 5000, "end_ms": 6200, "type": "FALSE_START", "action": "cut"},
        {"start_ms": 8000, "end_ms": 8500, "type": "PROFANITY", "action": "mute"}
    ],
    "settings": {
        "resolution": "1080p",
        "audio_censorship": "bleep"
    }
}

This means you can toggle edits after triggering a render without affecting the in-progress export. You can also re-render with different edits by triggering another export.

Steps¶

DownloadClipsStep¶

Downloads all clips for the project and concatenates them if there are multiple.

Query clips in display order
Generate presigned download URLs (1 hour expiry)
Download via httpx streaming (faster than FFmpeg's HTTP input)
If multiple clips, concatenate with FFmpeg

The output is a single video file ready for cutting.

ApplyCutsStep¶

Applies the edit snapshot using FFmpeg's select filter. Edits are separated by action: - CUT edits: Remove video and audio segments - MUTE edits: Keep video, silence or bleep audio

flowchart TB
    A[Read edit_snapshot] --> B[Separate CUT vs MUTE edits]
    B --> C[Calculate segments to keep from CUTs]
    C --> D[Remap MUTE timestamps for cuts]
    D --> E{Audio censorship mode?}
    E -->|none| F[No audio processing]
    E -->|mute| G[Apply volume=0 filter]
    E -->|bleep| H[Mix with 1kHz sine tone]
    F --> I[FFmpeg render]
    G --> I
    H --> I
    I --> J[Output video]

    style G fill:#f59e0b,color:#fff
    style H fill:#ef4444,color:#fff

Read edit_snapshot from the export record
Separate edits by action field (CUT vs MUTE)
Calculate segments to keep (inverse of CUT edits only)
Remap MUTE edit timestamps to account for removed content
Build FFmpeg filter graph based on audio_censorship setting
Run FFmpeg

Mute Mode - Silences audio during profanity:

ffmpeg -i input.mp4 \
  -vf "select='...'" \
  -af "aselect='...',volume=enable='between(t,5,7)':volume=0" \
  output.mp4

Bleep Mode - Overlays 1kHz tone during profanity using filter_complex:

ffmpeg -i input.mp4 -filter_complex "
  [0:v]select='...',setpts=...[vout];
  [0:a]aselect='...',volume=enable='between(t,5,7)':volume=0[main];
  sine=frequency=1000:duration=15.5,aformat=channel_layouts=stereo,
    volume='if(between(t,5,7),0.25,0)':eval=frame[bleep];
  [main][bleep]amix=inputs=2:normalize=0[aout]
" -map "[vout]" -map "[aout]" output.mp4

The bleep tone is a standard 1kHz sine wave at 25% volume, mixed only during mute regions.

Audio Processing (Optional)¶

When the user enables "Audio Clean", two processing filters are applied:

flowchart LR
    A[Source Audio] --> B[Noise Reduction]
    B --> C[LUFS Normalization]
    C --> D[Output Audio]

    style B fill:#3b82f6,color:#fff
    style C fill:#10b981,color:#fff

1. Noise Reduction (afftdn)

FFT-based spectral analysis removes constant background noise: - Fan/AC hum - Room tone - Computer noise

afftdn=nf=-25

The noise floor of -25 dB provides moderate reduction without affecting voice quality. Lower values (e.g., -40) are more aggressive but may introduce artifacts.

2. LUFS Loudness Normalization (loudnorm)

Adjusts overall loudness to broadcast standards using EBU R128:

loudnorm=I=-14:TP=-1.5:LRA=11

Parameter	Value	Purpose
`I` (Integrated)	-14 LUFS	Target loudness (YouTube/Spotify standard)
`TP` (True Peak)	-1.5 dB	Prevents clipping on lossy codecs
`LRA` (Loudness Range)	11	Preserves natural dynamics

Why -14 LUFS? - YouTube normalizes to -14 LUFS (quieter content gets boosted, louder gets reduced) - Spotify uses -14 LUFS for podcasts - Matches typical professional podcast/video loudness

Processing Order

Noise reduction runs BEFORE normalization. This prevents the normalizer from amplifying background noise when boosting quiet audio.

# Combined filter chain
-af "afftdn=nf=-25,loudnorm=I=-14:TP=-1.5:LRA=11"

Asset Overlay Compositing¶

When asset edits with visual_mode=overlay are present, the render pipeline composites them onto the main video using FFmpeg's overlay filter. Each overlay goes through:

loop (images only) → trim → setpts (PTS shift) → scale → hflip → vflip → rotate → colorchannelmixer (opacity) → overlay

Filter	When Applied	Purpose
`loop=-1:size=1:start=0`	Image inputs (no `asset_duration_ms`)	Loop the single frame so the overlay stream fills the window; without this the overlay shows for one frame then the filter holds the last frame, producing a static image regardless of content
`trim=duration=W`	Always	Cap the overlay stream to the enable window duration
`setpts=PTS-STARTPTS+S/TB`	Always	Shift overlay's PTS so its frame 0 arrives at main `t=start_ms`. Without this, a video overlay starts playing at main `t=0`, exhausts before the enable window opens, and freezes on the last frame
`scale`	Always	Size overlay to `overlay_size_percent` of video
`hflip`	`overlay_flip_h=true`	Mirror horizontally
`vflip`	`overlay_flip_v=true`	Mirror vertically
`rotate`	`overlay_rotation_deg > 0`	Rotate by N degrees (converted to radians)
`colorchannelmixer`	`overlay_opacity_percent < 100`	Apply transparency
`overlay`	Always	Position on main video with time-based enable

Multiple overlays are chained sequentially, each composited onto the result of the previous.

Subtitle ordering. When an overlay-composite render also has subtitles, the subtitles filter is applied AFTER the final overlay rather than baked into the pre-overlay vbase chain. Captions therefore render on top of asset overlays, matching the b-roll-composite path's existing pattern. The simple-trim path (no overlays) still applies subtitles at the end of {crop}{scale}{transform}{subtitles}, unchanged.

Caption position resolution + snapshot keying. Free-form caption coordinates ride in settings_snapshot["captions"] as three optional fields: global caption_x / caption_y (0-1 normalized) and caption_overrides, a per-line map. _resolve_caption_placement (workers/render/subtitles.py) resolves each line's ASS placement with priority per-line override → global → none (style alignment). A positioned line is emitted as {\an5\pos(x,y)} (center anchor, matching the preview's translate(-50%)) plus symmetric per-event MarginL/MarginR computed as x_px - min(x_px, W - x_px) / W - x_px - min(...). libass bounds \pos wrap width by the event margins, so this makes a positioned caption wrap in-frame (within 2 × distance-to-nearest-edge, centered on x) instead of overflowing the edge; an unpositioned line keeps 0,0 margins (style default). Per-line keys are the caption_line UUID as a string — CaptionSettings.caption_overrides is deliberately typed dict[str, CaptionPositionOverride], not dict[UUID, ...]: the snapshot lives in a Postgres JSON column and json.dumps rejects UUID dict keys, so a dict[UUID, ...] would crash crud_exports.create at serialization time. A field_validator enforces every key parses as a UUID without paying the serialization cost. Drop-on-missing is the invariant: an override whose line was deleted between snapshot and render is dropped silently — the resolver looks the override up by the surviving line's id and never migrates an orphaned override onto a neighbor or the first line. Captions are otherwise live-queried at render time (load_caption_lines), not snapshotted, so edited text and deletions both reflect the latest state.

Asset B-roll Compositing (REPLACE + INSERT)¶

When asset edits with visual_mode=replace or insert are present, the render uses a two-pass pipeline. Pass 1 applies CUT edits to produce a trimmed main video; Pass 2 splits that trimmed video into main/asset segments and concatenates them via FFmpeg's concat filter.

concat is strict about input uniformity — it rejects the whole filter graph with Failed to configure output pad on Parsed_concat_N if any two inputs differ on:

Video dimensions or sample aspect ratio. Every segment is scaled and padded to the exact output_width × output_height computed from the main's aspect, then setsar=1 forces square pixels. The variable-width scale=-2:'min(H,ih)' fallback is only used when output dimensions aren't known; within the B-roll path they always are.
Audio sample rate, channel layout, or sample format. Every audio sub-chain (real [0:a]atrim, real [N:a]atrim, and synthesized anullsrc) ends with aresample=48000,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo so heterogeneous sources (e.g. a 44.1kHz mono screen recording + a 48kHz stereo broll) concatenate cleanly.

Image inputs (PNG/JPEG assets in REPLACE or INSERT) have a single frame at PTS=0. Without intervention, concat emits one frame for a multi-second window and the user sees the image "blink". Image segments are detected via absent asset_duration_ms and prepend loop=-1:size=1:start=0,trim=duration=D to fill the window.

Silent inputs — screen recordings without microphone, muted brolls, PNGs — are probed via has_audio_stream() (ffprobe) before the command is built. The main's probe feeds main_has_audio; asset probes feed inputs_without_audio: frozenset[int]. Any segment whose underlying input has no audio stream synthesizes anullsrc instead of referencing [N:a] — FFmpeg would otherwise reject the filter graph with Stream specifier ':a' ... matches no streams.

REPLACE audio semantics. REPLACE swaps video visually but keeps the main's audio continuous. VideoReplaceSegment.audio_mode defaults to "original_only" for REPLACE segments, pulling audio from [0:a]atrim={output_start}:{output_end} rather than from the asset. The other audio modes (asset_only, mix, none) still fall through to the pre-existing asset-audio or silence branches; original_only is the only new branch.

INSERT audio semantics. INSERT uses a different schema shape from REPLACE/OVERLAY: instead of audio_mode, INSERT edits carry two booleans insert_audio_enabled (default True) and insert_allow_overlapping_audio (default False). EditCreate / EditUpdate validators reject audio_mode when visual_mode == INSERT — surfaces frontend bugs loudly rather than silently dropping the value. The legacy AudioMode enum is translated to the two booleans at edit-create time (see workers/analysis/edits/step.py::_build_asset_edit_create): ASSET_ONLY / MIX → enabled=True; MIX additionally sets overlapping=True; ORIGINAL_ONLY / NONE → enabled=False. The analysis pipeline previously crashed on this mismatch — visual_mode=insert paired with the default audio_mode='original_only' failed EditCreate's validator.

Insert-anchored assets (issue #234). Non-INSERT asset edits whose inside_insert_edit_id is set bypass the main-time shift entirely. After shift_edits_for_inserts and remap_asset_edits pass them through unchanged, apply_inside_insert_anchors runs and schedules each one at <that insert's output_start_ms> + insert_offset_ms. The asset's own insert_overlap_modes[anchor_id] decides the end behavior: 'merge' (default) keeps the full visible duration and lets it overflow into post-insert main video (audio soundtracking case); 'anchor' clips the end at insert.output_end_ms, dropping the asset entirely if the resulting duration is ≤ 0. Anchored assets that reference an insert no longer in the export snapshot are dropped with a warning — better than silently misplacing them at start_ms=0.

Per-asset crop reframe (issue #235). Each REPLACE / INSERT video or image segment runs through _maybe_crop_chain(edit, target_w, target_h) in build_video_replace_segments. When the edit's asset_crop_enabled is False, target dims are unknown, or AssetFile.width/height are missing, the helper returns None and the segment takes today's uniform letterbox scale_part. Otherwise the helper writes a crop_chain of scale=<cover>,crop=<region>,scale=<exact target>,setsar=1 onto the VideoReplaceSegment. Cover-scale is the inverse of letterbox's min(...) — it uses max(...) so one axis exceeds the target — and the final exact-target rescale absorbs the ±1 px rounding from the even-pixel constraint that build_crop_filter enforces for H.264 chroma alignment. The crop region itself comes from _compute_crop_region (Python port of frontend/shared/lib/cropUtils.ts:computeCropRegion). _build_asset_video_chain reads seg.crop_chain and injects it in place of the uniform scale_part; both the video-input and image-input branches go through the same swap. The output is concat-uniform either way: same final WxH, same SAR. Source dimensions reach the renderer via the new asset_dimensions: dict[str, tuple[int, int]] carrier on DownloadResult (populated by _download_assets) and an asset_dimensions kwarg threaded through _prepare_asset_inputs.

Main Video Transforms¶

The main video can be flipped horizontally and/or vertically. These transforms are applied in the FFmpeg filter chain after scaling/padding but before subtitle burn-in:

select → setpts → scale/pad → hflip/vflip → subtitles

Filter	When Applied	Purpose
`hflip`	`video_flip_h=true`	Mirror video horizontally
`vflip`	`video_flip_v=true`	Mirror video vertically

Both transforms also apply in the live preview via CSS scale(-1, 1) / scale(1, -1) on the <video> element only (overlays and controls remain unaffected).

Main Volume Control¶

The main_volume_percent setting (0-100%) adjusts the main video's audio level before mute/normalization processing:

# Volume at 50%
-af "volume=0.50"

Processing order: volume → mute censorship → noise reduction → LUFS normalization.

UploadExportStep¶

Uploads the rendered video to R2:

Bucket: STORAGE_BUCKET_EXPORTS
Key: exports/{project_uuid}/{export_uuid}/{filename}.mp4

UpdateExportStep¶

Finalizes the export record:

Sets status to COMPLETED
Stores storage_key
Publishes ExportCompleteEvent

Timeouts¶

Long videos need time to process:

Step	Timeout
Download clips	15 minutes
Apply cuts	30 minutes
Upload export	10 minutes
Total pipeline	1 hour

If a step exceeds its timeout, the export fails and we set status to FAILED with an error message.

Error Handling¶

flowchart TB
    A[Step Executing] --> B{Success?}
    B -->|Yes| C[Next Step]
    B -->|No| D[Catch Exception]
    D --> E[Set status = FAILED]
    E --> F[Store error_message]
    F --> G[Publish ExportFailedEvent]
    G --> H[Cleanup temp files]
    H --> I[Frontend shows error]
    I --> J{User action}
    J -->|Retry| A

If any step fails:

The pipeline catches the exception
Sets export.status = FAILED
Stores the error message in export.error_message
Publishes ExportFailedEvent
Cleans up temporary files

The frontend shows the error to the user so they can retry.

Downloading Exports¶

Once complete, users can download via:

GET /api/v1/exports/{export_uuid}/download

This returns a presigned download URL valid for 1 hour.

Key Files¶

Component	Location
Pipeline definition	`backend/src/workers/render/pipeline.py`
Task entry	`backend/src/workers/render/tasks.py`
Pipeline steps	`backend/src/workers/render/steps.py`
DB helpers (Convention #16 shape 2)	`backend/src/workers/render/data.py`
FFmpeg utilities	`backend/src/workers/render/ffmpeg/`

← API Endpoints Download Pipeline →