Download Pipeline¶
When users upload a video or import from YouTube, we process it to extract audio, generate waveforms, and optionally create a web-compatible proxy.
Two Entry Points¶
YouTube Import¶
POST /api/v1/projects/{project_uuid}/clips/youtube-import
{ "url": "https://youtube.com/watch?v=..." }
Fetches video metadata synchronously (yt-dlp, wrapped in asyncio.to_thread + asyncio.wait_for with YOUTUBE_METADATA_TIMEOUT_SECONDS) and rejects the request with 400 ValidationError if the video duration exceeds MAX_YOUTUBE_DURATION_SECONDS (3600 — matches render pipeline timeout) or if metadata cannot be fetched / fetch times out. On pass, creates a Clip + ClipFile record and queues download_youtube_video. Each import creates a new ClipFile (per-user storage, no deduplication). Convention #13 applies — the duration cap is enforced against resolved video metadata, not against URL shape alone.
Presigned Upload¶
POST /api/v1/projects/{project_uuid}/clips/presign
{ "filename": "video.mp4", "content_type": "video/mp4" }
Returns a presigned URL. Client uploads directly to R2, then calls:
This queues process_clip_artifacts.
download_youtube_video Task¶
Downloads from YouTube and processes the video:
- Download with yt-dlp - Up to 1080p, best audio
- Upload original - Store in R2 at
clips/{prefix}/{uuid}/{filename} - Extract audio - 16kHz mono MP3 for Whisper
- Generate waveform - Array of peaks for timeline visualization
- Check codecs - Determine if proxy is needed
- Update ClipFile - Store paths and metadata
- Publish ClipReadyEvent - Notify frontend
If the video codec isn't web-compatible (HEVC, ProRes), we queue generate_clip_proxy onto the separate proxy_broker (queue: proxy). Keeping proxy generation off the download queue means audio extraction for the next import doesn't have to wait behind a slow FFmpeg re-encode.
process_clip_artifacts Task¶
For user-uploaded videos (already in R2):
- Download from R2 - Fetch the uploaded file
- Extract audio - Same 16kHz mono MP3
- Generate waveform - Same peaks array
- Check codecs - Same compatibility check
- Update ClipFile - Store paths
- Publish ClipReadyEvent
generate_clip_proxy Task¶
Runs on the dedicated proxy_broker (queue: proxy), executed by the taskiq-proxy-worker container. CPU-heavy re-encodes can take 1-3× source duration, so decoupling this broker from download_broker prevents audio extraction for newly imported clips from queuing behind a long transcode.
Creates a web-compatible preview for videos that browsers can't play natively, plus a timeline scrub sprite for instant thumbnails:
- Download original - From R2
- Transcode + sprite - Chained FFmpeg single-decode pass: 480p H.264 + AAC audio (with
+faststartand dense-g 60 -keyint_min 30 -sc_threshold 0keyframes for ~2s seek granularity) plus a 10×20 grid sprite (160×90 tiles, JPEG) from the same-i - Upload both - Proxy to
clips/{prefix}/{uuid}/proxy.mp4, sprite toclips/{prefix}/{uuid}/sprite.jpg - Update ClipFile - Set
proxy_key,sprite_key, andsprite_seconds_per_tile
The proxy is used for preview in the timeline. The sprite powers instant scrub thumbnails while dragging the playhead. The original is used for final render. Sprite density (seconds_per_tile) is chosen at generation time as max(1, ceil(duration_s / 200)) — short clips get 1s/tile, long clips scale down so the sprite stays a fixed 10×20 grid. See TIER_3_SPRITE_PLAN.md for the full design.
Audio Extraction¶
We use FFmpeg to extract audio in Whisper-compatible format:
-ar 16000- 16kHz sample rate (Whisper requirement)-ac 1- Mono channel-f mp3- MP3 format (good compression, fast)
Waveform Generation¶
The waveform is an array of peak amplitudes used for the timeline visualization:
We generate ~100 peaks per second of video. The frontend renders these as vertical bars in the timeline.
Codec Compatibility¶
Web browsers can play: - Video: H.264, VP8, VP9 - Audio: AAC, MP3, Opus
If we detect non-compatible codecs (HEVC, ProRes, AV1 in some browsers), we generate a proxy for preview. The original stays intact for rendering.
Storage Keys¶
clips/{prefix}/{uuid}/{original_filename} # Original video
clips/{prefix}/{uuid}/audio.mp3 # Extracted audio
clips/{prefix}/{uuid}/proxy.mp4 # Web-compatible proxy
clips/{prefix}/{uuid}/sprite.jpg # Timeline scrub sprite (10x20 grid)
clips/{prefix}/{uuid}/waveform.json # Peak data
The {prefix} is the first 2 characters of the UUID, which helps S3 distribute files across partitions.
Key Files¶
| Component | Location |
|---|---|
| Download task | backend/src/workers/download/tasks.py:download_youtube_video |
| Process task | backend/src/workers/download/tasks.py:process_clip_artifacts |
| Proxy task | backend/src/workers/download/tasks.py:generate_clip_proxy |
| yt-dlp wrapper | backend/src/workers/download/youtube.py |
| Audio extraction | backend/src/workers/download/audio.py |
| Waveform generation | backend/src/infrastructure/waveform.py |