Download Pipeline¶
When users upload a video or import from YouTube, we process it to extract audio, generate waveforms, and optionally create a web-compatible proxy.
Two Entry Points¶
YouTube Import¶
POST /api/v1/projects/{project_uuid}/clips/youtube-import
{ "url": "https://youtube.com/watch?v=..." }
Fetches video metadata synchronously (yt-dlp, wrapped in asyncio.to_thread + asyncio.wait_for with YOUTUBE_METADATA_TIMEOUT_SECONDS) and rejects the request with 400 ValidationError if the video duration exceeds MAX_YOUTUBE_DURATION_SECONDS (3600 — matches render pipeline timeout) or if metadata cannot be fetched / fetch times out. On pass, creates a Clip + ClipFile record and queues download_youtube_video. Each import creates a new ClipFile (per-user storage, no deduplication). Convention #13 applies — the duration cap is enforced against resolved video metadata, not against URL shape alone.
Presigned Upload (single-PUT, files < 25 MiB)¶
POST /api/v1/projects/{project_uuid}/clips/presign
{ "filename": "video.mp4", "content_type": "video/mp4" }
Returns a presigned URL. Client uploads directly to R2, then calls:
This queues process_clip_artifacts.
Multipart Upload (files ≥ 25 MiB)¶
For files at or above MULTIPART_CUTOFF_BYTES (25 MiB), the client uses the multipart endpoints under /clips/multipart/:
POST /clips/multipart/initiate → upload_id + initial 50 part URLs
PUT <part URL> (× N parts in parallel)
POST /clips/multipart/complete → finalize, kick process_clip_artifacts
POST /clips/multipart/abort → cancel + release R2 parts
GET /clips/multipart/parts → list parts already on R2 (resume)
GET /clips/multipart/parts/urls → refill URLs as upload progresses
ClipFile lifecycle: PENDING → MULTIPART_INITIATED → UPLOADED (or FAILED after UPLOADED; rows still in MULTIPART_INITIATED are deleted on user cancel, complete-time over-quota, or cron sweep — never flipped to FAILED). The complete endpoint queues process_clip_artifacts, mirroring the single-PUT confirm path. Stuck multipart uploads (no complete/abort within STUCK_MULTIPART_MINUTES, default 180) are swept by the recover_stuck_tasks cron, which calls abort_multipart_upload on R2 to release any orphaned parts AND deletes the Clip + ClipFile rows so the user doesn't end up with a zombie FAILED card whose Retry can't actually resume (R2's abort destroys the parts). See docs/backend/api.md Multipart Upload sections for full request/response schemas.
download_youtube_video Task¶
Downloads from YouTube and processes the video:
- Download with yt-dlp - Up to 1080p, best audio
- Upload original - Store in R2 at
clips/{prefix}/{uuid}/{filename} - Extract audio - 16kHz mono MP3 for Whisper
- Generate waveform - Array of peaks for timeline visualization
- Check codecs - Determine if proxy is needed
- Update ClipFile - Store paths and metadata
- Publish ClipReadyEvent - Notify frontend
If the video codec isn't web-compatible (HEVC, ProRes), we queue generate_clip_proxy onto the separate proxy_broker (queue: proxy). Keeping proxy generation off the download queue means audio extraction for the next import doesn't have to wait behind a slow FFmpeg re-encode.
process_clip_artifacts Task¶
For user-uploaded videos (already in R2):
- Download from R2 - Fetch the uploaded file
- Probe codecs - ffprobe video + audio codec names. If no audio stream is present (iOS ReplayKit screen recordings, etc.) the task switches to the silent branch:
- Synthesize a silent WAV matching the source duration via
anullsrcsoaudio_keyconsumers don't have to handleNone - Skip waveform compute for an audible source; the synthesized track produces a zeroed waveform naturally
- Mark
has_audio=Falseon the ClipFile so the analysis pipeline and frontend can gate audio-dependent steps - Extract audio (audible branch only) - 16kHz mono WAV for Whisper
- Generate waveform - Peaks array
- Update ClipFile - Store paths +
has_audio - Publish ClipReadyEvent
The proxy transcoder uses -map 0:a? (optional) so the same audio-less inputs survive proxy generation without a second ffprobe roundtrip — missing audio is dropped at the filter graph instead of failing the encode.
generate_clip_proxy Task¶
Runs on the dedicated proxy_broker (queue: proxy), executed by the taskiq-proxy-worker container. CPU-heavy re-encodes can take 1-3× source duration, so decoupling this broker from download_broker prevents audio extraction for newly imported clips from queuing behind a long transcode.
Creates a web-compatible preview for videos that browsers can't play natively, plus a timeline scrub sprite for instant thumbnails:
- Download original - From R2
- Transcode + sprite - Chained FFmpeg single-decode pass: 480p H.264 + AAC audio (with
+faststartand dense-g 60 -keyint_min 30 -sc_threshold 0keyframes for ~2s seek granularity) plus a 10×20 grid sprite (160×90 tiles, JPEG) from the same-i - Upload both - Proxy to
clips/{prefix}/{uuid}/proxy.mp4, sprite toclips/{prefix}/{uuid}/sprite.jpg - Update ClipFile - Set
proxy_key,sprite_key, andsprite_seconds_per_tile
The proxy is used for preview in the timeline. The sprite powers instant scrub thumbnails while dragging the playhead. The original is used for final render. Sprite density (seconds_per_tile) is chosen at generation time as max(1, ceil(duration_s / 200)) — short clips get 1s/tile, long clips scale down so the sprite stays a fixed 10×20 grid. See TIER_3_SPRITE_PLAN.md for the full design.
Failed-Clip Retry¶
Clips that exit the pipeline at status = FAILED can be re-queued via POST /api/v1/projects/{project_uuid}/clips/{clip_uuid}/retry. The service inspects the clip's artifact state and dispatches via RetryTaskKind:
YOUTUBE_DOWNLOAD—storage_keyis NULL; re-queuesdownload_youtube_video.GENERATE_PROXY—audio_keyset,proxy_keyNULL, source codec non-web-compatible; re-queuesgenerate_clip_proxydirectly. Skippingprocess_clip_artifactsis required: it would short-circuit on the audio_key idempotency check and never re-queue proxy.PROCESS_ARTIFACTS— fallback for everything else.
The underlying tasks HEAD their R2 outputs before re-running FFmpeg, so retrying work already on R2 finalizes the DB write without burning compute.
Audio Extraction¶
We use FFmpeg to extract audio in Whisper-compatible format:
-ar 16000- 16kHz sample rate (Whisper requirement)-ac 1- Mono channel-f mp3- MP3 format (good compression, fast)
Waveform Generation¶
The waveform is an array of peak amplitudes used for the timeline visualization:
We generate ~100 peaks per second of video. The frontend renders these as vertical bars in the timeline.
Codec Compatibility¶
Web browsers can play: - Video: H.264, VP8, VP9 - Audio: AAC, MP3, Opus
If we detect non-compatible codecs (HEVC, ProRes, AV1 in some browsers, animated GIF), we generate a proxy for preview. The original stays intact for rendering.
Storage Keys¶
clips/{prefix}/{uuid}/{original_filename} # Original video
clips/{prefix}/{uuid}/audio.mp3 # Extracted audio
clips/{prefix}/{uuid}/proxy.mp4 # Web-compatible proxy
clips/{prefix}/{uuid}/sprite.jpg # Timeline scrub sprite (10x20 grid)
clips/{prefix}/{uuid}/waveform.json # Peak data
The {prefix} is the first 2 characters of the UUID, which helps S3 distribute files across partitions.
Key Files¶
| Component | Location |
|---|---|
| Download task | backend/src/workers/download/tasks.py:download_youtube_video |
| Process task | backend/src/workers/download/tasks.py:process_clip_artifacts |
| Proxy task | backend/src/workers/download/tasks.py:generate_clip_proxy |
| Retry endpoint | backend/src/interfaces/api/v1/clips.py:retry_clip_processing |
| yt-dlp + audio extraction + codec probes | backend/src/workers/download/media.py |
| DB helpers (Convention #16 shape 2) | backend/src/workers/download/context.py |
| Waveform generation | backend/src/infrastructure/waveform.py |