Analysis Pipeline¶
The analysis pipeline takes a video and produces edits (regions to cut). It runs Whisper for transcription, then detects silences from word gaps and false starts with an LLM.
Pipeline Structure¶
The pipeline is a DAG defined in backend/src/workers/analysis/pipeline.py:
flowchart TB
A[Load Audio] --> B[Transcribe]
B --> C[Detect Silences]
B --> D[Detect False Starts]
B --> IT[Improve Transcript]
B --> IA[Insert Fixed Assets]
IT --> P[Censor Profanity]
C --> E[Validate Edits]
D --> E
E --> F[Create Edits]
P --> F
IA --> F
F --> G[Update Project]
style B fill:#ff3300,color:#fff
style C fill:#ff6633,color:#fff
style D fill:#ff6633,color:#fff
style P fill:#ef4444,color:#fff
Silence detection, false start detection, transcript improvement, and fixed asset insertion run in parallel after transcription. Profanity censoring runs after transcript improvement (so the matcher sees natural text). All branches merge at create edits.
Triggering Analysis¶
Analysis is triggered via the API:
sequenceDiagram
participant Client
participant API
participant Redis
participant Worker
participant Whisper as Whisper API
participant LLM as DeepSeek / GPT-5
Client->>API: POST /projects/{uuid}/analyze
API->>Redis: Queue analyze_project
API->>Client: 202 Accepted
Worker->>Redis: Pick up task
Worker->>Worker: Load audio from R2
Worker->>Whisper: Send audio
Whisper->>Worker: Transcription + word timing
par Parallel Detection
Worker->>Worker: Detect silences
and
Worker->>LLM: Detect false starts
LLM->>Worker: False start regions
end
Worker->>Worker: Validate & merge edits
Worker->>API: Create Edit records
Worker-->>Client: AnalysisCompleteEvent (SSE)
POST /api/v1/projects/{project_uuid}/analyze
{
"pacing_level": 50,
"false_start_sensitivity": 50,
"language": "en"
}
This queues an analyze_project task on the analysis broker. The task runs the pipeline and publishes events as steps complete.
Steps¶
LoadAudioStep¶
Downloads the pre-extracted audio from R2. During clip processing, we extract audio at 16kHz mono (Whisper-compatible) and store it alongside the video.
- Input:
project_uuid - Output:
Pathto local audio file
TranscribeStep¶
Sends the audio to OpenAI's Whisper API.
- Input: Audio file path
- Output:
TranscriptionResultwith words, timestamps, and duration
The result includes word-level timing like:
DetectSilencesStep¶
Analyzes gaps between words to find pauses worth cutting.
- Input: Transcription + audio file
- Output:
list[SilenceRegion]with start/end ms and confidence
The pacing_level parameter (0-100) controls sensitivity. Higher values mean more aggressive silence detection - a fast-paced video might use 70, a contemplative one might use 30.
We also analyze the audio waveform to confirm silences aren't just transcription gaps but actual quiet periods.
DetectFalseStartsStep¶
Uses an LLM to find repeated phrases where someone started a sentence, stopped, and tried again.
- Input: Transcription text
- Output:
list[FalseStartRegion]with abandoned/completed text
The LLM prompt asks it to find patterns like "I think... I think we should" where the first "I think" should be cut. For long transcripts, words are split into overlapping chunks and processed independently.
The false_start_sensitivity parameter (0-100) controls how aggressive detection is. Higher values mean more detections (lower confidence threshold).
ValidateEditsStep¶
Cleans up the detected edits before saving:
- Merges overlapping edits (within 50ms proximity)
- Prefers
FALSE_STARTtype when merging (more significant) - Optionally uses LLM to judge edit quality
CreateEditsStep¶
Bulk creates Edit records in the database, tagged with the current analysis_run_id. Old runs' edits are preserved (no clearing).
Edit(
project_id=project.id,
analysis_run_id=run.uuid, # Tags edit to this run
type=EditType.SILENCE, # or FALSE_START, PROFANITY, ASSET
start_ms=1000,
end_ms=2500,
active=True,
confidence=0.85,
reason="Word gap of 1.5s detected between words",
reason_tag="word_gap",
)
UpdateProjectStep¶
Marks the project as analyzed and stores metadata:
- Sets
statustoANALYZED - Stores transcript text in
transcript(project convenience copy) - Stores word timing in
transcript_words - Generates caption lines tagged with
analysis_run_id - Publishes
AnalysisCompleteEvent
The task then stores transcript + counts on the AnalysisRun record and sets project.active_run_id to the new run.
Edit Types¶
flowchart LR
subgraph Detection["Detection Sources"]
W[Word Gaps] --> S[SILENCE]
WF[Waveform Analysis] --> S
LLM[LLM Analysis] --> F[FALSE_START]
end
subgraph Output["Edit Records"]
S --> E[Edit Record]
F --> E
end
E --> V{User Review}
V -->|Toggle On| R[Include in Render]
V -->|Toggle Off| X[Exclude from Render]
| Type | Source | Action | Description |
|---|---|---|---|
SILENCE |
Word gaps + waveform | CUT | Pauses in speech |
FALSE_START |
LLM analysis | CUT | Repeated/abandoned phrases |
PROFANITY |
Dictionary matcher | MUTE | Words to censor (bleep/silence) |
Users can toggle edits on/off before rendering. Only active edits are applied during render.
Key Files¶
| Component | Location |
|---|---|
| Pipeline definition | backend/src/workers/analysis/pipeline.py |
| Task entry | backend/src/workers/analysis/tasks.py |
| Silence detection | backend/src/workers/analysis/silence/ |
| False start detection | backend/src/workers/analysis/false_starts/ |
| Edit creation | backend/src/workers/analysis/edits/step.py |
| Transcription | backend/src/workers/analysis/transcription/step.py |