Skip to content

Timeline Coordinate Systems

Sapari uses two coordinate systems for positioning edits and captions on the timeline. Understanding the distinction is critical when working with the frontend.

Main-Video-Relative Time (Backend)

The backend stores all edit positions relative to the main video, where 0 = the first frame of the uploaded video. This is the canonical coordinate system — it never changes regardless of what intro/outro assets exist.

Main video: [0ms ────────────────────── 60000ms]
Edit at:     [5000ms ── 8000ms]   (silence removal)

All Edit records in the database use this coordinate system. The start_ms and end_ms fields always reference the main video timeline.

For multi-clip projects, the "main video" is the concatenation of all Clip rows in display_order, and edit.end_ms is bounded by SUM(ClipFile.duration_ms) across the project's clips. EditService.create and EditService.update enforce end_ms ≤ total via clip.utils.get_project_total_duration_ms. See that helper's docstring for how the ready/not-ready/empty states are handled; the short version is that edits on a project with no clips are rejected, edits on a project where any clip is still processing skip the check, and the bulk-insert worker path is intentionally unvalidated (trusted caller). Two exceptions bypass the ceiling, both rooted in the insert mechanism extending output_duration_ms at render (workers/render/steps.py adds total_insert_ms). (1) INSERT-mode asset edits — bounded instead by raw duration (end_ms - start_ms ≤ MAX_VIDEO_DURATION_MS) as deferred-exploit defense. (2) Non-INSERT asset edits positioned inside an insert (inside_insert_edit_id set) — anchor mode positions via insert_offset_ms and the renderer intersects against the host's output range; merge mode encodes start_ms = splice_point + offset_within_insert so the visual cursor lands at the user's drop point, which legitimately puts end_ms > total because shift_edits_for_inserts accounts for the host's expansion. All other asset edits (OVERLAY, REPLACE, NONE — without inside_insert_edit_id) follow the same ceiling as non-asset edits because the renderer doesn't extend for them — visible truncation is recorded in the data (clamped at submit) rather than handled at render time.

Body-drag of OVERLAY/REPLACE asset edits conditionally auto-extends end_ms: Dashboard.handleAdjustEdit recomputes end_ms = min(newStart + naturalVisible, totalDurationMs) on move only when the asset is at-or-beyond its natural visible length (mainDuration >= naturalVisibleMs - 1). The visible portion then grows when the user drags left into more available space, matching the "drag-left = less cropped" mental model. If the user has explicitly trimmed the asset (mainDuration < naturalVisibleMs), the trimmed duration is preserved on move — without that guard, moving a trimmed asset would resize it back to full source duration. INSERT preserves duration on move (splice point shifts only). Non-asset edits also preserve duration. The auto-extend path is the only move-time clamp to totalDurationMs; the default mainEnd = mainStart + mainDuration path is intentionally unclamped, because a non-insert asset dropped INSIDE an existing INSERT region is encoded as start_ms = splice + offset_into_insert and can legitimately have start_ms > totalDurationMs. A universal clamp there would produce end_ms <= start_ms, trip the renderer's width-≤0 guard, and the asset would silently vanish on release (issue #234).

Creating a non-insert asset INSIDE an existing INSERT region: Dashboard.handleAssetInsert mirrors the drag-into-insert flow when the picker fires at a playhead inside an INSERT. The asset is anchored via two first-class schema fields, inside_insert_edit_id (FK → edit.uuid) and insert_offset_ms (offset into that insert's main-time range). The renderer reads these in apply_inside_insert_anchors and schedules the asset at <anchored insert's output_start_ms> + insert_offset_ms, bypassing the main-time shift entirely. A denormalized start_ms = splice_point + offset_into_insert is also written so legacy reload-time heuristics keep working. The asset's insert_overlap_modes[anchor_id] picks between three modes per insert: 'merge' (default — plays for the full visible duration, can overflow into post-insert main video; soundtracking case), 'split' (asset is split around the insert, insert region skipped), or 'anchor' (asset clipped to the host insert's output_end_ms, overflow rendered dimmed in the editor; "only play during the insert" case). Common case: audio asset placed under an inserted video clip to soundtrack it (merge). INSERT-inside-INSERT is not representable and bails silently; intro/outro fixed regions also bail silently (no editId to overlap with). Without the schema fields the asset picker was a silent no-op inside inserts and the asset appeared to "disappear" at export time — issue #234.

Deleting an INSERT with anchored children: EditService.delete runs _repoint_anchored_children before the row is removed — anchored children are moved to the splice point (start_ms = deleted_insert.start_ms, preserving each child's visible duration), their inside_insert_edit_id + insert_offset_ms are cleared, and the deleted insert's UUID is stripped from each child's insert_overlap_modes. The FK has ON DELETE SET NULL as a safety net, but the renderer can't schedule an asset with a NULL anchor and a stale offset — the service-layer repoint is the primary path.

Effective Timeline (Frontend)

When an intro clip exists, the frontend needs to display it before the main video on the timeline. This shifts everything forward by introOffsetMs (the intro's duration).

Intro:      [0ms ── 3000ms]
Main video: [3000ms ────────────────────── 63000ms]
Edit at:     [8000ms ── 11000ms]  (same edit, shifted by 3000ms)
Outro:      [63000ms ── 66000ms]

The effective timeline is what users see and interact with. It includes intro, main video, and outro as one continuous strip.

The Shift: introOffsetMs

introOffsetMs = intro clip exists ? intro.duration_ms : 0

Converting Backend → Frontend (Display)

Add the offset to show edits at correct positions on the effective timeline:

const displayStartMs = edit.start_ms + introOffsetMs;
const displayEndMs = edit.end_ms + introOffsetMs;

Converting Frontend → Backend (Save)

Subtract the offset before sending to the API:

const backendStartMs = displayStartMs - introOffsetMs;
const backendEndMs = displayEndMs - introOffsetMs;

Display Data Pattern

All coordinate conversion happens once in Dashboard.tsx, producing display* versions of backend data:

Raw (backend) Display (frontend) How
edits displayEdits Shift non-fixed edits by introOffsetMs, add fixed-position assets at correct effective positions
captionLines displayCaptionLines Shift timestamps by introOffsetMs

Components receive the display* versions and never need to do offset math themselves.

Fixed-Position Assets

Intro, outro, watermark, and background audio have fixedPosition set. These assets must not be shifted like normal edits — they have their own positioning logic:

Asset Effective Timeline Position
Intro 0intro.duration_ms
Outro effectiveTotalDurationMs - outro.duration_mseffectiveTotalDurationMs
Watermark 0effectiveTotalDurationMs (spans entire timeline)
Background Audio 0effectiveTotalDurationMs

The displayEdits computation filters out fixed-position edits from the normal shift, then re-adds them at the correct positions:

// 1. Filter out fixed-position edits
const shiftableEdits = edits.filter(e => !e.fixedPosition);

// 2. Shift normal edits
const shifted = shiftableEdits.map(e => ({
  ...e,
  start_ms: e.start_ms + introOffsetMs,
  end_ms: e.end_ms + introOffsetMs,
}));

// 3. Add fixed assets at correct positions
const introEdit = { start_ms: 0, end_ms: introDuration, ... };
const outroEdit = { start_ms: totalDuration - outroDuration, end_ms: totalDuration, ... };
const watermarkEdit = { start_ms: 0, end_ms: totalDuration, ... };

Per-Asset Crop Reframe (issue #235)

REPLACE and INSERT video/image assets default to letterbox when their source aspect differs from the project's target. Users can opt into cover-crop reframe per edit via four fields on the Edit row: asset_crop_enabled, asset_crop_zoom (∈ [1.0, 10.0]), asset_crop_pan_x (∈ [-1.0, 1.0]), asset_crop_pan_y (∈ [-1.0, 1.0]).

Encoding decision: UI-state, not render-state. The persisted form is zoom + pan, not a frozen CropRegion {x, y, w, h}. Rationale: if the user later changes project aspect ratio, zoom/pan auto-adapts (the renderer recomputes the crop window against the new target aspect, same way main-video crop already does at Dashboard.tsx:752). A frozen region would stay matched to the old target aspect and produce a fresh letterbox on the new one.

Renderer chain. When asset_crop_enabled, build_video_replace_segments calls _maybe_crop_chain(edit, target_w, target_h) per asset segment. That helper bails to None when crop is disabled, target dims are unknown, or the asset's source dims (AssetFile.width/height) are missing — the renderer then takes today's letterbox path. When a chain is built, it's scale=<cover>,crop=<region>,scale=<exact target>,setsar=1 — cover-scale uses FFmpeg's max(...) (inverse of letterbox min(...)), the crop region comes from _compute_crop_region (Python port of frontend/shared/lib/cropUtils.ts:computeCropRegion), and the final exact-target rescale absorbs the ±1 px rounding from build_crop_filter's even-pixel constraint. _build_asset_video_chain reads seg.crop_chain and injects it in place of the uniform letterbox scale_part — same shape works for both video-input and image-input branches.

Legacy assets (uploaded before the asset-dim probe migration a4c9e2b6f085) have NULL AssetFile.width/height. The UI surfaces the crop toggle as disabled with a re-upload tooltip; the renderer's _maybe_crop_chain returns None, falling back to letterbox. No silent corruption — the data round-trips, just renders without crop.

Frontend preview parity. VideoWindow splits its render tree into two sibling layers inside the outer crop container: a cropped layer holding the main clips (and hidden audio elements) under the project main-video crop transform, and an uncropped asset layer holding intro/outro/INSERT clips plus REPLACE broll. The split mirrors the renderer: main-video crop applies only to MAIN segments, not to asset segments. Within the asset layer, each asset clip independently swaps object-containobject-cover and applies computeCropTransform(zoom, panX, panY) when its own assetCropEnabled is true (otherwise it letterboxes). Before the split, intro/outro/INSERT inherited the main-video crop AND compounded with their own per-asset crop — zoom visibly doubled. After the split, the two crops compose the same way as the export: independently per layer.

Swap-mode reset. Dashboard.handleAssetInserted explicitly resets all four crop fields when the user replaces an asset under an existing edit — the persisted zoom/pan was framed against the old asset's source aspect, so carrying it over would produce nonsense framing on a different-aspect asset.

Common Gotchas

Shifting fixed-position edits: Raw edits from the backend includes intro/outro/watermark. If you shift ALL edits by introOffsetMs, fixed assets end up at wrong positions. Always filter them out first.

Watermark span: When intro/outro exist, the watermark needs start_ms: 0, end_ms: effectiveTotalDurationMs — not just the main video portion.

Drag operations: When users drag edits on the timeline, the drag position is in effective-timeline coordinates. Subtract introOffsetMs before calling updateEditLocally or saving to the backend.

Seeking to edits: When seeking the video player to an edit from raw edits state, add introOffsetMs to get the correct effective-timeline position.


← Workers Backend Models →