Skip to content

Waveform Editor

The Waveform tab inside the Subtitle Editor is a full editing surface, not just a viewer. It offers a large active-cue bar above the wave, text labels inside every region, visible keyframe and quality markers, and toolbar sliders for amplitude and playback rate.

The actual subtitle file (SRT or ASS) is rewritten only when you press Save in the modal — every drag and shortcut just stages an in-memory edit, so you can experiment freely.

┌─────────────────────────────────────────────────────────────┐
│ Cue 23 / 471 00:01:23.400 → 00:01:25.800 (2.400 s) │
│ Sie haben mich nicht ernst genommen. │ ← Active cue bar (A)
│ Aber das war ein Fehler. │
├─────────────────────────────────────────────────────────────┤
│ 0 5 10 15 20 25 … │ ← Sticky timeline (G)
│ ████░ ░░░ ████ ░░ █████ ░░░░ ████ │ ← Waveform with cue regions
│ "Hallo" "Du da?" "Stop!" "Ja, …" "Hört mich an." │ ← Region labels (B)
│ ╿ ╿ ╿ ╿ ╿ ╿ │ ← Keyframe markers (C, opt-in)
│ ▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▒▒ ▂▂▂▂▂▂▂ ▓▓ ▂▂▂▂▂▂▂▂▂▂▂ │ ← Gap/overlap markers (F)
└─────────────────────────────────────────────────────────────┘
  1. Open any cached subtitle from Library → Movie/Episode detail.
  2. Click Edit on a sidecar pill — the Subtitle Editor opens.
  3. Switch to the Waveform tab. The audio track is extracted on demand (cached per audio stream), so the first open per video may take a few seconds.

If the source video has multiple audio tracks (Japanese / English / commentary), a track picker shows up next to the toolbar — switching re-extracts and renders the chosen track.

  1. Click a cue in the Cues tab to select it. The selected region highlights on the waveform.
  2. Drag the region edges to retime start/end. The drag-end commit snaps to the closest anchor (see below) and enforces a minimum cue duration of 80 ms.
  3. Drag the region body to shift both edges by the same delta — useful for nudging a whole line without changing its length.
  4. Use the L/R click map for fast set:
    • Left click anywhere on the waveform → sets the selected cue’s start to that time (snapped).
    • Right click → sets the end.

The header pill on the cue turns “edited” once you’ve staged a change; Save writes it back to disk.

Drag-end and click commits use a three-pool snap algorithm: the closest anchor inside its tolerance wins; ties favour keyframes, then scenes, then neighbouring cues.

PoolDefault toleranceSource
Keyframes150 msffprobe -skip_frame nokey against the video
Scene changes200 msPySceneDetect (lazy-imported; install scenedetect to enable)
Neighbouring cue boundaries80 msThe currently parsed subtitle file

Scene markers paint as thin amber lines on the waveform. The “N scene cuts” status pill confirms how many were detected; if the host doesn’t have scenedetect installed, the pool stays empty and the algorithm falls back to keyframes + neighbours.

The editor follows Aegisub’s Medusa-style layout. Press ? at any time to open a shortcut overlay listing every action.

GroupKeysAction
PlaybackSpacePlay / pause
TimingS / DSet start / end at playhead
TimingF / GSplit at cursor / merge with next cue
Navigation / Seek 100 ms back / forward
NavigationShift+← / Shift+→Seek 1 s back / forward
Navigation / Select previous / next cue
Zoom+ / -Zoom in / out (multiplicative)
Help?Open this overlay

Shortcuts are intentionally suspended while the help modal is open so the close key never collides with another action.

ControlPurpose
Play/PauseToggles playback. Same as Space.
Edit / LockedDrag-edit on/off. Disabled in read-only mode.
ShortcutsOpens the shortcut overlay.
SpectrogramLayers a spectrogram view (BSD-3-Clause WaveSurfer plugin). Persisted in localStorage.
ScrubPlays a tiny audio window (~30 Hz throttled) while you drag, so you can hear what you’re snapping to.
Keyframes (v0.84+)Toggles thin teal vertical lines at every video keyframe so you can see the snap targets you’re aligning to. Off by default — keyframes are dense (~one per 0.5 s) and clutter the wave. Persisted.
Zoom slider (horizontal)1–50 px/sec. Linear slider; the +/- keys do exponential steps.
Amplitude zoom slider (vertical) (v0.84+)1×–5× scaling of the wave’s barHeight. Crucial for editing quiet dialogue — without it whispers look flat. Persisted.
Playback rate slider (v0.84+)0.5×–2.0× with preservesPitch=true, so slowing dialogue down doesn’t make it sound chipmunk-ish. Aegisub-style audition workflow. Persisted.
Auto-centerKeeps the selected cue in view as you switch cues.
Audio trackHidden when the video has only one audio stream.

Two coloured bars along the bottom 6 px of the wave flag adjacent-cue defects:

  • Red bar — Two cues overlap (next.start < prev.end). The bar spans the overlap interval; hover for Overlap: cues N ↔ N+1. Most players render overlapped cues unpredictably; the editor should fix these before save.
  • Amber bar — Tight gap (default < 80 ms). Spans the gap; hover for Tight gap (<80 ms): cues N → N+1. Whether you fix or keep depends on the source — fast dialogue often demands tight chaining.

Both layers are always on — they’re the kind of quality signal you want to see permanently. To suppress them in a per-component config, set showGapOverlapMarkers={false} on WaveformEditor (no UI toggle shipped because the bars are intentionally cheap).

When the active cue is an ASS karaoke line (\k<cs>, \K<cs>, \kf<cs>, \ko<cs> overrides), the editor paints a thin purple tick per syllable on top of the waveform. This is display-only. Sublarr deliberately does not retime karaoke — that workflow stays in Aegisub.

  • Read-only fall-back: if your install doesn’t grant write access to the subtitle file, the Waveform tab still loads but the Edit/Locked toggle stays locked.
  • Scene detection is best-effort: PySceneDetect is lazy-imported. If it’s missing you’ll just see no scene markers — no warning, no failure. To enable, install the optional dep on the host or rebuild the Docker image with pip install scenedetect.
  • Per-track caching: Audio extraction caches per (video_path, mtime, track_index). Switching tracks repeatedly during a session is cheap.