Skip to content

Settings — Transcription

Transcription is the fallback when none of the configured subtitle providers has a match: Sublarr extracts the audio track, runs it through a speech-to-text model, and writes the result as a subtitle file. It’s compute-heavy and accuracy varies by language and audio quality, but it’s the only way to subtitle obscure content where no community release exists.

BackendWhere it runsWhen to pick it
Whisper (built-in)Inside Sublarr’s container, CPU or GPUSelf-contained, no extra service. Slow on CPU.
WhisperAI / OpenAI Whisper APIOpenAI’s hosted endpointFast, paid per minute. No GPU needed locally.
SubgenSelf-hosted Subgen containerBest for batch / GPU users; offloads CPU+GPU pressure off Sublarr.

The bundled Whisper runs faster-whisper against a model file you configure. Models are auto-downloaded on first use to /config/whisper/models/.

SettingDefaultValuesEffect
EnabledofftoggleMaster switch for the built-in backend.
Modelbasetiny / base / small / medium / large-v3Larger = more accurate but slower and more memory.
Deviceautoauto / cpu / cudaauto picks GPU when available.
Compute typeint8int8 / float16 / float32Quantisation. int8 on CPU is fast; float16 on GPU is the quality sweet spot.
Language hintemptyISO 639-1Bias the model toward a known source language. Empty = auto-detect.
SettingEffect
EnabledMaster switch.
API keyOpenAI API key with Whisper access.
ModelDefault whisper-1.
Price (per min)Default $0.006. Updates if OpenAI changes pricing — verify against your invoice.

OpenAI’s Whisper API runs the large-v2 model on their infrastructure. Quality is excellent; cost is per-minute of audio.

Subgen is a separate self-hosted service for Whisper transcription with a queue. Recommended when you have a GPU and want to keep transcription off Sublarr’s process.

SettingEffect
EnabledMaster switch.
Subgen URLBase URL of the Subgen service.
API keyAuth token (when configured on Subgen).
ModelOverride Subgen’s default; usually leave empty.

Sublarr offers transcription in three places:

TriggerWhen
Manual: Library → Sidecar → TranscribeOne-off; you click.
Auto: when no provider returns a matchConfigurable per profile (Settings → Subtitles → Languages → Transcribe on miss).
Batch: Wanted → Transcribe selectedSame as manual but for many items.
Audio qualityWhisper medium accuracylarge-v3 accuracy
Studio dialogue, single speaker~ 95%~ 98%
TV / movie dialogue, mixed~ 85%~ 92%
Anime with overlapping dialogue, music~ 65–80%~ 80–88%
Noisy field recording~ 50%~ 65%

Numbers are word-error-rate (WER) inverses; treat them as ballparks, not guarantees.

For OpenAI’s Whisper API at $0.006 / min:

ContentApprox cost
24-min anime episode~ $0.14
50-min TV episode~ $0.30
110-min movie~ $0.66
12-episode anime season~ $1.70

Self-hosted Whisper or Subgen has no per-minute cost — only your hardware time.

The raw Whisper output is rough — long monologue lines, no cue breaks, [BLANK_AUDIO] markers. Sublarr applies a post-processing pass:

StepPurpose
Cue splittingInsert breaks at natural pause boundaries.
PunctuationRe-capitalisation and final punctuation.
Min cue durationMerge cues shorter than 80 ms with the next.
Strip Whisper noise tagsRemove [BLANK_AUDIO], [Music], etc.

The post-processing is shared with the regular subtitle pipeline — see Post-Processing for the full set of fixes.

In the same place as provider-downloaded subtitles: a sidecar next to the source video, named per Settings → Subtitles → Format & Naming. The provider field in Activity → History shows whisper:built-in, whisper:api, or subgen so you can audit which backend produced each subtitle.