Settings — Transcription

Transcription is the fallback when none of the configured subtitle providers has a match: Sublarr extracts the audio track, runs it through a speech-to-text model, and writes the result as a subtitle file. It’s compute-heavy and accuracy varies by language and audio quality, but it’s the only way to subtitle obscure content where no community release exists.

Backends

Backend	Where it runs	When to pick it
Whisper (built-in)	Inside Sublarr’s container, CPU or GPU	Self-contained, no extra service. Slow on CPU.
WhisperAI / OpenAI Whisper API	OpenAI’s hosted endpoint	Fast, paid per minute. No GPU needed locally.
Subgen	Self-hosted Subgen container	Best for batch / GPU users; offloads CPU+GPU pressure off Sublarr.

Whisper (built-in)

The bundled Whisper runs faster-whisper against a model file you configure. Models are auto-downloaded on first use to /config/whisper/models/.

Setting	Default	Values	Effect
Enabled	off	toggle	Master switch for the built-in backend.
Model	`base`	`tiny / base / small / medium / large-v3`	Larger = more accurate but slower and more memory.
Device	`auto`	`auto / cpu / cuda`	`auto` picks GPU when available.
Compute type	`int8`	`int8 / float16 / float32`	Quantisation. `int8` on CPU is fast; `float16` on GPU is the quality sweet spot.
Language hint	empty	ISO 639-1	Bias the model toward a known source language. Empty = auto-detect.

WhisperAI (OpenAI hosted)

Setting	Effect
Enabled	Master switch.
API key	OpenAI API key with Whisper access.
Model	Default `whisper-1`.
Price (per min)	Default `$0.006`. Updates if OpenAI changes pricing — verify against your invoice.

OpenAI’s Whisper API runs the large-v2 model on their infrastructure. Quality is excellent; cost is per-minute of audio.

Subgen

Subgen is a separate self-hosted service for Whisper transcription with a queue. Recommended when you have a GPU and want to keep transcription off Sublarr’s process.

Setting	Effect
Enabled	Master switch.
Subgen URL	Base URL of the Subgen service.
API key	Auth token (when configured on Subgen).
Model	Override Subgen’s default; usually leave empty.

When does transcription run?

Sublarr offers transcription in three places:

Trigger	When
Manual: Library → Sidecar → Transcribe	One-off; you click.
Auto: when no provider returns a match	Configurable per profile (Settings → Subtitles → Languages → Transcribe on miss).
Batch: Wanted → Transcribe selected	Same as manual but for many items.

Quality expectations

Audio quality	Whisper `medium` accuracy	`large-v3` accuracy
Studio dialogue, single speaker	~ 95%	~ 98%
TV / movie dialogue, mixed	~ 85%	~ 92%
Anime with overlapping dialogue, music	~ 65–80%	~ 80–88%
Noisy field recording	~ 50%	~ 65%

Numbers are word-error-rate (WER) inverses; treat them as ballparks, not guarantees.

Cost (Whisper API)

For OpenAI’s Whisper API at $0.006 / min:

Content	Approx cost
24-min anime episode	~ $0.14
50-min TV episode	~ $0.30
110-min movie	~ $0.66
12-episode anime season	~ $1.70

Self-hosted Whisper or Subgen has no per-minute cost — only your hardware time.

Output post-processing

The raw Whisper output is rough — long monologue lines, no cue breaks, [BLANK_AUDIO] markers. Sublarr applies a post-processing pass:

Step	Purpose
Cue splitting	Insert breaks at natural pause boundaries.
Punctuation	Re-capitalisation and final punctuation.
Min cue duration	Merge cues shorter than 80 ms with the next.
Strip Whisper noise tags	Remove `[BLANK_AUDIO]`, `[Music]`, etc.

The post-processing is shared with the regular subtitle pipeline — see Post-Processing for the full set of fixes.

Where transcribed subtitles land

In the same place as provider-downloaded subtitles: a sidecar next to the source video, named per Settings → Subtitles → Format & Naming. The provider field in Activity → History shows whisper:built-in, whisper:api, or subgen so you can audit which backend produced each subtitle.