Settings — Transcription
Transcription is the fallback when none of the configured subtitle providers has a match: Sublarr extracts the audio track, runs it through a speech-to-text model, and writes the result as a subtitle file. It’s compute-heavy and accuracy varies by language and audio quality, but it’s the only way to subtitle obscure content where no community release exists.
Backends
Section titled “Backends”| Backend | Where it runs | When to pick it |
|---|---|---|
| Whisper (built-in) | Inside Sublarr’s container, CPU or GPU | Self-contained, no extra service. Slow on CPU. |
| WhisperAI / OpenAI Whisper API | OpenAI’s hosted endpoint | Fast, paid per minute. No GPU needed locally. |
| Subgen | Self-hosted Subgen container | Best for batch / GPU users; offloads CPU+GPU pressure off Sublarr. |
Whisper (built-in)
Section titled “Whisper (built-in)”The bundled Whisper runs faster-whisper against a model file you configure. Models are auto-downloaded on first use to /config/whisper/models/.
| Setting | Default | Values | Effect |
|---|---|---|---|
| Enabled | off | toggle | Master switch for the built-in backend. |
| Model | base | tiny / base / small / medium / large-v3 | Larger = more accurate but slower and more memory. |
| Device | auto | auto / cpu / cuda | auto picks GPU when available. |
| Compute type | int8 | int8 / float16 / float32 | Quantisation. int8 on CPU is fast; float16 on GPU is the quality sweet spot. |
| Language hint | empty | ISO 639-1 | Bias the model toward a known source language. Empty = auto-detect. |
WhisperAI (OpenAI hosted)
Section titled “WhisperAI (OpenAI hosted)”| Setting | Effect |
|---|---|
| Enabled | Master switch. |
| API key | OpenAI API key with Whisper access. |
| Model | Default whisper-1. |
| Price (per min) | Default $0.006. Updates if OpenAI changes pricing — verify against your invoice. |
OpenAI’s Whisper API runs the large-v2 model on their infrastructure. Quality is excellent; cost is per-minute of audio.
Subgen
Section titled “Subgen”Subgen is a separate self-hosted service for Whisper transcription with a queue. Recommended when you have a GPU and want to keep transcription off Sublarr’s process.
| Setting | Effect |
|---|---|
| Enabled | Master switch. |
| Subgen URL | Base URL of the Subgen service. |
| API key | Auth token (when configured on Subgen). |
| Model | Override Subgen’s default; usually leave empty. |
When does transcription run?
Section titled “When does transcription run?”Sublarr offers transcription in three places:
| Trigger | When |
|---|---|
| Manual: Library → Sidecar → Transcribe | One-off; you click. |
| Auto: when no provider returns a match | Configurable per profile (Settings → Subtitles → Languages → Transcribe on miss). |
| Batch: Wanted → Transcribe selected | Same as manual but for many items. |
Quality expectations
Section titled “Quality expectations”| Audio quality | Whisper medium accuracy | large-v3 accuracy |
|---|---|---|
| Studio dialogue, single speaker | ~ 95% | ~ 98% |
| TV / movie dialogue, mixed | ~ 85% | ~ 92% |
| Anime with overlapping dialogue, music | ~ 65–80% | ~ 80–88% |
| Noisy field recording | ~ 50% | ~ 65% |
Numbers are word-error-rate (WER) inverses; treat them as ballparks, not guarantees.
Cost (Whisper API)
Section titled “Cost (Whisper API)”For OpenAI’s Whisper API at $0.006 / min:
| Content | Approx cost |
|---|---|
| 24-min anime episode | ~ $0.14 |
| 50-min TV episode | ~ $0.30 |
| 110-min movie | ~ $0.66 |
| 12-episode anime season | ~ $1.70 |
Self-hosted Whisper or Subgen has no per-minute cost — only your hardware time.
Output post-processing
Section titled “Output post-processing”The raw Whisper output is rough — long monologue lines, no cue breaks, [BLANK_AUDIO] markers. Sublarr applies a post-processing pass:
| Step | Purpose |
|---|---|
| Cue splitting | Insert breaks at natural pause boundaries. |
| Punctuation | Re-capitalisation and final punctuation. |
| Min cue duration | Merge cues shorter than 80 ms with the next. |
| Strip Whisper noise tags | Remove [BLANK_AUDIO], [Music], etc. |
The post-processing is shared with the regular subtitle pipeline — see Post-Processing for the full set of fixes.
Where transcribed subtitles land
Section titled “Where transcribed subtitles land”In the same place as provider-downloaded subtitles: a sidecar next to the source video, named per Settings → Subtitles → Format & Naming. The provider field in Activity → History shows whisper:built-in, whisper:api, or subgen so you can audit which backend produced each subtitle.