Architecture
Architecture Overview
Section titled “Architecture Overview”Sublarr is a standalone subtitle manager and translator built with a modern web stack. This document describes the system architecture, data flow, and key components.
High-Level Architecture
Section titled “High-Level Architecture”Frontend (React 19 + TypeScript + Vite) | | HTTP / WebSocket vBackend (Flask + Flask-SocketIO + APScheduler) | +-- SQLAlchemy ORM | +-- SQLite (default, WAL) or PostgreSQL (SUBLARR_DATABASE_URL) +-- Provider System (see registry.py for the canonical list) | +-- Native adapters + embedded-track extractor + alternate-adapter fallbacks +-- Translation Pipeline (Ollama, Claude, Gemini, DeepL, ...) +-- Sync Engines (ffsubsync / alass with fallback orchestrator) +-- Remux Engine (mkvmerge / ffmpeg stream removal) +-- Scheduler (APScheduler, default jobs persisted in DB) | vExternal Services +-- Sonarr / Radarr (webhooks + API, multi-instance) +-- Jellyfin / Emby / Plex / Kodi (library refresh) +-- Ollama / OpenAI / Claude / Gemini / DeepSeek / ... (translation) +-- AniDB (offline xml.gz dump + online API for ID resolution) +-- mkvmerge / ffmpeg (stream remux, video tools)Backend Architecture
Section titled “Backend Architecture”Core Components
Section titled “Core Components”Flask Application (app.py)
- 29 Blueprint modules registered under
/api/v1/viaroutes/__init__.py - Flask-SocketIO for real-time WebSocket updates (auth-gated when API key is set)
- Gunicorn WSGI server (
gthread, 1 worker, 4 threads, 300 s timeout, 15 s graceful) - Optional API key authentication via
auth.py;/api/v1/health, the Sonarr/Radarr webhook endpoints and the OpenAPI discovery surface (GET /api/docs,GET /api/v1/openapi.json) are exempt - Live OpenAPI 3.0.3 spec generated from route docstrings via
apispec
Configuration System (config.py)
- Pydantic Settings for type-safe configuration
- Environment variables with
SUBLARR_prefix - Runtime config overrides stored in
config_entriestable - Validation on startup and config updates
- Secrets never exposed in API responses
Database Layer (db/)
- SQLite (default): WAL mode, zero-config. Thread safety is provided
by SQLAlchemy session scoping — there is no application-level lock.
(Earlier versions exposed a
_db_lockshim; it is now a no-op kept only for source-compat with old plugins.) - PostgreSQL: first-class support via
SUBLARR_DATABASE_URL; connection pooling via SQLAlchemy pool; Alembic migrations run automatically on startup; dialect-aware health endpoint (GET /api/v1/database/health) - Repository pattern (
db/repositories/) wraps SQLAlchemy ORM;BaseRepository.batch()context manager batches multiple writes into a single commit (used by wanted scanner for per-series commits)
Database Tables
Section titled “Database Tables”30+ tables — full schema in Database Schema. Key tables:
| Table | Purpose |
|---|---|
jobs | Translation job tracking, status, statistics, error messages |
daily_stats | Aggregated daily statistics for dashboard |
config_entries | Runtime configuration overrides (key-value) |
provider_cache | Cached subtitle search results (TTL-based expiration) |
subtitle_downloads | Download history per provider (for stats) |
language_profiles | Multi-language profile definitions (name, source, targets) |
series_language_profiles | Profile assignment per Sonarr series |
movie_language_profiles | Profile assignment per Radarr movie |
wanted_items | Missing subtitles queue (episode/movie, language, status) |
blacklist_entries | Rejected subtitle downloads (provider, hash, reason) |
download_history | Subtitle download audit log |
upgrade_history | Records every subtitle upgrade performed |
glossary_entries | User-defined translation glossary (term pairs) |
prompt_presets | Saved Ollama prompt templates |
filter_presets | Saved search filter configurations |
ffprobe_cache | Cached ffprobe/mediainfo metadata (invalidated by mtime) |
anidb_absolute_mappings | TVDB→AniDB absolute episode number mappings |
series_settings | Per-series config overrides (glossary, forced sub preference) |
Provider System (providers/)
Section titled “Provider System (providers/)”Architecture Pattern
- Singleton
ProviderManagerorchestrates all providers - Abstract base class
SubtitleProviderdefines interface - Shared
RetryingSessionfor HTTP requests with rate-limit handling - Priority-based search with configurable provider order
Provider Registry
The canonical built-in provider list is _BUILTIN_PROVIDERS in backend/providers/registry.py. It contains native adapters per source service, an embedded-track extractor (reads tracks already inside .mkv / .mp4), and a small set of alternate adapter implementations used as fallbacks. Additional providers can be loaded as plugins from /config/plugins/. See Settings → Providers for the full list and per-provider configuration.
Search Flow
- User/System requests subtitles for a video file
ProviderManager.search()queries all enabled providers in parallel- Results scored using weighted algorithm (see PROVIDERS.md)
- Best match selected (ASS format gets +50 bonus)
- Subtitle downloaded and cached
- Downloaded file processed (extracted from ZIP/RAR/XZ if needed)
Provider Interface
class SubtitleProvider(ABC): @abstractmethod def search(self, query: VideoQuery) -> List[SubtitleResult]: pass
@abstractmethod def download(self, result: SubtitleResult, dest: Path) -> Path: pass
@abstractmethod def health_check(self) -> bool: passCache Invalidation
- Config changes call
invalidate_manager()to recreate provider instances - Provider cache entries have TTL (default 1 hour)
- Manual cache clear endpoint available
Translation Pipeline (translator.py)
Section titled “Translation Pipeline (translator.py)”Three-Stage Priority Chain
The translation system uses a cascading priority system to minimize unnecessary work and prefer high-quality formats:
Case A: Target Subtitle Exists
- Check if target language ASS or SRT already exists
- Action: Skip translation
- Reason: Already have what we need
Case B: Target SRT Exists, Upgrade Attempt
- Have target SRT, want to upgrade to ASS
- B1: Search providers for target language ASS
- B2: If source ASS embedded, translate to target ASS
- B3: No upgrade possible, keep existing SRT
- Reason: ASS format preferred for anime (styling, positioning)
Case C: No Target Subtitle, Full Pipeline
- C1: Source ASS embedded -> translate to target ASS
- C2: Source SRT (embedded/external) -> translate to target SRT
- C3: Search providers for source subtitle -> download -> translate
- C4: Nothing found -> fail with detailed error
- Reason: Build target subtitle from any available source
ASS Translation Process (ass_utils.py)
- Parse ASS file structure (Styles, Events sections)
- Classify styles: Dialog vs Signs/Songs
- Dialog styles: >80% plain dialogue events -> translate
- Signs/Songs styles: >80% positioned/complex events -> preserve
- Extract dialogue lines, preserve formatting codes
- Send to Ollama with style-aware prompt
- Inject translated text back into ASS structure
- Update Language field in Script Info section
SRT Translation Process
- Parse SRT format (sequence number, timestamps, text)
- Extract text blocks, preserve timing
- Send to Ollama with context prompt
- Reconstruct SRT with translated text
Glossary Integration
- User-defined term pairs loaded from database
- Injected into Ollama prompt template
- Applied during translation for consistency
Wanted System (wanted_scanner.py, wanted_search.py)
Section titled “Wanted System (wanted_scanner.py, wanted_search.py)”Scanner Architecture
- Periodic task (default: every 6 hours)
- Queries Sonarr/Radarr for all series/movies with Sublarr tag
- For each item:
- Check language profile assignment
- Iterate through target languages
- Scan media directory for existing subtitles
- Create wanted_items entry if missing
- Tracks scan history, monitors for new episodes
Search & Process Flow
- Wanted item created by scanner
- Background task or manual trigger calls
wanted_search.py - Build
VideoQueryfrom item metadata - Search providers for target AND source languages
- If target found: download -> save
- If only source found: download -> translate -> save
- Update wanted_item status (COMPLETED/FAILED)
- Notify Jellyfin to refresh library
- Record download in history
Batch Operations
- WebSocket progress updates during batch search
- Parallel processing with configurable concurrency
- Dry-run mode for testing queries
Remux Engine (remux/)
Section titled “Remux Engine (remux/)”Purpose: Safely remove embedded subtitle streams from video containers without re-encoding.
Workflow
- Probe container with ffprobe to confirm stream exists
- Select backend: mkvmerge for MKV/MK3D, ffmpeg for MP4/AVI and other containers
- Remux to a temp file in the same directory (same filesystem for atomic swap)
- Verify: duration ±2s, video/audio stream counts unchanged, subtitle count exactly -1, file size ≥50% of original
- Move original to trash directory:
<trash_dir>/trash/<YYYY-MM-DD>/<file>.<ts>.bak - Atomic replace:
os.replace(tmp, original)
Backend selection
- mkvmerge (preferred for MKV) — uses
--subtitle-tracks !<global_TID>flag; global Track ID matches ffprobestream_index - ffmpeg fallback — used when mkvmerge is unavailable or for non-MKV containers; uses
-map 0 -map -0:<stream_index> -c copy
Backup strategy
- Default trash path:
<media_root>/.sublarr/trash/<date>/(relative to media root) - Absolute paths supported directly
- CoW reflink attempted first on Btrfs/XFS (near-instant, zero-cost); falls back to
shutil.copy2 - Falls back to sibling
.bakif trash dir cannot be created (permission error)
Async job system
POST /api/v1/library/episodes/<ep_id>/tracks/<index>/remove-from-container→ background job- Jobs tracked in-memory with
ThreadPoolExecutor - Real-time status via Socket.IO
remux_job_updateevents
API Endpoints
Section titled “API Endpoints”All endpoints prefixed with /api/v1/
Request/Response Format
- JSON payloads
- Standard HTTP status codes
- Error responses:
{"error": "message", "details": {...}} - Success responses:
{"success": true, "data": {...}}
Authentication
- Optional API key via
SUBLARR_API_KEYenv var - Header:
X-Api-Key: <key> - Query param:
?apikey=<key> - Exempt endpoints:
/health, webhooks
Real-Time Updates (WebSocket)
- Namespace:
/socket.io/ - Events:
job_update: Translation job status changesbatch_progress: Batch operation progresswanted_batch_progress: Wanted search batch progressscan_progress: Library scan progress
Integration Clients
Section titled “Integration Clients”Sonarr Client (sonarr_client.py)
- API v3 calls with
X-Api-Keyheader - Fetch series list, episode details, file paths
- Tag-based filtering (only series with Sublarr tag)
- Parse video file metadata (episode number, season, release group)
Radarr Client (radarr_client.py)
- API v3 calls with
X-Api-Keyheader - Fetch movie list, file paths, quality profiles
- Tag-based filtering
- Parse video file metadata (year, resolution, source)
Jellyfin Client (jellyfin_client.py)
- Library scan trigger after new subtitles added
- Token-based auth:
X-MediaBrowser-Tokenheader - Graceful fallback if Jellyfin not configured
Ollama Client (ollama_client.py)
- HTTP API calls to Ollama server
- Dynamic prompt template from config
- Streaming response support (for future UI feature)
- Model selection via Settings → Translation → Backends → Ollama
- Glossary injection into system prompt
Scheduler & Background Tasks
Section titled “Scheduler & Background Tasks”APScheduler Integration
- Background scheduler runs in main process
- Jobs configured via environment variables
Scheduled Jobs
- Wanted Scanner: Every N hours (configurable)
- Scans Sonarr/Radarr for missing subtitles
- Creates wanted_items entries
- Upgrade Scanner: Daily (optional)
- Scans for SRT subtitles within upgrade window
- Attempts upgrade to ASS format
- Cache Cleanup: Daily
- Removes expired provider cache entries
- Prunes old job history (configurable retention)
Frontend Architecture
Section titled “Frontend Architecture”Technology Stack
Section titled “Technology Stack”- React 19 (functional components, hooks)
- TypeScript (strict mode)
- Tailwind CSS v4 (utility-first styling)
- React Router v6 (client-side routing)
- TanStack Query (server state management)
- Socket.IO Client (real-time updates)
- Vite (development server, HMR, bundler)
Directory Structure
Section titled “Directory Structure”frontend/src/├── App.tsx # Router, QueryClient provider├── main.tsx # Entry point├── index.css # Tailwind imports, arr-style CSS variables├── api/│ └── client.ts # Axios instance, type-safe API calls├── hooks/│ ├── useApi.ts # React Query hooks for all endpoints│ └── useWebSocket.ts # Socket.IO integration, event handlers├── components/│ ├── layout/│ │ └── Sidebar.tsx # Navigation, teal arr-style theme│ └── shared/│ ├── StatusBadge.tsx # Color-coded status indicators│ ├── ProgressBar.tsx # Visual progress component│ ├── Toast.tsx # Notification system│ └── ErrorBoundary.tsx # React error handling├── pages/│ ├── Dashboard.tsx # Stats overview, recent activity│ ├── Activity.tsx # Live job monitoring, WebSocket updates│ ├── Library.tsx # Series/movie list, subtitle status│ ├── SeriesDetail.tsx # Per-series subtitle management│ ├── Wanted.tsx # Missing subtitles queue│ ├── Queue.tsx # Active job queue view│ ├── Tasks.tsx # Background scheduler task status + controls│ ├── Statistics.tsx # Download/translation statistics charts│ ├── History.tsx # Download/translation history│ ├── Blacklist.tsx # Rejected subtitles│ ├── Plugins.tsx # Plugin management│ ├── Settings/ # Settings pages (tabbed by category)│ ├── Logs.tsx # Paginated log viewer│ ├── Onboarding.tsx # First-run setup wizard│ └── NotFound.tsx # 404 pageState Management
Section titled “State Management”Server State (React Query)
- All API calls wrapped in
useQueryoruseMutationhooks - Automatic caching, refetching, error handling
- Optimistic updates for instant UI feedback
- Query invalidation on mutations (e.g., config update -> refetch config)
UI State (React Hooks)
- Local component state with
useState - Form state with controlled inputs
- Modal/dialog state with
useReducerfor complex flows
Real-Time State (Socket.IO)
useWebSockethook connects to backend- Listens for events: job_update, batch_progress, etc.
- Invalidates React Query cache on relevant events
- Automatic reconnection on disconnect
Routing
Section titled “Routing”Routes
/- Dashboard/activity- Active jobs/queue- Job queue/tasks- Background scheduler tasks/library- Series/movie list/library/:id- Series detail view/wanted- Missing subtitles/statistics- Statistics/history- Download history/blacklist- Rejected subtitles/plugins- Plugin management/settings- Configuration (tabbed)/logs- System logs*- 404 Not Found
Navigation
- Sidebar with active route highlighting
- Breadcrumbs for deep navigation (future)
- Back button for detail views
Styling
Section titled “Styling”Tailwind Configuration
- Custom color palette (teal primary: #1DB8D4)
- Dark mode support (prefers-color-scheme)
- Responsive breakpoints (sm, md, lg, xl)
- Custom animations for loading states
CSS Variables (arr-style)
:root { --color-primary: #1DB8D4; --color-success: #27AE60; --color-warning: #F39C12; --color-danger: #E74C3C; --sidebar-width: 200px;}Data Flow
Section titled “Data Flow”Download Event Flow (Webhooks)
Section titled “Download Event Flow (Webhooks)”Sonarr/Radarr downloads episode | | OnDownload webhook vSublarr receives webhook | | Delay N minutes (configurable) vWanted scanner triggered | | Scan episode directory vCheck language profile | | Missing target subtitle? vCreate wanted_item entry | | Auto-search enabled? vProvider search (target + source languages) | | Best result selected vDownload subtitle | | Target found? -> Save | Source found? -> Translate -> Save vUpdate wanted_item status | | Notify Jellyfin vWebSocket update to frontendManual Translation Flow
Section titled “Manual Translation Flow”User uploads file via frontend | | POST /api/v1/translate vCreate job entry in database | | Return job_id immediately vBackground worker starts | | Parse video file vExtract/search for source subtitle | | Found? vTranslate via Ollama | | Apply glossary vSave target subtitle | | Update job status vWebSocket job_update event | vFrontend receives update, displays resultBatch Processing Flow
Section titled “Batch Processing Flow”User clicks "Search All" in Wanted page | | POST /api/v1/wanted/batch-search vQueue all wanted_items | | Return batch_id vBackground worker processes queue | | For each item: | - Search providers | - Download best match | - Translate if needed | - Update status | - Emit WebSocket progress vBatch complete | | Final WebSocket event vFrontend updates UI, shows summaryDocker Deployment
Section titled “Docker Deployment”Multi-Stage Build
Section titled “Multi-Stage Build”Stage 1: Frontend Build
- Base:
node:22-alpine - Copy
frontend/package*.json, install dependencies - Copy
frontend/src, build with Vite - Output: Optimized static files in
dist/
Stage 2: Backend Runtime
- Base:
python:3.12-slim - Install system dependencies: ffmpeg, mkvtoolnix, curl, unrar-free, postgresql-client, tesseract-ocr, hunspell
- Copy
backend/requirements.txt, install Python packages - Copy backend source code
- Copy frontend build output from Stage 1
- Set up entrypoint: gunicorn with config
Volumes
/config- SQLite database, logs, cache/media- Sonarr/Radarr media files (read/write)
Environment Variables
- All config via
SUBLARR_prefixed env vars - Secrets via
.envfile, never in docker-compose.yml - Config validation on container start
Networking
- Expose port 5765
- Optional custom port via
SUBLARR_PORTenv var
Health Check
- Endpoint:
http://localhost:5765/api/v1/health - Interval: 30s
- Timeout: 10s
- Retries: 3
Security Considerations
Section titled “Security Considerations”- API Authentication: Optional but recommended API key
- Input Validation: Pydantic models for all API inputs
- Path Traversal: All file operations validated against media directory
- SQL Injection: Parameterized queries, no string interpolation
- XSS Prevention: React auto-escapes, no dangerouslySetInnerHTML
- Secrets Management: Never log or return API keys/tokens in responses
- Rate Limiting: Provider sessions handle rate limits gracefully
- Docker Isolation: Container runs as non-root user (future improvement)
Performance Optimizations
Section titled “Performance Optimizations”- Database: WAL mode for concurrent reads, indexed foreign keys
- Provider Cache: 1-hour TTL reduces redundant searches
- HTTP Session Pooling: Reuse connections to providers
- WebSocket: Batch progress updates (max 1/second)
- Frontend Code Splitting: Lazy-loaded routes
- React Query Cache: Minimize API calls, stale-while-revalidate
- Gunicorn Workers: 2 workers × 4 threads = 8 concurrent requests
- Static File Serving: Nginx recommended for production (future)
Monitoring & Observability
Section titled “Monitoring & Observability”Logging
- Python logging module, configurable level via
SUBLARR_LOG_LEVEL - File handler:
/config/sublarr.log(rotated daily) - Console handler: stdout (Docker logs)
- Format:
[timestamp] [level] [module] message
Metrics (via /stats endpoint)
- Total jobs: completed, failed, pending
- Provider statistics: searches, downloads, success rate
- Translation statistics: ASS vs SRT, average duration
- Daily aggregates: jobs per day, episodes processed
Health Checks
/api/v1/healthreturns status of all services- Checks: Database connectivity, Ollama reachability, provider health
- Used by Docker healthcheck and monitoring tools
Scalability Considerations
Section titled “Scalability Considerations”| Concern | Solution |
|---|---|
| High-concurrency DB writes | PostgreSQL via SUBLARR_DATABASE_URL |
| Persistent background jobs | Redis + RQ via SUBLARR_REDIS_URL + backend/worker.py |
| Parallel translation | Settings → Translation → Concurrent translations (MemoryQueue) or --scale rq-worker=N (RQ) |
| Scanner DB contention | Batch commits per series; UI yield knob under Settings → Automation → Search & Scan |
| Repeated ffprobe overhead | Incremental mtime-based cache (ffprobe_cache table) |
RQ Worker Architecture (when Redis enabled)
Flask (gunicorn) -> enqueue() -> Redis -> rq-worker (backend/worker.py) └── AppContextWorker └── app.app_context() per jobScale with: docker compose -f docker-compose.redis.yml up -d --scale rq-worker=N
Remaining Single-Instance Constraints
- One gunicorn worker (SocketIO session state) — do not increase
--workers - Horizontal scaling with load balancer not yet supported
- S3/object storage for subtitle files (future)
- Prometheus metrics export (Grafana dashboards in
monitoring/grafana/)
Development Workflow
Section titled “Development Workflow”Local Development
- Backend:
npm run dev:backend(Flask dev server on port 5765) - Frontend:
cd frontend && npm run dev(Vite HMR on port 5173) - Frontend proxies API calls to backend (port 5765)
Production Build
docker build -t sublarr:latest .docker-compose up -d
Code Organization
- Backend: Feature-based modules (translator, providers, clients)
- Frontend: Component-based structure (pages, hooks, components)
- Shared types: TypeScript interfaces mirror Pydantic models