> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Multilingual Subtitle Translator

> Translate subtitle files while preserving timestamps and formatting

# Multilingual Subtitle Translator

Translate SRT/VTT subtitle files to any language while preserving timestamps and formatting.

## Problem Statement

**Who:** Video localization teams, content creators, streaming platforms\
**Why:** Manual subtitle translation is slow and expensive. Automated translation with timestamp preservation speeds up localization.

## What You'll Build

A recipe that reads subtitle files, translates the text while preserving timing, and outputs properly formatted subtitle files.

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
graph LR
    Input[📄 SRT/VTT File] --> Parse[Parse Subtitles]
    Parse --> Translate[Translate Text]
    Translate --> Format[Preserve Timing]
    Format --> Output[📄 Translated File]

    classDef input fill:#8B0000,stroke:#7C90A0,color:#fff
    classDef process fill:#189AB4,stroke:#7C90A0,color:#fff

    class Input,Output input
    class Parse,Translate,Format process
```

### Input/Output Contract

| Input                 | Type    | Required | Description                                   |
| --------------------- | ------- | -------- | --------------------------------------------- |
| `subtitles_file`      | string  | Yes      | Path to SRT or VTT file                       |
| `target_language`     | string  | Yes      | Target language code (e.g., `es`, `fr`, `de`) |
| `preserve_timestamps` | boolean | No       | Keep original timing (default: true)          |

| Output                      | Type    | Description                      |
| --------------------------- | ------- | -------------------------------- |
| `translated_subtitles_file` | string  | Path to translated subtitle file |
| `ok`                        | boolean | Success indicator                |

## Prerequisites

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
export OPENAI_API_KEY=your_key_here
pip install praisonaiagents
```

## Step-by-Step Build

<Steps>
  <Step title="Create Recipe Directory">
    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    mkdir -p ~/.praisonai/templates/multilingual-subtitle-translator
    cd ~/.praisonai/templates/multilingual-subtitle-translator
    ```
  </Step>

  <Step title="Create TEMPLATE.yaml">
    ```yaml theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    name: multilingual-subtitle-translator
    version: "1.0.0"
    description: "Translate subtitle files while preserving timestamps"
    author: "PraisonAI"
    license: "MIT"

    tags:
      - translation
      - subtitles
      - localization
      - video

    requires:
      env:
        - OPENAI_API_KEY
      packages:
        - praisonaiagents

    inputs:
      subtitles_file:
        type: string
        description: "Path to the SRT or VTT subtitle file"
        required: true
      target_language:
        type: string
        description: "Target language code (e.g., es, fr, de, ja, zh)"
        required: true
      preserve_timestamps:
        type: boolean
        description: "Keep original timing"
        required: false
        default: true

    outputs:
      translated_subtitles_file:
        type: string
        description: "Path to the translated subtitle file"
      ok:
        type: boolean
        description: "Success indicator"

    cli:
      command: "praison recipes run multilingual-subtitle-translator"
      examples:
        - 'praison recipes run multilingual-subtitle-translator --input ''{"subtitles_file": "video.srt", "target_language": "es"}'''

    safety:
      dry_run_default: false
      requires_consent: false
      overwrites_files: true
      network_access: true
      pii_handling: false
    ```
  </Step>

  <Step title="Create recipe.py">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # recipe.py
    import os
    import re
    from pathlib import Path
    from praisonaiagents import Agent, Task, AgentTeam

    def run(input_data: dict, config: dict = None) -> dict:
        """
        Translate subtitle files while preserving timestamps.
        """
        subtitles_file = input_data.get("subtitles_file")
        target_language = input_data.get("target_language")
        preserve_timestamps = input_data.get("preserve_timestamps", True)
        
        if not subtitles_file:
            return {"ok": False, "error": {"code": "MISSING_INPUT", "message": "subtitles_file is required"}}
        
        if not target_language:
            return {"ok": False, "error": {"code": "MISSING_INPUT", "message": "target_language is required"}}
        
        if not os.path.exists(subtitles_file):
            return {"ok": False, "error": {"code": "FILE_NOT_FOUND", "message": f"File not found: {subtitles_file}"}}
        
        try:
            # Read subtitle file
            with open(subtitles_file, "r", encoding="utf-8") as f:
                content = f.read()
            
            # Detect format
            is_vtt = subtitles_file.lower().endswith(".vtt") or content.startswith("WEBVTT")
            
            # Parse subtitles
            subtitles = parse_subtitles(content, is_vtt)
            
            # Create translation agent
            translator = Agent(
                name="Subtitle Translator",
                role="Professional Translator",
                goal=f"Translate subtitles to {target_language} naturally",
                instructions=f"""
                You are a professional subtitle translator.
                - Translate to {target_language} naturally
                - Keep translations concise (subtitles have character limits)
                - Preserve meaning and tone
                - Handle idioms appropriately
                - Keep proper nouns unchanged unless they have standard translations
                """,
            )
            
            # Translate in batches
            batch_size = 20
            translated_subtitles = []
            
            for i in range(0, len(subtitles), batch_size):
                batch = subtitles[i:i + batch_size]
                texts = [s["text"] for s in batch]
                
                task = Task(
                    name=f"translate_batch_{i}",
                    description=f"""
                    Translate these subtitle lines to {target_language}:
                    
                    {chr(10).join(f'{j+1}. {t}' for j, t in enumerate(texts))}
                    
                    Return translations in the same numbered format.
                    """,
                    expected_output="Numbered translations matching input order",
                    agent=translator,
                )
                
                agents = AgentTeam(agents=[translator], tasks=[task])
                result = agents.start()
                
                # Parse translations
                translations = parse_translations(result.get(f"translate_batch_{i}", ""), len(texts))
                
                for j, sub in enumerate(batch):
                    translated_subtitles.append({
                        "index": sub["index"],
                        "timestamp": sub["timestamp"],
                        "text": translations[j] if j < len(translations) else sub["text"]
                    })
            
            # Format output
            output_content = format_subtitles(translated_subtitles, is_vtt)
            
            # Save translated file
            file_path = Path(subtitles_file)
            ext = file_path.suffix
            output_file = f"{file_path.stem}_{target_language}{ext}"
            
            with open(output_file, "w", encoding="utf-8") as f:
                f.write(output_content)
            
            return {
                "ok": True,
                "translated_subtitles_file": output_file,
                "artifacts": [{"path": output_file, "type": "text", "size_bytes": os.path.getsize(output_file)}],
                "warnings": [],
            }
            
        except Exception as e:
            return {"ok": False, "error": {"code": "PROCESSING_ERROR", "message": str(e)}}


    def parse_subtitles(content: str, is_vtt: bool) -> list:
        """Parse SRT or VTT content into structured data."""
        subtitles = []
        
        if is_vtt:
            # Remove WEBVTT header
            content = re.sub(r'^WEBVTT.*?\n\n', '', content, flags=re.DOTALL)
        
        # Split into blocks
        blocks = re.split(r'\n\n+', content.strip())
        
        for block in blocks:
            lines = block.strip().split('\n')
            if len(lines) >= 2:
                # Find timestamp line
                for i, line in enumerate(lines):
                    if '-->' in line:
                        index = lines[i-1] if i > 0 and lines[i-1].isdigit() else str(len(subtitles) + 1)
                        timestamp = line
                        text = '\n'.join(lines[i+1:])
                        subtitles.append({"index": index, "timestamp": timestamp, "text": text})
                        break
        
        return subtitles


    def parse_translations(result: str, expected_count: int) -> list:
        """Parse numbered translations from agent output."""
        translations = []
        lines = result.strip().split('\n')
        
        current_text = []
        for line in lines:
            match = re.match(r'^\d+\.\s*(.+)$', line)
            if match:
                if current_text:
                    translations.append(' '.join(current_text))
                current_text = [match.group(1)]
            elif current_text:
                current_text.append(line)
        
        if current_text:
            translations.append(' '.join(current_text))
        
        # Pad if needed
        while len(translations) < expected_count:
            translations.append("")
        
        return translations


    def format_subtitles(subtitles: list, is_vtt: bool) -> str:
        """Format subtitles back to SRT or VTT format."""
        lines = []
        
        if is_vtt:
            lines.append("WEBVTT\n")
        
        for sub in subtitles:
            lines.append(sub["index"])
            lines.append(sub["timestamp"])
            lines.append(sub["text"])
            lines.append("")
        
        return '\n'.join(lines)
    ```
  </Step>

  <Step title="Create test_recipe.py">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # test_recipe.py
    import pytest
    import tempfile
    import os
    from recipe import run, parse_subtitles, format_subtitles

    def test_missing_subtitles_file():
        result = run({"target_language": "es"})
        assert result["ok"] is False
        assert result["error"]["code"] == "MISSING_INPUT"

    def test_missing_target_language():
        result = run({"subtitles_file": "test.srt"})
        assert result["ok"] is False
        assert result["error"]["code"] == "MISSING_INPUT"

    def test_file_not_found():
        result = run({"subtitles_file": "/nonexistent.srt", "target_language": "es"})
        assert result["ok"] is False
        assert result["error"]["code"] == "FILE_NOT_FOUND"

    def test_parse_srt():
        srt_content = """1
    00:00:01,000 --> 00:00:04,000
    Hello world

    2
    00:00:05,000 --> 00:00:08,000
    How are you?
    """
        subtitles = parse_subtitles(srt_content, is_vtt=False)
        assert len(subtitles) == 2
        assert subtitles[0]["text"] == "Hello world"

    def test_parse_vtt():
        vtt_content = """WEBVTT

    00:00:01.000 --> 00:00:04.000
    Hello world

    00:00:05.000 --> 00:00:08.000
    How are you?
    """
        subtitles = parse_subtitles(vtt_content, is_vtt=True)
        assert len(subtitles) == 2

    @pytest.mark.integration
    def test_end_to_end():
        # Create temp SRT file
        with tempfile.NamedTemporaryFile(mode='w', suffix='.srt', delete=False) as f:
            f.write("1\n00:00:01,000 --> 00:00:04,000\nHello world\n")
            temp_file = f.name
        
        try:
            result = run({"subtitles_file": temp_file, "target_language": "es"})
            assert result["ok"] is True
        finally:
            os.unlink(temp_file)
    ```
  </Step>
</Steps>

## Run Locally

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Translate to Spanish
praison recipes run multilingual-subtitle-translator \
  --input '{"subtitles_file": "movie.srt", "target_language": "es"}'

# Translate to Japanese
praison recipes run multilingual-subtitle-translator \
  --input '{"subtitles_file": "video.vtt", "target_language": "ja"}'
```

## Deploy & Integrate: 6 Integration Models

<Tabs>
  <Tab title="Model 1: Embedded SDK">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    from praisonai import recipe

    result = recipe.run(
        "multilingual-subtitle-translator",
        input={
            "subtitles_file": "video.srt",
            "target_language": "es"
        }
    )

    if result.ok:
        print(f"Translated: {result.output['translated_subtitles_file']}")
    ```

    **Deployment note:** Best for Python video processing pipelines.
  </Tab>

  <Tab title="Model 2: CLI Invocation">
    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Batch translate to multiple languages
    for lang in es fr de ja; do
      praison recipes run multilingual-subtitle-translator \
        --input "{\"subtitles_file\": \"video.srt\", \"target_language\": \"$lang\"}"
    done
    ```

    **Deployment note:** Great for localization pipelines.
  </Tab>

  <Tab title="Model 3: Plugin Mode">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    class SubtitleTranslatorPlugin:
        def translate(self, file_path, target_lang):
            from praisonai import recipe
            return recipe.run(
                "multilingual-subtitle-translator",
                input={"subtitles_file": file_path, "target_language": target_lang}
            )
    ```
  </Tab>

  <Tab title="Model 4: Local HTTP Sidecar">
    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    praison recipes serve --port 8765
    ```

    ```javascript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    const response = await fetch('http://localhost:8765/recipes/multilingual-subtitle-translator/run', {
      method: 'POST',
      body: JSON.stringify({
        subtitles_file: '/path/to/video.srt',
        target_language: 'es'
      })
    });
    ```
  </Tab>

  <Tab title="Model 5: Remote Managed Runner">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    response = requests.post(
        "https://api.localization-service.com/translate",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "subtitles_url": "https://cdn.example.com/video.srt",
            "target_language": "es"
        }
    )
    ```
  </Tab>

  <Tab title="Model 6: Event-Driven">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Trigger on video upload
    def handle_video_upload(event):
        video_id = event['video_id']
        languages = ['es', 'fr', 'de', 'ja']
        
        for lang in languages:
            job_queue.put({
                "recipe": "multilingual-subtitle-translator",
                "input": {
                    "subtitles_file": f"s3://bucket/{video_id}.srt",
                    "target_language": lang
                }
            })
    ```
  </Tab>
</Tabs>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Timestamps are misaligned">
    Ensure `preserve_timestamps: true` (default). If timing needs adjustment for translated text length, consider post-processing.
  </Accordion>

  <Accordion title="Special characters corrupted">
    The recipe uses UTF-8 encoding. Ensure your source file is UTF-8 encoded.
  </Accordion>
</AccordionGroup>

## Next Steps

* **[Video Caption Generator](/docs/examples/recipe-examples/video-caption-generator)** - Generate captions first
* **[Voice-to-Voice Translator Lite](/docs/examples/recipe-examples/voice-to-voice-translator-lite)** - Translate audio directly
