> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Video Caption Generator

> Generate captions from video files with language detection and multiple output formats

# Video Caption Generator

Generate captions from video files with automatic language detection and support for SRT/VTT output formats.

## Problem Statement

**Who:** Content creators, video editors, accessibility teams\
**Why:** Manual captioning is time-consuming and expensive. Automated captions improve accessibility and SEO.

## What You'll Build

A recipe that extracts audio from video, transcribes it, and generates properly formatted caption files.

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
graph LR
    Input[🎬 Video File] --> Extract[Extract Audio]
    Extract --> Transcribe[Transcribe]
    Transcribe --> Format[Format Captions]
    Format --> Output[📄 SRT/VTT File]

    classDef input fill:#8B0000,stroke:#7C90A0,color:#fff
    classDef process fill:#189AB4,stroke:#7C90A0,color:#fff

    class Input,Output input
    class Extract,Transcribe,Format process
```

### Input/Output Contract

| Input           | Type   | Required | Description                            |
| --------------- | ------ | -------- | -------------------------------------- |
| `video_path`    | string | Yes      | Path to the video file                 |
| `language`      | string | No       | Language code (auto-detect if omitted) |
| `output_format` | string | No       | `srt` or `vtt` (default: `srt`)        |

| Output          | Type    | Description                        |
| --------------- | ------- | ---------------------------------- |
| `captions_file` | string  | Path to generated caption file     |
| `summary`       | string  | Brief summary of the video content |
| `ok`            | boolean | Success indicator                  |

## Prerequisites

<Warning>
  **Required:** `OPENAI_API_KEY` environment variable must be set.
</Warning>

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Set your API key
export OPENAI_API_KEY=your_key_here

# Install required packages
pip install praisonaiagents

# Optional: Install ffmpeg for audio extraction
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg
```

## Step-by-Step Build

<Steps>
  <Step title="Create Recipe Directory">
    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    mkdir -p ~/.praisonai/templates/video-caption-generator
    cd ~/.praisonai/templates/video-caption-generator
    ```
  </Step>

  <Step title="Create TEMPLATE.yaml">
    Create the recipe metadata file:

    ```yaml theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # TEMPLATE.yaml
    name: video-caption-generator
    version: "1.0.0"
    description: "Generate captions from video files with language detection"
    author: "PraisonAI"
    license: "MIT"

    tags:
      - video
      - captions
      - accessibility
      - transcription

    requires:
      env:
        - OPENAI_API_KEY
      packages:
        - praisonaiagents
      optional_env:
        - ANTHROPIC_API_KEY
      external:
        - ffmpeg

    inputs:
      video_path:
        type: string
        description: "Path to the video file to caption"
        required: true
      language:
        type: string
        description: "Language code (e.g., 'en', 'es', 'fr'). Auto-detect if omitted."
        required: false
        default: "auto"
      output_format:
        type: string
        description: "Caption output format"
        required: false
        default: "srt"
        enum:
          - srt
          - vtt

    outputs:
      captions_file:
        type: string
        description: "Path to the generated caption file"
      summary:
        type: string
        description: "Brief summary of the video content"
      ok:
        type: boolean
        description: "Whether the operation succeeded"

    cli:
      command: "praison recipes run video-caption-generator"
      examples:
        - 'praison recipes run video-caption-generator --input ''{"video_path": "video.mp4"}'''
        - 'praison recipes run video-caption-generator --input ''{"video_path": "video.mp4", "language": "en", "output_format": "vtt"}'''

    safety:
      dry_run_default: false
      requires_consent: false
      overwrites_files: true
      network_access: true
      pii_handling: false
    ```
  </Step>

  <Step title="Create recipe.py">
    Implement the main recipe logic:

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # recipe.py
    import os
    import subprocess
    import tempfile
    from pathlib import Path
    from praisonaiagents import Agent, Task, AgentTeam

    def run(input_data: dict, config: dict = None) -> dict:
        """
        Generate captions from a video file.
        
        Args:
            input_data: Contains video_path, language, output_format
            config: Optional configuration overrides
            
        Returns:
            Dict with captions_file, summary, and ok status
        """
        # Validate required inputs
        video_path = input_data.get("video_path")
        if not video_path:
            return {
                "ok": False,
                "error": {"code": "MISSING_INPUT", "message": "video_path is required"},
                "captions_file": None,
                "summary": None,
            }
        
        if not os.path.exists(video_path):
            return {
                "ok": False,
                "error": {"code": "FILE_NOT_FOUND", "message": f"Video file not found: {video_path}"},
                "captions_file": None,
                "summary": None,
            }
        
        language = input_data.get("language", "auto")
        output_format = input_data.get("output_format", "srt")
        
        try:
            # Extract audio from video
            audio_path = extract_audio(video_path)
            
            # Create transcription agent
            transcriber = Agent(
                name="Transcription Specialist",
                role="Audio Transcription Expert",
                goal="Accurately transcribe audio content with timestamps",
                instructions="""
                You are an expert transcriptionist.
                - Transcribe audio accurately with proper punctuation
                - Include timestamps for each segment
                - Identify speaker changes when possible
                - Handle multiple languages if detected
                """,
            )
            
            # Create caption formatting agent
            formatter = Agent(
                name="Caption Formatter",
                role="Caption File Specialist",
                goal=f"Format transcription into {output_format.upper()} format",
                instructions=f"""
                You are a caption formatting expert.
                - Convert transcriptions to {output_format.upper()} format
                - Ensure proper timestamp formatting
                - Keep caption lines under 42 characters
                - Split long sentences appropriately
                """,
            )
            
            # Create summarizer agent
            summarizer = Agent(
                name="Content Summarizer",
                role="Video Content Analyst",
                goal="Provide a brief summary of the video content",
                instructions="""
                You are a content analyst.
                - Summarize the main topics discussed
                - Keep summary under 100 words
                - Highlight key points
                """,
            )
            
            # Define tasks
            transcribe_task = Task(
                name="transcribe_audio",
                description=f"""
                Transcribe the audio from: {audio_path}
                Language: {language if language != 'auto' else 'auto-detect'}
                
                Provide timestamped transcription segments.
                """,
                expected_output="Timestamped transcription with segments",
                agent=transcriber,
            )
            
            format_task = Task(
                name="format_captions",
                description=f"""
                Format the transcription into {output_format.upper()} caption format.
                
                For SRT format:
                1
                00:00:00,000 --> 00:00:02,500
                Caption text here
                
                For VTT format:
                WEBVTT
                
                00:00:00.000 --> 00:00:02.500
                Caption text here
                """,
                expected_output=f"Properly formatted {output_format.upper()} captions",
                agent=formatter,
                context=[transcribe_task],
            )
            
            summarize_task = Task(
                name="summarize_content",
                description="Summarize the video content based on the transcription.",
                expected_output="Brief summary of video content",
                agent=summarizer,
                context=[transcribe_task],
            )
            
            # Execute agents
            agents = AgentTeam(
                agents=[transcriber, formatter, summarizer],
                tasks=[transcribe_task, format_task, summarize_task],
            )
            
            result = agents.start()
            
            # Save captions to file
            video_name = Path(video_path).stem
            captions_file = f"{video_name}.{output_format}"
            
            with open(captions_file, "w", encoding="utf-8") as f:
                f.write(result.get("format_captions", ""))
            
            # Cleanup temp audio file
            if audio_path and os.path.exists(audio_path):
                os.remove(audio_path)
            
            return {
                "ok": True,
                "captions_file": captions_file,
                "summary": result.get("summarize_content", ""),
                "artifacts": [
                    {"path": captions_file, "type": "text", "size_bytes": os.path.getsize(captions_file)}
                ],
                "warnings": [],
            }
            
        except Exception as e:
            return {
                "ok": False,
                "error": {"code": "PROCESSING_ERROR", "message": str(e)},
                "captions_file": None,
                "summary": None,
            }


    def extract_audio(video_path: str) -> str:
        """Extract audio from video file using ffmpeg."""
        try:
            # Create temp file for audio
            temp_audio = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
            temp_audio.close()
            
            # Extract audio using ffmpeg
            cmd = [
                "ffmpeg", "-i", video_path,
                "-vn", "-acodec", "pcm_s16le",
                "-ar", "16000", "-ac", "1",
                "-y", temp_audio.name
            ]
            
            subprocess.run(cmd, check=True, capture_output=True)
            return temp_audio.name
            
        except FileNotFoundError:
            # ffmpeg not installed - return video path for direct processing
            return video_path
        except subprocess.CalledProcessError as e:
            raise RuntimeError(f"Audio extraction failed: {e.stderr.decode()}")
    ```
  </Step>

  <Step title="Create test_recipe.py">
    Write tests for the recipe:

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # test_recipe.py
    import pytest
    import os
    import tempfile
    from recipe import run

    def test_missing_video_path():
        """Test error handling for missing video_path."""
        result = run({})
        assert result["ok"] is False
        assert result["error"]["code"] == "MISSING_INPUT"

    def test_file_not_found():
        """Test error handling for non-existent file."""
        result = run({"video_path": "/nonexistent/video.mp4"})
        assert result["ok"] is False
        assert result["error"]["code"] == "FILE_NOT_FOUND"

    def test_output_format_default():
        """Test that default output format is SRT."""
        # This test would need a mock video file
        pass

    def test_valid_output_formats():
        """Test that both SRT and VTT formats are accepted."""
        # Validation only - actual processing requires video file
        valid_formats = ["srt", "vtt"]
        for fmt in valid_formats:
            input_data = {
                "video_path": "test.mp4",
                "output_format": fmt
            }
            # Would need mock file for full test
            assert fmt in valid_formats

    @pytest.mark.integration
    def test_end_to_end():
        """Full integration test with real video file."""
        # Skip if no test video available
        test_video = os.environ.get("TEST_VIDEO_PATH")
        if not test_video or not os.path.exists(test_video):
            pytest.skip("No test video available")
        
        result = run({
            "video_path": test_video,
            "language": "en",
            "output_format": "srt"
        })
        
        assert result["ok"] is True
        assert result["captions_file"] is not None
        assert os.path.exists(result["captions_file"])
        
        # Cleanup
        if os.path.exists(result["captions_file"]):
            os.remove(result["captions_file"])

    @pytest.mark.integration
    def test_vtt_format():
        """Test VTT output format."""
        test_video = os.environ.get("TEST_VIDEO_PATH")
        if not test_video:
            pytest.skip("No test video available")
        
        result = run({
            "video_path": test_video,
            "output_format": "vtt"
        })
        
        if result["ok"]:
            assert result["captions_file"].endswith(".vtt")
    ```
  </Step>

  <Step title="Create README.md">
    Document the recipe:

    ````markdown theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Video Caption Generator

    Generate captions from video files with automatic language detection.

    ## Quick Start

    ```bash
    praison recipes run video-caption-generator --input '{"video_path": "my-video.mp4"}'
    ````

    ## Inputs

    | Field          | Type   | Required | Default | Description                      |
    | -------------- | ------ | -------- | ------- | -------------------------------- |
    | video\_path    | string | Yes      | -       | Path to video file               |
    | language       | string | No       | auto    | Language code (en, es, fr, etc.) |
    | output\_format | string | No       | srt     | Output format: srt or vtt        |

    ## Outputs

    | Field          | Type    | Description                    |
    | -------------- | ------- | ------------------------------ |
    | captions\_file | string  | Path to generated caption file |
    | summary        | string  | Brief content summary          |
    | ok             | boolean | Success indicator              |

    ## Requirements

    * `OPENAI_API_KEY` environment variable
    * `ffmpeg` (optional, for audio extraction)
    * `praisonaiagents` package

    ## Examples

    ### Basic Usage

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    praison recipes run video-caption-generator \
      --input '{"video_path": "presentation.mp4"}'
    ```

    ### Specify Language and Format

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    praison recipes run video-caption-generator \
      --input '{"video_path": "video.mp4", "language": "es", "output_format": "vtt"}'
    ```

    ## Troubleshooting

    | Issue              | Solution                                                      |
    | ------------------ | ------------------------------------------------------------- |
    | "ffmpeg not found" | Install ffmpeg: `brew install ffmpeg` or `apt install ffmpeg` |
    | "API key missing"  | Set `export OPENAI_API_KEY=your_key`                          |
    | Poor transcription | Try specifying the language explicitly                        |

    ````
    </Step>

    <Step title="Verify Recipe Structure">
    ```bash
    # Check directory structure
    ls -la ~/.praisonai/templates/video-caption-generator/

    # Expected output:
    # TEMPLATE.yaml
    # recipe.py
    # test_recipe.py
    # README.md

    # Verify recipe is discovered
    praison recipes list | grep video-caption
    ````
  </Step>
</Steps>

## Run Locally

### Using CLI

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Basic run
praison recipes run video-caption-generator \
  --input '{"video_path": "my-video.mp4"}'

# With all options
praison recipes run video-caption-generator \
  --input '{"video_path": "video.mp4", "language": "en", "output_format": "vtt"}'

# Dry run (see what would happen)
praison recipes run video-caption-generator \
  --input '{"video_path": "video.mp4"}' \
  --dry-run
```

### Using Python SDK

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonai import recipe

result = recipe.run(
    "video-caption-generator",
    input={
        "video_path": "my-video.mp4",
        "language": "en",
        "output_format": "srt"
    }
)

if result.ok:
    print(f"Captions saved to: {result.output['captions_file']}")
    print(f"Summary: {result.output['summary']}")
else:
    print(f"Error: {result.error}")
```

## Deploy & Integrate: 6 Integration Models

<Tabs>
  <Tab title="Model 1: Embedded SDK">
    **When to use:** Python applications, Jupyter notebooks, direct integration

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    from praisonai import recipe

    # Synchronous execution
    result = recipe.run(
        "video-caption-generator",
        input={"video_path": "video.mp4", "output_format": "srt"}
    )

    # Access results
    if result.ok:
        captions_path = result.output["captions_file"]
        summary = result.output["summary"]
    ```

    **Deployment note:** Runs in-process, lowest latency, requires Python environment.

    <Warning>
      **Safety:** Ensure video files are from trusted sources. Recipe writes to local filesystem.
    </Warning>
  </Tab>

  <Tab title="Model 2: CLI Invocation">
    **When to use:** Shell scripts, CI/CD pipelines, language-agnostic integration

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # From any language via subprocess
    praison recipes run video-caption-generator \
      --input '{"video_path": "video.mp4"}' \
      --json
    ```

    **Node.js example:**

    ```javascript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    const { execSync } = require('child_process');

    const input = JSON.stringify({ video_path: 'video.mp4' });
    const result = execSync(
      `praison recipes run video-caption-generator --input '${input}' --json`
    );
    const output = JSON.parse(result.toString());
    ```

    **Deployment note:** Requires `praisonai` CLI installed on the system.

    <Warning>
      **Safety:** Validate file paths before passing to CLI to prevent path traversal.
    </Warning>
  </Tab>

  <Tab title="Model 3: Plugin Mode">
    **When to use:** IDE extensions, CMS plugins, chat applications

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # VS Code extension example
    class CaptionGeneratorPlugin:
        def __init__(self):
            from praisonai import recipe
            self.recipe = recipe
        
        def generate_captions(self, video_path: str):
            return self.recipe.run(
                "video-caption-generator",
                input={"video_path": video_path}
            )

    # Register with IDE
    plugin = CaptionGeneratorPlugin()
    ```

    **Deployment note:** Embed in plugin architecture, handle UI callbacks.

    <Warning>
      **Safety:** Respect IDE sandbox permissions for file access.
    </Warning>
  </Tab>

  <Tab title="Model 4: Local HTTP Sidecar">
    **When to use:** Microservices, polyglot environments, local API

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Start the sidecar server
    praison recipes serve --port 8765
    ```

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Call from any HTTP client
    import requests

    response = requests.post(
        "http://localhost:8765/recipes/video-caption-generator/run",
        json={"video_path": "video.mp4", "output_format": "srt"}
    )
    result = response.json()
    ```

    **Deployment note:** Run as a local service, configure port and auth as needed.

    <Warning>
      **Safety:** Bind to localhost only. Use authentication for non-local access.
    </Warning>
  </Tab>

  <Tab title="Model 5: Remote Managed Runner">
    **When to use:** Multi-tenant SaaS, cloud deployments, authenticated access

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    import requests

    # Call remote runner with auth
    response = requests.post(
        "https://api.your-service.com/recipes/video-caption-generator/run",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "video_path": "s3://bucket/video.mp4",
            "output_format": "srt"
        }
    )
    result = response.json()
    ```

    **Deployment note:** Deploy behind API gateway, implement rate limiting and auth.

    <Warning>
      **Safety:** Use signed URLs for file access. Implement tenant isolation.
    </Warning>
  </Tab>

  <Tab title="Model 6: Event-Driven">
    **When to use:** Batch processing, async workflows, queue-based systems

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Producer: Submit job to queue
    import json
    import queue  # or use: from your_queue_lib import queue

    job_queue = queue.Queue()  # Replace with SQS/RabbitMQ client in production
    job = {
        "recipe": "video-caption-generator",
        "input": {"video_path": "s3://bucket/video.mp4"},
        "callback_url": "https://your-app.com/webhook"
    }
    job_queue.put(json.dumps(job))

    # Consumer: Process from queue
    def process_job(message):
        from praisonai import recipe
        job = json.loads(message)
        result = recipe.run(job["recipe"], input=job["input"])
        
        # Send result to callback
        requests.post(job["callback_url"], json=result.to_dict())
    ```

    **Deployment note:** Use SQS, RabbitMQ, or Redis for queue. Handle retries.

    <Warning>
      **Safety:** Validate callback URLs. Implement job timeout and dead-letter queues.
    </Warning>
  </Tab>
</Tabs>

## Troubleshooting

<AccordionGroup>
  <Accordion title="ffmpeg not found">
    **Symptom:** Error message about ffmpeg not being installed.

    **Solution:**

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # macOS
    brew install ffmpeg

    # Ubuntu/Debian
    sudo apt install ffmpeg

    # Windows
    choco install ffmpeg
    ```

    The recipe will still work without ffmpeg but may have reduced quality.
  </Accordion>

  <Accordion title="API key not set">
    **Symptom:** Authentication error from OpenAI.

    **Solution:**

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    export OPENAI_API_KEY=your_key_here

    # Verify it's set
    echo $OPENAI_API_KEY
    ```
  </Accordion>

  <Accordion title="Poor transcription quality">
    **Symptom:** Captions contain errors or miss words.

    **Solutions:**

    * Specify the language explicitly instead of auto-detect
    * Ensure audio quality is good (reduce background noise)
    * Try a different model by setting `OPENAI_MODEL=gpt-4o`
  </Accordion>

  <Accordion title="Large file processing timeout">
    **Symptom:** Recipe times out on long videos.

    **Solution:**

    * Split video into smaller segments
    * Use async/event-driven integration model
    * Increase timeout in config
  </Accordion>
</AccordionGroup>

## Next Steps

* **[Podcast Transcription Cleaner](/docs/examples/recipe-examples/podcast-transcription-cleaner)** - Similar recipe for audio files
* **[Multilingual Subtitle Translator](/docs/examples/recipe-examples/multilingual-subtitle-translator)** - Translate your generated captions
* **[Integration Models](/docs/guides/recipes/integration-models)** - Deep dive into deployment options
