Video Caption Generator
Generate captions from video files with automatic language detection and support for SRT/VTT output formats.Problem Statement
Who: Content creators, video editors, accessibility teamsWhy: Manual captioning is time-consuming and expensive. Automated captions improve accessibility and SEO.
What You’ll Build
A recipe that extracts audio from video, transcribes it, and generates properly formatted caption files.Input/Output Contract
| Input | Type | Required | Description |
|---|---|---|---|
video_path | string | Yes | Path to the video file |
language | string | No | Language code (auto-detect if omitted) |
output_format | string | No | srt or vtt (default: srt) |
| Output | Type | Description |
|---|---|---|
captions_file | string | Path to generated caption file |
summary | string | Brief summary of the video content |
ok | boolean | Success indicator |
Prerequisites
Step-by-Step Build
1
Create Recipe Directory
2
Create TEMPLATE.yaml
Create the recipe metadata file:
3
Create recipe.py
Implement the main recipe logic:
4
Create test_recipe.py
Write tests for the recipe:
5
Create README.md
Document the recipe:
Inputs
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| video_path | string | Yes | - | Path to video file |
| language | string | No | auto | Language code (en, es, fr, etc.) |
| output_format | string | No | srt | Output format: srt or vtt |
Outputs
| Field | Type | Description |
|---|---|---|
| captions_file | string | Path to generated caption file |
| summary | string | Brief content summary |
| ok | boolean | Success indicator |
Requirements
OPENAI_API_KEYenvironment variableffmpeg(optional, for audio extraction)praisonaiagentspackage
Examples
Basic Usage
Specify Language and Format
Troubleshooting
| Issue | Solution |
|---|---|
| ”ffmpeg not found” | Install ffmpeg: brew install ffmpeg or apt install ffmpeg |
| ”API key missing” | Set export OPENAI_API_KEY=your_key |
| Poor transcription | Try specifying the language explicitly |
Run Locally
Using CLI
Using Python SDK
Deploy & Integrate: 6 Integration Models
- Model 1: Embedded SDK
- Model 2: CLI Invocation
- Model 3: Plugin Mode
- Model 4: Local HTTP Sidecar
- Model 5: Remote Managed Runner
- Model 6: Event-Driven
When to use: Python applications, Jupyter notebooks, direct integrationDeployment note: Runs in-process, lowest latency, requires Python environment.
Troubleshooting
ffmpeg not found
ffmpeg not found
Symptom: Error message about ffmpeg not being installed.Solution:The recipe will still work without ffmpeg but may have reduced quality.
API key not set
API key not set
Symptom: Authentication error from OpenAI.Solution:
Poor transcription quality
Poor transcription quality
Symptom: Captions contain errors or miss words.Solutions:
- Specify the language explicitly instead of auto-detect
- Ensure audio quality is good (reduce background noise)
- Try a different model by setting
OPENAI_MODEL=gpt-4o
Large file processing timeout
Large file processing timeout
Symptom: Recipe times out on long videos.Solution:
- Split video into smaller segments
- Use async/event-driven integration model
- Increase timeout in config
Next Steps
- Podcast Transcription Cleaner - Similar recipe for audio files
- Multilingual Subtitle Translator - Translate your generated captions
- Integration Models - Deep dive into deployment options

