Multi-Agent Media Pipelines
Create powerful media processing workflows by chaining specialized agents (AudioAgent, VideoAgent, ImageAgent, OCRAgent) together with standard agents. Context passes seamlessly between agents using{{previous_output}}.
Overview
Multi-agent pipelines allow you to:- Chain different agent types in sequence
- Pass context between agents automatically
- Process media through multiple transformation stages
- Combine AI capabilities (transcription → research → generation)
Example: 5-Agent Media Pipeline
This example demonstrates a complete pipeline: STT → Research → Image → Video → TTSContext Passing
Use{{previous_output}} to pass the output from one agent to the next:
Mixed Agent Types
Combine specialized agents with standard agents:CLI Usage
Run the multi-agent pipeline recipe:Python API
Create multi-agent pipelines programmatically:Available Recipes
| Recipe | Description | Agents |
|---|---|---|
ai-text-to-speech | Convert text to speech | AudioAgent |
ai-speech-to-text | Transcribe audio | AudioAgent |
ai-generate-image | Generate images | ImageAgent |
ai-generate-video | Generate videos | VideoAgent |
ai-document-ocr | Extract text from documents | OCRAgent |
ai-media-pipeline | Complete 5-agent pipeline | AudioAgent, Agent, ImageAgent, VideoAgent |
Best Practices
- Order matters - Place agents in logical sequence (input → processing → output)
- Use appropriate models - Match model capabilities to task requirements
- Handle file outputs - Ensure output paths are specified for media files
- Test incrementally - Test each agent individually before combining
- Monitor context size - Large outputs may need summarization between steps
Error Handling
Add error handling with guardrails:Related
- Specialized Agents - Individual agent type documentation
- Workflow Patterns - General workflow patterns
- YAML Workflows - YAML workflow syntax
- Context Passing - How context works between agents

