Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
Multi-Agent Media Pipelines
Create powerful media processing workflows by chaining specialized agents (AudioAgent, VideoAgent, ImageAgent, OCRAgent) together with standard agents. Context passes seamlessly between agents using {{previous_output}}.
Overview
Multi-agent pipelines allow you to:
- Chain different agent types in sequence
- Pass context between agents automatically
- Process media through multiple transformation stages
- Combine AI capabilities (transcription → research → generation)
This example demonstrates a complete pipeline: STT → Research → Image → Video → TTS
name: Media Pipeline
description: Complete media pipeline from audio to video
process: sequential
agents:
# Agent 1: Speech-to-Text
transcriber:
agent: AudioAgent
llm: openai/whisper-1
role: Audio Transcriber
goal: Convert audio to text
# Agent 2: Research (standard Agent with tools)
researcher:
role: Research Specialist
goal: Research the topic
tools:
- tavily_search
# Agent 3: Image Generation
image_creator:
agent: ImageAgent
llm: openai/dall-e-3
role: Visual Artist
goal: Create images
# Agent 4: Video Generation
video_creator:
agent: VideoAgent
llm: openai/sora-2
role: Video Producer
goal: Create videos
# Agent 5: Text-to-Speech (Voiceover)
narrator:
agent: AudioAgent
llm: openai/tts-1-hd
role: Voice Narrator
goal: Create voiceovers
steps:
- agent: transcriber
action: transcribe
input: "{{audio_file}}"
- agent: researcher
action: "Research based on: {{previous_output}}"
- agent: image_creator
action: generate
prompt: "{{previous_output}}"
- agent: video_creator
action: generate
prompt: "{{previous_output}}"
- agent: narrator
action: speech
text: "{{previous_output}}"
output: "voiceover.mp3"
variables:
audio_file: input.mp3
Context Passing
Use {{previous_output}} to pass the output from one agent to the next:
steps:
- agent: transcriber
action: transcribe
input: "audio.mp3"
# The transcription text is available as {{previous_output}}
- agent: researcher
action: "Research this topic: {{previous_output}}"
# The research summary is now {{previous_output}}
- agent: artist
action: generate
prompt: "Create an image for: {{previous_output}}"
Mixed Agent Types
Combine specialized agents with standard agents:
agents:
# Specialized agent for transcription
transcriber:
agent: AudioAgent
llm: openai/whisper-1
role: Transcriber
goal: Transcribe audio
# Standard agent for analysis
analyzer:
role: Content Analyst
goal: Analyze and summarize content
instructions: You analyze content and provide insights.
# Specialized agent for image generation
visualizer:
agent: ImageAgent
llm: openai/dall-e-3
role: Visualizer
goal: Create visual representations
steps:
- agent: transcriber
action: transcribe
input: "meeting.mp3"
- agent: analyzer
action: "Analyze this transcript and identify key themes: {{previous_output}}"
- agent: visualizer
action: generate
prompt: "Create an infographic showing: {{previous_output}}"
CLI Usage
Run the multi-agent pipeline recipe:
# Run the complete media pipeline
praisonai recipe run ai-media-pipeline --var audio_file=input.mp3
# With custom output directory
praisonai recipe run ai-media-pipeline --var audio_file=podcast.mp3 --var output_dir=./output
Python API
Create multi-agent pipelines programmatically:
from praisonaiagents.workflows.yaml_parser import YAMLWorkflowParser
yaml_content = """
name: Custom Pipeline
process: sequential
agents:
transcriber:
agent: AudioAgent
llm: openai/whisper-1
role: Transcriber
goal: Transcribe audio
summarizer:
role: Summarizer
goal: Summarize content
steps:
- agent: transcriber
action: transcribe
input: "{{audio_file}}"
- agent: summarizer
action: "Summarize: {{previous_output}}"
variables:
audio_file: recording.mp3
"""
parser = YAMLWorkflowParser()
workflow = parser.parse_string(yaml_content)
# Check agent types
for name, agent in parser._agents.items():
print(f"{name}: {agent.__class__.__name__}")
# Run the workflow
result = workflow.start()
Available Recipes
| Recipe | Description | Agents |
|---|
ai-text-to-speech | Convert text to speech | AudioAgent |
ai-speech-to-text | Transcribe audio | AudioAgent |
ai-generate-image | Generate images | ImageAgent |
ai-generate-video | Generate videos | VideoAgent |
ai-document-ocr | Extract text from documents | OCRAgent |
ai-media-pipeline | Complete 5-agent pipeline | AudioAgent, Agent, ImageAgent, VideoAgent |
Best Practices
- Order matters - Place agents in logical sequence (input → processing → output)
- Use appropriate models - Match model capabilities to task requirements
- Handle file outputs - Ensure output paths are specified for media files
- Test incrementally - Test each agent individually before combining
- Monitor context size - Large outputs may need summarization between steps
Error Handling
Add error handling with guardrails:
steps:
- agent: transcriber
action: transcribe
input: "{{audio_file}}"
max_retries: 3
- agent: researcher
action: "Research: {{previous_output}}"
guardrail: validate_research_output