Skip to main content

Multi-Agent Media Pipelines

Create powerful media processing workflows by chaining specialized agents (AudioAgent, VideoAgent, ImageAgent, OCRAgent) together with standard agents. Context passes seamlessly between agents using {{previous_output}}.

Overview

Multi-agent pipelines allow you to:
  • Chain different agent types in sequence
  • Pass context between agents automatically
  • Process media through multiple transformation stages
  • Combine AI capabilities (transcription → research → generation)

Example: 5-Agent Media Pipeline

This example demonstrates a complete pipeline: STT → Research → Image → Video → TTS
name: Media Pipeline
description: Complete media pipeline from audio to video
process: sequential

agents:
  # Agent 1: Speech-to-Text
  transcriber:
    agent: AudioAgent
    llm: openai/whisper-1
    role: Audio Transcriber
    goal: Convert audio to text

  # Agent 2: Research (standard Agent with tools)
  researcher:
    role: Research Specialist
    goal: Research the topic
    tools:
      - tavily_search

  # Agent 3: Image Generation
  image_creator:
    agent: ImageAgent
    llm: openai/dall-e-3
    role: Visual Artist
    goal: Create images

  # Agent 4: Video Generation
  video_creator:
    agent: VideoAgent
    llm: openai/sora-2
    role: Video Producer
    goal: Create videos

  # Agent 5: Text-to-Speech (Voiceover)
  narrator:
    agent: AudioAgent
    llm: openai/tts-1-hd
    role: Voice Narrator
    goal: Create voiceovers

steps:
  - agent: transcriber
    action: transcribe
    input: "{{audio_file}}"

  - agent: researcher
    action: "Research based on: {{previous_output}}"

  - agent: image_creator
    action: generate
    prompt: "{{previous_output}}"

  - agent: video_creator
    action: generate
    prompt: "{{previous_output}}"

  - agent: narrator
    action: speech
    text: "{{previous_output}}"
    output: "voiceover.mp3"

variables:
  audio_file: input.mp3

Context Passing

Use {{previous_output}} to pass the output from one agent to the next:
steps:
  - agent: transcriber
    action: transcribe
    input: "audio.mp3"
  
  # The transcription text is available as {{previous_output}}
  - agent: researcher
    action: "Research this topic: {{previous_output}}"
  
  # The research summary is now {{previous_output}}
  - agent: artist
    action: generate
    prompt: "Create an image for: {{previous_output}}"

Mixed Agent Types

Combine specialized agents with standard agents:
agents:
  # Specialized agent for transcription
  transcriber:
    agent: AudioAgent
    llm: openai/whisper-1
    role: Transcriber
    goal: Transcribe audio

  # Standard agent for analysis
  analyzer:
    role: Content Analyst
    goal: Analyze and summarize content
    instructions: You analyze content and provide insights.

  # Specialized agent for image generation
  visualizer:
    agent: ImageAgent
    llm: openai/dall-e-3
    role: Visualizer
    goal: Create visual representations

steps:
  - agent: transcriber
    action: transcribe
    input: "meeting.mp3"
  
  - agent: analyzer
    action: "Analyze this transcript and identify key themes: {{previous_output}}"
  
  - agent: visualizer
    action: generate
    prompt: "Create an infographic showing: {{previous_output}}"

CLI Usage

Run the multi-agent pipeline recipe:
# Run the complete media pipeline
praisonai recipe run ai-media-pipeline --var audio_file=input.mp3

# With custom output directory
praisonai recipe run ai-media-pipeline --var audio_file=podcast.mp3 --var output_dir=./output

Python API

Create multi-agent pipelines programmatically:
from praisonaiagents.workflows.yaml_parser import YAMLWorkflowParser

yaml_content = """
name: Custom Pipeline
process: sequential

agents:
  transcriber:
    agent: AudioAgent
    llm: openai/whisper-1
    role: Transcriber
    goal: Transcribe audio
  
  summarizer:
    role: Summarizer
    goal: Summarize content

steps:
  - agent: transcriber
    action: transcribe
    input: "{{audio_file}}"
  
  - agent: summarizer
    action: "Summarize: {{previous_output}}"

variables:
  audio_file: recording.mp3
"""

parser = YAMLWorkflowParser()
workflow = parser.parse_string(yaml_content)

# Check agent types
for name, agent in parser._agents.items():
    print(f"{name}: {agent.__class__.__name__}")

# Run the workflow
result = workflow.start()

Available Recipes

RecipeDescriptionAgents
ai-text-to-speechConvert text to speechAudioAgent
ai-speech-to-textTranscribe audioAudioAgent
ai-generate-imageGenerate imagesImageAgent
ai-generate-videoGenerate videosVideoAgent
ai-document-ocrExtract text from documentsOCRAgent
ai-media-pipelineComplete 5-agent pipelineAudioAgent, Agent, ImageAgent, VideoAgent

Best Practices

  1. Order matters - Place agents in logical sequence (input → processing → output)
  2. Use appropriate models - Match model capabilities to task requirements
  3. Handle file outputs - Ensure output paths are specified for media files
  4. Test incrementally - Test each agent individually before combining
  5. Monitor context size - Large outputs may need summarization between steps

Error Handling

Add error handling with guardrails:
steps:
  - agent: transcriber
    action: transcribe
    input: "{{audio_file}}"
    max_retries: 3
  
  - agent: researcher
    action: "Research: {{previous_output}}"
    guardrail: validate_research_output