Multi-Agent Media Pipelines

Create powerful media processing workflows by chaining specialized agents (AudioAgent, VideoAgent, ImageAgent, OCRAgent) together with standard agents. Context passes seamlessly between agents using {{previous_output}}.

Overview

Multi-agent pipelines allow you to:

Chain different agent types in sequence
Pass context between agents automatically
Process media through multiple transformation stages
Combine AI capabilities (transcription → research → generation)

Example: 5-Agent Media Pipeline

This example demonstrates a complete pipeline: STT → Research → Image → Video → TTS

name: Media Pipeline
description: Complete media pipeline from audio to video
process: sequential

agents:
  # Agent 1: Speech-to-Text
  transcriber:
    agent: AudioAgent
    llm: openai/whisper-1
    role: Audio Transcriber
    goal: Convert audio to text

  # Agent 2: Research (standard Agent with tools)
  researcher:
    role: Research Specialist
    goal: Research the topic
    tools:
      - tavily_search

  # Agent 3: Image Generation
  image_creator:
    agent: ImageAgent
    llm: openai/dall-e-3
    role: Visual Artist
    goal: Create images

  # Agent 4: Video Generation
  video_creator:
    agent: VideoAgent
    llm: openai/sora-2
    role: Video Producer
    goal: Create videos

  # Agent 5: Text-to-Speech (Voiceover)
  narrator:
    agent: AudioAgent
    llm: openai/tts-1-hd
    role: Voice Narrator
    goal: Create voiceovers

steps:
  - agent: transcriber
    action: transcribe
    input: "{{audio_file}}"

  - agent: researcher
    action: "Research based on: {{previous_output}}"

  - agent: image_creator
    action: generate
    prompt: "{{previous_output}}"

  - agent: video_creator
    action: generate
    prompt: "{{previous_output}}"

  - agent: narrator
    action: speech
    text: "{{previous_output}}"
    output: "voiceover.mp3"

variables:
  audio_file: input.mp3

Context Passing

Use {{previous_output}} to pass the output from one agent to the next:

steps:
  - agent: transcriber
    action: transcribe
    input: "audio.mp3"
  
  # The transcription text is available as {{previous_output}}
  - agent: researcher
    action: "Research this topic: {{previous_output}}"
  
  # The research summary is now {{previous_output}}
  - agent: artist
    action: generate
    prompt: "Create an image for: {{previous_output}}"

Mixed Agent Types

Combine specialized agents with standard agents:

agents:
  # Specialized agent for transcription
  transcriber:
    agent: AudioAgent
    llm: openai/whisper-1
    role: Transcriber
    goal: Transcribe audio

  # Standard agent for analysis
  analyzer:
    role: Content Analyst
    goal: Analyze and summarize content
    instructions: You analyze content and provide insights.

  # Specialized agent for image generation
  visualizer:
    agent: ImageAgent
    llm: openai/dall-e-3
    role: Visualizer
    goal: Create visual representations

steps:
  - agent: transcriber
    action: transcribe
    input: "meeting.mp3"
  
  - agent: analyzer
    action: "Analyze this transcript and identify key themes: {{previous_output}}"
  
  - agent: visualizer
    action: generate
    prompt: "Create an infographic showing: {{previous_output}}"

CLI Usage

Run the multi-agent pipeline recipe:

# Run the complete media pipeline
praisonai recipe run ai-media-pipeline --var audio_file=input.mp3

# With custom output directory
praisonai recipe run ai-media-pipeline --var audio_file=podcast.mp3 --var output_dir=./output

Python API

Create multi-agent pipelines programmatically:

from praisonaiagents.workflows.yaml_parser import YAMLWorkflowParser

yaml_content = """
name: Custom Pipeline
process: sequential

agents:
  transcriber:
    agent: AudioAgent
    llm: openai/whisper-1
    role: Transcriber
    goal: Transcribe audio
  
  summarizer:
    role: Summarizer
    goal: Summarize content

steps:
  - agent: transcriber
    action: transcribe
    input: "{{audio_file}}"
  
  - agent: summarizer
    action: "Summarize: {{previous_output}}"

variables:
  audio_file: recording.mp3
"""

parser = YAMLWorkflowParser()
workflow = parser.parse_string(yaml_content)

# Check agent types
for name, agent in parser._agents.items():
    print(f"{name}: {agent.__class__.__name__}")

# Run the workflow
result = workflow.start()

Available Recipes

Recipe	Description	Agents
`ai-text-to-speech`	Convert text to speech	AudioAgent
`ai-speech-to-text`	Transcribe audio	AudioAgent
`ai-generate-image`	Generate images	ImageAgent
`ai-generate-video`	Generate videos	VideoAgent
`ai-document-ocr`	Extract text from documents	OCRAgent
`ai-media-pipeline`	Complete 5-agent pipeline	AudioAgent, Agent, ImageAgent, VideoAgent

Best Practices

Order matters - Place agents in logical sequence (input → processing → output)
Use appropriate models - Match model capabilities to task requirements
Handle file outputs - Ensure output paths are specified for media files
Test incrementally - Test each agent individually before combining
Monitor context size - Large outputs may need summarization between steps

Error Handling

Add error handling with guardrails:

steps:
  - agent: transcriber
    action: transcribe
    input: "{{audio_file}}"
    max_retries: 3
  
  - agent: researcher
    action: "Research: {{previous_output}}"
    guardrail: validate_research_output

Specialized Agents - Individual agent type documentation
Workflow Patterns - General workflow patterns
YAML Workflows - YAML workflow syntax
Context Passing - How context works between agents

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Multi-Agent Media Pipelines

Multi-Agent Media Pipelines

Overview

Example: 5-Agent Media Pipeline

Context Passing

Mixed Agent Types

CLI Usage

Python API

Available Recipes

Best Practices

Error Handling

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

​Multi-Agent Media Pipelines

​Overview

​Example: 5-Agent Media Pipeline

​Context Passing

​Mixed Agent Types

​CLI Usage

​Python API

​Available Recipes

​Best Practices

​Error Handling

​Related

Multi-Agent Media Pipelines

Overview

Example: 5-Agent Media Pipeline

Context Passing

Mixed Agent Types

CLI Usage

Python API

Available Recipes

Best Practices

Error Handling

Related