Specialized Agents
PraisonAI supports specialized agent types that provide domain-specific capabilities for media processing, document handling, and more. These agents can be used in YAML workflows using the simpleagent: field.
Supported Agent Types
| Agent Type | Purpose | Key Methods |
|---|---|---|
AudioAgent | Text-to-Speech (TTS) and Speech-to-Text (STT) | speech(), transcribe() |
VideoAgent | Video generation | generate() |
ImageAgent | Image generation, editing, variations | generate(), edit() |
OCRAgent | Text extraction from documents/images | extract() |
DeepResearchAgent | Automated research with citations | research() |
Quick Start
Text-to-Speech (TTS)
Speech-to-Text (STT)
Image Generation
Video Generation
Document OCR
Python API
You can also use specialized agents directly in Python:Supported Providers
AudioAgent (TTS)
openai/tts-1- OpenAI TTSopenai/tts-1-hd- OpenAI TTS HDelevenlabs/eleven_multilingual_v2- ElevenLabsgemini/gemini-2.5-flash-preview-tts- Google Gemini
AudioAgent (STT)
openai/whisper-1- OpenAI Whispergroq/whisper-large-v3- Groq Whisperdeepgram/nova-2- Deepgram
ImageAgent
openai/dall-e-3- DALL-E 3openai/dall-e-2- DALL-E 2vertex_ai/imagen-3.0-generate-001- Google Imagen
VideoAgent
openai/sora-2- OpenAI Soragemini/veo-3.0-generate-preview- Google Veorunwayml/gen4_turbo- RunwayML
OCRAgent
mistral/mistral-ocr-latest- Mistral OCR
CLI Usage
Use specialized agents via recipes:Best Practices
- Use appropriate models - Choose the right model for your use case (e.g.,
tts-1-hdfor higher quality audio) - Handle file outputs - Specialized agents often produce files; ensure proper output paths
- Chain with standard agents - Combine specialized agents with standard
Agentfor complex workflows - Use context passing - Use
{{previous_output}}to pass results between agents
Related
- Multi-Agent Pipelines - Chain specialized agents together
- Audio Agents - Detailed audio agent documentation
- Video Agents - Detailed video agent documentation
- Image Agents - Detailed image agent documentation
- OCR - Detailed OCR documentation

