Skip to main content

AudioAgent

Defined in the audio_agent module.
A specialized agent for audio processing using AI models. Provides:
  • Text-to-Speech (TTS): Convert text to spoken audio
  • Speech-to-Text (STT): Transcribe audio to text
TTS Providers:
  • OpenAI: openai/tts-1, openai/tts-1-hd
  • Azure: azure/tts-1
  • Gemini: gemini/gemini-2.5-flash-preview-tts
  • Vertex AI: vertex_ai/gemini-2.5-flash-preview-tts
  • ElevenLabs: elevenlabs/eleven_multilingual_v2
  • MiniMax: minimax/speech-01
STT Providers:
  • OpenAI: openai/whisper-1
  • Azure: azure/whisper
  • Groq: groq/whisper-large-v3
  • Deepgram: deepgram/nova-2
  • Gemini: gemini/gemini-2.0-flash

Constructor

name
Optional
No description available.
instructions
Optional
No description available.
llm
Optional
No description available.
model
Optional
No description available.
base_url
Optional
No description available.
api_key
Optional
No description available.
audio
Optional
No description available.
verbose
Union
default:"True"
No description available.

Methods

Usage

from praisonaiagents import AudioAgent
    
    # Text-to-Speech
    agent = AudioAgent(llm="openai/tts-1")
    agent.speech("Hello world!", output="hello.mp3")
    
    # Speech-to-Text
    agent = AudioAgent(llm="openai/whisper-1")
    text = agent.transcribe("audio.mp3")
    print(text)