Quick Start
- Bot CLI
- Python
TTS Tool
Convert text to speech and get an audio file.Usage
Options
| Parameter | Type | Default | Description |
|---|---|---|---|
text | str | required | Text to convert |
voice | str | "alloy" | Voice: alloy, echo, fable, onyx, nova, shimmer |
model | str | "openai/tts-1" | TTS model |
output_format | str | "mp3" | Format: mp3, opus, aac, flac, wav |
output_dir | str | temp dir | Directory to save audio |
STT Tool
Transcribe audio files to text.Usage
Options
| Parameter | Type | Default | Description |
|---|---|---|---|
audio_path | str | required | Path to audio file |
language | str | auto | Language code (en, es, fr, etc.) |
model | str | "openai/whisper-1" | STT model |
Bot CLI Options
Enable audio tools when starting bots:| Option | Description |
|---|---|
--tts | Enable TTS tool |
--tts-voice VOICE | Voice (alloy, echo, fable, onyx, nova, shimmer) |
--tts-model MODEL | TTS model (default: openai/tts-1) |
--auto-tts | Auto-convert all responses to speech |
--stt | Enable STT tool |
--stt-model MODEL | STT model (default: openai/whisper-1) |
Examples
Supported Providers
Audio tools use the coreAudioAgent which supports multiple providers via LiteLLM:
TTS Providers
TTS Providers
| Provider | Model | Notes |
|---|---|---|
| OpenAI | openai/tts-1, openai/tts-1-hd | Default, high quality |
| Azure | azure/tts-1 | Enterprise |
| ElevenLabs | elevenlabs/eleven_multilingual_v2 | Premium voices |
| Gemini | gemini/gemini-2.5-flash-preview-tts |
STT Providers
STT Providers
| Provider | Model | Notes |
|---|---|---|
| OpenAI | openai/whisper-1 | Default, accurate |
| Azure | azure/whisper | Enterprise |
| Groq | groq/whisper-large-v3 | Fast |
| Deepgram | deepgram/nova-2 | Real-time |
Voice Options
Available voices for OpenAI TTS:| Voice | Description |
|---|---|
alloy | Neutral, balanced (default) |
echo | Warm, conversational |
fable | Expressive, storytelling |
onyx | Deep, authoritative |
nova | Friendly, upbeat |
shimmer | Clear, professional |
Architecture
Audio tools are in the wrapper layer (
praisonai), not the core SDK. They wrap the core AudioAgent for easy use with agents and bots.Related
Bot CLI
Full bot CLI reference
AudioAgent
Core AudioAgent class

