Audio - PraisonAI

Agents can speak and listen - convert text to speech and transcribe audio.

Quick Start

Text to Speech

import { Agent } from 'praisonai';

const agent = new Agent({
  instructions: 'You are a helpful assistant',
  audio: true
});

// Agent response as audio
const audio = await agent.speak('Hello! How can I help you today?');
// Returns audio buffer

Speech to Text

// Transcribe audio file
const text = await agent.transcribe('./recording.mp3');
console.log(text);
// "This is what was said in the recording..."

User Interaction Flow

Configuration Levels

// Level 1: Bool - Enable with defaults
const agent = new Agent({
  audio: true
});

// Level 2: String - Specify voice
const agent = new Agent({
  audio: 'alloy'  // OpenAI voice name
});

// Level 3: Dict - Full options
const agent = new Agent({
  audio: {
    voice: 'nova',
    model: 'tts-1-hd',
    speed: 1.0,
    format: 'mp3'
  }
});

// Level 4: Instance - AudioAgent
import { AudioAgent } from 'praisonai';

const audio = new AudioAgent({
  provider: 'elevenlabs',
  voice: 'rachel'
});

Audio Options

Option	Description
`voice`	Voice name (alloy, echo, nova, etc.)
`model`	TTS model (tts-1, tts-1-hd)
`speed`	Playback speed (0.25 to 4.0)
`format`	Output format (mp3, wav, opus)

API Reference

AudioConfig

Complete configuration options

Best Practices

Use HD for quality

tts-1-hd sounds more natural but costs more.

Match voice to content

Choose voices that match your content’s tone and audience.

Handle long text

Break long text into chunks for better audio quality.

Voice

Voice conversations

Realtime

Real-time streaming

JavaScript

​Quick Start

​User Interaction Flow

​Configuration Levels

​Audio Options

​API Reference