Skip to main content
Agents can speak and listen - convert text to speech and transcribe audio.

Quick Start

1

Text to Speech

import { Agent } from 'praisonai';

const agent = new Agent({
  instructions: 'You are a helpful assistant',
  audio: true
});

// Agent response as audio
const audio = await agent.speak('Hello! How can I help you today?');
// Returns audio buffer
2

Speech to Text

// Transcribe audio file
const text = await agent.transcribe('./recording.mp3');
console.log(text);
// "This is what was said in the recording..."

User Interaction Flow


Configuration Levels

// Level 1: Bool - Enable with defaults
const agent = new Agent({
  audio: true
});

// Level 2: String - Specify voice
const agent = new Agent({
  audio: 'alloy'  // OpenAI voice name
});

// Level 3: Dict - Full options
const agent = new Agent({
  audio: {
    voice: 'nova',
    model: 'tts-1-hd',
    speed: 1.0,
    format: 'mp3'
  }
});

// Level 4: Instance - AudioAgent
import { AudioAgent } from 'praisonai';

const audio = new AudioAgent({
  provider: 'elevenlabs',
  voice: 'rachel'
});

Audio Options

OptionDescription
voiceVoice name (alloy, echo, nova, etc.)
modelTTS model (tts-1, tts-1-hd)
speedPlayback speed (0.25 to 4.0)
formatOutput format (mp3, wav, opus)

API Reference

AudioConfig

Complete configuration options

Best Practices

tts-1-hd sounds more natural but costs more.
Choose voices that match your content’s tone and audience.
Break long text into chunks for better audio quality.