Skip to main content
Audio processing using Google’s Gemini models.

Setup

export GOOGLE_API_KEY=your-key

Text-to-Speech

from praisonaiagents import AudioAgent

agent = AudioAgent(llm="gemini/gemini-2.5-flash-preview-tts")
agent.speech("Hello world!", output="hello.mp3")

Speech-to-Text

from praisonaiagents import AudioAgent

agent = AudioAgent(llm="gemini/gemini-2.0-flash")
text = agent.transcribe("audio.mp3")
print(text)

Models

ModelType
gemini/gemini-2.5-flash-preview-ttsTTS
gemini/gemini-2.0-flashSTT