Voice-to-Voice Translator Lite
Translate spoken audio to another language, with optional text-to-speech output.Problem Statement
Who: Travelers, international teams, content localizationWhy: Real-time voice translation enables cross-language communication without manual transcription and translation steps.
What You’ll Build
A recipe that transcribes audio, translates the text, and optionally generates speech in the target language.Input/Output Contract
| Input | Type | Required | Description |
|---|---|---|---|
audio_path | string | Yes | Path to source audio file |
target_language | string | Yes | Target language code |
voice_style | string | No | TTS voice style (default: neutral) |
| Output | Type | Description |
|---|---|---|
translated_audio_file | string | Path to translated audio (if TTS available) |
transcript_file | string | Path to translated transcript |
ok | boolean | Success indicator |
Prerequisites
Copy
export OPENAI_API_KEY=your_key_here
pip install praisonaiagents
TTS is optional. If TTS is not configured, the recipe will output a transcript file only. This is a safe fallback for environments without audio synthesis capabilities.
Step-by-Step Build
1
Create Recipe Directory
Copy
mkdir -p ~/.praison/templates/voice-to-voice-translator-lite
cd ~/.praison/templates/voice-to-voice-translator-lite
2
Create TEMPLATE.yaml
Copy
name: voice-to-voice-translator-lite
version: "1.0.0"
description: "Translate spoken audio with optional TTS output"
author: "PraisonAI"
license: "MIT"
tags:
- audio
- translation
- voice
- tts
requires:
env:
- OPENAI_API_KEY
packages:
- praisonaiagents
optional_env:
- ELEVENLABS_API_KEY
inputs:
audio_path:
type: string
description: "Path to the source audio file"
required: true
target_language:
type: string
description: "Target language code (e.g., es, fr, de)"
required: true
voice_style:
type: string
description: "TTS voice style"
required: false
default: "neutral"
enum:
- neutral
- friendly
- professional
- energetic
outputs:
translated_audio_file:
type: string
description: "Path to translated audio file (null if TTS unavailable)"
transcript_file:
type: string
description: "Path to translated transcript"
ok:
type: boolean
description: "Success indicator"
cli:
command: "praison recipes run voice-to-voice-translator-lite"
examples:
- 'praison recipes run voice-to-voice-translator-lite --input ''{"audio_path": "speech.mp3", "target_language": "es"}'''
safety:
dry_run_default: false
requires_consent: false
overwrites_files: true
network_access: true
pii_handling: true
3
Create recipe.py
Copy
# recipe.py
import os
from pathlib import Path
from praisonaiagents import Agent, Task, PraisonAIAgents
def run(input_data: dict, config: dict = None) -> dict:
"""
Translate spoken audio to another language.
"""
audio_path = input_data.get("audio_path")
target_language = input_data.get("target_language")
voice_style = input_data.get("voice_style", "neutral")
if not audio_path:
return {"ok": False, "error": {"code": "MISSING_INPUT", "message": "audio_path is required"}}
if not target_language:
return {"ok": False, "error": {"code": "MISSING_INPUT", "message": "target_language is required"}}
if not os.path.exists(audio_path):
return {"ok": False, "error": {"code": "FILE_NOT_FOUND", "message": f"Audio file not found: {audio_path}"}}
try:
# Create transcription agent
transcriber = Agent(
name="Speech Transcriber",
role="Speech Recognition Specialist",
goal="Accurately transcribe spoken audio",
instructions="""
You are a speech recognition expert.
- Transcribe speech accurately
- Detect the source language
- Handle accents and dialects
- Note unclear sections
""",
)
# Create translation agent
translator = Agent(
name="Voice Translator",
role="Professional Translator",
goal=f"Translate speech naturally to {target_language}",
instructions=f"""
You are a professional translator specializing in spoken language.
- Translate to {target_language} naturally
- Preserve tone and intent
- Adapt idioms appropriately
- Keep the conversational style
""",
)
# Define tasks
transcribe_task = Task(
name="transcribe_speech",
description=f"Transcribe the audio from: {audio_path}",
expected_output="Transcribed text with detected language",
agent=transcriber,
)
translate_task = Task(
name="translate_speech",
description=f"Translate the transcription to {target_language}",
expected_output=f"Natural {target_language} translation",
agent=translator,
context=[transcribe_task],
)
# Execute
agents = PraisonAIAgents(
agents=[transcriber, translator],
tasks=[transcribe_task, translate_task],
)
result = agents.start()
# Save transcript
audio_name = Path(audio_path).stem
transcript_file = f"{audio_name}_{target_language}_transcript.txt"
translated_text = result.get("translate_speech", "")
with open(transcript_file, "w", encoding="utf-8") as f:
f.write(translated_text)
# Attempt TTS if available
translated_audio_file = None
warnings = []
tts_available = check_tts_availability()
if tts_available:
try:
translated_audio_file = generate_tts(
translated_text,
target_language,
voice_style,
f"{audio_name}_{target_language}.mp3"
)
except Exception as e:
warnings.append(f"TTS generation failed: {str(e)}. Transcript saved.")
else:
warnings.append("TTS not configured. Set ELEVENLABS_API_KEY for audio output.")
artifacts = [{"path": transcript_file, "type": "text", "size_bytes": os.path.getsize(transcript_file)}]
if translated_audio_file and os.path.exists(translated_audio_file):
artifacts.append({"path": translated_audio_file, "type": "audio", "size_bytes": os.path.getsize(translated_audio_file)})
return {
"ok": True,
"translated_audio_file": translated_audio_file,
"transcript_file": transcript_file,
"artifacts": artifacts,
"warnings": warnings,
}
except Exception as e:
return {"ok": False, "error": {"code": "PROCESSING_ERROR", "message": str(e)}}
def check_tts_availability() -> bool:
"""Check if TTS service is available."""
return bool(os.environ.get("ELEVENLABS_API_KEY"))
def generate_tts(text: str, language: str, style: str, output_path: str) -> str:
"""Generate TTS audio. Returns output path or raises exception."""
# Placeholder for TTS implementation
# In production, integrate with ElevenLabs, Google TTS, or similar
api_key = os.environ.get("ELEVENLABS_API_KEY")
if not api_key:
raise RuntimeError("TTS API key not configured")
# Example ElevenLabs integration (simplified)
# import requests
# response = requests.post(
# "https://api.elevenlabs.io/v1/text-to-speech/...",
# headers={"xi-api-key": api_key},
# json={"text": text, "voice_settings": {...}}
# )
# with open(output_path, "wb") as f:
# f.write(response.content)
# For now, create placeholder
with open(output_path, "wb") as f:
f.write(b"") # Placeholder
return output_path
4
Create test_recipe.py
Copy
# test_recipe.py
import pytest
from recipe import run, check_tts_availability
def test_missing_audio_path():
result = run({"target_language": "es"})
assert result["ok"] is False
assert result["error"]["code"] == "MISSING_INPUT"
def test_missing_target_language():
result = run({"audio_path": "test.mp3"})
assert result["ok"] is False
assert result["error"]["code"] == "MISSING_INPUT"
def test_file_not_found():
result = run({"audio_path": "/nonexistent.mp3", "target_language": "es"})
assert result["ok"] is False
def test_tts_availability_check():
# Should return False without API key
import os
original = os.environ.get("ELEVENLABS_API_KEY")
if "ELEVENLABS_API_KEY" in os.environ:
del os.environ["ELEVENLABS_API_KEY"]
assert check_tts_availability() is False
if original:
os.environ["ELEVENLABS_API_KEY"] = original
@pytest.mark.integration
def test_transcript_only():
"""Test that transcript is generated even without TTS."""
import os
import tempfile
test_audio = os.environ.get("TEST_AUDIO_PATH")
if not test_audio:
pytest.skip("No test audio available")
result = run({
"audio_path": test_audio,
"target_language": "es"
})
assert result["ok"] is True
assert result["transcript_file"] is not None
Run Locally
Copy
# Basic translation (transcript only without TTS key)
praison recipes run voice-to-voice-translator-lite \
--input '{"audio_path": "meeting.mp3", "target_language": "fr"}'
# With voice style
praison recipes run voice-to-voice-translator-lite \
--input '{"audio_path": "speech.wav", "target_language": "de", "voice_style": "professional"}'
Deploy & Integrate: 6 Integration Models
- Model 1: Embedded SDK
- Model 2: CLI Invocation
- Model 3: Plugin Mode
- Model 4: Local HTTP Sidecar
- Model 5: Remote Managed Runner
- Model 6: Event-Driven
Copy
from praisonai import recipe
result = recipe.run(
"voice-to-voice-translator-lite",
input={
"audio_path": "speech.mp3",
"target_language": "es",
"voice_style": "friendly"
}
)
if result.ok:
print(f"Transcript: {result.output['transcript_file']}")
if result.output['translated_audio_file']:
print(f"Audio: {result.output['translated_audio_file']}")
for warning in result.warnings:
print(f"Warning: {warning}")
Safety: Audio may contain PII. Handle files securely.
Copy
praison recipes run voice-to-voice-translator-lite \
--input '{"audio_path": "speech.mp3", "target_language": "ja"}' \
--json
Copy
class VoiceTranslatorPlugin:
def translate(self, audio_path, target_lang, style="neutral"):
from praisonai import recipe
return recipe.run(
"voice-to-voice-translator-lite",
input={
"audio_path": audio_path,
"target_language": target_lang,
"voice_style": style
}
)
Copy
praison recipes serve --port 8765
Copy
import requests
response = requests.post(
"http://localhost:8765/recipes/voice-to-voice-translator-lite/run",
json={
"audio_path": "/uploads/speech.mp3",
"target_language": "es"
}
)
Copy
response = requests.post(
"https://api.translation-service.com/voice-translate",
headers={"Authorization": f"Bearer {api_key}"},
json={
"audio_url": "https://cdn.example.com/speech.mp3",
"target_language": "fr"
}
)
Copy
def handle_audio_upload(event):
queue.send({
"recipe": "voice-to-voice-translator-lite",
"input": {
"audio_path": event['audio_path'],
"target_language": event['target_language']
},
"callback_url": event['callback_url']
})
Troubleshooting
No audio output generated
No audio output generated
Cause: TTS API key not configured.Solution: Set
ELEVENLABS_API_KEY environment variable. The recipe will still produce a transcript without it.Poor translation quality
Poor translation quality
Solutions:
- Ensure clear audio input
- Try specifying source language explicitly
- Use higher quality audio files
Next Steps
- Multilingual Subtitle Translator - Translate text subtitles
- Podcast Transcription Cleaner - Clean up transcripts

