Podcast Transcription Cleaner
Transcribe podcast audio with speaker diarization, filler word removal, and intelligent cleanup.Problem Statement
Who: Podcasters, content creators, transcription servicesWhy: Raw transcriptions are messy with filler words, overlapping speech, and no speaker identification.
What You’ll Build
A recipe that transcribes audio, identifies speakers, removes filler words, and produces clean, readable transcripts.Input/Output Contract
| Input | Type | Required | Description |
|---|---|---|---|
audio_path | string | Yes | Path to the audio file |
speaker_labels | boolean | No | Enable speaker diarization (default: true) |
cleanup_level | string | No | light, medium, heavy (default: medium) |
| Output | Type | Description |
|---|---|---|
transcript_file | string | Path to cleaned transcript |
highlights | array | Key moments and quotes |
ok | boolean | Success indicator |
Prerequisites
Copy
export OPENAI_API_KEY=your_key_here
pip install praisonaiagents
Step-by-Step Build
1
Create Recipe Directory
Copy
mkdir -p ~/.praison/templates/podcast-transcription-cleaner
cd ~/.praison/templates/podcast-transcription-cleaner
2
Create TEMPLATE.yaml
Copy
name: podcast-transcription-cleaner
version: "1.0.0"
description: "Transcribe and clean podcast audio with speaker labels"
author: "PraisonAI"
license: "MIT"
tags:
- audio
- podcast
- transcription
- cleanup
requires:
env:
- OPENAI_API_KEY
packages:
- praisonaiagents
inputs:
audio_path:
type: string
description: "Path to the podcast audio file"
required: true
speaker_labels:
type: boolean
description: "Enable speaker diarization"
required: false
default: true
cleanup_level:
type: string
description: "Level of text cleanup"
required: false
default: "medium"
enum:
- light
- medium
- heavy
outputs:
transcript_file:
type: string
description: "Path to the cleaned transcript"
highlights:
type: array
description: "Key moments and notable quotes"
ok:
type: boolean
description: "Success indicator"
cli:
command: "praison recipes run podcast-transcription-cleaner"
examples:
- 'praison recipes run podcast-transcription-cleaner --input ''{"audio_path": "episode.mp3"}'''
- 'praison recipes run podcast-transcription-cleaner --input ''{"audio_path": "podcast.wav", "cleanup_level": "heavy"}'''
safety:
dry_run_default: false
requires_consent: false
overwrites_files: true
network_access: true
pii_handling: true
3
Create recipe.py
Copy
# recipe.py
import os
from pathlib import Path
from praisonaiagents import Agent, Task, PraisonAIAgents
def run(input_data: dict, config: dict = None) -> dict:
"""
Transcribe and clean podcast audio.
Args:
input_data: Contains audio_path, speaker_labels, cleanup_level
config: Optional configuration overrides
Returns:
Dict with transcript_file, highlights, and ok status
"""
audio_path = input_data.get("audio_path")
if not audio_path:
return {
"ok": False,
"error": {"code": "MISSING_INPUT", "message": "audio_path is required"},
"transcript_file": None,
"highlights": [],
}
if not os.path.exists(audio_path):
return {
"ok": False,
"error": {"code": "FILE_NOT_FOUND", "message": f"Audio file not found: {audio_path}"},
"transcript_file": None,
"highlights": [],
}
speaker_labels = input_data.get("speaker_labels", True)
cleanup_level = input_data.get("cleanup_level", "medium")
try:
# Define cleanup instructions based on level
cleanup_instructions = {
"light": "Remove only obvious filler words (um, uh). Keep natural speech patterns.",
"medium": "Remove filler words, fix grammar, improve readability while preserving voice.",
"heavy": "Full editorial cleanup. Remove all filler, fix grammar, restructure for clarity."
}
# Create transcription agent
transcriber = Agent(
name="Podcast Transcriber",
role="Audio Transcription Specialist",
goal="Accurately transcribe podcast audio with timestamps",
instructions="""
You are an expert podcast transcriptionist.
- Transcribe speech accurately
- Note speaker changes
- Include timestamps at natural breaks
- Capture tone and emphasis where notable
""",
)
# Create speaker identification agent
diarizer = Agent(
name="Speaker Identifier",
role="Speaker Diarization Expert",
goal="Identify and label different speakers",
instructions="""
You are a speaker identification expert.
- Identify distinct speakers by voice characteristics
- Assign consistent labels (Speaker 1, Speaker 2, or names if mentioned)
- Note speaker transitions
- Handle overlapping speech
""",
)
# Create cleanup agent
cleaner = Agent(
name="Transcript Editor",
role="Editorial Specialist",
goal="Clean and polish transcripts",
instructions=f"""
You are a transcript editor.
Cleanup level: {cleanup_level}
{cleanup_instructions[cleanup_level]}
- Preserve the speaker's authentic voice
- Maintain meaning and context
- Format for readability
""",
)
# Create highlights extractor
highlighter = Agent(
name="Content Highlighter",
role="Content Analyst",
goal="Extract key moments and quotes",
instructions="""
You are a content analyst.
- Identify quotable moments
- Find key insights and takeaways
- Note interesting stories or anecdotes
- Highlight actionable advice
""",
)
# Define tasks
transcribe_task = Task(
name="transcribe_audio",
description=f"Transcribe the podcast audio from: {audio_path}",
expected_output="Raw timestamped transcription",
agent=transcriber,
)
tasks = [transcribe_task]
agents = [transcriber]
if speaker_labels:
diarize_task = Task(
name="identify_speakers",
description="Identify and label speakers in the transcription",
expected_output="Transcription with speaker labels",
agent=diarizer,
context=[transcribe_task],
)
tasks.append(diarize_task)
agents.append(diarizer)
cleanup_context = [diarize_task]
else:
cleanup_context = [transcribe_task]
cleanup_task = Task(
name="cleanup_transcript",
description=f"Clean the transcript at {cleanup_level} level",
expected_output="Cleaned, readable transcript",
agent=cleaner,
context=cleanup_context,
)
tasks.append(cleanup_task)
agents.append(cleaner)
highlight_task = Task(
name="extract_highlights",
description="Extract key moments and notable quotes",
expected_output="List of highlights with timestamps",
agent=highlighter,
context=[cleanup_task],
)
tasks.append(highlight_task)
agents.append(highlighter)
# Execute
workflow = PraisonAIAgents(agents=agents, tasks=tasks)
result = workflow.start()
# Save transcript
audio_name = Path(audio_path).stem
transcript_file = f"{audio_name}_transcript.txt"
with open(transcript_file, "w", encoding="utf-8") as f:
f.write(result.get("cleanup_transcript", ""))
# Parse highlights
highlights_text = result.get("extract_highlights", "")
highlights = [h.strip() for h in highlights_text.split("\n") if h.strip()]
return {
"ok": True,
"transcript_file": transcript_file,
"highlights": highlights[:10], # Top 10 highlights
"artifacts": [
{"path": transcript_file, "type": "text", "size_bytes": os.path.getsize(transcript_file)}
],
"warnings": [],
}
except Exception as e:
return {
"ok": False,
"error": {"code": "PROCESSING_ERROR", "message": str(e)},
"transcript_file": None,
"highlights": [],
}
4
Create test_recipe.py
Copy
# test_recipe.py
import pytest
from recipe import run
def test_missing_audio_path():
"""Test error handling for missing audio_path."""
result = run({})
assert result["ok"] is False
assert result["error"]["code"] == "MISSING_INPUT"
def test_file_not_found():
"""Test error handling for non-existent file."""
result = run({"audio_path": "/nonexistent/audio.mp3"})
assert result["ok"] is False
assert result["error"]["code"] == "FILE_NOT_FOUND"
def test_cleanup_levels():
"""Test that all cleanup levels are valid."""
valid_levels = ["light", "medium", "heavy"]
for level in valid_levels:
# Validation test only
assert level in valid_levels
def test_default_values():
"""Test default parameter values."""
# Would need mock for full test
defaults = {
"speaker_labels": True,
"cleanup_level": "medium"
}
assert defaults["speaker_labels"] is True
assert defaults["cleanup_level"] == "medium"
@pytest.mark.integration
def test_end_to_end():
"""Full integration test."""
import os
test_audio = os.environ.get("TEST_AUDIO_PATH")
if not test_audio:
pytest.skip("No test audio available")
result = run({
"audio_path": test_audio,
"speaker_labels": True,
"cleanup_level": "medium"
})
assert result["ok"] is True
assert result["transcript_file"] is not None
Run Locally
Copy
# Basic transcription
praison recipes run podcast-transcription-cleaner \
--input '{"audio_path": "episode.mp3"}'
# Heavy cleanup without speaker labels
praison recipes run podcast-transcription-cleaner \
--input '{"audio_path": "interview.wav", "speaker_labels": false, "cleanup_level": "heavy"}'
Deploy & Integrate: 6 Integration Models
- Model 1: Embedded SDK
- Model 2: CLI Invocation
- Model 3: Plugin Mode
- Model 4: Local HTTP Sidecar
- Model 5: Remote Managed Runner
- Model 6: Event-Driven
When to use: Python applications, podcast platformsDeployment note: Runs in-process with lowest latency.
Copy
from praisonai import recipe
result = recipe.run(
"podcast-transcription-cleaner",
input={
"audio_path": "episode.mp3",
"speaker_labels": True,
"cleanup_level": "medium"
}
)
if result.ok:
print(f"Transcript: {result.output['transcript_file']}")
print(f"Highlights: {result.output['highlights']}")
Safety: May process PII in conversations. Handle transcripts securely.
When to use: Batch processing, automation scriptsDeployment note: Great for CI/CD and batch workflows.
Copy
# Process multiple episodes
for file in episodes/*.mp3; do
praison recipes run podcast-transcription-cleaner \
--input "{\"audio_path\": \"$file\"}" \
--json >> results.jsonl
done
Safety: Validate file paths in scripts.
When to use: Podcast editing software, DAWsDeployment note: Integrate with audio editing workflows.
Copy
class PodcastTranscriptPlugin:
def transcribe(self, audio_path, cleanup="medium"):
from praisonai import recipe
return recipe.run(
"podcast-transcription-cleaner",
input={"audio_path": audio_path, "cleanup_level": cleanup}
)
When to use: Web-based podcast platformsDeployment note: Run alongside your web application.
Copy
praison recipes serve --port 8765
Copy
// Frontend integration
const response = await fetch('http://localhost:8765/recipes/podcast-transcription-cleaner/run', {
method: 'POST',
body: JSON.stringify({ audio_path: '/uploads/episode.mp3' })
});
When to use: SaaS podcast platforms, multi-tenantDeployment note: Implement per-tenant rate limiting.
Copy
response = requests.post(
"https://api.podcast-service.com/transcribe",
headers={"Authorization": f"Bearer {api_key}"},
json={"audio_url": "https://cdn.example.com/episode.mp3"}
)
When to use: Automatic transcription on uploadDeployment note: Process uploads asynchronously.
Copy
# Trigger on S3 upload
def handle_upload(event):
audio_key = event['Records'][0]['s3']['object']['key']
queue.send({
"recipe": "podcast-transcription-cleaner",
"input": {"audio_path": f"s3://bucket/{audio_key}"}
})
Troubleshooting
Speaker labels are inconsistent
Speaker labels are inconsistent
Solution: The diarization works best with clear audio and distinct voices. Try:
- Using higher quality audio
- Reducing background noise
- Setting
cleanup_level: "light"to preserve more context
Transcript missing sections
Transcript missing sections
Solution: Long silences or music may cause gaps. Check:
- Audio file integrity
- Try processing in smaller segments
Next Steps
- Video Caption Generator - Caption video content
- Meeting Minutes Action Items - Extract action items from recordings

