Skip to main content

Podcast Transcription Cleaner

Transcribe podcast audio with speaker diarization, filler word removal, and intelligent cleanup.

Problem Statement

Who: Podcasters, content creators, transcription services
Why: Raw transcriptions are messy with filler words, overlapping speech, and no speaker identification.

What You’ll Build

A recipe that transcribes audio, identifies speakers, removes filler words, and produces clean, readable transcripts.

Input/Output Contract

InputTypeRequiredDescription
audio_pathstringYesPath to the audio file
speaker_labelsbooleanNoEnable speaker diarization (default: true)
cleanup_levelstringNolight, medium, heavy (default: medium)
OutputTypeDescription
transcript_filestringPath to cleaned transcript
highlightsarrayKey moments and quotes
okbooleanSuccess indicator

Prerequisites

export OPENAI_API_KEY=your_key_here
pip install praisonaiagents

Step-by-Step Build

1

Create Recipe Directory

mkdir -p ~/.praison/templates/podcast-transcription-cleaner
cd ~/.praison/templates/podcast-transcription-cleaner
2

Create TEMPLATE.yaml

name: podcast-transcription-cleaner
version: "1.0.0"
description: "Transcribe and clean podcast audio with speaker labels"
author: "PraisonAI"
license: "MIT"

tags:
  - audio
  - podcast
  - transcription
  - cleanup

requires:
  env:
    - OPENAI_API_KEY
  packages:
    - praisonaiagents

inputs:
  audio_path:
    type: string
    description: "Path to the podcast audio file"
    required: true
  speaker_labels:
    type: boolean
    description: "Enable speaker diarization"
    required: false
    default: true
  cleanup_level:
    type: string
    description: "Level of text cleanup"
    required: false
    default: "medium"
    enum:
      - light
      - medium
      - heavy

outputs:
  transcript_file:
    type: string
    description: "Path to the cleaned transcript"
  highlights:
    type: array
    description: "Key moments and notable quotes"
  ok:
    type: boolean
    description: "Success indicator"

cli:
  command: "praison recipes run podcast-transcription-cleaner"
  examples:
    - 'praison recipes run podcast-transcription-cleaner --input ''{"audio_path": "episode.mp3"}'''
    - 'praison recipes run podcast-transcription-cleaner --input ''{"audio_path": "podcast.wav", "cleanup_level": "heavy"}'''

safety:
  dry_run_default: false
  requires_consent: false
  overwrites_files: true
  network_access: true
  pii_handling: true
3

Create recipe.py

# recipe.py
import os
from pathlib import Path
from praisonaiagents import Agent, Task, PraisonAIAgents

def run(input_data: dict, config: dict = None) -> dict:
    """
    Transcribe and clean podcast audio.
    
    Args:
        input_data: Contains audio_path, speaker_labels, cleanup_level
        config: Optional configuration overrides
        
    Returns:
        Dict with transcript_file, highlights, and ok status
    """
    audio_path = input_data.get("audio_path")
    if not audio_path:
        return {
            "ok": False,
            "error": {"code": "MISSING_INPUT", "message": "audio_path is required"},
            "transcript_file": None,
            "highlights": [],
        }
    
    if not os.path.exists(audio_path):
        return {
            "ok": False,
            "error": {"code": "FILE_NOT_FOUND", "message": f"Audio file not found: {audio_path}"},
            "transcript_file": None,
            "highlights": [],
        }
    
    speaker_labels = input_data.get("speaker_labels", True)
    cleanup_level = input_data.get("cleanup_level", "medium")
    
    try:
        # Define cleanup instructions based on level
        cleanup_instructions = {
            "light": "Remove only obvious filler words (um, uh). Keep natural speech patterns.",
            "medium": "Remove filler words, fix grammar, improve readability while preserving voice.",
            "heavy": "Full editorial cleanup. Remove all filler, fix grammar, restructure for clarity."
        }
        
        # Create transcription agent
        transcriber = Agent(
            name="Podcast Transcriber",
            role="Audio Transcription Specialist",
            goal="Accurately transcribe podcast audio with timestamps",
            instructions="""
            You are an expert podcast transcriptionist.
            - Transcribe speech accurately
            - Note speaker changes
            - Include timestamps at natural breaks
            - Capture tone and emphasis where notable
            """,
        )
        
        # Create speaker identification agent
        diarizer = Agent(
            name="Speaker Identifier",
            role="Speaker Diarization Expert",
            goal="Identify and label different speakers",
            instructions="""
            You are a speaker identification expert.
            - Identify distinct speakers by voice characteristics
            - Assign consistent labels (Speaker 1, Speaker 2, or names if mentioned)
            - Note speaker transitions
            - Handle overlapping speech
            """,
        )
        
        # Create cleanup agent
        cleaner = Agent(
            name="Transcript Editor",
            role="Editorial Specialist",
            goal="Clean and polish transcripts",
            instructions=f"""
            You are a transcript editor.
            Cleanup level: {cleanup_level}
            {cleanup_instructions[cleanup_level]}
            
            - Preserve the speaker's authentic voice
            - Maintain meaning and context
            - Format for readability
            """,
        )
        
        # Create highlights extractor
        highlighter = Agent(
            name="Content Highlighter",
            role="Content Analyst",
            goal="Extract key moments and quotes",
            instructions="""
            You are a content analyst.
            - Identify quotable moments
            - Find key insights and takeaways
            - Note interesting stories or anecdotes
            - Highlight actionable advice
            """,
        )
        
        # Define tasks
        transcribe_task = Task(
            name="transcribe_audio",
            description=f"Transcribe the podcast audio from: {audio_path}",
            expected_output="Raw timestamped transcription",
            agent=transcriber,
        )
        
        tasks = [transcribe_task]
        agents = [transcriber]
        
        if speaker_labels:
            diarize_task = Task(
                name="identify_speakers",
                description="Identify and label speakers in the transcription",
                expected_output="Transcription with speaker labels",
                agent=diarizer,
                context=[transcribe_task],
            )
            tasks.append(diarize_task)
            agents.append(diarizer)
            cleanup_context = [diarize_task]
        else:
            cleanup_context = [transcribe_task]
        
        cleanup_task = Task(
            name="cleanup_transcript",
            description=f"Clean the transcript at {cleanup_level} level",
            expected_output="Cleaned, readable transcript",
            agent=cleaner,
            context=cleanup_context,
        )
        tasks.append(cleanup_task)
        agents.append(cleaner)
        
        highlight_task = Task(
            name="extract_highlights",
            description="Extract key moments and notable quotes",
            expected_output="List of highlights with timestamps",
            agent=highlighter,
            context=[cleanup_task],
        )
        tasks.append(highlight_task)
        agents.append(highlighter)
        
        # Execute
        workflow = PraisonAIAgents(agents=agents, tasks=tasks)
        result = workflow.start()
        
        # Save transcript
        audio_name = Path(audio_path).stem
        transcript_file = f"{audio_name}_transcript.txt"
        
        with open(transcript_file, "w", encoding="utf-8") as f:
            f.write(result.get("cleanup_transcript", ""))
        
        # Parse highlights
        highlights_text = result.get("extract_highlights", "")
        highlights = [h.strip() for h in highlights_text.split("\n") if h.strip()]
        
        return {
            "ok": True,
            "transcript_file": transcript_file,
            "highlights": highlights[:10],  # Top 10 highlights
            "artifacts": [
                {"path": transcript_file, "type": "text", "size_bytes": os.path.getsize(transcript_file)}
            ],
            "warnings": [],
        }
        
    except Exception as e:
        return {
            "ok": False,
            "error": {"code": "PROCESSING_ERROR", "message": str(e)},
            "transcript_file": None,
            "highlights": [],
        }
4

Create test_recipe.py

# test_recipe.py
import pytest
from recipe import run

def test_missing_audio_path():
    """Test error handling for missing audio_path."""
    result = run({})
    assert result["ok"] is False
    assert result["error"]["code"] == "MISSING_INPUT"

def test_file_not_found():
    """Test error handling for non-existent file."""
    result = run({"audio_path": "/nonexistent/audio.mp3"})
    assert result["ok"] is False
    assert result["error"]["code"] == "FILE_NOT_FOUND"

def test_cleanup_levels():
    """Test that all cleanup levels are valid."""
    valid_levels = ["light", "medium", "heavy"]
    for level in valid_levels:
        # Validation test only
        assert level in valid_levels

def test_default_values():
    """Test default parameter values."""
    # Would need mock for full test
    defaults = {
        "speaker_labels": True,
        "cleanup_level": "medium"
    }
    assert defaults["speaker_labels"] is True
    assert defaults["cleanup_level"] == "medium"

@pytest.mark.integration
def test_end_to_end():
    """Full integration test."""
    import os
    test_audio = os.environ.get("TEST_AUDIO_PATH")
    if not test_audio:
        pytest.skip("No test audio available")
    
    result = run({
        "audio_path": test_audio,
        "speaker_labels": True,
        "cleanup_level": "medium"
    })
    
    assert result["ok"] is True
    assert result["transcript_file"] is not None

Run Locally

# Basic transcription
praison recipes run podcast-transcription-cleaner \
  --input '{"audio_path": "episode.mp3"}'

# Heavy cleanup without speaker labels
praison recipes run podcast-transcription-cleaner \
  --input '{"audio_path": "interview.wav", "speaker_labels": false, "cleanup_level": "heavy"}'

Deploy & Integrate: 6 Integration Models

When to use: Python applications, podcast platforms
from praisonai import recipe

result = recipe.run(
    "podcast-transcription-cleaner",
    input={
        "audio_path": "episode.mp3",
        "speaker_labels": True,
        "cleanup_level": "medium"
    }
)

if result.ok:
    print(f"Transcript: {result.output['transcript_file']}")
    print(f"Highlights: {result.output['highlights']}")
Deployment note: Runs in-process with lowest latency.
Safety: May process PII in conversations. Handle transcripts securely.

Troubleshooting

Solution: The diarization works best with clear audio and distinct voices. Try:
  • Using higher quality audio
  • Reducing background noise
  • Setting cleanup_level: "light" to preserve more context
Solution: Long silences or music may cause gaps. Check:
  • Audio file integrity
  • Try processing in smaller segments

Next Steps