> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Podcast Transcription Cleaner

> Transcribe audio with speaker labels and intelligent cleanup for podcast content

# Podcast Transcription Cleaner

Transcribe podcast audio with speaker diarization, filler word removal, and intelligent cleanup.

## Problem Statement

**Who:** Podcasters, content creators, transcription services\
**Why:** Raw transcriptions are messy with filler words, overlapping speech, and no speaker identification.

## What You'll Build

A recipe that transcribes audio, identifies speakers, removes filler words, and produces clean, readable transcripts.

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
graph LR
    Input[🎙️ Audio File] --> Transcribe[Transcribe]
    Transcribe --> Diarize[Speaker Labels]
    Diarize --> Cleanup[Cleanup Text]
    Cleanup --> Output[📄 Clean Transcript]

    classDef input fill:#8B0000,stroke:#7C90A0,color:#fff
    classDef process fill:#189AB4,stroke:#7C90A0,color:#fff

    class Input,Output input
    class Transcribe,Diarize,Cleanup process
```

### Input/Output Contract

| Input            | Type    | Required | Description                                    |
| ---------------- | ------- | -------- | ---------------------------------------------- |
| `audio_path`     | string  | Yes      | Path to the audio file                         |
| `speaker_labels` | boolean | No       | Enable speaker diarization (default: true)     |
| `cleanup_level`  | string  | No       | `light`, `medium`, `heavy` (default: `medium`) |

| Output            | Type    | Description                |
| ----------------- | ------- | -------------------------- |
| `transcript_file` | string  | Path to cleaned transcript |
| `highlights`      | array   | Key moments and quotes     |
| `ok`              | boolean | Success indicator          |

## Prerequisites

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
export OPENAI_API_KEY=your_key_here
pip install praisonaiagents
```

## Step-by-Step Build

<Steps>
  <Step title="Create Recipe Directory">
    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    mkdir -p ~/.praisonai/templates/podcast-transcription-cleaner
    cd ~/.praisonai/templates/podcast-transcription-cleaner
    ```
  </Step>

  <Step title="Create TEMPLATE.yaml">
    ```yaml theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    name: podcast-transcription-cleaner
    version: "1.0.0"
    description: "Transcribe and clean podcast audio with speaker labels"
    author: "PraisonAI"
    license: "MIT"

    tags:
      - audio
      - podcast
      - transcription
      - cleanup

    requires:
      env:
        - OPENAI_API_KEY
      packages:
        - praisonaiagents

    inputs:
      audio_path:
        type: string
        description: "Path to the podcast audio file"
        required: true
      speaker_labels:
        type: boolean
        description: "Enable speaker diarization"
        required: false
        default: true
      cleanup_level:
        type: string
        description: "Level of text cleanup"
        required: false
        default: "medium"
        enum:
          - light
          - medium
          - heavy

    outputs:
      transcript_file:
        type: string
        description: "Path to the cleaned transcript"
      highlights:
        type: array
        description: "Key moments and notable quotes"
      ok:
        type: boolean
        description: "Success indicator"

    cli:
      command: "praison recipes run podcast-transcription-cleaner"
      examples:
        - 'praison recipes run podcast-transcription-cleaner --input ''{"audio_path": "episode.mp3"}'''
        - 'praison recipes run podcast-transcription-cleaner --input ''{"audio_path": "podcast.wav", "cleanup_level": "heavy"}'''

    safety:
      dry_run_default: false
      requires_consent: false
      overwrites_files: true
      network_access: true
      pii_handling: true
    ```
  </Step>

  <Step title="Create recipe.py">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # recipe.py
    import os
    from pathlib import Path
    from praisonaiagents import Agent, Task, AgentTeam

    def run(input_data: dict, config: dict = None) -> dict:
        """
        Transcribe and clean podcast audio.
        
        Args:
            input_data: Contains audio_path, speaker_labels, cleanup_level
            config: Optional configuration overrides
            
        Returns:
            Dict with transcript_file, highlights, and ok status
        """
        audio_path = input_data.get("audio_path")
        if not audio_path:
            return {
                "ok": False,
                "error": {"code": "MISSING_INPUT", "message": "audio_path is required"},
                "transcript_file": None,
                "highlights": [],
            }
        
        if not os.path.exists(audio_path):
            return {
                "ok": False,
                "error": {"code": "FILE_NOT_FOUND", "message": f"Audio file not found: {audio_path}"},
                "transcript_file": None,
                "highlights": [],
            }
        
        speaker_labels = input_data.get("speaker_labels", True)
        cleanup_level = input_data.get("cleanup_level", "medium")
        
        try:
            # Define cleanup instructions based on level
            cleanup_instructions = {
                "light": "Remove only obvious filler words (um, uh). Keep natural speech patterns.",
                "medium": "Remove filler words, fix grammar, improve readability while preserving voice.",
                "heavy": "Full editorial cleanup. Remove all filler, fix grammar, restructure for clarity."
            }
            
            # Create transcription agent
            transcriber = Agent(
                name="Podcast Transcriber",
                role="Audio Transcription Specialist",
                goal="Accurately transcribe podcast audio with timestamps",
                instructions="""
                You are an expert podcast transcriptionist.
                - Transcribe speech accurately
                - Note speaker changes
                - Include timestamps at natural breaks
                - Capture tone and emphasis where notable
                """,
            )
            
            # Create speaker identification agent
            diarizer = Agent(
                name="Speaker Identifier",
                role="Speaker Diarization Expert",
                goal="Identify and label different speakers",
                instructions="""
                You are a speaker identification expert.
                - Identify distinct speakers by voice characteristics
                - Assign consistent labels (Speaker 1, Speaker 2, or names if mentioned)
                - Note speaker transitions
                - Handle overlapping speech
                """,
            )
            
            # Create cleanup agent
            cleaner = Agent(
                name="Transcript Editor",
                role="Editorial Specialist",
                goal="Clean and polish transcripts",
                instructions=f"""
                You are a transcript editor.
                Cleanup level: {cleanup_level}
                {cleanup_instructions[cleanup_level]}
                
                - Preserve the speaker's authentic voice
                - Maintain meaning and context
                - Format for readability
                """,
            )
            
            # Create highlights extractor
            highlighter = Agent(
                name="Content Highlighter",
                role="Content Analyst",
                goal="Extract key moments and quotes",
                instructions="""
                You are a content analyst.
                - Identify quotable moments
                - Find key insights and takeaways
                - Note interesting stories or anecdotes
                - Highlight actionable advice
                """,
            )
            
            # Define tasks
            transcribe_task = Task(
                name="transcribe_audio",
                description=f"Transcribe the podcast audio from: {audio_path}",
                expected_output="Raw timestamped transcription",
                agent=transcriber,
            )
            
            tasks = [transcribe_task]
            agents = [transcriber]
            
            if speaker_labels:
                diarize_task = Task(
                    name="identify_speakers",
                    description="Identify and label speakers in the transcription",
                    expected_output="Transcription with speaker labels",
                    agent=diarizer,
                    context=[transcribe_task],
                )
                tasks.append(diarize_task)
                agents.append(diarizer)
                cleanup_context = [diarize_task]
            else:
                cleanup_context = [transcribe_task]
            
            cleanup_task = Task(
                name="cleanup_transcript",
                description=f"Clean the transcript at {cleanup_level} level",
                expected_output="Cleaned, readable transcript",
                agent=cleaner,
                context=cleanup_context,
            )
            tasks.append(cleanup_task)
            agents.append(cleaner)
            
            highlight_task = Task(
                name="extract_highlights",
                description="Extract key moments and notable quotes",
                expected_output="List of highlights with timestamps",
                agent=highlighter,
                context=[cleanup_task],
            )
            tasks.append(highlight_task)
            agents.append(highlighter)
            
            # Execute
            workflow = AgentTeam(agents=agents, tasks=tasks)
            result = workflow.start()
            
            # Save transcript
            audio_name = Path(audio_path).stem
            transcript_file = f"{audio_name}_transcript.txt"
            
            with open(transcript_file, "w", encoding="utf-8") as f:
                f.write(result.get("cleanup_transcript", ""))
            
            # Parse highlights
            highlights_text = result.get("extract_highlights", "")
            highlights = [h.strip() for h in highlights_text.split("\n") if h.strip()]
            
            return {
                "ok": True,
                "transcript_file": transcript_file,
                "highlights": highlights[:10],  # Top 10 highlights
                "artifacts": [
                    {"path": transcript_file, "type": "text", "size_bytes": os.path.getsize(transcript_file)}
                ],
                "warnings": [],
            }
            
        except Exception as e:
            return {
                "ok": False,
                "error": {"code": "PROCESSING_ERROR", "message": str(e)},
                "transcript_file": None,
                "highlights": [],
            }
    ```
  </Step>

  <Step title="Create test_recipe.py">
    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # test_recipe.py
    import pytest
    from recipe import run

    def test_missing_audio_path():
        """Test error handling for missing audio_path."""
        result = run({})
        assert result["ok"] is False
        assert result["error"]["code"] == "MISSING_INPUT"

    def test_file_not_found():
        """Test error handling for non-existent file."""
        result = run({"audio_path": "/nonexistent/audio.mp3"})
        assert result["ok"] is False
        assert result["error"]["code"] == "FILE_NOT_FOUND"

    def test_cleanup_levels():
        """Test that all cleanup levels are valid."""
        valid_levels = ["light", "medium", "heavy"]
        for level in valid_levels:
            # Validation test only
            assert level in valid_levels

    def test_default_values():
        """Test default parameter values."""
        # Would need mock for full test
        defaults = {
            "speaker_labels": True,
            "cleanup_level": "medium"
        }
        assert defaults["speaker_labels"] is True
        assert defaults["cleanup_level"] == "medium"

    @pytest.mark.integration
    def test_end_to_end():
        """Full integration test."""
        import os
        test_audio = os.environ.get("TEST_AUDIO_PATH")
        if not test_audio:
            pytest.skip("No test audio available")
        
        result = run({
            "audio_path": test_audio,
            "speaker_labels": True,
            "cleanup_level": "medium"
        })
        
        assert result["ok"] is True
        assert result["transcript_file"] is not None
    ```
  </Step>
</Steps>

## Run Locally

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Basic transcription
praison recipes run podcast-transcription-cleaner \
  --input '{"audio_path": "episode.mp3"}'

# Heavy cleanup without speaker labels
praison recipes run podcast-transcription-cleaner \
  --input '{"audio_path": "interview.wav", "speaker_labels": false, "cleanup_level": "heavy"}'
```

## Deploy & Integrate: 6 Integration Models

<Tabs>
  <Tab title="Model 1: Embedded SDK">
    **When to use:** Python applications, podcast platforms

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    from praisonai import recipe

    result = recipe.run(
        "podcast-transcription-cleaner",
        input={
            "audio_path": "episode.mp3",
            "speaker_labels": True,
            "cleanup_level": "medium"
        }
    )

    if result.ok:
        print(f"Transcript: {result.output['transcript_file']}")
        print(f"Highlights: {result.output['highlights']}")
    ```

    **Deployment note:** Runs in-process with lowest latency.

    <Warning>**Safety:** May process PII in conversations. Handle transcripts securely.</Warning>
  </Tab>

  <Tab title="Model 2: CLI Invocation">
    **When to use:** Batch processing, automation scripts

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Process multiple episodes
    for file in episodes/*.mp3; do
      praison recipes run podcast-transcription-cleaner \
        --input "{\"audio_path\": \"$file\"}" \
        --json >> results.jsonl
    done
    ```

    **Deployment note:** Great for CI/CD and batch workflows.

    <Warning>**Safety:** Validate file paths in scripts.</Warning>
  </Tab>

  <Tab title="Model 3: Plugin Mode">
    **When to use:** Podcast editing software, DAWs

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    class PodcastTranscriptPlugin:
        def transcribe(self, audio_path, cleanup="medium"):
            from praisonai import recipe
            return recipe.run(
                "podcast-transcription-cleaner",
                input={"audio_path": audio_path, "cleanup_level": cleanup}
            )
    ```

    **Deployment note:** Integrate with audio editing workflows.
  </Tab>

  <Tab title="Model 4: Local HTTP Sidecar">
    **When to use:** Web-based podcast platforms

    ```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    praison recipes serve --port 8765
    ```

    ```javascript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    // Frontend integration
    const response = await fetch('http://localhost:8765/recipes/podcast-transcription-cleaner/run', {
      method: 'POST',
      body: JSON.stringify({ audio_path: '/uploads/episode.mp3' })
    });
    ```

    **Deployment note:** Run alongside your web application.
  </Tab>

  <Tab title="Model 5: Remote Managed Runner">
    **When to use:** SaaS podcast platforms, multi-tenant

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    response = requests.post(
        "https://api.podcast-service.com/transcribe",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"audio_url": "https://cdn.example.com/episode.mp3"}
    )
    ```

    **Deployment note:** Implement per-tenant rate limiting.
  </Tab>

  <Tab title="Model 6: Event-Driven">
    **When to use:** Automatic transcription on upload

    ```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
    # Trigger on S3 upload
    def handle_upload(event):
        import queue as q
        job_queue = q.Queue()  # Replace with SQS/RabbitMQ in production
        audio_key = event['Records'][0]['s3']['object']['key']
        
        job_queue.put({
            "recipe": "podcast-transcription-cleaner",
            "input": {"audio_path": f"s3://bucket/{audio_key}"}
        })
    ```

    **Deployment note:** Process uploads asynchronously.
  </Tab>
</Tabs>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Speaker labels are inconsistent">
    **Solution:** The diarization works best with clear audio and distinct voices. Try:

    * Using higher quality audio
    * Reducing background noise
    * Setting `cleanup_level: "light"` to preserve more context
  </Accordion>

  <Accordion title="Transcript missing sections">
    **Solution:** Long silences or music may cause gaps. Check:

    * Audio file integrity
    * Try processing in smaller segments
  </Accordion>
</AccordionGroup>

## Next Steps

* **[Video Caption Generator](/docs/examples/recipe-examples/video-caption-generator)** - Caption video content
* **[Meeting Minutes Action Items](/docs/examples/recipe-examples/meeting-minutes-action-items)** - Extract action items from recordings
