> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI News Deduper

> Deduplicate and cluster news articles by topic

# AI News Deduper

Deduplicate news articles using semantic similarity and cluster them by topic.

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
flowchart LR
    A[Raw Articles] --> B[AI News Deduper]
    B --> C[Semantic Analysis]
    C --> D[Deduplicated & Clustered]
    
    style A fill:#e1f5fe
    style D fill:#c8e6c9
```

## CLI Quickstart

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Run deduplication
praisonai recipe run ai-news-deduper \
  --input '{"articles": [...], "similarity_threshold": 0.85}' \
  --json
```

## Use in Your App (SDK)

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonai.recipe import run, run_stream

# Deduplicate articles
result = run(
    "ai-news-deduper",
    input={
        "articles": articles_list,
        "similarity_threshold": 0.85,
        "use_semantic": True
    }
)

# Direct tool usage
import sys
sys.path.insert(0, 'agent_recipes/templates/ai-news-deduper')
from tools import deduplicate_articles, cluster_by_topic

# Deduplicate
deduped = deduplicate_articles(articles, similarity_threshold=0.85)

# Cluster by topic
clusters = cluster_by_topic(deduped["deduplicated"], num_clusters=5)
```

## Input Schema

```json theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
{
  "type": "object",
  "properties": {
    "articles": {
      "type": "array",
      "description": "List of article objects"
    },
    "similarity_threshold": {
      "type": "number",
      "default": 0.85
    },
    "use_semantic": {
      "type": "boolean",
      "default": true
    }
  }
}
```

## Output Schema

```json theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
{
  "deduplicated": [{"title": "...", "url": "..."}],
  "removed_count": 5,
  "clusters": [
    {"topic": "GPT-5", "articles": [...]}
  ]
}
```

## Configuration

| Option                | Type  | Default | Description                               |
| --------------------- | ----- | ------- | ----------------------------------------- |
| similarity\_threshold | float | 0.85    | Similarity threshold for deduplication    |
| use\_semantic         | bool  | true    | Use semantic similarity (requires OpenAI) |
| num\_clusters         | int   | 5       | Number of topic clusters                  |

## Dependencies

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
pip install openai numpy
```

## Environment Variables

| Variable         | Required | Description             |
| ---------------- | -------- | ----------------------- |
| OPENAI\_API\_KEY | Yes      | For semantic embeddings |

## Related Tools

* [AI News Crawler](/docs/examples/agent-recipes/creator-suite/ai-news-crawler)
* [AI Signal Ranker](/docs/examples/agent-recipes/creator-suite/ai-signal-ranker)
