Skip to main content

AI News Deduper

Deduplicate news articles using semantic similarity and cluster them by topic.

CLI Quickstart

# Run deduplication
praisonai recipe run ai-news-deduper \
  --input '{"articles": [...], "similarity_threshold": 0.85}' \
  --json

Use in Your App (SDK)

from praisonai.recipes import run_recipe

# Deduplicate articles
result = run_recipe(
    "ai-news-deduper",
    input={
        "articles": articles_list,
        "similarity_threshold": 0.85,
        "use_semantic": True
    }
)

# Direct tool usage
import sys
sys.path.insert(0, 'agent_recipes/templates/ai-news-deduper')
from tools import deduplicate_articles, cluster_by_topic

# Deduplicate
deduped = deduplicate_articles(articles, similarity_threshold=0.85)

# Cluster by topic
clusters = cluster_by_topic(deduped["deduplicated"], num_clusters=5)

Input Schema

{
  "type": "object",
  "properties": {
    "articles": {
      "type": "array",
      "description": "List of article objects"
    },
    "similarity_threshold": {
      "type": "number",
      "default": 0.85
    },
    "use_semantic": {
      "type": "boolean",
      "default": true
    }
  }
}

Output Schema

{
  "deduplicated": [{"title": "...", "url": "..."}],
  "removed_count": 5,
  "clusters": [
    {"topic": "GPT-5", "articles": [...]}
  ]
}

Configuration

OptionTypeDefaultDescription
similarity_thresholdfloat0.85Similarity threshold for deduplication
use_semanticbooltrueUse semantic similarity (requires OpenAI)
num_clustersint5Number of topic clusters

Dependencies

pip install openai numpy

Environment Variables

VariableRequiredDescription
OPENAI_API_KEYYesFor semantic embeddings