> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI News Crawler

> Crawl AI news from HackerNews, Reddit, arXiv, and GitHub trending

# AI News Crawler

Crawl AI news from multiple sources including HackerNews, Reddit, arXiv, and GitHub trending repositories.

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
flowchart LR
    A[Sources] --> B[AI News Crawler]
    B --> C[Filter & Extract]
    C --> D[Aggregated Articles]
    
    style A fill:#e1f5fe
    style D fill:#c8e6c9
```

## CLI Quickstart

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Install
pip install praisonai praisonai-tools

# Run the crawler
praisonai recipe run ai-news-crawler \
  --input '{"sources": ["hackernews", "reddit", "arxiv"], "max_articles": 20}' \
  --json

# With output directory
praisonai recipe run ai-news-crawler \
  --input-file config.json \
  --out-dir ./output
```

**Output:**

```json theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
{
  "ok": true,
  "run_id": "run_abc123",
  "recipe": "ai-news-crawler",
  "output": {
    "articles": [
      {"title": "...", "url": "...", "source": "hackernews", "score": 100}
    ],
    "total": 20
  }
}
```

## Use in Your App (SDK)

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonai.recipe import run, run_stream

# Basic usage
result = run(
    "ai-news-crawler",
    input={
        "sources": ["hackernews", "reddit", "arxiv"],
        "max_articles": 20,
        "time_window_hours": 24
    }
)

print(f"Crawled {len(result.output['articles'])} articles")

# Direct tool usage
import sys
sys.path.insert(0, 'agent_recipes/templates/ai-news-crawler')
from tools import crawl_hackernews, crawl_reddit_ai, crawl_arxiv

# Crawl HackerNews
hn_articles = crawl_hackernews(max_articles=10, time_window_hours=24)

# Crawl Reddit
reddit_articles = crawl_reddit_ai(subreddits=["MachineLearning", "artificial"])

# Crawl arXiv
arxiv_articles = crawl_arxiv(categories=["cs.AI", "cs.LG"], max_results=10)
```

## Use as HTTP Server

### Start Server

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai serve recipe --port 8080
```

### Invoke via curl

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
curl -X POST http://localhost:8080/v1/recipes/run \
  -H "Content-Type: application/json" \
  -d '{
    "recipe": "ai-news-crawler",
    "input": {
      "sources": ["hackernews", "reddit"],
      "max_articles": 10
    }
  }'
```

## Input Schema

```json theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
{
  "type": "object",
  "properties": {
    "sources": {
      "type": "array",
      "items": {"type": "string"},
      "description": "News sources: hackernews, reddit, arxiv, github, web"
    },
    "max_articles": {
      "type": "integer",
      "default": 50
    },
    "time_window_hours": {
      "type": "integer",
      "default": 24
    },
    "keywords": {
      "type": "array",
      "description": "Filter keywords"
    }
  }
}
```

## Output Schema

```json theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
{
  "articles": [
    {
      "title": "string",
      "url": "string",
      "source": "string",
      "score": "number",
      "timestamp": "string",
      "content": "string"
    }
  ],
  "total": "number",
  "sources_crawled": ["string"]
}
```

## Configuration

| Option              | Type  | Default         | Description                 |
| ------------------- | ----- | --------------- | --------------------------- |
| sources             | array | \["hackernews"] | News sources to crawl       |
| max\_articles       | int   | 50              | Maximum articles per source |
| time\_window\_hours | int   | 24              | Time window for filtering   |
| keywords            | array | \["AI", "ML"]   | Filter keywords             |

## Dependencies

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
pip install requests feedparser beautifulsoup4
```

## Environment Variables

| Variable         | Required | Description           |
| ---------------- | -------- | --------------------- |
| TAVILY\_API\_KEY | Optional | For web search source |

## Related Tools

* [AI News Deduper](/docs/examples/agent-recipes/creator-suite/ai-news-deduper)
* [AI Signal Ranker](/docs/examples/agent-recipes/creator-suite/ai-signal-ranker)
* [AI Brief Generator](/docs/examples/agent-recipes/creator-suite/ai-brief-generator)
