Prerequisites
- Python 3.10 or higher
- PraisonAI Agents package installed
crawl4aipackage installed and set up
Installation
Setup
Built-in Crawl4AI Tool
PraisonAI provides built-incrawl4ai functions that you can use directly:
Available Functions
| Function | Description |
|---|---|
crawl4ai | Async crawl a URL and get markdown |
crawl4ai_many | Crawl multiple URLs concurrently |
crawl4ai_extract | Extract data using CSS selectors |
crawl4ai_llm_extract | Extract data using LLM |
crawl4ai_sync | Synchronous version of crawl4ai |
crawl4ai_extract_sync | Synchronous CSS extraction |
Basic Usage
Simple Crawl
Crawl with Options
Crawl Multiple URLs
Extract with CSS Selectors
Extract with LLM
Using Crawl4AITools Class
For more control, use theCrawl4AITools class directly:
Synchronous Usage
For non-async code, use the sync versions:Schema Reference
CSS Extraction Schema
Field Types
| Type | Description |
|---|---|
text | Extract text content |
attribute | Extract HTML attribute (specify attribute key) |
html | Extract raw HTML |
nested | Single nested object |
list | List of simple items |
nested_list | List of complex objects |
JavaScript Execution
Execute JavaScript before crawling:Wait Conditions
Video Tutorial
Key Points
- Async by default: Use
awaitfor all crawl functions - JavaScript rendering: Full browser support for dynamic content
- CSS extraction: Fast, no-LLM structured data extraction
- LLM extraction: AI-powered extraction for complex content
- Multi-URL: Efficient concurrent crawling

