Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praison.ai/llms.txt

Use this file to discover all available pages before exploring further.

Firecrawl Tools

Firecrawl provides powerful web scraping, search, crawling, and data extraction capabilities for AI applications.

Installation

npm install firecrawl-aisdk

Environment Variables

FIRECRAWL_API_KEY=fc-your-api-key
Get your API key from Firecrawl.

Available Tools

ToolDescription
scrapeToolScrape a single URL
searchToolSearch the web
mapToolDiscover URLs on a site
crawlToolCrawl multiple pages
batchScrapeToolScrape multiple URLs
extractToolExtract structured data

Quick Start

import { Agent } from 'praisonai';
import { firecrawlScrape, firecrawlCrawl } from 'praisonai/tools';

const agent = new Agent({
  name: 'WebScraper',
  instructions: 'You scrape and analyze web content.',
  tools: [firecrawlScrape(), firecrawlCrawl()],
});

const result = await agent.run('Scrape https://example.com and summarize the content');
console.log(result.text);

Scrape Tool

Scrape a single URL and get clean markdown content.
import { firecrawlScrape } from 'praisonai/tools';

const scrapeTool = firecrawlScrape({
  // Output format
  formats: ['markdown', 'html'],
  
  // Wait for page to load
  waitFor: 1000,
  
  // Include/exclude tags
  includeTags: ['article', 'main'],
  excludeTags: ['nav', 'footer'],
  
  // Screenshot options
  screenshot: true,
});

const agent = new Agent({
  name: 'Scraper',
  tools: [scrapeTool],
});

Crawl Tool

Crawl multiple pages starting from a URL.
import { firecrawlCrawl } from 'praisonai/tools';

const crawlTool = firecrawlCrawl({
  // Maximum pages to crawl
  limit: 10,
  
  // Crawl depth
  maxDepth: 2,
  
  // URL patterns to include/exclude
  includePaths: ['/docs/*', '/blog/*'],
  excludePaths: ['/admin/*'],
  
  // Allow external links
  allowExternalLinks: false,
});

const agent = new Agent({
  name: 'Crawler',
  tools: [crawlTool],
});

Using with AI SDK Directly

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { scrapeTool, searchTool, mapTool, crawlTool } from 'firecrawl-aisdk';

// Scrape a page
const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Scrape https://firecrawl.dev and summarize what it does',
  tools: { scrape: scrapeTool },
});

// Search the web
const { text: searchResult } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Search for Firecrawl and summarize what you find',
  tools: { search: searchTool },
});

// Map a site
const { text: mapResult } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Map https://docs.firecrawl.dev and list the main sections',
  tools: { map: mapTool },
});

Batch Scraping

Scrape multiple URLs efficiently.
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { batchScrapeTool, pollTool } from 'firecrawl-aisdk';

const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Scrape https://firecrawl.dev and https://docs.firecrawl.dev, then compare',
  tools: { 
    batchScrape: batchScrapeTool, 
    poll: pollTool 
  },
});

Extract Structured Data

Extract specific data from web pages.
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { extractTool, pollTool } from 'firecrawl-aisdk';

const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Extract the main features from https://firecrawl.dev',
  tools: { 
    extract: extractTool, 
    poll: pollTool 
  },
});

Response Format

Scrape Result

interface FirecrawlScrapeResult {
  url: string;
  markdown: string;
  html?: string;
  metadata?: {
    title?: string;
    description?: string;
    language?: string;
  };
  screenshot?: string;
}

Crawl Result

interface FirecrawlCrawlResult {
  status: string;
  total: number;
  completed: number;
  data: Array<{
    url: string;
    markdown: string;
    metadata?: object;
  }>;
}

Advanced Example

import { Agent } from 'praisonai';
import { firecrawlScrape, firecrawlCrawl } from 'praisonai/tools';

const agent = new Agent({
  name: 'ContentAnalyzer',
  instructions: `You are a content analyst. 
    1. Scrape the provided URL
    2. Extract key information
    3. Provide a structured summary`,
  tools: [
    firecrawlScrape({ formats: ['markdown'] }),
    firecrawlCrawl({ limit: 5, maxDepth: 1 }),
  ],
});

const result = await agent.run(
  'Analyze the documentation structure of https://docs.firecrawl.dev'
);
console.log(result.text);

Error Handling

import { firecrawlScrape } from 'praisonai/tools';

const tool = firecrawlScrape();

try {
  const result = await tool.execute({ url: 'https://example.com' });
  console.log(result);
} catch (error) {
  if (error.message.includes('FIRECRAWL_API_KEY')) {
    console.error('Missing API key');
  } else if (error.message.includes('rate limit')) {
    console.error('Rate limited - try again later');
  } else {
    console.error('Scrape failed:', error.message);
  }
}

Best Practices

  1. Use appropriate tool - Scrape for single pages, crawl for multiple
  2. Set limits - Always set crawl limits to avoid excessive API usage
  3. Filter content - Use includeTags/excludeTags to get relevant content
  4. Handle async jobs - Use pollTool for crawl and batch operations
  5. Cache results - Store scraped content to avoid repeated requests
  • Tavily - Web search and extraction
  • Exa - Semantic web search
  • Parallel - Token-efficient web search