Skip to main content

Firecrawl Tools

Firecrawl provides powerful web scraping, search, crawling, and data extraction capabilities for AI applications.

Installation

npm install firecrawl-aisdk

Environment Variables

FIRECRAWL_API_KEY=fc-your-api-key
Get your API key from Firecrawl.

Available Tools

ToolDescription
scrapeToolScrape a single URL
searchToolSearch the web
mapToolDiscover URLs on a site
crawlToolCrawl multiple pages
batchScrapeToolScrape multiple URLs
extractToolExtract structured data

Quick Start

import { Agent } from 'praisonai';
import { firecrawlScrape, firecrawlCrawl } from 'praisonai/tools';

const agent = new Agent({
  name: 'WebScraper',
  instructions: 'You scrape and analyze web content.',
  tools: [firecrawlScrape(), firecrawlCrawl()],
});

const result = await agent.run('Scrape https://example.com and summarize the content');
console.log(result.text);

Scrape Tool

Scrape a single URL and get clean markdown content.
import { firecrawlScrape } from 'praisonai/tools';

const scrapeTool = firecrawlScrape({
  // Output format
  formats: ['markdown', 'html'],
  
  // Wait for page to load
  waitFor: 1000,
  
  // Include/exclude tags
  includeTags: ['article', 'main'],
  excludeTags: ['nav', 'footer'],
  
  // Screenshot options
  screenshot: true,
});

const agent = new Agent({
  name: 'Scraper',
  tools: [scrapeTool],
});

Crawl Tool

Crawl multiple pages starting from a URL.
import { firecrawlCrawl } from 'praisonai/tools';

const crawlTool = firecrawlCrawl({
  // Maximum pages to crawl
  limit: 10,
  
  // Crawl depth
  maxDepth: 2,
  
  // URL patterns to include/exclude
  includePaths: ['/docs/*', '/blog/*'],
  excludePaths: ['/admin/*'],
  
  // Allow external links
  allowExternalLinks: false,
});

const agent = new Agent({
  name: 'Crawler',
  tools: [crawlTool],
});

Using with AI SDK Directly

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { scrapeTool, searchTool, mapTool, crawlTool } from 'firecrawl-aisdk';

// Scrape a page
const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Scrape https://firecrawl.dev and summarize what it does',
  tools: { scrape: scrapeTool },
});

// Search the web
const { text: searchResult } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Search for Firecrawl and summarize what you find',
  tools: { search: searchTool },
});

// Map a site
const { text: mapResult } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Map https://docs.firecrawl.dev and list the main sections',
  tools: { map: mapTool },
});

Batch Scraping

Scrape multiple URLs efficiently.
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { batchScrapeTool, pollTool } from 'firecrawl-aisdk';

const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Scrape https://firecrawl.dev and https://docs.firecrawl.dev, then compare',
  tools: { 
    batchScrape: batchScrapeTool, 
    poll: pollTool 
  },
});

Extract Structured Data

Extract specific data from web pages.
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { extractTool, pollTool } from 'firecrawl-aisdk';

const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Extract the main features from https://firecrawl.dev',
  tools: { 
    extract: extractTool, 
    poll: pollTool 
  },
});

Response Format

Scrape Result

interface FirecrawlScrapeResult {
  url: string;
  markdown: string;
  html?: string;
  metadata?: {
    title?: string;
    description?: string;
    language?: string;
  };
  screenshot?: string;
}

Crawl Result

interface FirecrawlCrawlResult {
  status: string;
  total: number;
  completed: number;
  data: Array<{
    url: string;
    markdown: string;
    metadata?: object;
  }>;
}

Advanced Example

import { Agent } from 'praisonai';
import { firecrawlScrape, firecrawlCrawl } from 'praisonai/tools';

const agent = new Agent({
  name: 'ContentAnalyzer',
  instructions: `You are a content analyst. 
    1. Scrape the provided URL
    2. Extract key information
    3. Provide a structured summary`,
  tools: [
    firecrawlScrape({ formats: ['markdown'] }),
    firecrawlCrawl({ limit: 5, maxDepth: 1 }),
  ],
});

const result = await agent.run(
  'Analyze the documentation structure of https://docs.firecrawl.dev'
);
console.log(result.text);

Error Handling

import { firecrawlScrape } from 'praisonai/tools';

const tool = firecrawlScrape();

try {
  const result = await tool.execute({ url: 'https://example.com' });
  console.log(result);
} catch (error) {
  if (error.message.includes('FIRECRAWL_API_KEY')) {
    console.error('Missing API key');
  } else if (error.message.includes('rate limit')) {
    console.error('Rate limited - try again later');
  } else {
    console.error('Scrape failed:', error.message);
  }
}

Best Practices

  1. Use appropriate tool - Scrape for single pages, crawl for multiple
  2. Set limits - Always set crawl limits to avoid excessive API usage
  3. Filter content - Use includeTags/excludeTags to get relevant content
  4. Handle async jobs - Use pollTool for crawl and batch operations
  5. Cache results - Store scraped content to avoid repeated requests
  • Tavily - Web search and extraction
  • Exa - Semantic web search
  • Parallel - Token-efficient web search