Skip to main content
Firecrawl efficiently crawls websites to extract comprehensive data while handling complex web infrastructure. The process:
  1. URL Analysis: Scans sitemap and crawls website to identify links
  2. Traversal: Recursively follows links to find all subpages
  3. Scraping: Extracts content from each page, handling JS and rate limits
  4. Output: Converts data to clean markdown or structured format
This ensures thorough data collection from any starting URL.

Crawling

/crawl endpoint

Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.
By default - Crawl will ignore sublinks of a page if they aren’t children of the url you provide. So, the website.com/other-parent/blog-1 wouldn’t be returned if you crawled website.com/blogs/. If you want website.com/other-parent/blog-1, use the crawlEntireDomain parameter. To crawl subdomains like blog.website.com when crawling website.com, use the allowSubdomains parameter.

Installation

# pip install firecrawl-py

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

Usage

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

docs = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=10)
print(docs)

Scrape options in crawl

All options from the Scrape endpoint are available in Crawl via scrapeOptions (JS) / scrape_options (Python). These apply to every page the crawler scrapes: formats, proxy, caching, actions, location, tags, etc. See the full list in the Scrape API Reference.
import Firecrawl from '@mendable/firecrawl-js';

const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });

// Crawl with scrape options
const crawlResponse = await firecrawl.crawl('https://example.com', {
  limit: 100,
  scrapeOptions: {
    formats: [
      'markdown',
      {
        type: 'json',
        schema: { type: 'object', properties: { title: { type: 'string' } } },
      },
    ],
    proxy: 'auto',
    maxAge: 600000,
    onlyMainContent: true,
  },
});

API Response

If you’re using cURL or the starter method, this will return an ID to check the status of the crawl.
If you’re using the SDK, see methods below for waiter vs starter behavior.
{
  "success": true,
  "id": "123-456-789",
  "url": "https://api.firecrawl.dev/v2/crawl/123-456-789"
}

Check Crawl Job

Used to check the status of a crawl job and get its result.
Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the activity logs.
status = firecrawl.get_crawl_status("<crawl-id>")
print(status)

Response Handling

The response varies based on the crawl’s status. For not completed or large responses exceeding 10MB, a next URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the next parameter is absent, it indicates the end of the crawl data. The skip parameter sets the maximum number of results returned for each chunk of results returned.
The skip and next parameter are only relavent when hitting the api directly. If you’re using the SDK, we handle this for you and will return all the results at once.
{
  "status": "scraping",
  "total": 36,
  "completed": 10,
  "creditsUsed": 10,
  "expiresAt": "2024-00-00T00:00:00.000Z",
  "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=10",
  "data": [
    {
      "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...",
      "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...",
      "metadata": {
        "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
        "language": "en",
        "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
        "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
        "ogLocaleAlternate": [],
        "statusCode": 200
      }
    },
    ...
  ]
}

SDK methods

There are two ways to use the SDK:
  1. Crawl then wait (crawl):
    • Waits for the crawl to complete and returns the full response
    • Handles pagination automatically
    • Recommended for most use cases
from firecrawl import Firecrawl
from firecrawl.types import ScrapeOptions

firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")

# Crawl a website:
crawl_status = firecrawl.crawl(
  'https://firecrawl.dev', 
  limit=100, 
  scrape_options=ScrapeOptions(formats=['markdown', 'html']),
  poll_interval=30
)
print(crawl_status)
The response includes the crawl status and all scraped data:
success=True
status='completed'
completed=100
total=100
creditsUsed=100
expiresAt=datetime.datetime(2025, 4, 23, 19, 21, 17, tzinfo=TzInfo(UTC))
next=None
data=[
  Document(
    markdown='[Day 7 - Launch Week III.Integrations DayApril 14th to 20th](...',
    metadata={
      'title': '15 Python Web Scraping Projects: From Beginner to Advanced',
      ...
      'scrapeId': '97dcf796-c09b-43c9-b4f7-868a7a5af722',
      'sourceURL': 'https://www.firecrawl.dev/blog/python-web-scraping-projects',
      'url': 'https://www.firecrawl.dev/blog/python-web-scraping-projects',
      'statusCode': 200
    }
  ),
  ...
]
  1. Start then check status (startCrawl/start_crawl):
    • Returns immediately with a crawl ID
    • Allows manual status checking
    • Useful for long-running crawls or custom polling logic
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

job = firecrawl.start_crawl(url="https://docs.firecrawl.dev", limit=10)
print(job)

# Check the status of the crawl
status = firecrawl.get_crawl_status(job.id)
print(status)

Crawl WebSocket

Firecrawl’s WebSocket-based method, Crawl URL and Watch, enables real-time data extraction and monitoring. Start a crawl with a URL and customize it with options like page limits, allowed domains, and output formats, ideal for immediate data processing needs.
import asyncio
from firecrawl import AsyncFirecrawl

async def main():
    firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY")

    # Start a crawl first
    started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5)

    # Watch updates (snapshots) until terminal status
    async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120):
        if snapshot.status == "completed":
            print("DONE", snapshot.status)
            for doc in snapshot.data:
                print("DOC", doc.metadata.source_url if doc.metadata else None)
        elif snapshot.status == "failed":
            print("ERR", snapshot.status)
        else:
            print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total)

asyncio.run(main())

Crawl Webhook

You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they’re scraped instead of waiting for the entire crawl to complete.
cURL
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "limit": 100,
      "webhook": {
        "url": "https://your-domain.com/webhook",
        "metadata": {
          "any_key": "any_value"
        },
        "events": ["started", "page", "completed"]
      }
    }'

Quick Reference

Event Types:
  • crawl.started - When the crawl begins
  • crawl.page - For each page successfully scraped
  • crawl.completed - When the crawl finishes
  • crawl.failed - If the crawl encounters an error
Basic Payload:
{
  "success": true,
  "type": "crawl.page",
  "id": "crawl-job-id",
  "data": [...], // Page data for 'page' events
  "metadata": {}, // Your custom metadata
  "error": null
}

Security: Verifying Webhook Signatures

Every webhook request from Firecrawl includes an X-Firecrawl-Signature header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and hasn’t been tampered with. How it works:
  1. Get your webhook secret from the Advanced tab of your account settings
  2. Extract the signature from the X-Firecrawl-Signature header
  3. Compute HMAC-SHA256 of the raw request body using your secret
  4. Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The X-Firecrawl-Signature header contains the signature in the format: sha256=abc123def456...
For complete implementation examples in JavaScript and Python, see the Webhook Security documentation.

Full Documentation

For comprehensive webhook documentation including detailed event payloads, payload structure, advanced configuration, and troubleshooting, see the Webhooks documentation.