- URL Analysis: Scans sitemap and crawls website to identify links
- Traversal: Recursively follows links to find all subpages
- Scraping: Extracts content from each page, handling JS and rate limits
- Output: Converts data to clean markdown or structured format
Try it in the Playground
Test crawling in the interactive playground — no code required.
Crawling
/crawl endpoint
Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.By default, the crawler includes the website’s sitemap to discover URLs (
sitemap: "include"). If you set sitemap: "skip", the crawler will only find pages reachable through HTML links starting from the root URL. Assets like PDFs or deeply nested pages that are listed in the sitemap but not directly linked from any HTML page will be missed. For maximum coverage, keep the default sitemap: "include" setting.Installation
Usage
Each page crawled consumes 1 credit. The default crawl
limit is 10,000 pages — set a lower limit to control credit usage (e.g. limit: 100). Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page.Scrape options in crawl
All options from the Scrape endpoint are available in Crawl viascrapeOptions (JS) / scrape_options (Python). These apply to every page the crawler scrapes: formats, proxy, caching, actions, location, tags, etc. See the full list in the Scrape API Reference.
API Response
If you’re using cURL or the starter method, this will return anID to check the status of the crawl.
If you’re using the SDK, see methods below for waiter vs starter behavior.
Check Crawl Job
Used to check the status of a crawl job and get its result.Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the activity logs.
Pages in the crawl results
data array are pages that Firecrawl successfully scraped — even if the target site returned an HTTP error like 404. The metadata.statusCode field shows the HTTP status code from the target site. To retrieve pages that Firecrawl itself failed to scrape (e.g. network errors, timeouts, or robots.txt blocks), use the dedicated Get Crawl Errors endpoint (GET /crawl/{id}/errors).Response Handling
The response varies based on the crawl’s status. For not completed or large responses exceeding 10MB, anext URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the next parameter is absent, it indicates the end of the crawl data.
The skip parameter sets the maximum number of results returned for each chunk of results returned.
The skip and next parameter are only relavent when hitting the api directly.
If you’re using the SDK, we handle this for you and will return all the
results at once.
SDK methods
There are two ways to use the SDK:- Crawl then wait (
crawl):- Waits for the crawl to complete and returns the full response
- Handles pagination automatically
- Recommended for most use cases
- Start then check status (
startCrawl/start_crawl):- Returns immediately with a crawl ID
- Allows manual status checking
- Useful for long-running crawls or custom polling logic
Crawl WebSocket
Firecrawl’s WebSocket-based method,Crawl URL and Watch, enables real-time data extraction and monitoring. Start a crawl with a URL and customize it with options like page limits, allowed domains, and output formats, ideal for immediate data processing needs.
Crawl Webhook
You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they’re scraped instead of waiting for the entire crawl to complete.cURL
Quick Reference
Event Types:crawl.started- When the crawl beginscrawl.page- For each page successfully scrapedcrawl.completed- When the crawl finishescrawl.failed- If the crawl encounters an error
Security: Verifying Webhook Signatures
Every webhook request from Firecrawl includes anX-Firecrawl-Signature header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and hasn’t been tampered with.
How it works:
- Get your webhook secret from the Advanced tab of your account settings
- Extract the signature from the
X-Firecrawl-Signatureheader - Compute HMAC-SHA256 of the raw request body using your secret
- Compare with the signature header using a timing-safe function

