Batch scrape lets you scrape multiple URLs in a single job. Pass a list of URLs and optional parameters, and Firecrawl processes them concurrently and returns all results together.
- Works like
/crawl but for an explicit list of URLs
- Synchronous and asynchronous modes
- Supports all scrape options including structured extraction
- Configurable concurrency per job
How it works
You can run a batch scrape in two ways:
| Mode | SDK method (JS / Python) | Behavior |
|---|
| Synchronous | batchScrape / batch_scrape | Starts the batch and waits for completion, returning all results |
| Asynchronous | startBatchScrape / start_batch_scrape | Starts the batch and returns a job ID for polling or webhooks |
Basic usage
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
start = firecrawl.start_batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"]) # returns id
job = firecrawl.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)
print(job.status, job.completed, job.total)
Response
Calling batchScrape / batch_scrape returns the full results when the batch completes.
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
Calling startBatchScrape / start_batch_scrape returns a job ID you can track via getBatchScrapeStatus / get_batch_scrape_status, the API endpoint /batch/scrape/{id}, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the activity logs.
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
Concurrency
By default, a batch scrape job uses your team’s full concurrent browser limit (see Rate Limits). You can lower this per job with the maxConcurrency parameter.
For example, maxConcurrency: 50 limits that job to 50 simultaneous scrapes. Setting this value too low on large batches will significantly slow down processing, so only reduce it if you need to leave capacity for other concurrent jobs.
You can use batch scrape to extract structured data from every page in the batch. This is useful when you want the same schema applied to a list of URLs.
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Scrape multiple websites:
batch_scrape_result = firecrawl.batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_result)
# Or, you can use the start method:
batch_scrape_job = firecrawl.start_batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_job)
# You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id)
print(batch_scrape_status)
Response
batchScrape / batch_scrape returns full results:
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"json": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot."
}
},
...
]
}
startBatchScrape / start_batch_scrape returns a job ID:
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
Webhooks
You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This lets you process results immediately instead of waiting for the entire batch to complete.
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
Event types
| Event | Description |
|---|
batch_scrape.started | The batch scrape job has begun |
batch_scrape.page | A single URL was successfully scraped |
batch_scrape.completed | All URLs have been processed |
batch_scrape.failed | The batch scrape encountered an error |
Payload
Each webhook delivery includes a JSON body with the following structure:
{
"success": true,
"type": "batch_scrape.page",
"id": "batch-job-id",
"data": [...],
"metadata": {},
"error": null
}
Verifying webhook signatures
Every webhook request from Firecrawl includes an X-Firecrawl-Signature header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with.
- Get your webhook secret from the Advanced tab of your account settings
- Extract the signature from the
X-Firecrawl-Signature header
- Compute HMAC-SHA256 of the raw request body using your secret
- Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The X-Firecrawl-Signature header contains the signature in the format: sha256=abc123def456...
For complete implementation examples in JavaScript and Python, see the Webhook Security documentation.
For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the Webhooks documentation.