Batch Scrape

Scrape multiple URLs and optionally extract information using an LLM

curl --request POST \
  --url https://api.firecrawl.dev/v2/batch/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "urls": [
    "<string>"
  ],
  "webhook": {
    "url": "<string>",
    "headers": {},
    "metadata": {},
    "events": [
      "completed"
    ]
  },
  "maxConcurrency": 123,
  "ignoreInvalidURLs": true,
  "formats": [
    "markdown"
  ],
  "onlyMainContent": true,
  "includeTags": [
    "<string>"
  ],
  "excludeTags": [
    "<string>"
  ],
  "maxAge": 172800000,
  "headers": {},
  "waitFor": 0,
  "mobile": false,
  "skipTlsVerification": true,
  "timeout": 123,
  "parsers": [
    "pdf"
  ],
  "actions": [
    {
      "type": "wait",
      "milliseconds": 2,
      "selector": "#my-element"
    }
  ],
  "location": {
    "country": "US",
    "languages": [
      "en-US"
    ]
  },
  "removeBase64Images": true,
  "blockAds": true,
  "proxy": "auto",
  "storeInCache": true,
  "zeroDataRetention": false
}'

{
  "success": true,
  "id": "<string>",
  "url": "<string>",
  "invalidURLs": [
    "<string>"
  ]
}

POST

batch

scrape

Scrape multiple URLs and optionally extract information using an LLM

curl --request POST \
  --url https://api.firecrawl.dev/v2/batch/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "urls": [
    "<string>"
  ],
  "webhook": {
    "url": "<string>",
    "headers": {},
    "metadata": {},
    "events": [
      "completed"
    ]
  },
  "maxConcurrency": 123,
  "ignoreInvalidURLs": true,
  "formats": [
    "markdown"
  ],
  "onlyMainContent": true,
  "includeTags": [
    "<string>"
  ],
  "excludeTags": [
    "<string>"
  ],
  "maxAge": 172800000,
  "headers": {},
  "waitFor": 0,
  "mobile": false,
  "skipTlsVerification": true,
  "timeout": 123,
  "parsers": [
    "pdf"
  ],
  "actions": [
    {
      "type": "wait",
      "milliseconds": 2,
      "selector": "#my-element"
    }
  ],
  "location": {
    "country": "US",
    "languages": [
      "en-US"
    ]
  },
  "removeBase64Images": true,
  "blockAds": true,
  "proxy": "auto",
  "storeInCache": true,
  "zeroDataRetention": false
}'

{
  "success": true,
  "id": "<string>",
  "url": "<string>",
  "invalidURLs": [
    "<string>"
  ]
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

urls

string<uri>[]

required

The URL to scrape

webhook

object

A webhook specification object.

Show child attributes

maxConcurrency

integer

Maximum number of concurrent scrapes. This parameter allows you to set a concurrency limit for this batch scrape. If not specified, the batch scrape adheres to your team's concurrency limit.

ignoreInvalidURLs

boolean

default:true

If invalid URLs are specified in the urls array, they will be ignored. Instead of them failing the entire request, a batch scrape using the remaining valid URLs will be created, and the invalid URLs will be returned in the invalidURLs field of the response.

formats

Output formats to include in the response. You can specify one or more formats, either as strings (e.g., 'markdown') or as objects with additional options (e.g., { type: 'json', schema: {...} }). Some formats require specific options to be set. Example: ['markdown', { type: 'json', schema: {...} }].

Markdown
Summary
HTML
Raw HTML
Links
Images
Screenshot
JSON
Change Tracking
Branding

Show child attributes

onlyMainContent

boolean

default:true

Only return the main content of the page excluding headers, navs, footers, etc.

includeTags

string[]

Tags to include in the output.

excludeTags

string[]

Tags to exclude from the output.

maxAge

integer

default:172800000

Returns a cached version of the page if it is younger than this age in milliseconds. If a cached version of the page is older than this value, the page will be scraped. If you do not need extremely fresh data, enabling this can speed up your scrapes by 500%. Defaults to 2 days.

headers

object

Headers to send with the request. Can be used to send cookies, user-agent, etc.

waitFor

integer

default:0

Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.

mobile

boolean

default:false

Set to true if you want to emulate scraping from a mobile device. Useful for testing responsive pages and taking mobile screenshots.

skipTlsVerification

boolean

default:true

Skip TLS certificate verification when making requests

timeout

integer

Timeout in milliseconds for the request.

parsers

array

Controls how files are processed during scraping. When "pdf" is included (default), the PDF content is extracted and converted to markdown format, with billing based on the number of pages (1 credit per page). When an empty array is passed, the PDF file is returned in base64 encoding with a flat rate of 1 credit total.

Show child attributes

actions

Actions to perform on the page before grabbing the content

Wait
Screenshot
Click
Write text
Press a key
Scroll
Scrape
Execute JavaScript
Generate PDF

Show child attributes

location

object

Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.

Show child attributes

removeBase64Images

boolean

default:true

Removes all base 64 images from the output, which may be overwhelmingly long. The image's alt text remains in the output, but the URL is replaced with a placeholder.

blockAds

boolean

default:true

Enables ad-blocking and cookie popup blocking.

proxy

enum<string>

default:auto

Specifies the type of proxy to use.

basic: Proxies for scraping sites with none to basic anti-bot solutions. Fast and usually works.
stealth: Stealth proxies for scraping sites with advanced anti-bot solutions. Slower, but more reliable on certain sites. Costs up to 5 credits per request.
auto: Firecrawl will automatically retry scraping with stealth proxies if the basic proxy fails. If the retry with stealth is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed.

If you do not specify a proxy, Firecrawl will default to auto.

Available options:

basic,

stealth,

auto

storeInCache

boolean

default:true

If true, the page will be stored in the Firecrawl index and cache. Setting this to false is useful if your scraping activity may have data protection concerns. Using some parameters associated with sensitive scraping (actions, headers) will force this parameter to be false.

zeroDataRetention

boolean

default:false

If true, this will enable zero data retention for this batch scrape. To enable this feature, please contact [email protected]

Response

Successful response

success

boolean

string

url

string<uri>

invalidURLs

string[] | null

If ignoreInvalidURLs is true, this is an array containing the invalid URLs that were specified in the request. If there were no invalid URLs, this will be an empty array. If ignoreInvalidURLs is false, this field will be undefined.

Scrape

Get Batch Scrape Status

⌘I

Using the API

Scrape Endpoints

Search Endpoints

Map Endpoints

Crawl Endpoints

Extract Endpoints

Account Endpoints

Authorizations

Body

Response