Using the API
Map Endpoints
Extract Endpoints
Search Endpoints
Account Endpoints
Extract
curl --request POST \
--url https://api.firecrawl.dev/v1/extract \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"<string>"
],
"prompt": "<string>",
"schema": {
"property1": "<string>",
"property2": 123
},
"enableWebSearch": false,
"ignoreSitemap": false,
"includeSubdomains": true,
"showSources": false,
"scrapeOptions": {
"formats": [
"markdown"
],
"onlyMainContent": true,
"includeTags": [
"<string>"
],
"excludeTags": [
"<string>"
],
"headers": {},
"waitFor": 0,
"mobile": false,
"skipTlsVerification": false,
"timeout": 30000,
"jsonOptions": {
"schema": {},
"systemPrompt": "<string>",
"prompt": "<string>"
},
"actions": [
{
"type": "wait",
"milliseconds": 2,
"selector": "#my-element"
}
],
"location": {
"country": "US",
"languages": [
"en-US"
]
},
"removeBase64Images": true,
"blockAds": true,
"proxy": "basic",
"changeTrackingOptions": {
"mode": "git-diff",
"schema": {},
"prompt": "<string>"
}
}
}'
{
"success": true,
"id": "<string>"
}
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The URLs to extract data from. URLs should be in glob format.
Prompt to guide the extraction process
When true, the extraction will use web search to find additional data
When true, sitemap.xml files will be ignored during website scanning
When true, subdomains of the provided URLs will also be scanned
When true, the sources used to extract the data will be included in the response as sources
key
Formats to include in the output.
markdown
, html
, rawHtml
, links
, screenshot
, screenshot@fullPage
, json
, changeTracking
Only return the main content of the page excluding headers, navs, footers, etc.
Tags to include in the output.
Tags to exclude from the output.
Headers to send with the request. Can be used to send cookies, user-agent, etc.
Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.
Set to true if you want to emulate scraping from a mobile device. Useful for testing responsive pages and taking mobile screenshots.
Skip TLS certificate verification when making requests
Timeout in milliseconds for the request
Extract object
Actions to perform on the page before grabbing the content
Wait for a specified amount of milliseconds
wait
Number of milliseconds to wait
x >= 1
Query selector to find the element by
"#my-element"
Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.
ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP')
Preferred languages and locales for the request in order of priority. Defaults to the language of the specified location. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language
Removes all base 64 images from the output, which may be overwhelmingly long. The image's alt text remains in the output, but the URL is replaced with a placeholder.
Enables ad-blocking and cookie popup blocking.
Specifies the type of proxy to use.
- basic: Proxies for scraping sites with none to basic anti-bot solutions. Fast and usually works.
- stealth: Stealth proxies for scraping sites with advanced anti-bot solutions. Slower, but more reliable on certain sites.
If you do not specify a proxy, Firecrawl will automatically attempt to determine which one you need based on the target site.
basic
, stealth
Options for change tracking (Beta). Only applicable when 'changeTracking' is included in formats. The 'markdown' format must also be specified when using change tracking.
The mode to use for change tracking. 'default' performs a basic comparison, 'git-diff' provides a detailed diff, and 'json' compares extracted JSON data.
git-diff
, json
Schema for JSON extraction when using 'json' mode. Defines the structure of data to extract and compare.
Prompt to use for change tracking when using 'json' mode. If not provided, the default prompt will be used.
curl --request POST \
--url https://api.firecrawl.dev/v1/extract \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"<string>"
],
"prompt": "<string>",
"schema": {
"property1": "<string>",
"property2": 123
},
"enableWebSearch": false,
"ignoreSitemap": false,
"includeSubdomains": true,
"showSources": false,
"scrapeOptions": {
"formats": [
"markdown"
],
"onlyMainContent": true,
"includeTags": [
"<string>"
],
"excludeTags": [
"<string>"
],
"headers": {},
"waitFor": 0,
"mobile": false,
"skipTlsVerification": false,
"timeout": 30000,
"jsonOptions": {
"schema": {},
"systemPrompt": "<string>",
"prompt": "<string>"
},
"actions": [
{
"type": "wait",
"milliseconds": 2,
"selector": "#my-element"
}
],
"location": {
"country": "US",
"languages": [
"en-US"
]
},
"removeBase64Images": true,
"blockAds": true,
"proxy": "basic",
"changeTrackingOptions": {
"mode": "git-diff",
"schema": {},
"prompt": "<string>"
}
}
}'
{
"success": true,
"id": "<string>"
}