POST
/
extract
Extract structured data from pages using LLMs
curl --request POST \
  --url https://api.firecrawl.dev/v2/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "urls": [
    "<string>"
  ],
  "prompt": "<string>",
  "schema": {},
  "enableWebSearch": false,
  "ignoreSitemap": false,
  "includeSubdomains": true,
  "showSources": false,
  "scrapeOptions": {
    "formats": [
      "markdown"
    ],
    "onlyMainContent": true,
    "includeTags": [
      "<string>"
    ],
    "excludeTags": [
      "<string>"
    ],
    "maxAge": 172800000,
    "headers": {},
    "waitFor": 0,
    "mobile": false,
    "skipTlsVerification": true,
    "timeout": 123,
    "parsers": [
      "pdf"
    ],
    "actions": [
      {
        "type": "wait",
        "milliseconds": 2,
        "selector": "#my-element"
      }
    ],
    "location": {
      "country": "US",
      "languages": [
        "en-US"
      ]
    },
    "removeBase64Images": true,
    "blockAds": true,
    "proxy": "auto",
    "storeInCache": true
  },
  "ignoreInvalidURLs": true
}'
{
  "success": true,
  "id": "<string>",
  "invalidURLs": [
    "<string>"
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
urls
string<uri>[]
required

The URLs to extract data from. URLs should be in glob format.

prompt
string

Prompt to guide the extraction process

schema
object

Schema to define the structure of the extracted data. Must conform to JSON Schema.

When true, the extraction will use web search to find additional data

ignoreSitemap
boolean
default:false

When true, sitemap.xml files will be ignored during website scanning

includeSubdomains
boolean
default:true

When true, subdomains of the provided URLs will also be scanned

showSources
boolean
default:false

When true, the sources used to extract the data will be included in the response as sources key

scrapeOptions
object
ignoreInvalidURLs
boolean
default:true

If invalid URLs are specified in the urls array, they will be ignored. Instead of them failing the entire request, an extract using the remaining valid URLs will be performed, and the invalid URLs will be returned in the invalidURLs field of the response.

Response

Successful extraction

success
boolean
id
string
invalidURLs
string[] | null

If ignoreInvalidURLs is true, this is an array containing the invalid URLs that were specified in the request. If there were no invalid URLs, this will be an empty array. If ignoreInvalidURLs is false, this field will be undefined.