POST
/
scrape
Scrape a single URL and optionally extract information using an LLM
curl --request POST \
  --url https://api.firecrawl.dev/v2/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "<string>",
  "formats": [
    "markdown"
  ],
  "onlyMainContent": true,
  "includeTags": [
    "<string>"
  ],
  "excludeTags": [
    "<string>"
  ],
  "maxAge": 172800000,
  "headers": {},
  "waitFor": 0,
  "mobile": false,
  "skipTlsVerification": true,
  "timeout": 123,
  "parsers": [
    "pdf"
  ],
  "actions": [
    {
      "type": "wait",
      "milliseconds": 2,
      "selector": "#my-element"
    }
  ],
  "location": {
    "country": "US",
    "languages": [
      "en-US"
    ]
  },
  "removeBase64Images": true,
  "blockAds": true,
  "proxy": "auto",
  "storeInCache": true,
  "zeroDataRetention": false
}'
{
  "success": true,
  "data": {
    "markdown": "<string>",
    "summary": "<string>",
    "html": "<string>",
    "rawHtml": "<string>",
    "screenshot": "<string>",
    "links": [
      "<string>"
    ],
    "actions": {
      "screenshots": [
        "<string>"
      ],
      "scrapes": [
        {
          "url": "<string>",
          "html": "<string>"
        }
      ],
      "javascriptReturns": [
        {
          "type": "<string>",
          "value": "<any>"
        }
      ],
      "pdfs": [
        "<string>"
      ]
    },
    "metadata": {
      "title": "<string>",
      "description": "<string>",
      "language": "<string>",
      "sourceURL": "<string>",
      "<any other metadata> ": "<string>",
      "statusCode": 123,
      "error": "<string>"
    },
    "warning": "<string>",
    "changeTracking": {
      "previousScrapeAt": "2023-11-07T05:31:56Z",
      "changeStatus": "new",
      "visibility": "visible",
      "diff": "<string>",
      "json": {}
    }
  }
}

v2 有哪些新内容

新的 formats

  • "summary" - 获取页面内容的简洁摘要
  • JSON 提取现在使用对象格式:{ type: "json", prompt, schema }
  • 截图格式现在使用对象格式:{ type: "screenshot", fullPage, quality, viewport }

关键改进

  • 默认更快:请求启用缓存,maxAge 默认为 2 天
  • 更合理的默认设置blockAdsskipTlsVerificationremoveBase64Images 默认开启
  • 更强大的截图选项:通过对象格式对截图参数进行完全控制

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string<uri>
required

The URL to scrape

formats
(Markdown · object | Summary · object | HTML · object | Raw HTML · object | Links · object | Screenshot · object | JSON · object | Change Tracking · object)[]

Output formats to include in the response. You can specify one or more formats, either as strings (e.g., 'markdown') or as objects with additional options (e.g., { type: 'json', schema: {...} }). Some formats require specific options to be set. Example: ['markdown', { type: 'json', schema: {...} }].

onlyMainContent
boolean
default:true

Only return the main content of the page excluding headers, navs, footers, etc.

includeTags
string[]

Tags to include in the output.

excludeTags
string[]

Tags to exclude from the output.

maxAge
integer
default:172800000

Returns a cached version of the page if it is younger than this age in milliseconds. If a cached version of the page is older than this value, the page will be scraped. If you do not need extremely fresh data, enabling this can speed up your scrapes by 500%. Defaults to 2 days.

headers
object

Headers to send with the request. Can be used to send cookies, user-agent, etc.

waitFor
integer
default:0

Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.

mobile
boolean
default:false

Set to true if you want to emulate scraping from a mobile device. Useful for testing responsive pages and taking mobile screenshots.

skipTlsVerification
boolean
default:true

Skip TLS certificate verification when making requests

timeout
integer

Timeout in milliseconds for the request.

parsers
array

Controls how files are processed during scraping. When "pdf" is included (default), the PDF content is extracted and converted to markdown format, with billing based on the number of pages (1 credit per page). When an empty array is passed, the PDF file is returned in base64 encoding with a flat rate of 1 credit total.

actions
(Wait · object | Screenshot · object | Click · object | Write text · object | Press a key · object | Scroll · object | Scrape · object | Execute JavaScript · object | Generate PDF · object)[]

Actions to perform on the page before grabbing the content

location
object

Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.

removeBase64Images
boolean
default:true

Removes all base 64 images from the output, which may be overwhelmingly long. The image's alt text remains in the output, but the URL is replaced with a placeholder.

blockAds
boolean
default:true

Enables ad-blocking and cookie popup blocking.

proxy
enum<string>
default:auto

Specifies the type of proxy to use.

  • basic: Proxies for scraping sites with none to basic anti-bot solutions. Fast and usually works.
  • stealth: Stealth proxies for scraping sites with advanced anti-bot solutions. Slower, but more reliable on certain sites. Costs up to 5 credits per request.
  • auto: Firecrawl will automatically retry scraping with stealth proxies if the basic proxy fails. If the retry with stealth is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed.

If you do not specify a proxy, Firecrawl will default to auto.

Available options:
basic,
stealth,
auto
storeInCache
boolean
default:true

If true, the page will be stored in the Firecrawl index and cache. Setting this to false is useful if your scraping activity may have data protection concerns. Using some parameters associated with sensitive scraping (actions, headers) will force this parameter to be false.

zeroDataRetention
boolean
default:false

If true, this will enable zero data retention for this scrape. To enable this feature, please contact help@firecrawl.dev

Response

Successful response

success
boolean
data
object