Scrape
curl --request POST \
--url https://api.firecrawl.dev/v0/scrape \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"url": "<string>",
"pageOptions": {
"headers": {},
"includeHtml": false,
"includeRawHtml": false,
"onlyIncludeTags": [
"<string>"
],
"onlyMainContent": false,
"removeTags": [
"<string>"
],
"replaceAllPathsWithAbsolutePaths": false,
"screenshot": false,
"fullPageScreenshot": false,
"waitFor": 0
},
"extractorOptions": {},
"timeout": 30000
}'
{
"success": true,
"data": {
"markdown": "<string>",
"content": "<string>",
"html": "<string>",
"rawHtml": "<string>",
"metadata": {
"title": "<string>",
"description": "<string>",
"language": "<string>",
"sourceURL": "<string>",
"<any other metadata> ": "<string>",
"pageStatusCode": 123,
"pageError": "<string>"
},
"llm_extraction": {},
"warning": "<string>"
}
}
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The URL to scrape
Headers to send with the request. Can be used to send cookies, user-agent, etc.
Include the HTML version of the content on page. Will output a html key in the response.
Include the raw HTML content of the page. Will output a rawHtml key in the response.
Only include tags, classes and ids from the page in the final output. Use comma separated values. Example: 'script, .ad, #footer'
Only return the main content of the page excluding headers, navs, footers, etc.
Tags, classes and ids to remove from the page. Use comma separated values. Example: 'script, .ad, #footer'
Replace all relative paths with absolute paths for images and links
Include a screenshot of the top of the page that you are scraping.
Include a full page screenshot of the page that you are scraping.
Wait x amount of milliseconds for the page to load to fetch content
Options for extraction of structured information from the page content. Note: LLM-based extraction is not performed by default and only occurs when explicitly configured. The 'markdown' mode simply returns the scraped markdown and is the default mode for scraping.
The extraction mode to use. 'markdown': Returns the scraped markdown content, does not perform LLM extraction. 'llm-extraction': Extracts information from the cleaned and parsed content using LLM. 'llm-extraction-from-raw-html': Extracts information directly from the raw HTML using LLM. 'llm-extraction-from-markdown': Extracts information from the markdown content using LLM.
markdown
, llm-extraction
, llm-extraction-from-raw-html
, llm-extraction-from-markdown
A prompt describing what information to extract from the page, applicable for LLM extraction modes.
The schema for the data to be extracted, required only for LLM extraction modes.
Timeout in milliseconds for the request
Response
HTML version of the content on page if includeHtml
is true
Raw HTML content of the page if includeRawHtml
is true
Displayed when using LLM Extraction. Extracted data from the page following the schema defined.
Can be displayed when using LLM Extraction. Warning message will let you know any issues with the extraction.
curl --request POST \
--url https://api.firecrawl.dev/v0/scrape \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"url": "<string>",
"pageOptions": {
"headers": {},
"includeHtml": false,
"includeRawHtml": false,
"onlyIncludeTags": [
"<string>"
],
"onlyMainContent": false,
"removeTags": [
"<string>"
],
"replaceAllPathsWithAbsolutePaths": false,
"screenshot": false,
"fullPageScreenshot": false,
"waitFor": 0
},
"extractorOptions": {},
"timeout": 30000
}'
{
"success": true,
"data": {
"markdown": "<string>",
"content": "<string>",
"html": "<string>",
"rawHtml": "<string>",
"metadata": {
"title": "<string>",
"description": "<string>",
"language": "<string>",
"sourceURL": "<string>",
"<any other metadata> ": "<string>",
"pageStatusCode": 123,
"pageError": "<string>"
},
"llm_extraction": {},
"warning": "<string>"
}
}