Migrating from V0
Firecrawl allows you to turn entire websites into LLM-ready markdown
/scrape endpoint
The updated /scrape
endpoint has been redesigned for enhanced reliability and ease of use. The structure of the new /scrape
request body is as follows:
{
"url": "<string>",
"formats": ["markdown", "html", "rawHtml", "links", "screenshot"],
"includeTags": ["<string>"],
"excludeTags": ["<string>"],
"headers": { "<key>": "<value>" },
"waitFor": 123,
"timeout": 123
}
Formats
You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are:
- Markdown (markdown)
- HTML (html)
- Raw HTML (rawHtml) (with no modifications)
- Screenshot (screenshot or screenshot@fullPage)
- Links (links)
By default, the output will be include only the markdown format.
Details on the new request body
The table below outlines the changes to the request body parameters for the /scrape
endpoint in V1.
Parameter | Change | Description |
---|---|---|
onlyIncludeTags | Moved and Renamed | Moved to root level. And renamed to includeTags . |
removeTags | Moved and Renamed | Moved to root level. And renamed to excludeTags . |
onlyMainContent | Moved | Moved to root level. true by default. |
waitFor | Moved | Moved to root level. |
headers | Moved | Moved to root level. |
parsePDF | Moved | Moved to root level. |
extractorOptions | No Change | |
timeout | No Change | |
pageOptions | Removed | No need for pageOptions parameter. The scrape options were moved to root level. |
replaceAllPathsWithAbsolutePaths | Removed | replaceAllPathsWithAbsolutePaths is not needed anymore. Every path is now default to absolute path. |
includeHtml | Removed | add "html" to formats instead. |
includeRawHtml | Removed | add "rawHtml" to formats instead. |
screenshot | Removed | add "screenshot" to formats instead. |
fullPageScreenshot | Removed | add "screenshot@fullPage" to formats instead. |
extractorOptions | Removed | Use "extract" format instead with extract object. |
The new extract
format is described in the llm-extract section.
/crawl endpoint
We’ve also updated the /crawl
endpoint on v1
. Check out the improved body request below:
{
"url": "<string>",
"excludePaths": ["<string>"],
"includePaths": ["<string>"],
"maxDepth": 2,
"ignoreSitemap": true,
"limit": 10,
"allowBackwardLinks": true,
"allowExternalLinks": true,
"scrapeOptions": {
// same options as in /scrape
"formats": ["markdown", "html", "rawHtml", "screenshot", "links"],
"headers": { "<key>": "<value>" },
"includeTags": ["<string>"],
"excludeTags": ["<string>"],
"onlyMainContent": true,
"waitFor": 123
}
}
Details on the new request body
The table below outlines the changes to the request body parameters for the /crawl
endpoint in V1.
Parameter | Change | Description |
---|---|---|
pageOptions | Renamed | Renamed to scrapeOptions . |
includes | Moved and Renamed | Moved to root level. Renamed to includePaths . |
excludes | Moved and Renamed | Moved to root level. Renamed to excludePaths . |
allowBackwardCrawling | Moved and Renamed | Moved to root level. Renamed to allowBackwardLinks . |
allowExternalLinks | Moved | Moved to root level. |
maxDepth | Moved | Moved to root level. |
ignoreSitemap | Moved | Moved to root level. |
limit | Moved | Moved to root level. |
crawlerOptions | Removed | No need for crawlerOptions parameter. The crawl options were moved to root level. |
timeout | Removed | Use timeout in scrapeOptions instead. |