Firecrawl allows you to turn entire websites into LLM-ready markdown
/scrape
. Choose what formats you want your output in./map
endpoint for getting most of the URLs of a webpage./crawl/{id}
status.onlyMainContent
is now default to true
./crawl
webhooks and websocket support.Crawl URL and Watch
method.
json
format. To extract structured from a page, you can pass a schema to the endpoint or just provide a prompt.
prompt
to the endpoint. The llm chooses the structure of the data.
webhook
parameter to the /crawl
endpoint. This will send a POST request to the URL you specify when the crawl is started, updated and completed.
The webhook will now trigger for every page crawled and not just the whole result at the end.
crawl.started
- Triggered when the crawl is started.crawl.page
- Triggered for every page crawled.crawl.completed
- Triggered when the crawl is completed to let you know it’s done.crawl.failed
- Triggered when the crawl fails.success
- If the webhook was successful in crawling the page correctly.type
- The type of event that occurred.id
- The ID of the crawl.data
- The data that was scraped (Array). This will only be non empty on crawl.page
and will contain 1 item if the page was scraped successfully. The response is the same as the /scrape
endpoint.error
- If the webhook failed, this will contain the error message.⚠️ Deprecation Notice: V0 endpoints will be deprecated on April 1st, 2025. Please migrate to V1 endpoints before then to ensure uninterrupted service.
/scrape
endpoint has been redesigned for enhanced reliability and ease of use. The structure of the new /scrape
request body is as follows:
/scrape
endpoint in V1.
Parameter | Change | Description |
---|---|---|
onlyIncludeTags | Moved and Renamed | Moved to root level. And renamed to includeTags . |
removeTags | Moved and Renamed | Moved to root level. And renamed to excludeTags . |
onlyMainContent | Moved | Moved to root level. true by default. |
waitFor | Moved | Moved to root level. |
headers | Moved | Moved to root level. |
parsePDF | Moved | Moved to root level. |
extractorOptions | No Change | |
timeout | No Change | |
pageOptions | Removed | No need for pageOptions parameter. The scrape options were moved to root level. |
replaceAllPathsWithAbsolutePaths | Removed | replaceAllPathsWithAbsolutePaths is not needed anymore. Every path is now default to absolute path. |
includeHtml | Removed | add "html" to formats instead. |
includeRawHtml | Removed | add "rawHtml" to formats instead. |
screenshot | Removed | add "screenshot" to formats instead. |
fullPageScreenshot | Removed | add "screenshot@fullPage" to formats instead. |
extractorOptions | Removed | Use "json" format instead with jsonOptions object. |
json
format is described in the llm-extract section.
/crawl
endpoint on v1
. Check out the improved body request below:
/crawl
endpoint in V1.
Parameter | Change | Description |
---|---|---|
pageOptions | Renamed | Renamed to scrapeOptions . |
includes | Moved and Renamed | Moved to root level. Renamed to includePaths . |
excludes | Moved and Renamed | Moved to root level. Renamed to excludePaths . |
allowBackwardCrawling | Moved and Renamed | Moved to root level. Renamed to allowBackwardLinks . |
allowExternalLinks | Moved | Moved to root level. |
maxDepth | Moved | Moved to root level. |
ignoreSitemap | Moved | Moved to root level. |
limit | Moved | Moved to root level. |
crawlerOptions | Removed | No need for crawlerOptions parameter. The crawl options were moved to root level. |
timeout | Removed | Use timeout in scrapeOptions instead. |