- Output Formats for
/scrape. Choose what formats you want your output in. - New
/mapendpoint for getting most of the URLs of a webpage. - Developer friendly API for
/crawl/{id}status. - 2x Rate Limits for all plans.
- Go SDK and Rust SDK
- Teams support
- API Key Management in the dashboard.
onlyMainContentis now default totrue./crawlwebhooks and websocket support.
Scrape Formats
You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are:- Markdown (markdown)
- HTML (html)
- Raw HTML (rawHtml) (with no modifications)
- Screenshot (screenshot or screenshot@fullPage)
- Links (links)
- JSON (json) - structured output
Response
SDKs will return the data object directly. cURL will return the payload exactly as shown below.Introducing /map (Alpha)
The easiest way to go from a single url to a map of the entire website.Usage
Response
SDKs will return the data object directly. cURL will return the payload exactly as shown below.WebSockets
To crawl a website with WebSockets, use theCrawl URL and Watch method.
JSON format
LLM extraction is now available in v1 under thejson format. To extract structured from a page, you can pass a schema to the endpoint or just provide a prompt.
JSON
Extracting without schema (New)
You can now extract without a schema by just passing aprompt to the endpoint. The llm chooses the structure of the data.
JSON
New Crawl Webhook
You can now pass awebhook parameter to the /crawl endpoint. This will send a POST request to the URL you specify when the crawl is started, updated and completed.
The webhook will now trigger for every page crawled and not just the whole result at the end.
cURL
Webhook Events
There are now 4 types of events:crawl.started- Triggered when the crawl is started.crawl.page- Triggered for every page crawled.crawl.completed- Triggered when the crawl is completed to let you know it’s done.crawl.failed- Triggered when the crawl fails.
Webhook Response
success- If the webhook was successful in crawling the page correctly.type- The type of event that occurred.id- The ID of the crawl.data- The data that was scraped (Array). This will only be non empty oncrawl.pageand will contain 1 item if the page was scraped successfully. The response is the same as the/scrapeendpoint.error- If the webhook failed, this will contain the error message.
Migrating from V0
⚠️ Deprecation Notice: V0 endpoints will be deprecated on April 1st, 2025. Please migrate to V1 endpoints before then to ensure uninterrupted service.
/scrape endpoint
The updated/scrape endpoint has been redesigned for enhanced reliability and ease of use. The structure of the new /scrape request body is as follows:
Formats
You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are:- Markdown (markdown)
- HTML (html)
- Raw HTML (rawHtml) (with no modifications)
- Screenshot (screenshot or screenshot@fullPage)
- Links (links)
- JSON (json)
Details on the new request body
The table below outlines the changes to the request body parameters for the/scrape endpoint in V1.
| Parameter | Change | Description |
|---|---|---|
onlyIncludeTags | Moved and Renamed | Moved to root level. And renamed to includeTags. |
removeTags | Moved and Renamed | Moved to root level. And renamed to excludeTags. |
onlyMainContent | Moved | Moved to root level. true by default. |
waitFor | Moved | Moved to root level. |
headers | Moved | Moved to root level. |
parsePDF | Moved | Moved to root level. |
extractorOptions | No Change | |
timeout | No Change | |
pageOptions | Removed | No need for pageOptions parameter. The scrape options were moved to root level. |
replaceAllPathsWithAbsolutePaths | Removed | replaceAllPathsWithAbsolutePaths is not needed anymore. Every path is now default to absolute path. |
includeHtml | Removed | add "html" to formats instead. |
includeRawHtml | Removed | add "rawHtml" to formats instead. |
screenshot | Removed | add "screenshot" to formats instead. |
fullPageScreenshot | Removed | add "screenshot@fullPage" to formats instead. |
extractorOptions | Removed | Use "json" format instead with jsonOptions object. |
json format is described in the llm-extract section.
/crawl endpoint
We’ve also updated the/crawl endpoint on v1. Check out the improved body request below:
Details on the new request body
The table below outlines the changes to the request body parameters for the/crawl endpoint in V1.
| Parameter | Change | Description |
|---|---|---|
pageOptions | Renamed | Renamed to scrapeOptions. |
includes | Moved and Renamed | Moved to root level. Renamed to includePaths. |
excludes | Moved and Renamed | Moved to root level. Renamed to excludePaths. |
allowBackwardCrawling | Moved and Renamed | Moved to root level. Renamed to allowBackwardLinks. |
allowExternalLinks | Moved | Moved to root level. |
maxDepth | Moved | Moved to root level. |
ignoreSitemap | Moved | Moved to root level. |
limit | Moved | Moved to root level. |
crawlerOptions | Removed | No need for crawlerOptions parameter. The crawl options were moved to root level. |
timeout | Removed | Use timeout in scrapeOptions instead. |

