Batch Scrape
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The URL to send the webhook to. This will trigger for batch scrape started (batch_scrape.started), every page scraped (batch_scrape.page) and when the batch scrape is completed (batch_scrape.completed or batch_scrape.failed). The response will be the same as the /scrape
endpoint.
Formats to include in the output.
markdown
, html
, rawHtml
, links
, screenshot
, extract
, screenshot@fullPage
Only return the main content of the page excluding headers, navs, footers, etc.
Tags to include in the output.
Tags to exclude from the output.
Headers to send with the request. Can be used to send cookies, user-agent, etc.
Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.
Set to true if you want to emulate scraping from a mobile device. Useful for testing responsive pages and taking mobile screenshots.
Skip TLS certificate verification when making requests
Timeout in milliseconds for the request
Extract object
Actions to perform on the page before grabbing the content
Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.
Removes all base 64 images from the output, which may be overwhelmingly long. The image's alt text remains in the output, but the URL is replaced with a placeholder.
If invalid URLs are specified in the urls array, they will be ignored. Instead of them failing the entire request, a batch scrape using the remaining valid URLs will be created, and the invalid URLs will be returned in the invalidURLs field of the response.
Response
If ignoreInvalidURLs is true, this is an array containing the invalid URLs that were specified in the request. If there were no invalid URLs, this will be an empty array. If ignoreInvalidURLs is false, this field will be undefined.