Crawl
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The base URL to start crawling from
Enables the crawler to navigate from a specific URL to previously linked pages. For instance, from 'example.com/product/123' back to 'example.com/product'
Allows the crawler to follow links to external websites.
URL patterns to exclude
Generate alt text for images using LLMs (must have a paid plan)
Ignore the website sitemap when crawling
URL patterns to include
Maximum number of pages to crawl
Maximum depth to crawl relative to the entered URL. A maxDepth of 0 scrapes only the entered URL. A maxDepth of 1 scrapes the entered URL and all pages one level deep. A maxDepth of 2 scrapes the entered URL and all pages up to two levels deep. Higher values follow the same pattern.
The crawling mode to use. Fast mode crawls 4x faster websites without sitemap, but may not be as accurate and shouldn't be used in heavy js-rendered websites.
default
, fast
If true, returns only the URLs as a list on the crawl status. Attention: the return response will be a list of URLs inside the data, not a list of documents.
Include a full page screenshot of the page that you are scraping.
Headers to send with the request. Can be used to send cookies, user-agent, etc.
Include the HTML version of the content on page. Will output a html key in the response.
Include the raw HTML content of the page. Will output a rawHtml key in the response.
Only include tags, classes and ids from the page in the final output. Use comma separated values. Example: 'script, .ad, #footer'
Only return the main content of the page excluding headers, navs, footers, etc.
Tags, classes and ids to remove from the page. Use comma separated values. Example: 'script, .ad, #footer'
Replace all relative paths with absolute paths for images and links
Include a screenshot of the top of the page that you are scraping.
Wait x amount of milliseconds for the page to load to fetch content