Crawl
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
The base URL to start crawling from
URL patterns to include
URL patterns to exclude
Generate alt text for images using LLMs (must have a paid plan)
If true, returns only the URLs as a list on the crawl status. Attention: the return response will be a list of URLs inside the data, not a list of documents.
Maximum depth to crawl relative to the entered URL. A maxDepth of 0 scrapes only the entered URL. A maxDepth of 1 scrapes the entered URL and all pages one level deep. A maxDepth of 2 scrapes the entered URL and all pages up to two levels deep. Higher values follow the same pattern.
The crawling mode to use. Fast mode crawls 4x faster websites without sitemap, but may not be as accurate and shouldn't be used in heavy js-rendered websites.
Ignore the website sitemap when crawling
Maximum number of pages to crawl
Enables the crawler to navigate from a specific URL to previously linked pages. For instance, from 'example.com/product/123' back to 'example.com/product'
Allows the crawler to follow links to external websites.
Headers to send with the request. Can be used to send cookies, user-agent, etc.
Include the HTML version of the content on page. Will output a html key in the response.
Include the raw HTML content of the page. Will output a rawHtml key in the response.
Only include tags, classes and ids from the page in the final output. Use comma separated values. Example: 'script, .ad, #footer'
Only return the main content of the page excluding headers, navs, footers, etc.
Tags, classes and ids to remove from the page. Use comma separated values. Example: 'script, .ad, #footer'
Replace all relative paths with absolute paths for images and links
Include a screenshot of the top of the page that you are scraping.
Include a full page screenshot of the page that you are scraping.
Wait x amount of milliseconds for the page to load to fetch content
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The base URL to start crawling from