Firecrawl can recursively search through a urls subdomains, and gather the content
crawlEntireDomain
parameter. To crawl subdomains like blog.website.com when crawling website.com, use the allowSubdomains
parameter.scrapeOptions
(JS) / scrape_options
(Python). These apply to every page the crawler scrapes: formats, proxy, caching, actions, location, tags, etc. See the full list in the Scrape API Reference.
ID
to check the status of the crawl.
next
URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the next
parameter is absent, it indicates the end of the crawl data.
The skip parameter sets the maximum number of results returned for each chunk of results returned.
crawl
):
startCrawl
/start_crawl
):
Crawl URL and Watch
, enables real-time data extraction and monitoring. Start a crawl with a URL and customize it with options like page limits, allowed domains, and output formats, ideal for immediate data processing needs.
crawl.started
- When the crawl beginscrawl.page
- For each page successfully scrapedcrawl.completed
- When the crawl finishescrawl.failed
- If the crawl encounters an error