v2の新機能
クロール内容を指示する
サイトマップ制御の改善
sitemap
オプションで次を選べます:
"include"
(デフォルト):サイトマップを使用しつつ、他のページも探索します。"skip"
:サイトマップを完全に無視します。
新しいクロールオプション
crawlEntireDomain
- 子ページだけでなくドメイン全体をクロールmaxDiscoveryDepth
- クロールの深さを制御(maxDepth
を置き換え)
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The base URL to start crawling from
A prompt to use to generate the crawler options (all the parameters below) from natural language. Explicitly set parameters will override the generated equivalents.
URL pathname regex patterns that exclude matching URLs from the crawl. For example, if you set "excludePaths": ["blog/.*"] for the base URL firecrawl.dev, any results matching that pattern will be excluded, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.
URL pathname regex patterns that include matching URLs in the crawl. Only the paths that match the specified patterns will be included in the response. For example, if you set "includePaths": ["blog/.*"] for the base URL firecrawl.dev, only results matching that pattern will be included, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.
Maximum depth to crawl based on discovery order. The root site and sitemapped pages has a discovery depth of 0. For example, if you set it to 1, and you set sitemap: 'skip'
, you will only crawl the entered URL and all URLs that are linked on that page.
Sitemap mode when crawling. If you set it to 'skip', the crawler will ignore the website sitemap and only crawl the entered URL and discover pages from there onwards.
skip
, include
Do not re-scrape the same path with different (or none) query parameters
Maximum number of pages to crawl. Default limit is 10000.
Allows the crawler to follow internal links to sibling or parent URLs, not just child paths.
false: Only crawls deeper (child) URLs. → e.g. /features/feature-1 → /features/feature-1/tips ✅ → Won't follow /pricing or / ❌
true: Crawls any internal links, including siblings and parents. → e.g. /features/feature-1 → /pricing, /, etc. ✅
Use true for broader internal coverage beyond nested paths.
Allows the crawler to follow links to external websites.
Allows the crawler to follow links to subdomains of the main domain.
Delay in seconds between scrapes. This helps respect website rate limits.
Maximum number of concurrent scrapes. This parameter allows you to set a concurrency limit for this crawl. If not specified, the crawl adheres to your team's concurrency limit.
A webhook specification object.
If true, this will enable zero data retention for this crawl. To enable this feature, please contact help@firecrawl.dev