Crawl
Firecrawl can recursively search through a urls subdomains, and gather the content
Firecrawl thoroughly crawls websites, ensuring comprehensive data extraction while bypassing any web blocker mechanisms. Here’s how it works:
-
URL Analysis: Begins with a specified URL, identifying links by looking at the sitemap and then crawling the website. If no sitemap is found, it will crawl the website following the links.
-
Recursive Traversal: Recursively follows each link to uncover all subpages.
-
Content Scraping: Gathers content from every visited page while handling any complexities like JavaScript rendering or rate limits.
-
Result Compilation: Converts collected data into clean markdown or structured output, perfect for LLM processing or any other task.
This method guarantees an exhaustive crawl and data collection from any starting URL.
Crawling
/crawl endpoint
Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.
Installation
Usage
Job ID Response
If you are not using the sdk or prefer to use webhook or a different polling method, you can set the wait_until_done
to false
.
This will return a jobId.
For cURL, /crawl will always return a jobId where you can use to check the status of the crawl.
Check Crawl Job
Used to check the status of a crawl job and get its result.