Basic scraping with Firecrawl (/scrape)
To scrape a single page and get clean markdown content, you can use the/scrape endpoint.
Scraping pdfs
Firecrawl supports scraping pdfs by default. You can use the/scrape endpoint to scrape a pdf link and get the text content of the pdf. You can disable this by setting pageOptions.parsePDF to false.
Page Options
When using the/scrape endpoint, you can customize the scraping behavior with the pageOptions parameter. Here are the available options:
Getting cleaner content with onlyMainContent
- Type:
boolean - Description: Only return the main content of the page, excluding headers, navigation bars, footers, etc.
- Default:
false
Getting the HTML with includeHtml
- Type:
boolean - Description: Include the HTML version content of the page. This will add an
htmlkey in the response. - Default:
false
Getting the raw HTML with includeRawHtml
- Type:
boolean - Description: Include the raw HTML content of the page. This will add an
rawHtmlkey in the response. - Default:
false
Getting a screenshot of the page with screenshot
- Type:
boolean - Decription: Include a screenshot of the top of the page that you are scraping.
- Default:
false
Waiting for the page to load with waitFor
- Type:
integer - Description: To be used only as a last resort. Wait for a specified amount of milliseconds for the page to load before fetching content.
- Default:
0
Example Usage
- Return only the main content of the page.
- Include the raw HTML content in the response in the
htmlkey. - Wait for 5000 milliseconds (5 seconds) for the page to load before fetching the content.
Extractor Options
When using the/scrape endpoint, you can specify options for extracting structured information from the page content using the extractorOptions parameter. Here are the available options:
mode
-
Type:
string -
Enum:
["llm-extraction", "llm-extraction-from-raw-html"] -
Description: The extraction mode to use.
llm-extraction: Extracts information from the cleaned and parsed content.llm-extraction-from-raw-html: Extracts information directly from the raw HTML.
-
Type:
string - Description: A prompt describing what information to extract from the page.
extractionSchema
- Type:
object - Description: The schema for the data to be extracted. This defines the structure of the extracted data.
Example Usage
Adjusting Timeout
You can adjust the timeout for the scraping process using thetimeout parameter in milliseconds.
Example Usage
Crawling Multiple Pages
To crawl multiple pages, you can use the/crawl endpoint. This endpoint allows you to specify a base URL you want to crawl and all accessible subpages will be crawled.
Check Crawl Job
Used to check the status of a crawl job and get its result.Crawler Options
When using the/crawl endpoint, you can customize the crawling behavior with the crawlerOptions parameter. Here are the available options:
includes
- Type:
array - Description: URL patterns to include in the crawl. Only URLs matching these patterns will be crawled.
- Example:
["/blog/*", "/products/*"]
excludes
- Type:
array - Description: URL patterns to exclude from the crawl. URLs matching these patterns will be skipped.
- Example:
["/admin/*", "/login/*"]
returnOnlyUrls
- Type:
boolean - Description: If set to
true, the response will only include a list of URLs instead of the full document data. - Default:
false
maxDepth
- Type:
integer - Description: Maximum depth to crawl relative to the entered URL. A maxDepth of 0 scrapes only the entered URL. A maxDepth of 1 scrapes the entered URL and all pages one level deep. A maxDepth of 2 scrapes the entered URL and all pages up to two levels deep. Higher values follow the same pattern.
- Example:
2
mode
- Type:
string - Enum:
["default", "fast"] - Description: The crawling mode to use.
fastmode crawls websites without a sitemap 4x faster but may be less accurate and is not recommended for heavily JavaScript-rendered websites. - Default:
default
limit
- Type:
integer - Description: Maximum number of pages to crawl.
- Default:
10000
Example Usage
- Only crawl URLs that match the patterns
/blog/*and/products/*. - Skip URLs that match the patterns
/admin/*and/login/*. - Return the full document data for each page.
- Crawl up to a maximum depth of 2.
- Use the fast crawling mode.
- Crawl a maximum of 1000 pages.
Page Options + Crawler Options
You can combine thepageOptions and crawlerOptions parameters to customize both the full crawling behavior.
Example Usage
- Return only the main content for each page.
- Include the raw HTML content for each page.
- Wait for 5000 milliseconds for each page to load before fetching its content.
- Only crawl URLs that match the patterns
/blog/*and/products/*. - Crawl up to a maximum depth of 2.
- Use the fast crawling mode.

