Basic scraping with Firecrawl
To scrape a single page and get clean markdown content, you can use the/scrape endpoint.
Scraping PDFs
Firecrawl supports PDFs. Use theparsers option (e.g., parsers: ["pdf"]) when you want to ensure PDF parsing.
Scrape options
When using the/scrape endpoint, you can customize scraping with the options below.
Formats (formats)
- Type:
array - Strings:
["markdown", "links", "html", "rawHtml", "summary", "images"] - Object formats:
- JSON:
{ type: "json", prompt, schema } - Screenshot:
{ type: "screenshot", fullPage?, quality?, viewport? } - Change tracking:
{ type: "changeTracking", modes?, prompt?, schema?, tag? }(requiresmarkdown)
- JSON:
- Default:
["markdown"]
Full page content vs main content (onlyMainContent)
- Type:
boolean - Description: By default the scraper returns only the main content. Set to
falseto return full page content. - Default:
true
Include tags (includeTags)
- Type:
array - Description: HTML tags/classes/ids to include in the scrape.
Exclude tags (excludeTags)
- Type:
array - Description: HTML tags/classes/ids to exclude from the scrape.
Wait for page readiness (waitFor)
- Type:
integer - Description: Milliseconds to wait before scraping (use sparingly).
- Default:
0
Freshness and cache (maxAge)
- Type:
integer(milliseconds) - Description: If a cached version of the page is newer than
maxAge, Firecrawl returns it instantly; otherwise it scrapes fresh and updates the cache. Set0to always fetch fresh. - Default:
172800000(2 days)
Request timeout (timeout)
- Type:
integer - Description: Max duration in milliseconds before aborting.
- Default:
30000(30 seconds)
PDF parsing (parsers)
- Type:
array - Description: Control parsing behavior. To parse PDFs, set
parsers: ["pdf"].
Actions (actions)
When using the /scrape endpoint, Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
- Type:
array - Description: Sequence of browser steps to run before scraping.
- Supported actions:
wait{ milliseconds }click{ selector }write{ selector, text }press{ key }scroll{ direction: "up" | "down" }scrape{ selector }(scrape a sub-element)executeJavascript{ script }pdf(trigger PDF render in some flows)
Example Usage
cURL
- Return the full page content as markdown.
- Include the markdown, raw HTML, HTML, links, and a screenshot in the response.
- Include only the HTML tags
<h1>,<p>,<a>, and elements with the class.main-content, while excluding any elements with the IDs#adand#footer. - Wait for 1000 milliseconds (1 second) before scraping to allow the page to load.
- Set the maximum duration of the scrape request to 15000 milliseconds (15 seconds).
- Parse PDFs explicitly via
parsers: ["pdf"].
JSON extraction via formats
Use the JSON format object informats to extract structured data in one pass:
Extract endpoint
Use the dedicated extract job API when you want asynchronous extraction with status polling.Crawling multiple pages
To crawl multiple pages, use the/v2/crawl endpoint.
cURL
Check Crawl Job
Used to check the status of a crawl job and get its result.cURL
Pagination/Next URL
If the content is larger than 10MB or if the crawl job is still running, the response may include anext parameter, a URL to the next page of results.
Crawl prompt and params preview
You can provide a natural-languageprompt to let Firecrawl derive crawl settings. Preview them first:
cURL
Crawler options
When using the/v2/crawl endpoint, you can customize the crawling behavior with:
includePaths
- Type:
array - Description: Regex patterns to include.
- Example:
["^/blog/.*$", "^/docs/.*$"]
excludePaths
- Type:
array - Description: Regex patterns to exclude.
- Example:
["^/admin/.*$", "^/private/.*$"]
maxDiscoveryDepth
- Type:
integer - Description: Max discovery depth for finding new URLs.
limit
- Type:
integer - Description: Max number of pages to crawl.
- Default:
10000
crawlEntireDomain
- Type:
boolean - Description: Explore across siblings/parents to cover the entire domain.
- Default:
false
allowExternalLinks
- Type:
boolean - Description: Follow links to external domains.
- Default:
false
allowSubdomains
- Type:
boolean - Description: Follow subdomains of the main domain.
- Default:
false
delay
- Type:
number - Description: Delay in seconds between scrapes.
- Default:
undefined
scrapeOptions
- Type:
object - Description: Options for the scraper (see Formats above).
- Example:
{ "formats": ["markdown", "links", {"type": "screenshot", "fullPage": true}], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000} - Defaults:
formats: ["markdown"], caching enabled by default (maxAge ~ 2 days)
Example Usage
cURL
Mapping website links
The/v2/map endpoint identifies URLs related to a given website.
Usage
cURL
Map Options
search
- Type:
string - Description: Filter links containing text.
limit
- Type:
integer - Description: Maximum number of links to return.
- Default:
100
sitemap
- Type:
"only" | "include" | "skip" - Description: Control sitemap usage during mapping.
- Default:
"include"
includeSubdomains
- Type:
boolean - Description: Include subdomains of the website.
- Default:
true

