Basic scraping with Firecrawl
To scrape a single page and get clean markdown content, you can use the/scrape
endpoint.
Scraping PDFs
Firecrawl supports PDFs. Use theparsers
option (e.g., parsers: ["pdf"]
) when you want to ensure PDF parsing.
Scrape options
When using the/scrape
endpoint, you can customize scraping with the options below.
Formats (formats
)
- Type:
array
- Strings:
["markdown", "links", "html", "rawHtml", "summary"]
- Object formats:
- JSON:
{ type: "json", prompt, schema }
- Screenshot:
{ type: "screenshot", fullPage?, quality?, viewport? }
- Change tracking:
{ type: "changeTracking", modes?, prompt?, schema?, tag? }
(requiresmarkdown
)
- JSON:
- Default:
["markdown"]
Full page content vs main content (onlyMainContent
)
- Type:
boolean
- Description: By default the scraper returns only the main content. Set to
false
to return full page content. - Default:
true
Include tags (includeTags
)
- Type:
array
- Description: HTML tags/classes/ids to include in the scrape.
Exclude tags (excludeTags
)
- Type:
array
- Description: HTML tags/classes/ids to exclude from the scrape.
Wait for page readiness (waitFor
)
- Type:
integer
- Description: Milliseconds to wait before scraping (use sparingly).
- Default:
0
Freshness and cache (maxAge
)
- Type:
integer
(milliseconds) - Description: If a cached version of the page is newer than
maxAge
, Firecrawl returns it instantly; otherwise it scrapes fresh and updates the cache. Set0
to always fetch fresh. - Default:
172800000
(2 days)
Request timeout (timeout
)
- Type:
integer
- Description: Max duration in milliseconds before aborting.
- Default:
30000
(30 seconds)
PDF parsing (parsers
)
- Type:
array
- Description: Control parsing behavior. To parse PDFs, set
parsers: ["pdf"]
.
Actions (actions
)
When using the /scrape endpoint, Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
- Type:
array
- Description: Sequence of browser steps to run before scraping.
- Supported actions:
wait
{ milliseconds }
click
{ selector }
write
{ selector, text }
press
{ key }
scroll
{ direction: "up" | "down" }
scrape
{ selector }
(scrape a sub-element)executeJavascript
{ script }
pdf
(trigger PDF render in some flows)
Example Usage
cURL
- Return the full page content as markdown.
- Include the markdown, raw HTML, HTML, links, and a screenshot in the response.
- Include only the HTML tags
<h1>
,<p>
,<a>
, and elements with the class.main-content
, while excluding any elements with the IDs#ad
and#footer
. - Wait for 1000 milliseconds (1 second) before scraping to allow the page to load.
- Set the maximum duration of the scrape request to 15000 milliseconds (15 seconds).
- Parse PDFs explicitly via
parsers: ["pdf"]
.
JSON extraction via formats
Use the JSON format object informats
to extract structured data in one pass:
Extract endpoint
Use the dedicated extract job API when you want asynchronous extraction with status polling.Crawling multiple pages
To crawl multiple pages, use the/v2/crawl
endpoint.
cURL
Check Crawl Job
Used to check the status of a crawl job and get its result.cURL
Pagination/Next URL
If the content is larger than 10MB or if the crawl job is still running, the response may include anext
parameter, a URL to the next page of results.
Crawl prompt and params preview
You can provide a natural-languageprompt
to let Firecrawl derive crawl settings. Preview them first:
cURL
Crawler options
When using the/v2/crawl
endpoint, you can customize the crawling behavior with:
includePaths
- Type:
array
- Description: Regex patterns to include.
- Example:
["^/blog/.*$", "^/docs/.*$"]
excludePaths
- Type:
array
- Description: Regex patterns to exclude.
- Example:
["^/admin/.*$", "^/private/.*$"]
maxDiscoveryDepth
- Type:
integer
- Description: Max discovery depth for finding new URLs.
limit
- Type:
integer
- Description: Max number of pages to crawl.
- Default:
10000
crawlEntireDomain
- Type:
boolean
- Description: Explore across siblings/parents to cover the entire domain.
- Default:
false
allowExternalLinks
- Type:
boolean
- Description: Follow links to external domains.
- Default:
false
allowSubdomains
- Type:
boolean
- Description: Follow subdomains of the main domain.
- Default:
false
delay
- Type:
number
- Description: Delay in seconds between scrapes.
- Default:
undefined
scrapeOptions
- Type:
object
- Description: Options for the scraper (see Formats above).
- Example:
{ "formats": ["markdown", "links", {"type": "screenshot", "fullPage": true}], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000}
- Defaults:
formats: ["markdown"]
, caching enabled by default (maxAge ~ 2 days)
Example Usage
cURL
Mapping website links
The/v2/map
endpoint identifies URLs related to a given website.
Usage
cURL
Map Options
search
- Type:
string
- Description: Filter links containing text.
limit
- Type:
integer
- Description: Maximum number of links to return.
- Default:
100
sitemap
- Type:
"only" | "include" | "skip"
- Description: Control sitemap usage during mapping.
- Default:
"include"
includeSubdomains
- Type:
boolean
- Description: Include subdomains of the website.
- Default:
true