Skip to main content
Reference for every option across Firecrawl’s scrape, crawl, map, and agent endpoints.

Basic scraping

To scrape a single page and get clean markdown content, use the /scrape endpoint.
# pip install firecrawl-py

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

doc = firecrawl.scrape("https://firecrawl.dev")

print(doc.markdown)

Scraping PDFs

Firecrawl supports PDFs. Use the parsers option (e.g., parsers: ["pdf"]) when you want to ensure PDF parsing. You can control the parsing strategy with the mode option:
  • auto (default) — attempts fast text-based extraction first, then falls back to OCR if needed.
  • fast — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages.
  • ocr — forces OCR parsing on every page. Use for scanned documents or when auto misclassifies a page.
{ type: "pdf" } and "pdf" both default to mode: "auto".
"parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }]

Scrape options

When using the /scrape endpoint, you can customize the request with the following options.

Formats (formats)

The formats array controls which output types the scraper returns. Default: ["markdown"]. String formats: pass the name directly (e.g. "markdown").
FormatDescription
markdownPage content converted to clean Markdown.
htmlProcessed HTML with unnecessary elements removed.
rawHtmlOriginal HTML exactly as returned by the server.
linksAll links found on the page.
imagesAll images found on the page.
summaryAn LLM-generated summary of the page content.
brandingExtracts brand identity (colors, fonts, typography, spacing, UI components).
Object formats: pass an object with type and additional options.
FormatOptionsDescription
jsonprompt?: string, schema?: objectExtract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters).
screenshotfullPage?: boolean, quality?: number, viewport?: { width, height }Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours.
changeTrackingmodes?: ("json" | "git-diff")[], tag?: string, schema?: object, prompt?: stringTrack changes between scrapes. Requires "markdown" to also be in the formats array.
attributesselectors: [{ selector: string, attribute: string }]Extract specific HTML attributes from elements matching CSS selectors.

Content filtering

These parameters control which parts of the page appear in the output. onlyMainContent runs first to strip boilerplate (nav, footer, etc.), then includeTags and excludeTags further narrow the result. If you set onlyMainContent: false, the full page HTML is used as the starting point for tag filtering.
ParameterTypeDefaultDescription
onlyMainContentbooleantrueReturn only the main content. Set false for the full page.
includeTagsarrayHTML tags, classes, or IDs to include (e.g. ["h1", "p", ".main-content"]).
excludeTagsarrayHTML tags, classes, or IDs to exclude (e.g. ["#ad", "#footer"]).

Timing and cache

ParameterTypeDefaultDescription
waitForinteger (ms)0Extra wait time before scraping, on top of smart-wait. Use sparingly.
maxAgeinteger (ms)172800000Return a cached version if fresher than this value (default is 2 days). Set 0 to always fetch fresh.
timeoutinteger (ms)30000Max request duration before aborting (default is 30 seconds).

PDF parsing

ParameterTypeDefaultDescription
parsersarray["pdf"]Controls PDF processing. [] to skip parsing and return base64 (1 credit flat).
{ "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 }
PropertyTypeDefaultDescription
type"pdf"(required)Parser type.
mode"fast" | "auto" | "ocr""auto"fast: text-based extraction only. auto: fast with OCR fallback. ocr: force OCR.
maxPagesintegerCap the number of pages to parse.

Actions

Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all wait actions and waitFor must not exceed 60 seconds.
ActionParametersDescription
waitmilliseconds?: number, selector?: stringWait for a fixed duration or until an element is visible (provide one, not both). When using selector, times out after 30 seconds.
clickselector: string, all?: booleanClick an element matching the CSS selector. Set all: true to click every match.
writetext: stringType text into the currently focused field. You must focus the element with a click action first.
presskey: stringPress a keyboard key (e.g. "Enter", "Tab", "Escape").
scrolldirection?: "up" | "down", selector?: stringScroll the page or a specific element. Direction defaults to "down".
screenshotfullPage?: boolean, quality?: number, viewport?: { width, height }Capture a screenshot. Max viewport resolution is 7680×4320.
scrape(none)Capture the current page HTML at this point in the action sequence.
executeJavascriptscript: stringRun JavaScript code in the page. Returns { type, value }.
pdfformat?: string, landscape?: boolean, scale?: numberGenerate a PDF. Supported formats: "A0" through "A6", "Letter", "Legal", "Tabloid", "Ledger". Defaults to "Letter".
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

doc = firecrawl.scrape('https://example.com', {
  'actions': [
    { 'type': 'wait', 'milliseconds': 1000 },
    { 'type': 'click', 'selector': '#accept' },
    { 'type': 'scroll', 'direction': 'down' },
    { 'type': 'click', 'selector': '#q' },
    { 'type': 'write', 'text': 'firecrawl' },
    { 'type': 'press', 'key': 'Enter' },
    { 'type': 'wait', 'milliseconds': 2000 }
  ],
  'formats': ['markdown']
})

print(doc.markdown)

Action execution notes

  • Write requires a preceding click to focus the target element.
  • Scroll accepts an optional selector to scroll a specific element instead of the page.
  • Wait accepts either milliseconds (fixed delay) or selector (wait until visible).
  • Actions run sequentially: each step completes before the next begins.
  • Actions are not supported for PDFs. If the URL resolves to a PDF the request will fail.

Advanced action examples

Taking a screenshot:
cURL
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "click", "selector": "#load-more" },
      { "type": "wait", "milliseconds": 1000 },
      { "type": "screenshot", "fullPage": true, "quality": 80 }
    ]
  }'
Clicking multiple elements:
cURL
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "click", "selector": ".expand-button", "all": true },
      { "type": "wait", "milliseconds": 500 }
    ],
    "formats": ["markdown"]
  }'
Generating a PDF:
cURL
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "pdf", "format": "A4", "landscape": false }
    ]
  }'

Full scrape example

The following request combines multiple scrape options:
cURL
curl -X POST https://api.firecrawl.dev/v2/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "formats": [
        "markdown",
        "links",
        "html",
        "rawHtml",
        { "type": "screenshot", "fullPage": true, "quality": 80 }
      ],
      "includeTags": ["h1", "p", "a", ".main-content"],
      "excludeTags": ["#ad", "#footer"],
      "onlyMainContent": false,
      "waitFor": 1000,
      "timeout": 15000,
      "parsers": ["pdf"]
    }'
This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to <h1>, <p>, <a>, and .main-content while excluding #ad and #footer, waits 1 second before scraping, sets a 15 second timeout, and enables PDF parsing. See the full Scrape API reference for details.

JSON extraction via formats

Use the JSON format object in formats to extract structured data in one pass:
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

doc = firecrawl.scrape('https://firecrawl.dev', {
  'formats': [{
    'type': 'json',
    'prompt': 'Extract the features of the product',
    'schema': {
      'type': 'object',
      'properties': { 'features': { 'type': 'object' } },
      'required': ['features']
    }
  }]
})

print(doc.json)

Agent endpoint

Use the /v2/agent endpoint for autonomous, multi-page data extraction. The agent runs asynchronously: you start a job, then poll for results.

Agent options

ParameterTypeDefaultDescription
promptstring(required)Natural-language instructions describing what data to extract (max 10,000 characters).
urlsarrayURLs to constrain the agent to.
schemaobjectJSON schema to structure the extracted data.
maxCreditsnumber2500Maximum credits the agent can spend. The dashboard supports up to 2,500; for higher limits, set this via the API (values above 2,500 are always billed as paid requests).
strictConstrainToURLsbooleanfalseWhen true, the agent only visits the provided URLs.
modelstring"spark-1-mini"AI model to use. "spark-1-mini" (default, 60% cheaper) or "spark-1-pro" (higher accuracy).
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

# Start agent job
started = firecrawl.start_agent(
    prompt="Extract the title and description",
    urls=["https://docs.firecrawl.dev"],
    schema={"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]}
)

# Poll status
status = firecrawl.get_agent_status(started["id"])
print(status.get("status"), status.get("data"))

Check agent status

Poll GET /v2/agent/{jobId} to check progress. The response status field will be "processing", "completed", or "failed".
cURL
curl -X GET https://api.firecrawl.dev/v2/agent/YOUR-JOB-ID \
  -H 'Authorization: Bearer fc-YOUR-API-KEY'
The Python and Node SDKs also provide a convenience method (firecrawl.agent()) that starts the job and polls automatically until completion.

Crawling multiple pages

To crawl multiple pages, use the /v2/crawl endpoint. The crawl runs asynchronously and returns a job ID.
cURL
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev"
    }'

Response

{ "id": "1234-5678-9101" }

Check crawl job

Use the job ID to check the status of a crawl and retrieve its results.
cURL
curl -X GET https://api.firecrawl.dev/v2/crawl/1234-5678-9101 \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY'
If the content is larger than 10MB or the crawl job is still running, the response may include a next parameter, a URL to the next page of results.

Crawl prompt and params preview

You can provide a natural-language prompt to let Firecrawl derive crawl settings. Preview them first:
cURL
curl -X POST https://api.firecrawl.dev/v2/crawl/params-preview \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://docs.firecrawl.dev",
    "prompt": "Extract docs and blog"
  }'

Crawler options

When using the /v2/crawl endpoint, you can customize crawling behavior with the following options.

Path filtering

ParameterTypeDefaultDescription
includePathsarrayRegex patterns for URLs to include (pathname only by default).
excludePathsarrayRegex patterns for URLs to exclude (pathname only by default).
regexOnFullURLbooleanfalseMatch patterns against the full URL instead of just the pathname.
The starting URL is also checked against includePaths. If it does not match any of the patterns, the crawl may return 0 pages.

Crawl scope

ParameterTypeDefaultDescription
maxDiscoveryDepthintegerMax link-depth for discovering new URLs.
limitinteger10000Max pages to crawl.
crawlEntireDomainbooleanfalseExplore siblings and parents to cover the entire domain.
allowExternalLinksbooleanfalseFollow links to external domains.
allowSubdomainsbooleanfalseFollow subdomains of the main domain.
delaynumber (s)Delay between scrapes.

Sitemap and deduplication

ParameterTypeDefaultDescription
sitemapstring"include""include": use sitemap + link discovery. "skip": ignore sitemap. "only": crawl only sitemap URLs.
deduplicateSimilarURLsbooleantrueNormalize URL variants (www., https, trailing slashes, index.html) as duplicates.
ignoreQueryParametersbooleanfalseStrip query strings before deduplication (e.g. /page?a=1 and /page?a=2 become one URL).

Scrape options for crawl

ParameterTypeDefaultDescription
scrapeOptionsobject{ formats: ["markdown"] }Per-page scrape config. Accepts all scrape options above.

Crawl example

cURL
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "includePaths": ["^/blog/.*$", "^/docs/.*$"],
      "excludePaths": ["^/admin/.*$", "^/private/.*$"],
      "maxDiscoveryDepth": 2,
      "limit": 1000
    }'
The /v2/map endpoint identifies URLs related to a given website.
cURL
curl -X POST https://api.firecrawl.dev/v2/map \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev"
    }'

Map options

ParameterTypeDefaultDescription
searchstringFilter links by text match.
limitinteger100Max links to return.
sitemapstring"include""include", "skip", or "only".
includeSubdomainsbooleantrueInclude subdomains.
Here is the API Reference for it: Map Endpoint Documentation

Whitelisting Firecrawl

Allowing Firecrawl to scrape your website

  • User Agent: Allow FirecrawlAgent in your firewall or security rules.
  • IP addresses: Firecrawl does not use a fixed set of outbound IPs.

Allowing your application to call the Firecrawl API

If your firewall blocks outbound requests from your application to external services, you need to whitelist Firecrawl’s API server IP address so your application can reach the Firecrawl API (api.firecrawl.dev):
  • IP Address: 35.245.250.27
Add this IP to your firewall’s outbound allowlist so your backend can send scrape, crawl, map, and agent requests to Firecrawl.