Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt

Use this file to discover all available pages before exploring further.

Canonical Firecrawl Python source of truth for agents. Generated from SDK source (firecrawl-py / firecrawl 4.22.1) and the v2 OpenAPI spec. Method names, parameters, and return types match the v2 client in firecrawl/v2/client.py unless noted.

Install

pip install firecrawl-py

Authenticate

import os
from firecrawl import Firecrawl

client = Firecrawl(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# client = Firecrawl(api_key="fc-...", api_url="https://api.firecrawl.dev")

When To Use What

  • search: use when you start with a query and need discovery.
  • scrape: use when you already have a URL and want page content.
  • interact: use when the page needs clicks, forms, or post-scrape browser actions.
  • support/ask: use when a Firecrawl API call fails or returns unexpected results and you need a diagnosis.
  • support/docs-search: use when you need to look up Firecrawl documentation.

Why use it

Use search to discover relevant pages from a query, then pick URLs to scrape or interact with. You can constrain results to a site with site:, for example site:docs.firecrawl.dev crawl webhooks.

Preferred SDK method

client.search(query, **options)SearchData (firecrawl.v2.types)

Return value

Returns a SearchData model with optional lists:
  • web — web hits (SearchResultWeb or full Document when scrape_options hydrates content)
  • news — news hits (SearchResultNews or Document)
  • images — image hits (SearchResultImages or Document)
Omitted buckets are None when the API did not return that key. Wrong turn to avoid: search() does not return { data: [...] }. Do not access result.data. Web results are in result.web, news in result.news, images in result.images.

Simple Example

results = client.search("site:docs.firecrawl.dev webhook retries")
for item in results.web or []:
    print(getattr(item, "url", None), getattr(item, "title", None))

Complex Example

from firecrawl.v2.types import ScrapeOptions

results = client.search(
    "site:docs.firecrawl.dev crawl webhooks",
    sources=["web", "news"],
    categories=["research"],
    limit=10,
    tbs="qdr:m",
    location="San Francisco,California,United States",
    ignore_invalid_urls=True,
    timeout=300000,
    scrape_options=ScrapeOptions(
        formats=[
            "markdown",
            "links",
            {"type": "json", "prompt": "Extract title and key endpoints."},
        ],
        only_main_content=True,
        include_tags=["main", "article"],
        exclude_tags=["nav", "footer"],
        wait_for=1000,
        actions=[
            {"type": "click", "selector": "button#accept"},
            {"type": "wait", "milliseconds": 750},
            {"type": "scrape"},
        ],
    ),
)

Parameters

  • query
    • Type: str
    • Use when: you need a search query.
    • Notes: use site:example.com to limit results to a domain.
  • sources
    • Type: list of source names or Source objects
    • Use when: you want to control which sources are searched.
    • Confirmed values:
      • "web": web index results
      • "news": news results
      • "images": image results
      • Source(type="web" | "news" | "images"): typed source object form
  • categories
    • Type: list of category names or Category objects
    • Use when: you want to filter results by category.
    • Confirmed values:
      • "github": GitHub-focused results
      • "research": research and academic results
      • "pdf": PDF-focused results
      • Category(type="github" | "research" | "pdf"): typed category object form
  • limit
    • Type: int
    • Use when: you want to cap results.
    • Notes: defaults to 5 in the SDK model.
  • tbs
    • Type: str
    • Use when: you need a time-based filter (for example qdr:d, qdr:w, sbd:1,qdr:m).
  • location
    • Type: str
    • Use when: you want localized results.
  • ignore_invalid_urls
    • Type: bool
    • Use when: you want to drop URLs that cannot be scraped by other endpoints.
  • timeout
    • Type: int
    • Use when: you need a request timeout in milliseconds.
    • Notes: defaults to 300000 in the SDK model.
  • scrape_options
    • Type: ScrapeOptions
    • Use when: you want to scrape each search result (see Scrape parameters for fields).

Scrape

Why use it

Use scrape when you already have a URL and want structured content in one or more formats.

Preferred SDK method

client.scrape(url, **options)Document (firecrawl.v2.types)

Simple Example

doc = client.scrape("https://docs.firecrawl.dev", formats=["markdown"])

Complex Example

doc = client.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        "links",
        {"type": "json", "prompt": "Extract plan names and prices."},
        {"type": "screenshot", "full_page": True, "quality": 80, "viewport": {"width": 1280, "height": 720}},
        {"type": "changeTracking", "modes": ["git-diff"], "tag": "pricing"},
        {"type": "attributes", "selectors": [{"selector": "a", "attribute": "href"}]},
    ],
    headers={"User-Agent": "FirecrawlDocsBot/1.0"},
    only_main_content=True,
    wait_for=1000,
    parsers=[{"type": "pdf", "mode": "auto", "max_pages": 5}],
    actions=[
        {"type": "click", "selector": "#accept"},
        {"type": "wait", "milliseconds": 750},
        {"type": "scrape"},
    ],
    location={"country": "US", "languages": ["en-US"]},
    remove_base64_images=True,
    fast_mode=True,
    block_ads=True,
    proxy="auto",
    max_age=86400000,
    store_in_cache=True,
    profile={"name": "docs-session", "save_changes": True},
)

Parameters

  • url
    • Type: str
    • Use when: you want to scrape a specific page.
  • formats
    • Type: list of format strings or format objects
    • Use when: you want multiple output formats.
    • Confirmed format strings:
      • "markdown": markdown content
      • "html": cleaned HTML
      • "rawHtml" or "raw_html": raw HTML
      • "links": page links
      • "images": image URLs
      • "screenshot": screenshot output
      • "summary": summary output
      • "changeTracking" or "change_tracking": change tracking output
      • "attributes": attribute extraction
      • "branding": branding profile output
      • "audio": audio extraction
      • "video": video extraction
    • Object-only format types:
      • {"type": "json", ...}: JSON extraction. Use an object, not the plain string "json".
      • {"type": "question", "question": "..."}: question-answer output.
      • {"type": "highlights", "query": "..."}: relevant source-text output.
    • Format object fields:
      • type: one of the format strings above, or "json", "question", or "highlights" for object-only formats
      • question: for type: "question"
      • query: for type: "highlights"
      • prompt: optional for type: "json"
      • schema: JSON schema for type: "json" or for change tracking JSON mode
      • modes: array of "git-diff" or "json" for type: "changeTracking"
      • tag: change tracking tag for type: "changeTracking"
      • full_page, quality, viewport: screenshot options for type: "screenshot"
      • selectors: array of {selector, attribute} for type: "attributes"
  • headers
    • Type: dict of str to str
    • Use when: you need custom request headers.
  • include_tags
    • Type: list of str
    • Use when: you want to include only specific HTML tags.
  • exclude_tags
    • Type: list of str
    • Use when: you want to exclude specific HTML tags.
  • only_main_content
    • Type: bool
    • Use when: you want to strip nav, footer, and other boilerplate.
  • timeout
    • Type: int
    • Use when: you need a timeout in milliseconds.
  • wait_for
    • Type: int
    • Use when: you need to wait for the page to render (milliseconds).
  • mobile
    • Type: bool
    • Use when: you want a mobile viewport.
  • parsers
    • Type: list of parser names or parser objects
    • Use when: you need file parsing controls (for example PDF parsing).
    • Confirmed values:
      • "pdf": enable PDF parsing
      • { "type": "pdf", "mode": "fast" | "auto" | "ocr", "max_pages": int }: PDF parser options
  • actions
    • Type: list of action objects
    • Use when: you need lightweight pre-scrape actions.
    • Confirmed action types:
      • wait: milliseconds or selector required
      • screenshot: full_page, quality, viewport optional
      • click: selector required
      • write: text required (click to focus the input first)
      • press: key required
      • scroll: direction (up or down) required, selector optional
      • scrape: no additional fields
      • executeJavascript: script required
      • pdf: format (A0, A1, A2, A3, A4, A5, A6, Letter, Legal, Tabloid, Ledger), landscape, scale optional
  • location
    • Type: dict with country and languages
    • Use when: you need geo or language-aware scraping.
  • skip_tls_verification
    • Type: bool
    • Use when: you need to skip TLS verification.
  • remove_base64_images
    • Type: bool
    • Use when: you want to drop base64 images from markdown output.
  • fast_mode
    • Type: bool
    • Use when: you want faster scrapes with reduced fidelity.
  • block_ads
    • Type: bool
    • Use when: you want ad and cookie popup blocking.
  • proxy
    • Type: str
    • Use when: you need proxy control.
    • Confirmed values: "basic", "stealth", "enhanced", "auto"
  • max_age
    • Type: int
    • Use when: you want cached data up to a maximum age (milliseconds).
  • min_age
    • Type: int
    • Use when: you want cached data only if it is at least this old (milliseconds).
    • Notes: only on ScrapeOptions passed as scrape_options to client.search. The client.scrape method does not accept min_age.
  • store_in_cache
    • Type: bool
    • Use when: you want Firecrawl to cache the result.
  • profile
    • Type: dict with name and optional save_changes or saveChanges
    • Use when: you want a persistent browser profile shared across scrapes and interactions.

Interact

Why use it

Use interact when a page requires browser actions or code execution after a scrape starts.

Preferred SDK method

client.interact(job_id, code=None, *, prompt=None, language="node", timeout=None) prompt is keyword-only: call client.interact(job_id, prompt="...") or client.interact(job_id, code="..."). At least one of code or prompt must be non-empty (validated in firecrawl/v2/methods/scrape.py).

Simple Example

doc = client.scrape("https://example.com", formats=["markdown"])
job_id = doc.metadata.scrape_id if doc.metadata else None
if not job_id:
    raise RuntimeError("Missing scrape_id from scrape response")

result = client.interact(job_id, prompt="Click the pricing tab and summarize the plans.")

Complex Example

result = client.interact(
    "<scrapeJobId>",
    code="print(await page.title())",
    language="python",
    timeout=60,
)

Stop session

client.stop_interaction(job_id) ends the scrape-bound browser session. Returns BrowserDeleteResponse (success, optional session_duration_ms, credits_billed, error).

Parameters

  • job_id
    • Type: str
    • Use when: you have a scrape job ID from document.metadata.scrape_id.
  • code
    • Type: str
    • Use when: you want to run code in the browser session (optional if prompt is set).
  • prompt
    • Type: str
    • Use when: you want the browser agent to follow a natural-language instruction (optional if code is set).
  • language
    • Type: str
    • Use when: you need a specific runtime.
    • Confirmed values: "python", "node", "bash"
    • Notes: defaults to "node" in the SDK.
  • timeout
    • Type: int
    • Use when: you need an execution timeout in seconds.

Return value

Returns BrowserExecuteResponse: success, optional live_view_url, interactive_live_view_url, output, stdout, result, stderr, exit_code, killed, error (API camelCase is normalized to snake_case on the model).

Ask (Agentic Debugging)

Why use it

Use ask when a Firecrawl API call fails or returns unexpected results. The AI support agent diagnoses the issue, proposes fix parameters, and optionally validates the fix against the live API. Typical latency: 15-30 seconds.

Preferred SDK method

Not yet available in the Python SDK. Use requests or httpx directly.

Simple Example

import requests

response = requests.post(
    "https://api.firecrawl.dev/v2/support/ask",
    headers={
        "Authorization": f"Bearer {os.environ['FIRECRAWL_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={"question": "my scrape returned empty markdown for https://example.com"},
)
result = response.json()
print(result["answer"])
print(result["fixParameters"])

Complex Example

import requests

response = requests.post(
    "https://api.firecrawl.dev/v2/support/ask",
    headers={
        "Authorization": f"Bearer {os.environ['FIRECRAWL_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "question": "crawl returned 3 pages but I expected 50",
        "rationale": "user is on their third failed crawl attempt today",
        "context": {"userPlan": "standard", "retryCount": 3},
    },
)
result = response.json()

Agent retry pattern

from firecrawl import Firecrawl
import requests, os

client = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])

doc = client.scrape("https://example.com/pricing", formats=["markdown"])

if not doc.markdown or len(doc.markdown) < 100:
    diagnosis = requests.post(
        "https://api.firecrawl.dev/v2/support/ask",
        headers={
            "Authorization": f"Bearer {os.environ['FIRECRAWL_API_KEY']}",
            "Content-Type": "application/json",
        },
        json={
            "question": f"scrape returned only {len(doc.markdown or '')} chars for https://example.com/pricing",
        },
    ).json()

    if diagnosis.get("fixParameters"):
        doc = client.scrape(
            "https://example.com/pricing",
            formats=["markdown"],
            **diagnosis["fixParameters"],
        )

Parameters

  • question
    • Type: str (required, 1-8000 chars)
    • Use when: you need to describe the issue.
  • rationale
    • Type: str (1-2000 chars)
    • Use when: you are an AI agent calling on behalf of a user.
  • context
    • Type: dict (free-form)
    • Use when: you want to pass metadata into the debugging prompt.

Why use it

Use docs-search to look up Firecrawl documentation.

Simple Example

import requests

FIRECRAWL_API_KEY = "fc-YOUR_API_KEY"

response = requests.post(
    "https://api.firecrawl.dev/v2/support/docs-search",
    headers={"Authorization": f"Bearer {FIRECRAWL_API_KEY}"},
    json={"question": "how do I verify webhook signatures?"},
)
print(response.json()["answer"])

Parameters

  • question
    • Type: str (required, 1-8000 chars)
    • Use when: you need a docs-grounded answer.

Notes

  • Deprecated aliases: scrape_executeinteract; stop_interactive_browser and delete_scrape_browserstop_interaction.
  • The top-level Firecrawl client exposes v2 methods directly; v1 remains under client.v1.
  • The bundled v2 OpenAPI snippet for POST /v2/scrape/{jobId}/interact may only document code; the Python SDK and server accept either code or prompt for this endpoint.

Source Of Truth

  • firecrawl/apps/python-sdk/pyproject.toml
  • firecrawl/apps/python-sdk/firecrawl/__init__.py
  • firecrawl/apps/python-sdk/firecrawl/client.py
  • firecrawl/apps/python-sdk/firecrawl/v2/client.py
  • firecrawl/apps/python-sdk/firecrawl/v2/types.py
  • firecrawl/apps/python-sdk/firecrawl/v2/methods/search.py
  • firecrawl/apps/python-sdk/firecrawl/v2/methods/scrape.py
  • firecrawl/apps/python-sdk/firecrawl/v2/utils/validation.py
  • firecrawl-docs/api-reference/v2-openapi.json