Skip to main content
Canonical Firecrawl Python source of truth for agents. Generated from SDK source (firecrawl-py / firecrawl 4.22.1) and the v2 OpenAPI spec. Method names, parameters, and return types match the v2 client in firecrawl/v2/client.py unless noted.

Install

pip install firecrawl-py

Authenticate

import os
from firecrawl import Firecrawl

client = Firecrawl(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# client = Firecrawl(api_key="fc-...", api_url="https://api.firecrawl.dev")

When To Use What

  • search: use when you start with a query and need discovery.
  • scrape: use when you already have a URL and want page content.
  • interact: use when the page needs clicks, forms, or post-scrape browser actions.

Why use it

Use search to discover relevant pages from a query, then pick URLs to scrape or interact with. You can constrain results to a site with site:, for example site:docs.firecrawl.dev crawl webhooks.

Preferred SDK method

client.search(query, **options)SearchData (firecrawl.v2.types)

Return value

Returns a SearchData model with optional lists:
  • web — web hits (SearchResultWeb or full Document when scrape_options hydrates content)
  • news — news hits (SearchResultNews or Document)
  • images — image hits (SearchResultImages or Document)
Omitted buckets are None when the API did not return that key.

Simple Example

results = client.search("site:docs.firecrawl.dev webhook retries")
for item in results.web or []:
    print(getattr(item, "url", None), getattr(item, "title", None))

Complex Example

from firecrawl.v2.types import ScrapeOptions

results = client.search(
    "site:docs.firecrawl.dev crawl webhooks",
    sources=["web", "news"],
    categories=["research"],
    limit=10,
    tbs="qdr:m",
    location="San Francisco,California,United States",
    ignore_invalid_urls=True,
    timeout=300000,
    scrape_options=ScrapeOptions(
        formats=[
            "markdown",
            "links",
            {"type": "json", "prompt": "Extract title and key endpoints."},
        ],
        only_main_content=True,
        include_tags=["main", "article"],
        exclude_tags=["nav", "footer"],
        wait_for=1000,
        actions=[
            {"type": "click", "selector": "button#accept"},
            {"type": "wait", "milliseconds": 750},
            {"type": "scrape"},
        ],
    ),
)

Parameters

  • query
    • Type: str
    • Use when: you need a search query.
    • Notes: use site:example.com to limit results to a domain.
  • sources
    • Type: list of source names or Source objects
    • Use when: you want to control which sources are searched.
    • Confirmed values:
      • "web": web index results
      • "news": news results
      • "images": image results
      • Source(type="web" | "news" | "images"): typed source object form
  • categories
    • Type: list of category names or Category objects
    • Use when: you want to filter results by category.
    • Confirmed values:
      • "github": GitHub-focused results
      • "research": research and academic results
      • "pdf": PDF-focused results
      • Category(type="github" | "research" | "pdf"): typed category object form
  • limit
    • Type: int
    • Use when: you want to cap results.
    • Notes: defaults to 5 in the SDK model.
  • tbs
    • Type: str
    • Use when: you need a time-based filter (for example qdr:d, qdr:w, sbd:1,qdr:m).
  • location
    • Type: str
    • Use when: you want localized results.
  • ignore_invalid_urls
    • Type: bool
    • Use when: you want to drop URLs that cannot be scraped by other endpoints.
  • timeout
    • Type: int
    • Use when: you need a request timeout in milliseconds.
    • Notes: defaults to 300000 in the SDK model.
  • scrape_options
    • Type: ScrapeOptions
    • Use when: you want to scrape each search result (see Scrape parameters for fields).

Scrape

Why use it

Use scrape when you already have a URL and want structured content in one or more formats.

Preferred SDK method

client.scrape(url, **options)Document (firecrawl.v2.types)

Simple Example

doc = client.scrape("https://docs.firecrawl.dev", formats=["markdown"])

Complex Example

doc = client.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        "links",
        {"type": "json", "prompt": "Extract plan names and prices."},
        {"type": "screenshot", "full_page": True, "quality": 80, "viewport": {"width": 1280, "height": 720}},
        {"type": "changeTracking", "modes": ["git-diff"], "tag": "pricing"},
        {"type": "attributes", "selectors": [{"selector": "a", "attribute": "href"}]},
    ],
    headers={"User-Agent": "FirecrawlDocsBot/1.0"},
    only_main_content=True,
    wait_for=1000,
    parsers=[{"type": "pdf", "mode": "auto", "max_pages": 5}],
    actions=[
        {"type": "click", "selector": "#accept"},
        {"type": "wait", "milliseconds": 750},
        {"type": "scrape"},
    ],
    location={"country": "US", "languages": ["en-US"]},
    remove_base64_images=True,
    fast_mode=True,
    block_ads=True,
    proxy="auto",
    max_age=86400000,
    store_in_cache=True,
    profile={"name": "docs-session", "save_changes": True},
)

Parameters

  • url
    • Type: str
    • Use when: you want to scrape a specific page.
  • formats
    • Type: list of format strings or format objects
    • Use when: you want multiple output formats.
    • Confirmed format strings:
      • "markdown": markdown content
      • "html": cleaned HTML
      • "rawHtml" or "raw_html": raw HTML
      • "links": page links
      • "images": image URLs
      • "screenshot": screenshot output
      • "summary": summary output
      • "changeTracking" or "change_tracking": change tracking output
      • "attributes": attribute extraction
      • "branding": branding profile output
      • "audio": audio extraction
    • Object-only format types:
      • {"type": "json", ...}: JSON extraction. Use an object, not the plain string "json".
      • {"type": "query", "prompt": "..."}: question-answer output. Use an object, not the plain string "query".
    • Format object fields:
      • type: one of the format strings above, or "json" / "query" for object-only formats
      • prompt: for type: "query" and optional for type: "json"
      • schema: JSON schema for type: "json" or for change tracking JSON mode
      • modes: array of "git-diff" or "json" for type: "changeTracking"
      • tag: change tracking tag for type: "changeTracking"
      • full_page, quality, viewport: screenshot options for type: "screenshot"
      • selectors: array of {selector, attribute} for type: "attributes"
  • headers
    • Type: dict of str to str
    • Use when: you need custom request headers.
  • include_tags
    • Type: list of str
    • Use when: you want to include only specific HTML tags.
  • exclude_tags
    • Type: list of str
    • Use when: you want to exclude specific HTML tags.
  • only_main_content
    • Type: bool
    • Use when: you want to strip nav, footer, and other boilerplate.
  • timeout
    • Type: int
    • Use when: you need a timeout in milliseconds.
  • wait_for
    • Type: int
    • Use when: you need to wait for the page to render (milliseconds).
  • mobile
    • Type: bool
    • Use when: you want a mobile viewport.
  • parsers
    • Type: list of parser names or parser objects
    • Use when: you need file parsing controls (for example PDF parsing).
    • Confirmed values:
      • "pdf": enable PDF parsing
      • { "type": "pdf", "mode": "fast" | "auto" | "ocr", "max_pages": int }: PDF parser options
  • actions
    • Type: list of action objects
    • Use when: you need lightweight pre-scrape actions.
    • Confirmed action types:
      • wait: milliseconds or selector required
      • screenshot: full_page, quality, viewport optional
      • click: selector required
      • write: text required (click to focus the input first)
      • press: key required
      • scroll: direction (up or down) required, selector optional
      • scrape: no additional fields
      • executeJavascript: script required
      • pdf: format (A0, A1, A2, A3, A4, A5, A6, Letter, Legal, Tabloid, Ledger), landscape, scale optional
  • location
    • Type: dict with country and languages
    • Use when: you need geo or language-aware scraping.
  • skip_tls_verification
    • Type: bool
    • Use when: you need to skip TLS verification.
  • remove_base64_images
    • Type: bool
    • Use when: you want to drop base64 images from markdown output.
  • fast_mode
    • Type: bool
    • Use when: you want faster scrapes with reduced fidelity.
  • block_ads
    • Type: bool
    • Use when: you want ad and cookie popup blocking.
  • proxy
    • Type: str
    • Use when: you need proxy control.
    • Confirmed values: "basic", "stealth", "enhanced", "auto"
  • max_age
    • Type: int
    • Use when: you want cached data up to a maximum age (milliseconds).
  • min_age
    • Type: int
    • Use when: you want cached data only if it is at least this old (milliseconds).
    • Notes: only on ScrapeOptions passed as scrape_options to client.search. The client.scrape method does not accept min_age.
  • store_in_cache
    • Type: bool
    • Use when: you want Firecrawl to cache the result.
  • profile
    • Type: dict with name and optional save_changes or saveChanges
    • Use when: you want a persistent browser profile shared across scrapes and interactions.

Interact

Why use it

Use interact when a page requires browser actions or code execution after a scrape starts.

Preferred SDK method

client.interact(job_id, code=None, *, prompt=None, language="node", timeout=None) prompt is keyword-only: call client.interact(job_id, prompt="...") or client.interact(job_id, code="..."). At least one of code or prompt must be non-empty (validated in firecrawl/v2/methods/scrape.py).

Simple Example

doc = client.scrape("https://example.com", formats=["markdown"])
job_id = doc.metadata.scrape_id if doc.metadata else None
if not job_id:
    raise RuntimeError("Missing scrape_id from scrape response")

result = client.interact(job_id, prompt="Click the pricing tab and summarize the plans.")

Complex Example

result = client.interact(
    "<scrapeJobId>",
    code="print(await page.title())",
    language="python",
    timeout=60,
)

Stop session

client.stop_interaction(job_id) ends the scrape-bound browser session. Returns BrowserDeleteResponse (success, optional session_duration_ms, credits_billed, error).

Parameters

  • job_id
    • Type: str
    • Use when: you have a scrape job ID from document.metadata.scrape_id.
  • code
    • Type: str
    • Use when: you want to run code in the browser session (optional if prompt is set).
  • prompt
    • Type: str
    • Use when: you want the browser agent to follow a natural-language instruction (optional if code is set).
  • language
    • Type: str
    • Use when: you need a specific runtime.
    • Confirmed values: "python", "node", "bash"
    • Notes: defaults to "node" in the SDK.
  • timeout
    • Type: int
    • Use when: you need an execution timeout in seconds.

Return value

Returns BrowserExecuteResponse: success, optional live_view_url, interactive_live_view_url, output, stdout, result, stderr, exit_code, killed, error (API camelCase is normalized to snake_case on the model).

Notes

  • Deprecated aliases: scrape_executeinteract; stop_interactive_browser and delete_scrape_browserstop_interaction.
  • The top-level Firecrawl client exposes v2 methods directly; v1 remains under client.v1.
  • The bundled v2 OpenAPI snippet for POST /v2/scrape/{jobId}/interact may only document code; the Python SDK and server accept either code or prompt for this endpoint.

Source Of Truth

  • firecrawl/apps/python-sdk/pyproject.toml
  • firecrawl/apps/python-sdk/firecrawl/__init__.py
  • firecrawl/apps/python-sdk/firecrawl/client.py
  • firecrawl/apps/python-sdk/firecrawl/v2/client.py
  • firecrawl/apps/python-sdk/firecrawl/v2/types.py
  • firecrawl/apps/python-sdk/firecrawl/v2/methods/search.py
  • firecrawl/apps/python-sdk/firecrawl/v2/methods/scrape.py
  • firecrawl/apps/python-sdk/firecrawl/v2/utils/validation.py
  • firecrawl-docs/api-reference/v2-openapi.json