# Advanced Scraping Guide Source: https://docs.firecrawl.dev/advanced-scraping-guide Configure scrape options, browser actions, crawl, map, and the agent endpoint with Firecrawl's full API surface. Reference for every option across Firecrawl's scrape, crawl, map, and agent endpoints. ## Basic scraping To scrape a single page and get clean markdown content, use the `/scrape` endpoint. ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape("https://firecrawl.dev") print(doc.markdown) ``` ```js Node theme={null} // npm install @mendable/firecrawl-js import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://firecrawl.dev'); console.log(doc.markdown); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ## Scraping PDFs Firecrawl supports PDFs. Use the `parsers` option (e.g., `parsers: ["pdf"]`) when you want to ensure PDF parsing. You can control the parsing strategy with the `mode` option: * **`auto`** (default) — attempts fast text-based extraction first, then falls back to OCR if needed. * **`fast`** — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages. * **`ocr`** — forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page. `{ type: "pdf" }` and `"pdf"` both default to `mode: "auto"`. ```json theme={null} "parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }] ``` ## Scrape options When using the `/scrape` endpoint, you can customize the request with the following options. ### Formats (`formats`) The `formats` array controls which output types the scraper returns. Default: `["markdown"]`. **String formats**: pass the name directly (e.g. `"markdown"`). | Format | Description | | ---------- | ---------------------------------------------------------------------------- | | `markdown` | Page content converted to clean Markdown. | | `html` | Processed HTML with unnecessary elements removed. | | `rawHtml` | Original HTML exactly as returned by the server. | | `links` | All links found on the page. | | `images` | All images found on the page. | | `summary` | An LLM-generated summary of the page content. | | `branding` | Extracts brand identity (colors, fonts, typography, spacing, UI components). | **Object formats**: pass an object with `type` and additional options. | Format | Options | Description | | ---------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | `json` | `prompt?: string`, `schema?: object` | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters). | | `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. | | `changeTracking` | `modes?: ("json" \| "git-diff")[]`, `tag?: string`, `schema?: object`, `prompt?: string` | Track changes between scrapes. Requires `"markdown"` to also be in the formats array. | | `attributes` | `selectors: [{ selector: string, attribute: string }]` | Extract specific HTML attributes from elements matching CSS selectors. | ### Content filtering These parameters control which parts of the page appear in the output. When `onlyMainContent` is `true` (the default), boilerplate (nav, footer, etc.) is stripped. `includeTags` and `excludeTags` are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set `onlyMainContent: false` to use the full page as the starting point for tag filtering. | Parameter | Type | Default | Description | | ----------------- | --------- | ------- | ---------------------------------------------------------------------------- | | `onlyMainContent` | `boolean` | `true` | Return only the main content. Set `false` for the full page. | | `includeTags` | `array` | — | HTML tags, classes, or IDs to include (e.g. `["h1", "p", ".main-content"]`). | | `excludeTags` | `array` | — | HTML tags, classes, or IDs to exclude (e.g. `["#ad", "#footer"]`). | ### Timing and cache | Parameter | Type | Default | Description | | --------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------ | | `waitFor` | `integer` (ms) | `0` | Extra wait time before scraping, on top of smart-wait. Use sparingly. | | `maxAge` | `integer` (ms) | `172800000` | Return a cached version if fresher than this value (default is 2 days). Set `0` to always fetch fresh. | | `timeout` | `integer` (ms) | `60000` | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second). | ### PDF parsing | Parameter | Type | Default | Description | | --------- | ------- | --------- | -------------------------------------------------------------------------------- | | `parsers` | `array` | `["pdf"]` | Controls PDF processing. `[]` to skip parsing and return base64 (1 credit flat). | ```json theme={null} { "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 } ``` | Property | Type | Default | Description | | ---------- | --------------------------- | ------------ | ------------------------------------------------------------------------------------- | | `type` | `"pdf"` | *(required)* | Parser type. | | `mode` | `"fast" \| "auto" \| "ocr"` | `"auto"` | `fast`: text-based extraction only. `auto`: fast with OCR fallback. `ocr`: force OCR. | | `maxPages` | `integer` | — | Cap the number of pages to parse. | ### Actions Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all `wait` actions and `waitFor` must not exceed 60 seconds. | Action | Parameters | Description | | ------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | | `wait` | `milliseconds?: number`, `selector?: string` | Wait for a fixed duration **or** until an element is visible (provide one, not both). When using `selector`, times out after 30 seconds. | | `click` | `selector: string`, `all?: boolean` | Click an element matching the CSS selector. Set `all: true` to click every match. | | `write` | `text: string` | Type text into the currently focused field. You must focus the element with a `click` action first. | | `press` | `key: string` | Press a keyboard key (e.g. `"Enter"`, `"Tab"`, `"Escape"`). | | `scroll` | `direction?: "up" \| "down"`, `selector?: string` | Scroll the page or a specific element. Direction defaults to `"down"`. | | `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max viewport resolution is 7680×4320. | | `scrape` | *(none)* | Capture the current page HTML at this point in the action sequence. | | `executeJavascript` | `script: string` | Run JavaScript code in the page. Return values are available in the `actions.javascriptReturns` array of the response. | | `pdf` | `format?: string`, `landscape?: boolean`, `scale?: number` | Generate a PDF. Supported formats: `"A0"` through `"A6"`, `"Letter"`, `"Legal"`, `"Tabloid"`, `"Ledger"`. Defaults to `"Letter"`. | ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') doc = firecrawl.scrape('https://example.com', { 'actions': [ { 'type': 'wait', 'milliseconds': 1000 }, { 'type': 'click', 'selector': '#accept' }, { 'type': 'scroll', 'direction': 'down' }, { 'type': 'click', 'selector': '#q' }, { 'type': 'write', 'text': 'firecrawl' }, { 'type': 'press', 'key': 'Enter' }, { 'type': 'wait', 'milliseconds': 2000 } ], 'formats': ['markdown'] }) print(doc.markdown) ``` ```js Node theme={null} import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://example.com', { actions: [ { type: 'wait', milliseconds: 1000 }, { type: 'click', selector: '#accept' }, { type: 'scroll', direction: 'down' }, { type: 'click', selector: '#q' }, { type: 'write', text: 'firecrawl' }, { type: 'press', key: 'Enter' }, { type: 'wait', milliseconds: 2000 } ], formats: ['markdown'] }); console.log(doc.markdown); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "wait", "milliseconds": 1000 }, { "type": "click", "selector": "#accept" }, { "type": "scroll", "direction": "down" }, { "type": "click", "selector": "#q" }, { "type": "write", "text": "firecrawl" }, { "type": "press", "key": "Enter" }, { "type": "wait", "milliseconds": 2000 } ], "formats": ["markdown"] }' ``` #### Action execution notes * **Write** requires a preceding `click` to focus the target element. * **Scroll** accepts an optional `selector` to scroll a specific element instead of the page. * **Wait** accepts either `milliseconds` (fixed delay) or `selector` (wait until visible). * Actions run **sequentially**: each step completes before the next begins. * Actions are **not supported for PDFs**. If the URL resolves to a PDF the request will fail. #### Advanced action examples **Taking a screenshot:** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "click", "selector": "#load-more" }, { "type": "wait", "milliseconds": 1000 }, { "type": "screenshot", "fullPage": true, "quality": 80 } ] }' ``` **Clicking multiple elements:** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "click", "selector": ".expand-button", "all": true }, { "type": "wait", "milliseconds": 500 } ], "formats": ["markdown"] }' ``` **Generating a PDF:** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "pdf", "format": "A4", "landscape": false } ] }' ``` **Executing JavaScript (e.g. extracting embedded page data):** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "executeJavascript", "script": "document.querySelector(\"#__NEXT_DATA__\").textContent" } ], "formats": ["markdown"] }' ``` The return value of each `executeJavascript` action is captured in the `actions.javascriptReturns` array of the response. ### Full scrape example The following request combines multiple scrape options: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "formats": [ "markdown", "links", "html", "rawHtml", { "type": "screenshot", "fullPage": true, "quality": 80 } ], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000, "parsers": ["pdf"] }' ``` This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to `

`, `
`, ``, and `.main-content` while excluding `#ad` and `#footer`, waits 1 second before scraping, sets a 15 second timeout, and enables PDF parsing. See the full [Scrape API reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape) for details. ## JSON extraction via formats Use the JSON format object in `formats` to extract structured data in one pass: ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') doc = firecrawl.scrape('https://firecrawl.dev', { 'formats': [{ 'type': 'json', 'prompt': 'Extract the features of the product', 'schema': { 'type': 'object', 'properties': { 'features': { 'type': 'object' } }, 'required': ['features'] } }] }) print(doc.json) ``` ```js Node theme={null} import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: [{ type: 'json', prompt: 'Extract the features of the product', schema: { type: 'object', properties: { features: { type: 'object' } }, required: ['features'] } }] }); console.log(doc.json); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the features of the product", "schema": {"type": "object", "properties": {"features": {"type": "object"}}, "required": ["features"]} }] }' ``` ## Agent endpoint Use the `/v2/agent` endpoint for autonomous, multi-page data extraction. The agent runs asynchronously: you start a job, then poll for results. ### Agent options | Parameter | Type | Default | Description | | ----------------------- | --------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `prompt` | `string` | (required) | Natural-language instructions describing what data to extract (max 10,000 characters). | | `urls` | `array` | — | URLs to constrain the agent to. | | `schema` | `object` | — | JSON schema to structure the extracted data. | | `maxCredits` | `number` | `2500` | Maximum credits the agent can spend. The dashboard supports up to 2,500; for higher limits, set this via the API (values above 2,500 are always billed as paid requests). | | `strictConstrainToURLs` | `boolean` | `false` | When `true`, the agent only visits the provided URLs. | | `model` | `string` | `"spark-1-mini"` | AI model to use. `"spark-1-mini"` (default, 60% cheaper) or `"spark-1-pro"` (higher accuracy). | ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') # Start agent job started = firecrawl.start_agent( prompt="Extract the title and description", urls=["https://docs.firecrawl.dev"], schema={"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]} ) # Poll status status = firecrawl.get_agent_status(started["id"]) print(status.get("status"), status.get("data")) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start agent job const started = await firecrawl.startAgent({ prompt: 'Extract the title and description', urls: ['https://docs.firecrawl.dev'], schema: { type: 'object', properties: { title: { type: 'string' }, description: { type: 'string' } }, required: ['title'] } }); // Poll status const status = await firecrawl.getAgentStatus(started.id); console.log(status.status, status.data); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/agent \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "prompt": "Extract the title and description", "urls": ["https://docs.firecrawl.dev"], "schema": {"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]} }' ``` ### Check agent status Poll `GET /v2/agent/{jobId}` to check progress. The response `status` field will be `"processing"`, `"completed"`, or `"failed"`. ```bash cURL theme={null} curl -X GET https://api.firecrawl.dev/v2/agent/YOUR-JOB-ID \ -H 'Authorization: Bearer fc-YOUR-API-KEY' ``` The Python and Node SDKs also provide a convenience method (`firecrawl.agent()`) that starts the job and polls automatically until completion. ## Crawling multiple pages To crawl multiple pages, use the `/v2/crawl` endpoint. The crawl runs asynchronously and returns a job ID. Use the `limit` parameter to control how many pages are crawled. If omitted, the crawl will process up to 10,000 pages. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ### Response ```json theme={null} { "id": "1234-5678-9101" } ``` ### Check crawl job Use the job ID to check the status of a crawl and retrieve its results. ```bash cURL theme={null} curl -X GET https://api.firecrawl.dev/v2/crawl/1234-5678-9101 \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' ``` If the content is larger than 10MB or the crawl job is still running, the response may include a `next` parameter, a URL to the next page of results. ### Crawl prompt and params preview You can provide a natural-language `prompt` to let Firecrawl derive crawl settings. Preview them first: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl/params-preview \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "prompt": "Extract docs and blog" }' ``` ### Crawler options When using the `/v2/crawl` endpoint, you can customize crawling behavior with the following options. #### Path filtering | Parameter | Type | Default | Description | | ---------------- | --------- | ------- | ----------------------------------------------------------------- | | `includePaths` | `array` | — | Regex patterns for URLs to include (pathname only by default). | | `excludePaths` | `array` | — | Regex patterns for URLs to exclude (pathname only by default). | | `regexOnFullURL` | `boolean` | `false` | Match patterns against the full URL instead of just the pathname. | The starting URL is also checked against `includePaths`. If it does not match any of the patterns, the crawl may return 0 pages. #### Crawl scope | Parameter | Type | Default | Description | | -------------------- | ------------ | ------- | ------------------------------------------------------------ | | `maxDiscoveryDepth` | `integer` | — | Max link-depth for discovering new URLs. | | `limit` | `integer` | `10000` | Max pages to crawl. | | `crawlEntireDomain` | `boolean` | `false` | Explore siblings and parents to cover the entire domain. | | `allowExternalLinks` | `boolean` | `false` | Follow links to external domains. | | `allowSubdomains` | `boolean` | `false` | Follow subdomains of the main domain. | | `delay` | `number` (s) | — | Delay between scrapes. Setting this forces concurrency to 1. | #### Sitemap and deduplication | Parameter | Type | Default | Description | | ------------------------ | --------- | ----------- | ------------------------------------------------------------------------------------------------------- | | `sitemap` | `string` | `"include"` | `"include"`: use sitemap + link discovery. `"skip"`: ignore sitemap. `"only"`: crawl only sitemap URLs. | | `deduplicateSimilarURLs` | `boolean` | `true` | Normalize URL variants (`www.`, `https`, trailing slashes, `index.html`) as duplicates. | | `ignoreQueryParameters` | `boolean` | `false` | Strip query strings before deduplication (e.g. `/page?a=1` and `/page?a=2` become one URL). | #### Scrape options for crawl | Parameter | Type | Default | Description | | --------------- | -------- | --------------------------- | ---------------------------------------------------------------------------- | | `scrapeOptions` | `object` | `{ formats: ["markdown"] }` | Per-page scrape config. Accepts all [scrape options](#scrape-options) above. | ### Crawl example ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "includePaths": ["^/blog/.$", "^/docs/.$"], "excludePaths": ["^/admin/.$", "^/private/.$"], "maxDiscoveryDepth": 2, "limit": 1000 }' ``` ## Mapping website links The `/v2/map` endpoint identifies URLs related to a given website. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ### Map options | Parameter | Type | Default | Description | | ------------------- | --------- | ----------- | ----------------------------------- | | `search` | `string` | — | Filter links by text match. | | `limit` | `integer` | `100` | Max links to return. | | `sitemap` | `string` | `"include"` | `"include"`, `"skip"`, or `"only"`. | | `includeSubdomains` | `boolean` | `true` | Include subdomains. | Here is the API Reference for it: [Map Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/map) ## Whitelisting Firecrawl ### Allowing Firecrawl to scrape your website * User Agent: Allow `FirecrawlAgent` in your firewall or security rules. * IP addresses: Firecrawl does not use a fixed set of outbound IPs. ### Allowing your application to call the Firecrawl API If your firewall blocks outbound requests from your application to external services, you need to whitelist Firecrawl's API server IP address so your application can reach the Firecrawl API (`api.firecrawl.dev`): * IP Address: `35.245.250.27` Add this IP to your firewall's outbound allowlist so your backend can send scrape, crawl, map, and agent requests to Firecrawl. # Billing Source: https://docs.firecrawl.dev/billing How Firecrawl billing, credits, and plans work ## Overview Firecrawl uses a credit-based billing system. Every API call you make consumes credits, and the number of credits consumed depends on the endpoint and options you use. You get a monthly credit allotment based on your plan, and can purchase additional credits via auto-recharge packs if you need more. For current plan pricing, visit the [Firecrawl pricing page](https://www.firecrawl.dev/pricing). ## Credits Credits are the unit of usage in Firecrawl. Each plan includes a monthly credit allotment that resets at the start of each billing cycle. Different API endpoints consume different amounts of credits. ### Credit costs per endpoint | Endpoint | Credit Cost | Notes | | ----------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Scrape | 1 credit / page | Convert a single URL into clean markdown, HTML, or structured data. Additional credits apply when using scrape options (see below). | | Crawl | 1 credit / page | Scrape an entire website by following links from a starting URL. The same per-page scrape option costs apply to each page crawled. | | Map | 1 credit / call | Discover all URLs on a website without scraping their content. | | Search | 2 credits / 10 results | Search the web and optionally scrape the results. Rounded up per 10 results (e.g., 11 results = 4 credits). Additional per-page scrape costs apply to each result that is scraped. Enterprise ZDR search costs 10 credits / 10 results (see [ZDR Search](/features/search#zero-data-retention-zdr)). | | Browser | 2 credits / browser minute | Interactive browser sandbox session, billed per minute. | | Agent | Dynamic | Autonomous web research agent. 5 daily runs free; usage-based pricing beyond that. | ### Additional credit costs for scrape options Certain scrape options add credits on top of the base cost per page: | Option | Additional Cost | Description | | ---------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------ | | PDF parsing | +1 credit / PDF page | Extract content from PDF documents | | JSON format (LLM extraction) | +4 credits / page | Use an LLM to extract structured JSON data from the page | | Enhanced Mode | +4 credits / page | Improved scraping for pages that are difficult to access | | Zero Data Retention (ZDR) | +1 credit / page | Ensures no data is persisted beyond the request (see [Scrape ZDR](/features/scrape#zero-data-retention-zdr)) | These modifiers stack. For example, scraping a page with both JSON format and Enhanced Mode costs 1 + 4 + 4 = 9 credits per page. These same modifiers apply to the Crawl and Search endpoints since they use scrape internally for each page. ### When credits are charged For batch scrape and crawl jobs, credits are billed asynchronously as each page completes processing — not when the job is submitted. This means there can be a delay between submitting a job and seeing the full credit cost reflected on your account. If a batch contains many URLs or pages are queued during high-traffic periods, credits may continue to appear minutes or hours after submission. Polling or checking batch status does not consume credits. ### Tracking your usage You can monitor your credit usage in two ways: * Dashboard: View your current and historical usage at [firecrawl.dev/app](https://www.firecrawl.dev/app) * API: Use the [Credit Usage](/api-reference/endpoint/credit-usage) and [Credit Usage Historical](/api-reference/endpoint/credit-usage-historical) endpoints to programmatically check your usage We are actively working on improvements to make credit usage easier to understand. Stay tuned for updates. ## Plans Firecrawl offers subscription-based monthly plans. There is no pure pay-as-you-go option, but auto-recharge (described below) provides flexible scaling on top of your base plan. ### Available plans | Plan | Monthly Credits | Concurrent Browsers | | ------------ | --------------- | ------------------: | | Free | 500 (one-time) | 2 | | Hobby | 3,000 | 5 | | Standard | 100,000 | 50 | | Growth | 500,000 | 100 | For high-volume needs beyond the Growth plan, Firecrawl offers Scale and Enterprise plans with 1M+ credits, 150+ concurrent browsers, dedicated support, SLAs, bulk discounts, zero-data retention, and SSO. Visit the [Enterprise page](https://www.firecrawl.dev/enterprise) for details. All paid plans are available with monthly or yearly billing. Yearly billing offers a discount compared to paying month-to-month. For current pricing on each plan, visit the [pricing page](https://www.firecrawl.dev/pricing). ### Concurrent browsers Concurrent browsers represent how many web pages Firecrawl can process for you simultaneously. Your plan determines this limit. If you exceed it, additional jobs wait in a queue until a slot opens. See [Rate Limits](/rate-limits) for full details on concurrency and API rate limits. ## Auto-Recharge If you occasionally need more credits than your plan includes, you can enable auto-recharge from the dashboard. When your remaining credits drop below a threshold, Firecrawl automatically purchases an additional credit pack and adds it to your balance. By default the threshold is 0 credits, but you can customize it in your [billing settings](https://www.firecrawl.dev/app/settings?tab=billing) to trigger recharges earlier — for example, setting a higher threshold ensures you always have a buffer of credits available, which is useful if you run large crawls. Note that this check occurs when you make an API request — your balance will not be topped up until your next request triggers the recharge. * Auto-recharge packs are available on all paid plans * Pack sizes and prices vary by plan (visible on the [pricing page](https://www.firecrawl.dev/pricing)) * You can toggle auto-recharge on or off at any time * Auto-recharge is limited to 4 packs per month * Credits from auto-recharge packs do not reset monthly — they persist across billing cycles and expire after 1 year from purchase, unlike your monthly plan credits which reset each period. Auto-recharge is best for handling occasional spikes in usage. If you find yourself consistently exceeding your plan's allotment, upgrading to a higher plan will give you better value per credit. ## Coupons Firecrawl supports two types of coupons: * Subscription coupons apply a discount to your plan subscription (e.g. a percentage off your monthly or yearly price). These can only be applied during the Stripe checkout flow when you first subscribe to a paid plan or change plans. You cannot apply a subscription coupon after checkout has completed. * Credit coupons add bonus credits to your account. These can be redeemed from the Billing section of your dashboard at [firecrawl.dev/app/billing](https://www.firecrawl.dev/app/billing). Look for the coupon input field on the billing page to apply your code. Bonus credits from credit coupons are separate from your plan's monthly allotment and persist even if you upgrade or downgrade your plan. If you have a coupon code and are unsure which type it is, try applying it in the billing section of your dashboard first. If it is a subscription coupon, you will need to use it at the Stripe checkout page instead. ## Billing Cycle * Monthly plans: Credits reset on your monthly renewal date * Yearly plans: You are billed annually, but credits still reset each month on your virtual monthly renewal date * Unused plan credits do not roll over — your monthly allotment resets to the plan amount at the start of each billing period. Credits from auto-recharge packs are not tied to your billing cycle — they persist and expire 1 year from the date of purchase. ## Upgrading and Downgrading * Upgrades take effect immediately. You are billed a prorated amount for the remainder of the current billing period, and your credit allotment and limits update right away. * Downgrades are scheduled to take effect at your next renewal date. You keep your current plan's credits and limits until then. ## What Happens When You Run Out of Credits If you exhaust your credit allotment and do not have auto-recharge enabled, API requests that consume credits will return an HTTP 402 (Payment Required) error. If you have auto-recharge enabled, usage will continue while recharge packs are purchased automatically — but if the monthly pack limit is reached or a recharge fails, your balance may go negative until the next billing cycle. To resume usage after a hard stop, you can: 1. Enable auto-recharge to automatically purchase more credits 2. Upgrade to a higher plan 3. Wait for your credits to reset at the next billing cycle ## Free Plan The Free plan provides a one-time allotment of 500 credits with no credit card required. These credits do not renew — once they are used up, you will need to upgrade to a paid plan to continue using Firecrawl. The Free plan also has lower rate limits and concurrency compared to paid plans (see [Rate Limits](/rate-limits)). ## FAQs Plan credits do not roll over — your monthly allotment resets at the start of each billing period. However, credits purchased through auto-recharge packs are not tied to your billing cycle and carry forward. Auto-recharge credits expire 1 year from the date of purchase. Check the dashboard at [firecrawl.dev/app](https://www.firecrawl.dev/app), or call the [Credit Usage API endpoint](/api-reference/endpoint/credit-usage) programmatically. Credits are the billing unit for Firecrawl API calls. Tokens refer to LLM tokens used internally by endpoints like Extract and Agent. Each credit is equivalent to 15 tokens. For most users, you only need to think in terms of credits. It depends on the coupon type. Credit coupons are applied in the Billing section of your dashboard. Subscription coupons (discounts on your plan price) can only be applied at the Stripe checkout page when subscribing or changing plans. Reach out to [help@firecrawl.dev](mailto:help@firecrawl.dev), or visit the [Enterprise page](https://www.firecrawl.dev/enterprise) to learn more about custom plans. # Running Locally Source: https://docs.firecrawl.dev/contributing/guide Set up Firecrawl on your local machine for development and contribution. This guide walks you through running the Firecrawl API server on your local machine. Follow these steps to set up the development environment, start the services, and send your first request. If you're contributing, the process follows standard open-source conventions: fork the repo, make changes, run tests, and open a pull request. For questions or help getting started, reach out to [help@firecrawl.com](mailto:help@firecrawl.com) or [submit an issue](https://github.com/mendableai/firecrawl/issues). ## Prerequisites Install the following before proceeding: | Dependency | Required | Install guide | | ---------- | -------- | ------------------------------------------------------------------------------------- | | Node.js | Yes | [nodejs.org](https://nodejs.org/en/learn/getting-started/how-to-install-nodejs) | | pnpm (v9+) | Yes | [pnpm.io](https://pnpm.io/installation) | | Redis | Yes | [redis.io](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/) | | PostgreSQL | Yes | Via Docker (see below) or installed directly | | Docker | Optional | Required for the PostgreSQL container setup | ## Set up the database You need a PostgreSQL database initialized with the schema at `apps/nuq-postgres/nuq.sql`. The easiest approach is to use the Docker image inside `apps/nuq-postgres`. With Docker running, build and start the container: ```bash theme={null} docker build -t nuq-postgres apps/nuq-postgres ``` ```bash theme={null} docker run --name nuqdb \ -e POSTGRES_PASSWORD=postgres \ -p 5433:5432 \ -v nuq-data:/var/lib/postgresql/data \ -d nuq-postgres ``` ## Configure environment variables Copy the template to create your `.env` file in the `apps/api/` directory: ```bash theme={null} cp apps/api/.env.example apps/api/.env ``` For a minimal local setup without authentication or optional sub-services (PDF parsing, JS blocking, AI features), use the following configuration: ```bash apps/api/.env theme={null} # ===== Required ===== NUM_WORKERS_PER_QUEUE=8 PORT=3002 HOST=0.0.0.0 REDIS_URL=redis://localhost:6379 REDIS_RATE_LIMIT_URL=redis://localhost:6379 ## To turn on DB authentication, you need to set up supabase. USE_DB_AUTHENTICATION=false ## PostgreSQL connection for queuing — change if credentials, host, or DB differ NUQ_DATABASE_URL=postgres://postgres:postgres@localhost:5433/postgres # ===== Optional ===== # SUPABASE_ANON_TOKEN= # SUPABASE_URL= # SUPABASE_SERVICE_TOKEN= # TEST_API_KEY= # Set if you've configured authentication and want to test with a real API key # OPENAI_API_KEY= # Required for LLM-dependent features (image alt generation, etc.) # BULL_AUTH_KEY=@ # PLAYWRIGHT_MICROSERVICE_URL= # Set to run a Playwright fallback # LLAMAPARSE_API_KEY= # Set to parse PDFs with LlamaParse # SLACK_WEBHOOK_URL= # Set to send Slack server health status messages # POSTHOG_API_KEY= # Set to send PostHog events like job logs # POSTHOG_HOST= # Set to send PostHog events like job logs ``` ## Install dependencies From the `apps/api/` directory, install packages with pnpm: ```bash theme={null} cd apps/api pnpm install ``` ## Start the services You need three terminal sessions running simultaneously: Redis, the API server, and a terminal for sending requests. ### Terminal 1 — Redis Start the Redis server from anywhere in the project: ```bash theme={null} redis-server ``` ### Terminal 2 — API server Navigate to `apps/api/` and start the service: ```bash theme={null} pnpm start ``` This starts the API server and the workers responsible for processing crawl jobs. If you plan to use the [LLM extract feature](https://github.com/firecrawl/firecrawl/pull/586/), export your OpenAI key first: `export OPENAI_API_KEY=sk-...` ### Terminal 3 — Send a test request Verify the server is running with a health check: ```bash theme={null} curl -X GET http://localhost:3002/test ``` This should return `Hello, world!`. To test the crawl endpoint: ```bash theme={null} curl -X POST http://localhost:3002/v1/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://mendable.ai" }' ``` ## Alternative: Docker Compose For a simpler setup, Docker Compose runs all services (Redis, API server, and workers) in a single command. 1. Make sure Docker and Docker Compose are installed. 2. Copy `.env.example` to `.env` in the `apps/api/` directory and configure as needed. 3. From the project root, run: ```bash theme={null} docker compose up ``` This starts all services automatically in the correct configuration. ## Running tests Run the test suite with: ```bash theme={null} npm run test:snips ``` # Open Source vs Cloud Source: https://docs.firecrawl.dev/contributing/open-source-or-cloud Understand the differences between Firecrawl's open-source and cloud offerings Firecrawl is open source under the [AGPL-3.0 license](https://github.com/firecrawl/firecrawl/blob/main/LICENSE). To deliver the best possible product, we also offer a hosted cloud version at [firecrawl.dev](https://firecrawl.dev) with additional features and managed infrastructure. ## Feature comparison The open-source version includes the core scraping and crawling engine. Firecrawl Cloud adds infrastructure-level capabilities, a management dashboard, and enterprise features on top of everything in the open-source edition. | Feature | Open Source | Cloud | | ------------------- | :---------: | :---: | | Scrape | ✔ | ✔ | | Crawl | ✔ | ✔ | | Map | ✔ | ✔ | | Search | ✔ | ✔ | | Batch scrape | ✔ | ✔ | | Extract | ✔ | ✔ | | JSON mode | ✔ | ✔ | | LLM-ready formats | ✔ | ✔ | | Change tracking | ✔ | ✔ | | SDKs | ✔ | ✔ | | Agent | ✘ | ✔ | | Browser sandbox | ✘ | ✔ | | Actions | ✘ | ✔ | | Enhanced proxies | ✘ | ✔ | | Proxy rotations | ✘ | ✔ | | Dashboard | ✘ | ✔ | | Enterprise features | ✘ | ✔ | ## Why both? The cloud solution allows us to continuously innovate and maintain a high-quality, sustainable service for all users. Revenue from Firecrawl Cloud funds ongoing development of the open-source project. * Open source — Run Firecrawl on your own infrastructure with full control over the core scraping engine. See [Running locally](/contributing/guide) and [Self-hosting](/contributing/self-host). * Cloud — Get started immediately with a managed service that includes enhanced proxies, a usage dashboard, and enterprise features. Sign up at [firecrawl.dev](https://firecrawl.dev). # Self-hosting Source: https://docs.firecrawl.dev/contributing/self-host Learn how to self-host Firecrawl to run on your own and contribute to the project. #### Contributor? Welcome to [Firecrawl](https://firecrawl.dev) 🔥! Here are some instructions on how to get the project locally so you can run it on your own and contribute. If you're contributing, note that the process is similar to other open-source repos, i.e., fork Firecrawl, make changes, run tests, PR. If you have any questions or would like help getting on board, join our Discord community [here](https://discord.gg/firecrawl) for more information or submit an issue on Github [here](https://github.com/firecrawl/firecrawl/issues/new/choose)! ## Self-hosting Firecrawl Refer to [SELF\_HOST.md](https://github.com/firecrawl/firecrawl/blob/main/SELF_HOST.md) for instructions on how to run it locally. ## Why? Self-hosting Firecrawl is particularly beneficial for organizations with stringent security policies that require data to remain within controlled environments. Here are some key reasons to consider self-hosting: * Enhanced Security and Compliance: By self-hosting, you ensure that all data handling and processing complies with internal and external regulations, keeping sensitive information within your secure infrastructure. Note that Firecrawl is a Mendable product and relies on SOC2 Type2 certification, which means that the platform adheres to high industry standards for managing data security. * Customizable Services: Self-hosting allows you to tailor the services, such as the Playwright service, to meet specific needs or handle particular use cases that may not be supported by the standard cloud offering. * Learning and Community Contribution: By setting up and maintaining your own instance, you gain a deeper understanding of how Firecrawl works, which can also lead to more meaningful contributions to the project. ### Considerations However, there are some limitations and additional responsibilities to be aware of: 1. Limited Access to Fire-engine: Currently, self-hosted instances of Firecrawl do not have access to Fire-engine, which includes advanced features for handling IP blocks, robot detection mechanisms, and more. This means that while you can manage basic scraping tasks, more complex scenarios might require additional configuration or might not be supported. 2. Manual Configuration Required: If you need to use scraping methods beyond the basic fetch and Playwright options, you will need to manually configure these in the `.env` file. This requires a deeper understanding of the technologies and might involve more setup time. \| Capability | Cloud | Self-hosting | \| --- | --- | --- | \| All API endpoints supported | Yes | Not always; `/agent` and `/browser` are not supported in self-hosting | \| Screenshot support | Yes | Yes, when the Playwright service is running | \| Local LLMs (e.g., Ollama) | Not supported | Supported via `OLLAMA_BASE_URL` (experimental) | Self-hosting Firecrawl is ideal for those who need full control over their scraping and data processing environments but comes with the trade-off of additional maintenance and configuration efforts. ## Steps 1. First, start by installing the dependencies * Docker [instructions](https://docs.docker.com/get-docker/) 2. Set environment variables Create an `.env` in the root directory you can copy over the template in `apps/api/.env.example` To start, we wont set up authentication, or any optional sub services (pdf parsing, JS blocking support, AI features) ``` # .env # ===== Required ENVS ====== PORT=3002 HOST=0.0.0.0 # Note: PORT is used by both the main API server and worker liveness check endpoint # To turn on DB authentication, you need to set up Supabase. USE_DB_AUTHENTICATION=false # ===== Optional ENVS ====== ## === AI features (JSON format on scrape, /extract API) === # Provide your OpenAI API key here to enable AI features # OPENAI_API_KEY= # Experimental: Use Ollama # OLLAMA_BASE_URL=http://localhost:11434/api # MODEL_NAME=deepseek-r1:7b # MODEL_EMBEDDING_NAME=nomic-embed-text # Experimental: Use any OpenAI-compatible API # OPENAI_BASE_URL=https://example.com/v1 # OPENAI_API_KEY= ## === Proxy === # PROXY_SERVER can be a full URL (e.g. http://0.1.2.3:1234) or just an IP and port combo (e.g. 0.1.2.3:1234) # Do not uncomment PROXY_USERNAME and PROXY_PASSWORD if your proxy is unauthenticated # PROXY_SERVER= # PROXY_USERNAME= # PROXY_PASSWORD= ## === /search API === # You can specify a SearXNG server with the JSON format enabled, if you'd like to use that instead of direct Google. # You can also customize the engines and categories parameters, but the defaults should also work just fine. # SEARXNG_ENDPOINT=http://your.searxng.server # SEARXNG_ENGINES= # SEARXNG_CATEGORIES= ## === Other === # Supabase Setup (used to support DB authentication, advanced logging, etc.) # SUPABASE_ANON_TOKEN= # SUPABASE_URL= # SUPABASE_SERVICE_TOKEN= # Use if you've set up authentication and want to test with a real API key # TEST_API_KEY= # This key lets you access the queue admin panel. Change this if your deployment is publicly accessible. BULL_AUTH_KEY=CHANGEME # This is now autoconfigured by the docker-compose.yaml. You shouldn't need to set it. # PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape # REDIS_URL=redis://redis:6379 # REDIS_RATE_LIMIT_URL=redis://redis:6379 # Set if you have a llamaparse key you'd like to use to parse pdfs # LLAMAPARSE_API_KEY= # Set if you'd like to send server health status messages to Slack # SLACK_WEBHOOK_URL= # Set if you'd like to send posthog events like job logs # POSTHOG_API_KEY= # POSTHOG_HOST= ## === System Resource Configuration === # Maximum CPU usage threshold (0.0-1.0). Worker will reject new jobs when CPU usage exceeds this value. # Default: 0.8 (80%) # MAX_CPU=0.8 # Maximum RAM usage threshold (0.0-1.0). Worker will reject new jobs when memory usage exceeds this value. # Default: 0.8 (80%) # MAX_RAM=0.8 # Set if you'd like to allow local webhooks to be sent to your self-hosted instance # ALLOW_LOCAL_WEBHOOKS=true ``` The following AI features require an LLM provider configured (e.g., `OPENAI_API_KEY` or alternatives in the AI features section above): * JSON format on scrape * /extract API * Summary format * Branding format * Change tracking format 3. (Optional) Running with TypeScript Playwright Service * Update the `docker-compose.yml` file to change the Playwright service: ```plaintext theme={null} build: apps/playwright-service ``` TO ```plaintext theme={null} build: apps/playwright-service-ts ``` * Set the `PLAYWRIGHT_MICROSERVICE_URL` in your `.env` file: ```plaintext theme={null} PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/scrape ``` * Don't forget to set the proxy server in your `.env` file as needed. 4. Build and run the Docker containers: ```bash theme={null} docker compose build docker compose up ``` This will run a local instance of Firecrawl which can be accessed at `http://localhost:3002`. You should be able to see the Bull Queue Manager UI on `http://localhost:3002/admin/{BULL_AUTH_KEY}/queues`. 5. (Optional) Test the API If you’d like to test the crawl endpoint, you can run this: ```bash theme={null} curl -X POST http://localhost:3002/v2/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ## Troubleshooting This section provides solutions to common issues you might encounter while setting up or running your self-hosted instance of Firecrawl. ### Supabase client is not configured Symptom: ```bash theme={null} [YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Attempted to access Supabase client when it's not configured. [YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Error inserting scrape event: Error: Supabase client is not configured. ``` Explanation: This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances. ### You're bypassing authentication Symptom: ```bash theme={null} [YYYY-MM-DDTHH:MM:SS.SSSz]WARN - You're bypassing authentication ``` Explanation: This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances. ### Docker containers fail to start Symptom: Docker containers exit unexpectedly or fail to start. Solution: Check the Docker logs for any error messages using the command: ```bash theme={null} docker logs [container_name] ``` * Ensure all required environment variables are set correctly in the .env file. * Verify that all Docker services defined in docker-compose.yml are correctly configured and the necessary images are available. ### Connection issues with Redis Symptom: Errors related to connecting to Redis, such as timeouts or "Connection refused". Solution: * Ensure that the Redis service is up and running in your Docker environment. * Verify that the REDIS\_URL and REDIS\_RATE\_LIMIT\_URL in your .env file point to the correct Redis instance. * Check network settings and firewall rules that may block the connection to the Redis port. ### API endpoint does not respond Symptom: API requests to the Firecrawl instance timeout or return no response. Solution: * Ensure that the Firecrawl service is running by checking the Docker container status. * Verify that the PORT and HOST settings in your .env file are correct and that no other service is using the same port. * Check the network configuration to ensure that the host is accessible from the client making the API request. By addressing these common issues, you can ensure a smoother setup and operation of your self-hosted Firecrawl instance. ## Install Firecrawl on a Kubernetes Cluster (Simple Version) Read the [examples/kubernetes-cluster-install/README.md](https://github.com/firecrawl/firecrawl/tree/main/examples/kubernetes/cluster-install#readme) for instructions on how to install Firecrawl on a Kubernetes Cluster. # Overview Source: https://docs.firecrawl.dev/dashboard Overview of the Firecrawl dashboard and its key features The [Firecrawl dashboard](https://www.firecrawl.dev/app) is where you manage your account, test endpoints, and monitor usage. Below is a quick tour of each section. ## Playground The playground lets you try Firecrawl endpoints directly in the browser before integrating them into your code. * [Scrape](https://www.firecrawl.dev/app/playground?endpoint=scrape) — Extract content from a single page. * [Search](https://www.firecrawl.dev/app/playground?endpoint=search) — Search the web and get scraped results. * [Crawl](https://www.firecrawl.dev/app/playground?endpoint=crawl) — Crawl an entire website and extract content from every page. * [Map](https://www.firecrawl.dev/app/playground?endpoint=map) — Discover all URLs on a website. ## Browser [Interact with the web](https://www.firecrawl.dev/app/browser) through a live browser session. You can create persistent profiles, run actions, and take screenshots — useful for pages that require authentication or complex interaction. ## Agent The [Agent](https://www.firecrawl.dev/app/agent) is an AI-powered research tool that can autonomously browse the web, follow links, and extract structured data based on a prompt. ## Activity Logs [Activity Logs](https://www.firecrawl.dev/app/logs) show a history of your recent API requests, including status, duration, and credits consumed. ## Usage The [Usage](https://www.firecrawl.dev/app/usage) page shows your credit consumption over time and current billing-cycle totals. ## API Keys From the [API Keys](https://www.firecrawl.dev/app/api-keys) page you can create, view, and revoke API keys for your team. Your account must always have at least one active API key, so to revoke a key you will need to create a new one first. ## Settings The [Settings](https://www.firecrawl.dev/app/settings) page has three tabs: * Team — Invite members, assign roles, and manage your team. See [Team management & roles](#team-management--roles) below. * Billing — View your current plan, invoices, auto-recharge settings, and apply coupons. See also [Billing](/billing). * Advanced — Webhook signing secret and team deletion. * ## Team management & roles Firecrawl lets you invite teammates to collaborate under a shared account. From the Team tab in Settings you can invite members, assign roles, and manage your team. ### Roles Every team member is assigned one of two roles: Admin or Member. You choose the role when sending an invitation. | Capability | Admin | Member | | -------------------------------------------- | :---: | :----: | | General | | | | Use the team's API keys and shared resources | ✓ | ✓ | | Team management | | | | View the team member list | ✓ | ✓ | | Leave the team | ✓ | ✓ | | Invite new team members | ✓ | ✗ | | Remove team members | ✓ | ✗ | | Change a member's role | ✓ | ✗ | | Revoke pending invitations | ✓ | ✗ | | Edit the team name | ✓ | ✗ | | Billing | | | | View invoices and usage | ✓ | ✓ | | Apply credit coupons | ✓ | ✓ | | Manage subscription and billing portal | ✓ | ✗ | | Settings | | | | View the webhook signing secret | ✓ | ✓ | | Regenerate the webhook signing secret | ✓ | ✗ | | Delete the team | ✓ | ✗ | In short, Admins have full control over team management, billing, and settings, while Members** can use the team's resources, view usage, and apply coupons but cannot modify the team or subscription. # Scraping Amazon Source: https://docs.firecrawl.dev/developer-guides/common-sites/amazon Extract product data, prices, and reviews from Amazon using Firecrawl Amazon is one of the most scraped e-commerce sites. This guide shows you how to effectively extract product data, pricing, reviews, and search results using Firecrawl's powerful features. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Overview When scraping Amazon, you'll typically want to: * Extract product information (title, price, availability) * Get customer reviews and ratings * Monitor price changes * Search for products programmatically * Track competitor listings ## Scrape with JSON Mode Extract structured product data using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; // Define Zod schema const ProductSchema = z.object({ title: z.string(), price: z.string(), rating: z.number(), availability: z.string(), features: z.array(z.string()) }); const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.amazon.com/dp/B0DZZWMB2L', { formats: [{ type: 'json', schema: z.toJSONSchema(ProductSchema) }], }); // Parse and validate with Zod const jsonData = typeof result.json === 'string' ? JSON.parse(result.json) : result.json; const validated = ProductSchema.parse(jsonData); console.log('✅ Validated product data:'); console.log(validated); ``` ## Search Find products on Amazon. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('gaming laptop site:amazon.com', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single Amazon product page. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.amazon.com/ASUS-ROG-Strix-Gaming-Laptop/dp/B0DZZWMB2L', { formats: ['markdown'], // i.e. html, links, etc. onlyMainContent: true }); console.log(result); ``` ## Map Discover all available URLs on Amazon product or category pages. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from Amazon category or search results. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://www.amazon.com/s?k=mechanical+keyboards', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple Amazon product URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://www.amazon.com/ASUS-ROG-Strix-Gaming-Laptop/dp/B0DZZWMB2L', 'https://www.amazon.com/Razer-Blade-Gaming-Laptop-Lightweight/dp/B0FP47DNFQ', 'https://www.amazon.com/HP-2025-Omen-Gaming-Laptop/dp/B0FL4RMGSH'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Scraping Etsy Source: https://docs.firecrawl.dev/developer-guides/common-sites/etsy Extract handmade products, shop data, and pricing from Etsy marketplace Etsy is a global marketplace for unique and creative goods. This guide shows you how to extract product listings, shop information, reviews, and trending items using Firecrawl. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Overview When scraping Etsy, you'll typically want to: * Extract product listings and variations * Get shop information and ratings * Monitor trending items and categories * Track pricing and sales data * Extract customer reviews ## Scrape with JSON Mode Extract structured listing data using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; // Define Zod schema const ListingSchema = z.object({ title: z.string(), price: z.string(), shopName: z.string(), rating: z.number() }); const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.etsy.com/listing/1844315896/handmade-925-sterling-silver-jewelry-set', { formats: [{ type: 'json', schema: z.toJSONSchema(ListingSchema) }], }); // Parse and validate with Zod const jsonData = typeof result.json === 'string' ? JSON.parse(result.json) : result.json; const validated = ListingSchema.parse(jsonData); console.log('✅ Validated listing data:'); console.log(validated); ``` ## Search Find products on Etsy marketplace. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('handmade jewelry site:etsy.com', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single Etsy product listing. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.etsy.com/listing/1844315896/handmade-925-sterling-silver-jewelry-set', { formats: ['markdown'], // i.e. html, links, etc. onlyMainContent: true }); console.log(result); ``` ## Map Discover all available URLs in an Etsy shop or category. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://www.etsy.com/shop/YourShopName'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from Etsy shop or category. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://www.etsy.com/c/jewelry', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple Etsy listing URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://www.etsy.com/listing/1844315896/handmade-925-sterling-silver-jewelry-set', 'https://www.etsy.com/market/handmade_jewelry', 'https://www.etsy.com/market/jewelry_handmade'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Scraping GitHub Source: https://docs.firecrawl.dev/developer-guides/common-sites/github Learn how to scrape GitHub using Firecrawl's core features Learn how to use Firecrawl's core features to scrape GitHub repositories, issues, and documentation. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Scrape with JSON Mode Extract structured data from repositories using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', { formats: [{ type: 'json', schema: z.object({ name: z.string(), description: z.string(), stars: z.number(), forks: z.number(), language: z.string(), topics: z.array(z.string()) }) }] }); console.log(result.json); ``` ## Search Find repositories, issues, or documentation on GitHub. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('machine learning site:github.com', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single GitHub page - repository, issue, or file. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', { formats: ['markdown'] // i.e. html, links, etc. }); console.log(result); ``` ## Map Discover all available URLs in a repository or documentation site. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://github.com/vercel/next.js/tree/canary/docs'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from a repository or documentation. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://github.com/facebook/react/wiki', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple GitHub URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://github.com/vercel/next.js', 'https://github.com/facebook/react', 'https://github.com/microsoft/typescript'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` ## Batch Scrape with JSON Mode Extract structured data from multiple repositories at once. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://github.com/vercel/next.js', 'https://github.com/facebook/react'], { options: { formats: [{ type: 'json', schema: z.object({ name: z.string(), description: z.string(), stars: z.number(), language: z.string() }) }] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Scraping Wikipedia Source: https://docs.firecrawl.dev/developer-guides/common-sites/wikipedia Extract articles, infoboxes, and build knowledge graphs from Wikipedia Learn how to effectively scrape Wikipedia for research, knowledge extraction, and building AI applications. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Use Cases * Research automation and fact-checking * Building knowledge graphs * Multi-language content extraction * Educational content aggregation * Entity information extraction ## Scrape with JSON Mode Extract structured data from Wikipedia articles using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://en.wikipedia.org/wiki/JavaScript', { formats: [{ type: 'json', schema: z.object({ name: z.string(), creator: z.string(), firstAppeared: z.string(), typingDiscipline: z.string(), website: z.string() }) }] }); console.log(result.json); ``` ## Search Find articles on Wikipedia. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('quantum computing site:en.wikipedia.org', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single Wikipedia article. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://en.wikipedia.org/wiki/Artificial_intelligence', { formats: ['markdown'], // i.e. html, links, etc. onlyMainContent: true }); console.log(result); ``` ## Map Discover all available URLs in a Wikipedia portal or category. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://en.wikipedia.org/wiki/Portal:Computer_science'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from Wikipedia documentation or categories. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://en.wikipedia.org/wiki/Portal:Artificial_intelligence', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple Wikipedia URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://en.wikipedia.org/wiki/Machine_learning', 'https://en.wikipedia.org/wiki/Artificial_intelligence', 'https://en.wikipedia.org/wiki/Deep_learning'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Building an AI Research Assistant with Firecrawl and AI SDK Source: https://docs.firecrawl.dev/developer-guides/cookbooks/ai-research-assistant-cookbook Build a complete AI-powered research assistant with web scraping and search capabilities Build a complete AI-powered research assistant that can scrape websites and search the web to answer questions. The assistant automatically decides when to use web scraping or search tools to gather information, then provides comprehensive answers based on collected data. ## What You'll Build An AI chat interface where users can ask questions about any topic. The AI assistant automatically decides when to use web scraping or search tools to gather information, then provides comprehensive answers based on the data it collects. ## Prerequisites * Node.js 18 or later installed * An OpenAI API key from [platform.openai.com](https://platform.openai.com) * A Firecrawl API key from [firecrawl.dev](https://firecrawl.dev) * Basic knowledge of React and Next.js Start by creating a fresh Next.js application and navigating into the project directory: ```bash theme={null} npx create-next-app@latest ai-sdk-firecrawl && cd ai-sdk-firecrawl ``` When prompted, select the following options: * TypeScript: Yes * ESLint: Yes * Tailwind CSS: Yes * App Router: Yes * Use `src/` directory: No * Import alias: Yes (@/\) ### Install AI SDK Packages The AI SDK is a TypeScript toolkit that provides a unified API for working with different LLM providers: ```bash theme={null} npm i ai @ai-sdk/react zod ``` These packages provide: `ai`: Core SDK with streaming, tool calling, and response handling * `@ai-sdk/react`: React hooks like `useChat` for building chat interfaces * `zod`: Schema validation for tool inputs Learn more at [ai-sdk.dev/docs](https://ai-sdk.dev/docs). ### Install AI Elements AI Elements provides pre-built UI components for AI applications. Run the following command to scaffold all the necessary components: ```bash theme={null} npx ai-elements@latest ``` This sets up AI Elements in your project, including conversation components, message displays, prompt inputs, and tool call visualizations. Documentation: [ai-sdk.dev/elements/overview](https://ai-sdk.dev/elements/overview). ### Install OpenAI Provider Install the OpenAI provider to connect with OpenAI's models: ```bash theme={null} npm install @ai-sdk/openai ``` Create the main page at `app/page.tsx` and copy the code from the Code tab below. This will be the chat interface where users interact with the AI assistant. ```typescript app/page.tsx theme={null} "use client"; import { Conversation, ConversationContent, ConversationScrollButton, } from "@/components/ai-elements/conversation"; import { PromptInput, PromptInputActionAddAttachments, PromptInputActionMenu, PromptInputActionMenuContent, PromptInputActionMenuTrigger, PromptInputAttachment, PromptInputAttachments, PromptInputBody, PromptInputButton, PromptInputHeader, type PromptInputMessage, PromptInputSelect, PromptInputSelectContent, PromptInputSelectItem, PromptInputSelectTrigger, PromptInputSelectValue, PromptInputSubmit, PromptInputTextarea, PromptInputFooter, PromptInputTools, } from "@/components/ai-elements/prompt-input"; import { MessageResponse, Message, MessageContent, MessageActions, MessageAction, } from "@/components/ai-elements/message"; import { Fragment, useState } from "react"; import { useChat } from "@ai-sdk/react"; import type { ToolUIPart } from "ai"; import { Tool, ToolContent, ToolHeader, ToolInput, ToolOutput, } from "@/components/ai-elements/tool"; import { CopyIcon, GlobeIcon, RefreshCcwIcon } from "lucide-react"; import { Source, Sources, SourcesContent, SourcesTrigger, } from "@/components/ai-elements/sources"; import { Reasoning, ReasoningContent, ReasoningTrigger, } from "@/components/ai-elements/reasoning"; import { Loader } from "@/components/ai-elements/loader"; const models = [ { name: "GPT 5 Mini (Thinking)", value: "gpt-5-mini", }, { name: "GPT 4o Mini", value: "gpt-4o-mini", }, ]; const ChatBotDemo = () => { const [input, setInput] = useState(""); const [model, setModel] = useState(models[0].value); const [webSearch, setWebSearch] = useState(false); const { messages, sendMessage, status, regenerate } = useChat(); const handleSubmit = (message: PromptInputMessage) => { const hasText = Boolean(message.text); const hasAttachments = Boolean(message.files?.length); if (!(hasText || hasAttachments)) { return; } sendMessage( { text: message.text || "Sent with attachments", files: message.files, }, { body: { model: model, webSearch: webSearch, }, } ); setInput(""); }; return (

{messages.map((message) => (
{message.role === "assistant" && message.parts.filter((part) => part.type === "source-url") .length > 0 && ( part.type === "source-url" ).length } /> {message.parts .filter((part) => part.type === "source-url") .map((part, i) => ( ))} )} {message.parts.map((part, i) => { switch (part.type) { case "text": return ( {part.text} {message.role === "assistant" && i === messages.length - 1 && ( regenerate()} label="Retry" > navigator.clipboard.writeText(part.text) } label="Copy" > )} ); case "reasoning": return ( {part.text} ); default: { if (part.type.startsWith("tool-")) { const toolPart = part as ToolUIPart; return ( ); } return null; } } })}
))} {status === "submitted" && } {(attachment) => } setInput(e.target.value)} value={input} /> setWebSearch(!webSearch)} > Search { setModel(value); }} value={model} > {models.map((model) => ( {model.name} ))}

); }; export default ChatBotDemo; ``` ### Understanding the Frontend The frontend uses AI Elements components to provide a complete chat interface: Key Features: * Conversation Display: The `Conversation` component automatically handles message scrolling and display * Message Rendering: Each message part is rendered based on its type (text, reasoning, tool calls) * Tool Visualization: Tool calls are displayed with collapsible sections showing inputs and outputs * Interactive Controls: Users can toggle web search, select models, and attach files * Message Actions: Copy and retry actions for assistant messages To ensure the markdown from the LLM is correctly rendered, add the following import to your `app/globals.css` file: ```css theme={null} @source "../node_modules/streamdown/dist/index.js"; ``` This imports the necessary styles for rendering markdown content in the message responses. Create the chat API endpoint at `app/api/chat/route.ts`. This route will handle incoming messages and stream responses from the AI. ```typescript theme={null} import { streamText, UIMessage, convertToModelMessages } from "ai"; import { createOpenAI } from "@ai-sdk/openai"; const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY!, }); // Allow streaming responses up to 5 minutes export const maxDuration = 300; export async function POST(req: Request) { const { messages, model, webSearch, }: { messages: UIMessage[]; model: string; webSearch: boolean; } = await req.json(); const result = streamText({ model: openai(model), messages: convertToModelMessages(messages), system: "You are a helpful assistant that can answer questions and help with tasks.", }); // send sources and reasoning back to the client return result.toUIMessageStreamResponse({ sendSources: true, sendReasoning: true, }); } ``` This basic route: * Receives messages from the frontend * Uses the OpenAI model selected by the user * Streams responses back to the client * Doesn't include tools yet - we'll add those next Create a `.env.local` file in your project root: ```bash theme={null} touch .env.local ``` Add your OpenAI API key: ```env theme={null} OPENAI_API_KEY=sk-your-openai-api-key ``` The `OPENAI_API_KEY` is required for the AI model to function. Now you can test the AI SDK chatbot without Firecrawl integration. Start the development server: ```bash theme={null} npm run dev ``` Open [localhost:3000](http://localhost:3000) in your browser and test the basic chat functionality. The assistant should respond to messages, but won't have web scraping or search capabilities yet. Now let's enhance the assistant with web scraping and search capabilities using Firecrawl. ### Install Firecrawl SDK Firecrawl converts websites into LLM-ready formats with scraping and search capabilities: ```bash theme={null} npm i @mendable/firecrawl-js ``` ### Create the Tools File Create a `lib` folder and add a `tools.ts` file inside it: ```bash theme={null} mkdir lib && touch lib/tools.ts ``` Add the following code to define the web scraping and search tools: ```typescript lib/tools.ts theme={null} import FirecrawlApp from "@mendable/firecrawl-js"; import { tool } from "ai"; import { z } from "zod"; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); export const scrapeWebsiteTool = tool({ description: 'Scrape content from any website URL', inputSchema: z.object({ url: z.string().url().describe('The URL to scrape') }), execute: async ({ url }) => { console.log('Scraping:', url); const result = await firecrawl.scrape(url, { formats: ['markdown'], onlyMainContent: true, timeout: 30000 }); console.log('Scraped content preview:', result.markdown?.slice(0, 200) + '...'); return { content: result.markdown }; } }); export const searchWebTool = tool({ description: 'Search the web using Firecrawl', inputSchema: z.object({ query: z.string().describe('The search query'), limit: z.number().optional().describe('Number of results'), location: z.string().optional().describe('Location for localized results'), tbs: z.string().optional().describe('Time filter (qdr:h, qdr:d, qdr:w, qdr:m, qdr:y)'), sources: z.array(z.enum(['web', 'news', 'images'])).optional().describe('Result types'), categories: z.array(z.enum(['github', 'research', 'pdf'])).optional().describe('Filter categories'), }), execute: async ({ query, limit, location, tbs, sources, categories }) => { console.log('Searching:', query); const response = await firecrawl.search(query, { ...(limit && { limit }), ...(location && { location }), ...(tbs && { tbs }), ...(sources && { sources }), ...(categories && { categories }), }) as { web?: Array<{ title?: string; url?: string; description?: string }> }; const results = (response.web || []).map((item) => ({ title: item.title || item.url || 'Untitled', url: item.url || '', description: item.description || '', })); console.log('Search results:', results.length); return { results }; }, }); ``` ### Understanding the Tools Scrape Website Tool: * Accepts a URL as input (validated by Zod schema) * Uses Firecrawl's `scrape` method to fetch the page as markdown * Extracts only the main content to reduce token usage * Returns the scraped content for the AI to analyze Search Web Tool: * Accepts a search query with optional filters * Uses Firecrawl's `search` method to find relevant web pages * Supports advanced filters like location, time range, and content categories * Returns structured results with titles, URLs, and descriptions Learn more about tools: [ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools). Now update your `app/api/chat/route.ts` to include the Firecrawl tools we just created. ```typescript theme={null} import { streamText, UIMessage, stepCountIs, convertToModelMessages } from "ai"; import { createOpenAI } from "@ai-sdk/openai"; import { scrapeWebsiteTool, searchWebTool } from "@/lib/tools"; const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY!, }); export const maxDuration = 300; export async function POST(req: Request) { const { messages, model, webSearch, }: { messages: UIMessage[]; model: string; webSearch: boolean; } = await req.json(); const result = streamText({ model: openai(model), messages: convertToModelMessages(messages), system: "You are a helpful assistant that can answer questions and help with tasks.", // Add the Firecrawl tools here tools: { scrapeWebsite: scrapeWebsiteTool, searchWeb: searchWebTool, }, stopWhen: stepCountIs(5), toolChoice: webSearch ? "auto" : "none", }); return result.toUIMessageStreamResponse({ sendSources: true, sendReasoning: true, }); } ``` The key changes from the basic route: * Import `stepCountIs` from the AI SDK * Import the Firecrawl tools from `@/lib/tools` * Add the `tools` object with both `scrapeWebsite` and `searchWeb` tools * Add `stopWhen: stepCountIs(5)` to limit execution steps * Set `toolChoice` to "auto" when web search is enabled, "none" otherwise Learn more about `streamText`: [ai-sdk.dev/docs/reference/ai-sdk-core/stream-text](https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text). Update your `.env.local` file to include your Firecrawl API key: ```env theme={null} OPENAI_API_KEY=sk-your-openai-api-key FIRECRAWL_API_KEY=fc-your-firecrawl-api-key ``` Get your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev). Restart your development server: ```bash theme={null} npm run dev ``` Open [localhost:3000](http://localhost:3000) and test the enhanced assistant: 1. Toggle the "Search" button to enable web search 2. Ask: "What are the latest features from firecrawl.dev?" 3. Watch as the AI calls the `searchWeb` or `scrapeWebsite` tool 4. See the tool execution in the UI with inputs and outputs 5. Read the AI's analysis based on the scraped data ## How It Works ### Message Flow 1. User sends a message: The user types a question and clicks submit 2. Frontend sends request: `useChat` sends the message to `/api/chat` with the selected model and web search setting 3. Backend processes message: The API route receives the message and calls `streamText` 4. AI decides on tools: The model analyzes the question and decides whether to use `scrapeWebsite` or `searchWeb` (only if web search is enabled) 5. Tools execute: If tools are called, Firecrawl scrapes or searches the web 6. AI generates response: The model analyzes tool results and generates a natural language response 7. Frontend displays results: The UI shows tool calls and the final response in real-time ### Tool Calling Process The AI SDK's tool calling system ([ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools)) works as follows: 1. The model receives the user's message and available tool descriptions 2. If the model determines a tool is needed, it generates a tool call with parameters 3. The SDK executes the tool function with those parameters 4. The tool result is sent back to the model 5. The model uses the result to generate its final response This all happens automatically within a single `streamText` call, with results streaming to the frontend in real-time. ## Key Features ### Model Selection The application supports multiple OpenAI models: * GPT-5 Mini (Thinking): Recent OpenAI model with advanced reasoning capabilities * GPT-4o Mini: Fast and cost-effective model Users can switch between models using the dropdown selector. ### Web Search Toggle The Search button controls whether the AI can use Firecrawl tools: * Enabled: AI can call `scrapeWebsite` and `searchWeb` tools as needed * Disabled: AI responds only with its training knowledge This gives users control over when to use web data versus the model's built-in knowledge. ## Customization Ideas ### Add More Tools Extend the assistant with additional tools: * Database lookups for internal company data * CRM integration to fetch customer information * Email sending capabilities * Document generation Each tool follows the same pattern: define a schema with Zod, implement the execute function, and register it in the `tools` object. ### Change the AI Model Swap OpenAI for another provider: ```typescript theme={null} import { anthropic } from "@ai-sdk/anthropic"; const result = streamText({ model: anthropic("claude-4.5-sonnet"), // ... rest of config }); ``` The AI SDK supports 20+ providers with the same API. Learn more: [ai-sdk.dev/docs/foundations/providers-and-models](https://ai-sdk.dev/docs/foundations/providers-and-models). ### Customize the UI AI Elements components are built on shadcn/ui, so you can: * Modify component styles in the component files * Add new variants to existing components * Create custom components that match the design system ## Best Practices 1. Use appropriate tools: Choose `searchWeb` to find relevant pages first, `scrapeWebsite` for single pages, or let the AI decide 2. Monitor API usage: Track your Firecrawl and OpenAI API usage to avoid unexpected costs 3. Handle errors gracefully: The tools include error handling, but consider adding user-facing error messages 4. Optimize performance: Use streaming to provide immediate feedback and consider caching frequently accessed content 5. Set reasonable limits: The `stopWhen: stepCountIs(5)` prevents excessive tool calls and runaway costs *** ## Related Resources Explore the AI SDK for building AI-powered applications with streaming, tool calling, and multi-provider support. Pre-built UI components for AI applications built on shadcn/ui. # Building a Brand Style Guide Generator with Firecrawl Source: https://docs.firecrawl.dev/developer-guides/cookbooks/brand-style-guide-generator-cookbook Generate professional PDF brand style guides by extracting design systems from any website using Firecrawl's branding format Build a brand style guide generator that automatically extracts colors, typography, spacing, and visual identity from any website and compiles it into a professional PDF document. ## What You'll Build A Node.js application that takes any website URL, extracts its complete brand identity using Firecrawl's branding format, and generates a polished PDF style guide with: * Color palette with hex values * Typography system (fonts, sizes, weights) * Spacing and layout specifications * Logo and brand imagery * Theme information (light/dark mode) ## Prerequisites * Node.js 18 or later installed * A Firecrawl API key from [firecrawl.dev](https://firecrawl.dev) * Basic knowledge of TypeScript and Node.js Start by creating a new directory for your project and initializing it: ```bash theme={null} mkdir brand-style-guide-generator && cd brand-style-guide-generator npm init -y ``` Update your `package.json` to use ES modules: ```json package.json theme={null} { "name": "brand-style-guide-generator", "version": "1.0.0", "type": "module", "scripts": { "start": "npx tsx index.ts" } } ``` Install the required packages for web scraping and PDF generation: ```bash theme={null} npm i @mendable/firecrawl-js pdfkit npm i -D typescript tsx @types/node @types/pdfkit ``` These packages provide: * `@mendable/firecrawl-js`: Firecrawl SDK for extracting brand identity from websites * `pdfkit`: PDF document generation library * `tsx`: TypeScript execution for Node.js Create the main application file at `index.ts`. This script extracts brand identity from a URL and generates a professional PDF style guide. ```typescript index.ts theme={null} import Firecrawl from "@mendable/firecrawl-js"; import PDFDocument from "pdfkit"; import fs from "fs"; const API_KEY = "fc-YOUR-API-KEY"; const URL = "https://firecrawl.dev"; async function main() { const fc = new Firecrawl({ apiKey: API_KEY }); const { branding: b } = (await fc.scrape(URL, { formats: ["branding"] })) as any; const doc = new PDFDocument({ size: "A4", margin: 50 }); doc.pipe(fs.createWriteStream("brand-style-guide.pdf")); // Fetch logo (PNG/JPG only) let logoImg: Buffer | null = null; try { const logoUrl = b.images?.favicon || b.images?.ogImage; if (logoUrl?.match(/\.(png|jpg|jpeg)$/i)) { const res = await fetch(logoUrl); logoImg = Buffer.from(await res.arrayBuffer()); } } catch {} // Header with logo doc.rect(0, 0, 595, 120).fill(b.colors?.primary || "#333"); const titleX = logoImg ? 130 : 50; if (logoImg) doc.image(logoImg, 50, 30, { height: 60 }); doc.fontSize(36).fillColor("#fff").text("Brand Style Guide", titleX, 38); doc.fontSize(14).text(URL, titleX, 80); // Colors doc.fontSize(18).fillColor("#333").text("Colors", 50, 160); const colors = Object.entries(b.colors || {}).filter(([, v]) => typeof v === "string" && (v as string).startsWith("#")); colors.forEach(([k, v], i) => { const x = 50 + i * 100; doc.rect(x, 195, 80, 80).fill(v as string); doc.fontSize(10).fillColor("#333").text(k, x, 282, { width: 80, align: "center" }); doc.fontSize(9).fillColor("#888").text(v as string, x, 296, { width: 80, align: "center" }); }); // Typography doc.fontSize(18).fillColor("#333").text("Typography", 50, 340); doc.fontSize(13).fillColor("#444"); doc.text(`Primary Font: ${b.typography?.fontFamilies?.primary || "—"}`, 50, 370); doc.text(`Heading Font: ${b.typography?.fontFamilies?.heading || "—"}`, 50, 392); doc.fontSize(12).fillColor("#666").text("Font Sizes:", 50, 422); Object.entries(b.typography?.fontSizes || {}).forEach(([k, v], i) => { doc.text(`${k.toUpperCase()}: ${v}`, 70, 445 + i * 22); }); // Spacing & Theme doc.fontSize(18).fillColor("#333").text("Spacing & Theme", 320, 340); doc.fontSize(13).fillColor("#444"); doc.text(`Base Unit: ${b.spacing?.baseUnit}px`, 320, 370); doc.text(`Border Radius: ${b.spacing?.borderRadius}`, 320, 392); doc.text(`Color Scheme: ${b.colorScheme}`, 320, 414); doc.end(); console.log("Generated: brand-style-guide.pdf"); } main(); ``` For this simple project, the API key is placed directly in the code. If you plan to push this to GitHub or share it with others, move the key to a `.env` file and use `process.env.FIRECRAWL_API_KEY` instead. Replace `fc-YOUR-API-KEY` with your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev). ### Understanding the Code Key Components: * Firecrawl Branding Format: The `branding` format extracts comprehensive brand identity including colors, typography, spacing, and images * PDFKit Document: Creates a professional A4 PDF with proper margins and sections * Color Swatches: Renders visual color blocks with hex values and semantic names * Typography Display: Shows font families and sizes in an organized layout * Spacing & Theme: Documents the design system's spacing units and color scheme Run the script to generate a brand style guide: ```bash theme={null} npm start ``` The script will: 1. Extract the brand identity from the target URL using Firecrawl's branding format 2. Generate a PDF named `brand-style-guide.pdf` 3. Save it in your project directory To generate a style guide for a different website, simply change the `URL` constant in `index.ts`. ## How It Works ### Extraction Process 1. URL Input: The generator receives a target website URL 2. Firecrawl Scrape: Calls the `/scrape` endpoint with the `branding` format 3. Brand Analysis: Firecrawl analyzes the page's CSS, fonts, and visual elements 4. Data Return: Returns a structured `BrandingProfile` object with all design tokens ### PDF Generation Process 1. Header Creation: Generates a colored header using the primary brand color 2. Logo Fetch: Downloads and embeds the logo or favicon if available 3. Color Palette: Renders each color as a visual swatch with metadata 4. Typography Section: Documents font families, sizes, and weights 5. Spacing Info: Includes base units, border radius, and theme mode ### Branding Profile Structure The [branding format](https://docs.firecrawl.dev/features/scrape#%2Fscrape-with-branding-endpoint) returns detailed brand information: ```typescript theme={null} { colorScheme: "dark" | "light", logo: "https://example.com/logo.svg", colors: { primary: "#FF6B35", secondary: "#004E89", accent: "#F77F00", background: "#1A1A1A", textPrimary: "#FFFFFF", textSecondary: "#B0B0B0" }, typography: { fontFamilies: { primary: "Inter", heading: "Inter", code: "Roboto Mono" }, fontSizes: { h1: "48px", h2: "36px", body: "16px" }, fontWeights: { regular: 400, medium: 500, bold: 700 } }, spacing: { baseUnit: 8, borderRadius: "8px" }, images: { logo: "https://example.com/logo.svg", favicon: "https://example.com/favicon.ico" } } ``` ## Key Features ### Automatic Color Extraction The generator identifies and categorizes all brand colors: * Primary & Secondary: Main brand colors * Accent: Highlight and CTA colors * Background & Text: UI foundation colors * Semantic Colors: Success, warning, error states ### Typography Documentation Captures the complete type system: * Font Families: Primary, heading, and monospace fonts * Size Scale: All heading and body text sizes * Font Weights: Available weight variations ### Visual Output The PDF includes: * Color-coded header matching the brand * Embedded logo when available * Professional layout with clear hierarchy * Metadata footer with generation date ## Customization Ideas ### Add Component Documentation Extend the generator to include UI component styles: ```typescript theme={null} // Add after the Spacing & Theme section if (b.components) { doc.addPage(); doc.fontSize(20).fillColor("#333").text("UI Components", 50, 50); // Document button styles if (b.components.buttonPrimary) { const btn = b.components.buttonPrimary; doc.fontSize(14).text("Primary Button", 50, 90); doc.rect(50, 110, 120, 40).fill(btn.background); doc.fontSize(12).fillColor(btn.textColor).text("Button", 50, 120, { width: 120, align: "center" }); } } ``` ### Export Multiple Formats Add JSON export alongside the PDF: ```typescript theme={null} // Add before doc.end() fs.writeFileSync("brand-data.json", JSON.stringify(b, null, 2)); ``` ### Batch Processing Generate guides for multiple websites: ```typescript theme={null} const websites = [ "https://stripe.com", "https://linear.app", "https://vercel.com" ]; for (const site of websites) { const { branding } = await fc.scrape(site, { formats: ["branding"] }) as any; // Generate PDF for each site... } ``` ### Custom PDF Themes Create different PDF styles based on the extracted theme: ```typescript theme={null} const isDarkMode = b.colorScheme === "dark"; const headerBg = isDarkMode ? b.colors?.background : b.colors?.primary; const textColor = isDarkMode ? "#fff" : "#333"; ``` ## Best Practices 1. Handle Missing Data: Not all websites expose complete branding information. Always provide fallback values for missing properties. 2. Respect Rate Limits: When batch processing multiple sites, add delays between requests to respect Firecrawl's rate limits. 3. Cache Results: Store extracted branding data to avoid repeated API calls for the same site. 4. Image Format Handling: Some logos may be in formats PDFKit doesn't support (like SVG). Consider adding format conversion or graceful fallbacks. 5. Error Handling: Wrap the generation process in try-catch blocks and provide meaningful error messages. * ## Related Resources Learn more about the branding format and all available properties you can extract. Complete API reference for the scrape endpoint with all format options. Learn more about PDFKit for advanced PDF customization options. Process multiple URLs efficiently with batch scraping. # Full-Stack Templates Source: https://docs.firecrawl.dev/developer-guides/examples Explore real-world examples and tutorials for Firecrawl ## Official Examples Use Case**: Build a RAG-powered chatbot with live web data * [GitHub Repo](https://github.com/firecrawl/open-lovable) Use Case: Build and deploy AI agents with web scraping capabilities * [GitHub Repo](https://github.com/firecrawl/open-agent-builder) Use Case: AI search engine with real-time citations, streaming responses, and live data * [GitHub Repo](https://github.com/firecrawl/fireplexity) Use Case: SaaS starter with brand monitoring, authentication, and billing * [GitHub Repo](https://github.com/firecrawl/firegeo) Use Case: AI-powered data enrichment tool that transforms emails into rich datasets * [GitHub Repo](https://github.com/firecrawl/fire-enrich) Use Case: AI-powered deep research tool with validated answers and citations * [GitHub Repo](https://github.com/firecrawl/firesearch) Use Case: Instantly create AI chatbots for any website with RAG-powered search * [GitHub Repo](https://github.com/firecrawl/firestarter) Use Case: Transform any website into AI-ready structured data * [GitHub Repo](https://github.com/firecrawl/ai-ready-website) Use Case: AI-powered research assistant that gathers and analyzes web data * [GitHub Repo](https://github.com/firecrawl/open-researcher) # Anthropic Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/anthropic Use Firecrawl with Claude for web scraping + AI workflows Integrate Firecrawl with Claude to build AI applications powered by web data. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js @anthropic-ai/sdk zod zod-to-json-schema ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key ANTHROPIC_API_KEY=your_anthropic_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Summarize This example demonstrates a simple workflow: scrape a website and summarize the content using Claude. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import Anthropic from '@anthropic-ai/sdk'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const message = await anthropic.messages.create({ model: 'claude-haiku-4-5', max_tokens: 1024, messages: [ { role: 'user', content: `Summarize in 100 words: ${scrapeResult.markdown}` } ] }); console.log('Response:', message); ``` ## Tool Use This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { Anthropic } from '@anthropic-ai/sdk'; import { z } from 'zod'; import { zodToJsonSchema } from 'zod-to-json-schema'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const ScrapeArgsSchema = z.object({ url: z.string() }); console.log("Sending user message to Claude and requesting tool use if necessary..."); const response = await anthropic.messages.create({ model: 'claude-haiku-4-5', max_tokens: 1024, tools: [{ name: 'scrape_website', description: 'Scrape and extract markdown content from a website URL', input_schema: zodToJsonSchema(ScrapeArgsSchema, 'ScrapeArgsSchema') as any }], messages: [{ role: 'user', content: 'What is Firecrawl? Check firecrawl.dev' }] }); const toolUse = response.content.find(block => block.type === 'tool_use'); if (toolUse && toolUse.type === 'tool_use') { const input = toolUse.input as { url: string }; console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`); const result = await firecrawl.scrape(input.url, { formats: ['markdown'] }); console.log(`Scraped content preview: ${result.markdown?.substring(0, 300)}...`); // Continue with the conversation or process the scraped content as needed } ``` ## Structured Extraction This example demonstrates how to use Claude to extract structured data from scraped website content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import Anthropic from '@anthropic-ai/sdk'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const CompanyInfoSchema = z.object({ name: z.string(), industry: z.string().optional(), description: z.string().optional() }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'], onlyMainContent: true }); const prompt = `Extract company information from this website content. Output ONLY valid JSON in this exact format (no markdown, no explanation): { "name": "Company Name", "industry": "Industry", "description": "One sentence description" } Website content: ${scrapeResult.markdown}`; const message = await anthropic.messages.create({ model: 'claude-haiku-4-5', max_tokens: 1024, messages: [ { role: 'user', content: prompt }, { role: 'assistant', content: '{' } ] }); const textBlock = message.content.find(block => block.type === 'text'); if (textBlock && textBlock.type === 'text') { const jsonText = '{' + textBlock.text; const companyInfo = CompanyInfoSchema.parse(JSON.parse(jsonText)); console.log(companyInfo); } ``` For more examples, check the [Claude documentation](https://docs.anthropic.com/claude/docs). # ElevenAgents Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/elevenagents Give ElevenLabs voice and chat agents real-time web access with Firecrawl Give your [ElevenAgents](https://elevenlabs.io/agents) voice and chat agents the ability to scrape, search, and crawl the web in real time using Firecrawl. This guide covers two integration paths: 1. MCP server — connect the hosted Firecrawl MCP server for zero-code setup. 2. Server webhook tool — point a custom tool at Firecrawl's REST API for full control over requests. ## Prerequisites * An [ElevenLabs](https://elevenlabs.io) account with access to ElevenAgents * A Firecrawl API key from your [firecrawl.dev dashboard](https://firecrawl.dev) ## Option 1: Firecrawl MCP Server The fastest way to give an agent web access. ElevenAgents supports remote MCP servers, and Firecrawl provides a hosted MCP endpoint. ### Add the MCP server 1. Open the [Integrations page](https://elevenlabs.io/app/agents/integrations) in ElevenLabs and click + Add integration. 2. Select Custom MCP Server from the integration library. 3. Fill in the following fields: | Field | Value | | --------------- | ------------------------------------------------------------ | | Name | Firecrawl | | Description | Search, scrape, crawl, and extract content from any website. | | Server type | Streamable HTTP | | Server URL | `https://mcp.firecrawl.dev/YOUR_FIRECRAWL_API_KEY/v2/mcp` | Replace `YOUR_FIRECRAWL_API_KEY` with your actual key. Leave the Type dropdown set to Value. Treat this URL as a secret — it contains your API key. You must select Streamable HTTP as the server type. The default SSE option does not work with the Firecrawl MCP endpoint. 4. Under Tool Approval Mode, choose an approval level: * No Approval — the agent uses tools freely. Fine for read-only scraping. * Fine-Grained Tool Approval — lets you pre-select which tools can run automatically and which require approval. Good for controlling expensive crawl operations. * Always Ask (default) — the agent requests permission before every tool call. 5. Check I trust this server, then click Add Server. ElevenLabs will connect to the server and list the available tools (scrape, search, crawl, map, and more). ### Attach it to an agent 1. Create or open an agent in the [ElevenAgents dashboard](https://elevenlabs.io/app/agents/agents). 2. Go to the Tools tab, then select the MCP sub-tab. 3. Click Add server and select the Firecrawl integration from the dropdown. ### Update the system prompt In the Agent tab, add instructions to the System prompt so the agent knows when to use Firecrawl. For example: ```text theme={null} You are a helpful research assistant. When the user asks about a website, a company, or any topic that requires up-to-date information, use the Firecrawl tools to search the web or scrape the relevant page, then summarize the results. ``` ### Test it Click Preview in the top navigation bar. You can test using the text chat input or by starting a voice call. Try a prompt like: > "What does firecrawl.dev do? Go to the site and summarize it for me." The agent will call the Firecrawl MCP `scrape` tool, receive the page markdown, and respond with a summary. * ## Option 2: Server Webhook Tool Use this approach when you need precise control over request parameters (formats, headers, timeouts, etc.) or want to call a specific Firecrawl endpoint without exposing the full MCP tool set. ### Scrape tool Create a tool that scrapes a single URL and returns its content as markdown. 1. Open your agent and go to the Tools tab. 2. Click Add tool and select Webhook. 3. Configure the tool: | Field | Value | | --------------- | ---------------------------------------------------------- | | Name | scrape\_website | | Description | Scrape content from a URL and return it as clean markdown. | | Method | POST | | URL | `https://api.firecrawl.dev/v2/scrape` | The Method field defaults to GET — make sure to change it to POST. 4. Scroll to the Headers section and click Add header for authentication: | Header | Value | | --------------- | ------------------------------- | | `Authorization` | `Bearer YOUR_FIRECRAWL_API_KEY` | Alternatively, if you have workspace auth connections configured, you can use the Authentication dropdown instead. 5. Add a body parameter: | Parameter | Type | Description | Required | | --------- | ------ | ----------------- | -------- | | `url` | string | The URL to scrape | Yes | 6. Click Add tool. The Firecrawl API returns the page content as markdown by default. The agent receives the JSON response and can use the `markdown` field to answer questions. ### Search tool Create a tool that searches the web and returns results with scraped content. 1. Click Add tool → Webhook again and configure: | Field | Value | | --------------- | ------------------------------------------------------------------------- | | Name | search\_web | | Description | Search the web for a query and return relevant results with page content. | | Method | POST | | URL | `https://api.firecrawl.dev/v2/search` | 2. Add the same `Authorization` header as above. 3. Add body parameters: | Parameter | Type | Description | Required | | --------- | ------ | ----------------------------------------------- | -------- | | `query` | string | The search query | Yes | | `limit` | number | Maximum number of results to return (default 5) | No | 4. Click Add tool. ### Update the system prompt In the Agent tab, update the System prompt: ```text theme={null} You are a knowledgeable assistant with access to web tools. - Use `scrape_website` when the user gives you a specific URL to read. - Use `search_web` when the user asks a general question that requires finding information online. Always summarize the information concisely and cite the source URL. ``` ### Test it Click Preview and try asking: > "Search for the latest Next.js features and give me a summary." The agent will call `search_web`, receive results from Firecrawl, and respond with a summary of the findings. * ## Tips * Model selection — For reliable tool calling, use a high-intelligence model such as GPT-4o, Claude Sonnet 4.5 or later, or Gemini 2.5 Flash. Smaller models may struggle to extract the correct parameters. * Keep prompts specific — Tell the agent exactly when to use each tool. Vague instructions lead to missed or incorrect tool calls. * Limit response size — For voice agents, long scraped pages can overwhelm the LLM context. Use `onlyMainContent: true` in scrape options (or instruct the agent to summarize aggressively) to keep responses concise. * Tool call sounds — In the webhook or MCP tool settings, you can configure a Tool call sound to play ambient audio while a tool runs. This signals to the user that the agent is working. ## Resources * [ElevenAgents documentation](https://elevenlabs.io/docs/eleven-agents/overview) * [ElevenAgents tools overview](https://elevenlabs.io/docs/eleven-agents/customization/tools) * [ElevenAgents MCP integration](https://elevenlabs.io/docs/eleven-agents/customization/tools/mcp) * [Firecrawl API reference](https://docs.firecrawl.dev/api-reference/v2-introduction) * [Firecrawl MCP server](https://docs.firecrawl.dev/mcp-server) # Gemini Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/gemini Use Firecrawl with Google's Gemini AI for web scraping + AI workflows Integrate Firecrawl with Google's Gemini for AI applications powered by web data. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js @google/genai ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key GEMINI_API_KEY=your_gemini_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Summarize This example demonstrates a simple workflow: scrape a website and summarize the content using Gemini. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { GoogleGenAI } from '@google/genai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const response = await ai.models.generateContent({ model: 'gemini-2.5-flash', contents: `Summarize: ${scrapeResult.markdown}`, }); console.log('Summary:', response.text); ``` ## Content Analysis This example shows how to analyze website content using Gemini's multi-turn conversation capabilities. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { GoogleGenAI } from '@google/genai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://news.ycombinator.com/', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const chat = ai.chats.create({ model: 'gemini-2.5-flash' }); // Ask for the top 3 stories on Hacker News const result1 = await chat.sendMessage({ message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${scrapeResult.markdown}` }); console.log('Top 3 Stories:', result1.text); // Ask for the 4th and 5th stories on Hacker News const result2 = await chat.sendMessage({ message: `Now, what are the 4th and 5th top stories on Hacker News from the same content?` }); console.log('4th and 5th Stories:', result2.text); ``` ## Structured Extraction This example demonstrates how to extract structured data using Gemini's JSON mode from scraped website content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { GoogleGenAI, Type } from '@google/genai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const response = await ai.models.generateContent({ model: 'gemini-2.5-flash', contents: `Extract company information: ${scrapeResult.markdown}`, config: { responseMimeType: 'application/json', responseSchema: { type: Type.OBJECT, properties: { name: { type: Type.STRING }, industry: { type: Type.STRING }, description: { type: Type.STRING }, products: { type: Type.ARRAY, items: { type: Type.STRING } } }, propertyOrdering: ['name', 'industry', 'description', 'products'] } } }); console.log('Extracted company info:', response?.text); ``` For more examples, check the [Gemini documentation](https://ai.google.dev/docs). # Agent Development Kit (ADK) Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/google-adk Integrate Firecrawl with Google's ADK using MCP for advanced agent workflows Integrate Firecrawl with Google's Agent Development Kit (ADK) to build powerful AI agents with web scraping capabilities through the Model Context Protocol (MCP). ## Overview Firecrawl provides an MCP server that seamlessly integrates with Google's ADK, enabling your agents to efficiently scrape, crawl, and extract structured data from any website. The integration supports both cloud-based and self-hosted Firecrawl instances with streamable HTTP for optimal performance. ## Features * Efficient web scraping, crawling, and content discovery from any website * Advanced search capabilities and intelligent content extraction * Deep research and high-volume batch scraping * Flexible deployment (cloud-based or self-hosted) * Optimized for modern web environments with streamable HTTP support ## Prerequisites * Obtain an API key for Firecrawl from [firecrawl.dev](https://firecrawl.dev) * Install Google ADK ## Setup ```python Remote MCP Server theme={null} from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset FIRECRAWL_API_KEY = "YOUR-API-KEY" root_agent = Agent( model="gemini-2.5-pro", name="firecrawl_agent", description='A helpful assistant for scraping websites with Firecrawl', instruction='Help the user search for website content', tools=[ MCPToolset( connection_params=StreamableHTTPServerParams( url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp", ), ) ], ) ``` ```python Local MCP Server theme={null} from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset from mcp import StdioServerParameters root_agent = Agent( model='gemini-2.5-pro', name='firecrawl_agent', description='A helpful assistant for scraping websites with Firecrawl', instruction='Help the user search for website content', tools=[ MCPToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='npx', args=[ "-y", "firecrawl-mcp", ], env={ "FIRECRAWL_API_KEY": "YOUR-API-KEY", } ), timeout=30, ), ) ], ) ``` ## Available Tools | Tool | Name | Description | | ------------------ | ------------------------------ | ------------------------------------------------------------------------------------ | | Scrape Tool | `firecrawl_scrape` | Scrape content from a single URL with advanced options | | Batch Scrape Tool | `firecrawl_batch_scrape` | Scrape multiple URLs efficiently with built-in rate limiting and parallel processing | | Check Batch Status | `firecrawl_check_batch_status` | Check the status of a batch operation | | Map Tool | `firecrawl_map` | Map a website to discover all indexed URLs on the site | | Search Tool | `firecrawl_search` | Search the web and optionally extract content from search results | | Crawl Tool | `firecrawl_crawl` | Start an asynchronous crawl with advanced options | | Check Crawl Status | `firecrawl_check_crawl_status` | Check the status of a crawl job | | Extract Tool | `firecrawl_extract` | Extract structured information from web pages using LLM capabilities | ## Configuration ### Required Configuration FIRECRAWL\_API\_KEY: Your Firecrawl API key * Required when using cloud API (default) * Optional when using self-hosted instance with FIRECRAWL\_API\_URL ### Optional Configuration Firecrawl API URL (for self-hosted instances): * `FIRECRAWL_API_URL`: Custom API endpoint * Example: `https://firecrawl.your-domain.com` * If not provided, the cloud API will be used Retry configuration: * `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum retry attempts (default: 3) * `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds (default: 1000) * `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds (default: 10000) * `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2) Credit usage monitoring: * `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Warning threshold (default: 1000) * `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Critical threshold (default: 100) ## Example: Web Research Agent ```python theme={null} from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset FIRECRAWL_API_KEY = "YOUR-API-KEY" # Create a research agent research_agent = Agent( model="gemini-2.5-pro", name="research_agent", description='An AI agent that researches topics by scraping and analyzing web content', instruction='''You are a research assistant. When given a topic or question: 1. Use the search tool to find relevant websites 2. Scrape the most relevant pages for detailed information 3. Extract structured data when needed 4. Provide comprehensive, well-sourced answers''', tools=[ MCPToolset( connection_params=StreamableHTTPServerParams( url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp", ), ) ], ) # Use the agent response = research_agent.run("What are the latest features in Python 3.13?") print(response) ``` ## Best Practices 1. Use the right tool for the job: * `firecrawl_search` when you need to find relevant pages first * `firecrawl_scrape` for single pages * `firecrawl_batch_scrape` for multiple known URLs * `firecrawl_crawl` for discovering and scraping entire sites 2. Monitor your usage: Configure credit thresholds to avoid unexpected usage 3. Handle errors gracefully: Configure retry settings based on your use case 4. Optimize performance: Use batch operations when scraping multiple URLs * ## Related Resources Learn how to build powerful multi-agent AI systems using Google's ADK framework with Firecrawl for web scraping capabilities. Learn more about Firecrawl's Model Context Protocol (MCP) server integration and capabilities. Explore the official Google Agent Development Kit documentation for comprehensive guides and API references. # LangChain Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langchain Use Firecrawl with LangChain for web scraping + AI workflows Integrate Firecrawl with LangChain to build AI applications powered by web data. ## Setup ```bash theme={null} npm install @langchain/openai @mendable/firecrawl-js ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Chat This example demonstrates a simple workflow: scrape a website and process the content using LangChain. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { HumanMessage } from '@langchain/core/messages'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const chat = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const response = await chat.invoke([ new HumanMessage(`Summarize: ${scrapeResult.markdown}`) ]); console.log('Summary:', response.content); ``` ## Chains This example shows how to build a LangChain chain to process and analyze scraped content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { ChatPromptTemplate } from '@langchain/core/prompts'; import { StringOutputParser } from '@langchain/core/output_parsers'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const model = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); // Create processing chain const prompt = ChatPromptTemplate.fromMessages([ ['system', 'You are an expert at analyzing company websites.'], ['user', 'Extract the company name and main products from: {content}'] ]); const chain = prompt.pipe(model).pipe(new StringOutputParser()); // Execute the chain const result = await chain.invoke({ content: scrapeResult.markdown }); console.log('Chain result:', result); ``` ## Tool Calling This example demonstrates how to use LangChain's tool calling feature to let the model decide when to scrape websites. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { DynamicStructuredTool } from '@langchain/core/tools'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Create the scraping tool const scrapeWebsiteTool = new DynamicStructuredTool({ name: 'scrape_website', description: 'Scrape content from any website URL', schema: z.object({ url: z.string().url().describe('The URL to scrape') }), func: async ({ url }) => { console.log('Scraping:', url); const result = await firecrawl.scrape(url, { formats: ['markdown'] }); console.log('Scraped content preview:', result.markdown?.substring(0, 200) + '...'); return result.markdown || 'No content scraped'; } }); const model = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }).bindTools([scrapeWebsiteTool]); const response = await model.invoke('What is Firecrawl? Visit firecrawl.dev and tell me about it.'); console.log('Response:', response.content); console.log('Tool calls:', response.tool_calls); ``` ## Structured Data Extraction This example shows how to extract structured data using LangChain's structured output feature. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const CompanyInfoSchema = z.object({ name: z.string(), industry: z.string(), description: z.string(), products: z.array(z.string()) }); const model = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }).withStructuredOutput(CompanyInfoSchema); const companyInfo = await model.invoke([ { role: 'system', content: 'Extract company information from website content.' }, { role: 'user', content: `Extract data: ${scrapeResult.markdown}` } ]); console.log('Extracted company info:', companyInfo); ``` For more examples, check the [LangChain documentation](https://js.langchain.com/docs). # LangGraph Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langgraph Integrate Firecrawl with LangGraph for building agent workflows This guide shows how to integrate Firecrawl with LangGraph to build AI agent workflows that can scrape and process web content. ## Setup ```bash theme={null} npm install @langchain/langgraph @langchain/openai @mendable/firecrawl-js ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Basic Workflow This example demonstrates a basic LangGraph workflow that scrapes a website and analyzes the content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { StateGraph, MessagesAnnotation, START, END } from '@langchain/langgraph'; // Initialize Firecrawl const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Initialize LLM const llm = new ChatOpenAI({ model: "gpt-5-nano", apiKey: process.env.OPENAI_API_KEY }); // Define the scrape node async function scrapeNode(state: typeof MessagesAnnotation.State) { console.log('Scraping...'); const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); return { messages: [{ role: "system", content: `Scraped content: ${result.markdown}` }] }; } // Define the analyze node async function analyzeNode(state: typeof MessagesAnnotation.State) { console.log('Analyzing...'); const response = await llm.invoke(state.messages); return { messages: [response] }; } // Build the graph const graph = new StateGraph(MessagesAnnotation) .addNode("scrape", scrapeNode) .addNode("analyze", analyzeNode) .addEdge(START, "scrape") .addEdge("scrape", "analyze") .addEdge("analyze", END); // Compile the graph const app = graph.compile(); // Run the workflow const result = await app.invoke({ messages: [{ role: "user", content: "Summarize the website" }] }); console.log(JSON.stringify(result, null, 2)); ``` ## Multi-Step Workflow This example demonstrates a more complex workflow that scrapes multiple URLs and processes them. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { StateGraph, Annotation, START, END } from '@langchain/langgraph'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const llm = new ChatOpenAI({ model: "gpt-5-nano", apiKey: process.env.OPENAI_API_KEY }); // Define custom state const WorkflowState = Annotation.Root({ urls: Annotation(), scrapedData: Annotation>(), summary: Annotation() }); // Scrape multiple URLs async function scrapeMultiple(state: typeof WorkflowState.State) { const scrapedData = []; for (const url of state.urls) { const result = await firecrawl.scrape(url, { formats: ['markdown'] }); scrapedData.push({ url, content: result.markdown || '' }); } return { scrapedData }; } // Summarize all scraped content async function summarizeAll(state: typeof WorkflowState.State) { const combinedContent = state.scrapedData .map(item => `Content from ${item.url}:\n${item.content}`) .join('\n\n'); const response = await llm.invoke([ { role: "user", content: `Summarize these websites:\n${combinedContent}` } ]); return { summary: response.content as string }; } // Build the workflow graph const workflow = new StateGraph(WorkflowState) .addNode("scrape", scrapeMultiple) .addNode("summarize", summarizeAll) .addEdge(START, "scrape") .addEdge("scrape", "summarize") .addEdge("summarize", END); const app = workflow.compile(); // Execute workflow const result = await app.invoke({ urls: ["https://firecrawl.dev", "https://firecrawl.dev/pricing"] }); console.log(result.summary); ``` For more examples, check the [LangGraph documentation](https://langchain-ai.github.io/langgraphjs/). # LlamaIndex Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/llamaindex Use Firecrawl with LlamaIndex for RAG applications Integrate Firecrawl with LlamaIndex to build AI applications with vector search and embeddings powered by web content. ## Setup ```bash theme={null} npm install llamaindex @llamaindex/openai @mendable/firecrawl-js ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## RAG with Vector Search This example demonstrates how to use LlamaIndex with Firecrawl to crawl a website, create embeddings, and query the content using RAG. ```typescript theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { Document, VectorStoreIndex, Settings } from 'llamaindex'; import { OpenAI, OpenAIEmbedding } from '@llamaindex/openai'; Settings.llm = new OpenAI({ model: "gpt-4o" }); Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" }); const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://firecrawl.dev', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(`Crawled ${crawlResult.data.length } pages`); const documents = crawlResult.data.map((page: any, i: number) => new Document({ text: page.markdown, id_: `page-${i}`, metadata: { url: page.metadata?.sourceURL } }) ); const index = await VectorStoreIndex.fromDocuments(documents); console.log('Vector index created with embeddings'); const queryEngine = index.asQueryEngine(); const response = await queryEngine.query({ query: 'What is Firecrawl and how does it work?' }); console.log('\nAnswer:', response.toString()); ``` For more examples, check the [LlamaIndex documentation](https://ts.llamaindex.ai/). # Mastra Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/mastra Use Firecrawl with Mastra for building AI workflows Integrate Firecrawl with Mastra, the TypeScript framework for building AI agents and workflows. ## Setup ```bash theme={null} npm install @mastra/core @mendable/firecrawl-js zod ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Multi-Step Workflow This example demonstrates a complete workflow that searches, scrapes, and summarizes documentation using Firecrawl and Mastra. ```typescript theme={null} import { createWorkflow, createStep } from "@mastra/core/workflows"; import { z } from "zod"; import Firecrawl from "@mendable/firecrawl-js"; import { Agent } from "@mastra/core/agent"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY || "fc-YOUR_API_KEY" }); const agent = new Agent({ name: "summarizer", instructions: "You are a helpful assistant that creates concise summaries of documentation.", model: "openai/gpt-5-nano", }); // Step 1: Search with Firecrawl SDK const searchStep = createStep({ id: "search", inputSchema: z.object({ query: z.string(), }), outputSchema: z.object({ url: z.string(), title: z.string(), }), execute: async ({ inputData }: { inputData: { query: string } }) => { console.log(`Searching: ${inputData.query}`); const searchResults = await firecrawl.search(inputData.query, { limit: 1 }); const webResults = (searchResults as any)?.web; if (!webResults || !Array.isArray(webResults) || webResults.length === 0) { throw new Error("No search results found"); } const firstResult = webResults[0]; console.log(`Found: ${firstResult.title}`); return { url: firstResult.url, title: firstResult.title, }; }, }); // Step 2: Scrape the URL with Firecrawl SDK const scrapeStep = createStep({ id: "scrape", inputSchema: z.object({ url: z.string(), title: z.string(), }), outputSchema: z.object({ markdown: z.string(), title: z.string(), }), execute: async ({ inputData }: { inputData: { url: string; title: string } }) => { console.log(`Scraping: ${inputData.url}`); const scrapeResult = await firecrawl.scrape(inputData.url, { formats: ["markdown"], }); console.log(`Scraped: ${scrapeResult.markdown?.length || 0} characters`); return { markdown: scrapeResult.markdown || "", title: inputData.title, }; }, }); // Step 3: Summarize with Claude const summarizeStep = createStep({ id: "summarize", inputSchema: z.object({ markdown: z.string(), title: z.string(), }), outputSchema: z.object({ summary: z.string(), }), execute: async ({ inputData }: { inputData: { markdown: string; title: string } }) => { console.log(`Summarizing: ${inputData.title}`); const prompt = `Summarize the following documentation in 2-3 sentences:\n\nTitle: ${inputData.title}\n\n${inputData.markdown}`; const result = await agent.generate(prompt); console.log(`Summary generated`); return { summary: result.text }; }, }); // Create workflow export const workflow = createWorkflow({ id: "firecrawl-workflow", inputSchema: z.object({ query: z.string(), }), outputSchema: z.object({ summary: z.string(), }), steps: [searchStep, scrapeStep, summarizeStep], }) .then(searchStep) .then(scrapeStep) .then(summarizeStep) .commit(); async function testWorkflow() { const run = await workflow.createRunAsync(); const result = await run.start({ inputData: { query: "Firecrawl documentation" } }); if (result.status === "success") { const { summarize } = result.steps; if (summarize.status === "success") { console.log(`\n${summarize.output.summary}`); } } else { console.error("Workflow failed:", result.status); } } testWorkflow().catch(console.error); ``` For more examples, check the [Mastra documentation](https://mastra.ai/docs). # OpenAI Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/openai Use Firecrawl with OpenAI for web scraping + AI workflows Integrate Firecrawl with OpenAI to build AI applications powered by web data. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js openai zod ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > Note: If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Summarize This example demonstrates a simple workflow: scrape a website and summarize the content using an OpenAI model. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); // Scrape the website content const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); // Summarize with OpenAI model const completion = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [ { role: 'user', content: `Summarize: ${scrapeResult.markdown}` } ] }); console.log('Summary:', completion.choices[0]?.message.content); ``` ## Function Calling This example shows how to use OpenAI's function calling feature to let the model decide when to scrape websites based on user requests. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const ScrapeArgsSchema = z.object({ url: z.string().describe('The URL of the website to scrape') }); const tools = [{ type: 'function' as const, function: { name: 'scrape_website', description: 'Scrape content from any website URL', parameters: z.toJSONSchema(ScrapeArgsSchema) } }]; const response = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [{ role: 'user', content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.' }], tools }); const message = response.choices[0]?.message; if (message?.tool_calls && message.tool_calls.length > 0) { for (const toolCall of message.tool_calls) { if (toolCall.type === 'function') { console.log('Tool called:', toolCall.function.name); const args = ScrapeArgsSchema.parse(JSON.parse(toolCall.function.arguments)); const result = await firecrawl.scrape(args.url, { formats: ['markdown'] // Other formats: html, links, etc. }); console.log('Scraped content:', result.markdown?.substring(0, 200) + '...'); // Send the scraped content back to the model for final response const finalResponse = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [ { role: 'user', content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.' }, message, { role: 'tool', tool_call_id: toolCall.id, content: result.markdown || 'No content scraped' } ], tools }); console.log('Final response:', finalResponse.choices[0]?.message?.content); } } } else { console.log('Direct response:', message?.content); } ``` ## Structured Data Extraction This example demonstrates how to use OpenAI models with structured outputs to extract specific data from scraped content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const CompanyInfoSchema = z.object({ name: z.string(), industry: z.string(), description: z.string(), products: z.array(z.string()) }); const response = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [ { role: 'system', content: 'Extract company information from website content.' }, { role: 'user', content: `Extract data: ${scrapeResult.markdown}` } ], response_format: { type: 'json_schema', json_schema: { name: 'company_info', schema: z.toJSONSchema(CompanyInfoSchema), strict: true } } }); const content = response.choices[0]?.message?.content; const companyInfo = content ? CompanyInfoSchema.parse(JSON.parse(content)) : null; console.log('Validated company info:', companyInfo); ``` ## Search + Analyze This example combines Firecrawl's search capabilities with OpenAI model analysis to find and summarize information from multiple sources. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); // Search for relevant information const searchResult = await firecrawl.search('Next.js 16 new features', { limit: 3, sources: [{ type: 'web' }], // Other sources: { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log('Search results:', searchResult.web?.length, 'pages found'); // Analyze and summarize the key features const analysis = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [{ role: 'user', content: `Summarize the key features: ${JSON.stringify(searchResult)}` }] }); console.log('Analysis:', analysis.choices[0]?.message?.content); ``` ## Responses API with MCP This example shows how to use OpenAI's Responses API with Firecrawl configured as an MCP (Model Context Protocol) server. ```typescript theme={null} import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await openai.responses.create({ model: 'gpt-5-nano', tools: [ { type: 'mcp', server_label: 'firecrawl', server_description: 'A web search and scraping MCP server to scrape and extract content from websites.', server_url: `https://mcp.firecrawl.dev/${process.env.FIRECRAWL_API_KEY}/v2/mcp`, require_approval: 'never' } ], input: 'Find out what the top stories on Hacker News are and the latest blog post on OpenAI and summarize them in a bullet point format' }); console.log('Response:', JSON.stringify(response.output, null, 2)); ``` For more examples, check the [OpenAI documentation](https://platform.openai.com/docs). # Vercel AI SDK Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/vercel-ai-sdk Firecrawl tools for Vercel AI SDK. Web scraping, search, interact, and crawling for AI applications. Firecrawl tools for the Vercel AI SDK. Search, scrape, interact with pages, and crawl the web in your AI applications. ## Install ```bash theme={null} npm install firecrawl-aisdk ai ``` Set environment variables: ```bash theme={null} FIRECRAWL_API_KEY=fc-your-key # https://firecrawl.dev AI_GATEWAY_API_KEY=your-key # https://vercel.com/ai-gateway ``` These examples use the [Vercel AI Gateway](https://vercel.com/ai-gateway) string model format, but Firecrawl tools work with any AI SDK provider. You can also use provider imports like `anthropic('claude-sonnet-4-5-20250514')` from `@ai-sdk/anthropic`. ## Quick Start `FirecrawlTools()` bundles `search`, `scrape`, and `interact` by default. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { FirecrawlTools } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', tools: FirecrawlTools(), stopWhen: stepCountIs(30), prompt: ` 1. Use interact on Hacker News to identify the top story 2. Search for other perspectives on the same topic 3. Scrape the most relevant pages you found 4. Summarize everything you found `, }); ``` ## FirecrawlTools `FirecrawlTools()` gives you the default tools plus an auto-generated `systemPrompt` you can pass to `generateText`. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { FirecrawlTools } from 'firecrawl-aisdk'; const tools = FirecrawlTools(); const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', system: `${tools.systemPrompt}\n\nAnswer with citations when possible.`, tools, stopWhen: stepCountIs(20), prompt: 'Find the current Firecrawl pricing page and explain the available plans.', }); ``` You can customize defaults, opt into async tools, or disable individual tools: ```typescript theme={null} const tools = FirecrawlTools({ search: { limit: 5 }, scrape: { formats: ['markdown'], onlyMainContent: true }, interact: { profile: { name: 'my-session', saveChanges: true } }, crawl: true, agent: true, }); ``` ```typescript theme={null} // Disable interact, keep search + scrape FirecrawlTools({ interact: false }); // Opt into deprecated browser compatibility FirecrawlTools({ browser: {} }); // Include every available tool FirecrawlTools({ all: true }); ``` When scraping to answer a question about a page, prefer query format: ```typescript theme={null} formats: [{ type: 'query', prompt: 'What does this page say about pricing and rate limits?' }] ``` Use `formats: ['markdown']` only when you need the full page content. ## Individual Tools Every tool can be used directly or called with options: ```typescript theme={null} import { generateText } from 'ai'; import { scrape, search } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Search for Firecrawl, then scrape the most relevant result.', tools: { search, scrape }, }); const customScrape = scrape({ apiKey: 'fc-custom-key', apiUrl: 'https://api.firecrawl.dev' }); ``` ### Search + Scrape ```typescript theme={null} import { generateText } from 'ai'; import { search, scrape } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Search for Firecrawl, scrape the top official result, and explain what it does.', tools: { search, scrape }, }); ``` ### Map ```typescript theme={null} import { generateText } from 'ai'; import { map } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Map https://docs.firecrawl.dev and list the main sections.', tools: { map }, }); ``` ### Stream ```typescript theme={null} import { streamText, stepCountIs } from 'ai'; import { scrape } from 'firecrawl-aisdk'; const result = streamText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'What is the first 100 words of firecrawl.dev?', tools: { scrape }, stopWhen: stepCountIs(3), }); for await (const chunk of result.textStream) { process.stdout.write(chunk); } await result.fullStream; ``` ## Interact `interact()` creates a scrape-backed interactive session. Call `start(url)` to bootstrap a session and get a live view URL, then let the model reuse that session through the `interact` tool. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { interact, search } from 'firecrawl-aisdk'; const interactTool = interact(); console.log('Live view:', await interactTool.start('https://news.ycombinator.com')); const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', tools: { interact: interactTool, search }, stopWhen: stepCountIs(25), prompt: 'Use interact on the current Hacker News session, find the top story, then search for more context.', }); await interactTool.close(); ``` If you need the explicit live view URL after startup, use `interactTool.interactiveLiveViewUrl`. Reuse browser state across sessions with profiles: ```typescript theme={null} const interactTool = interact({ profile: { name: 'my-session', saveChanges: true }, }); ``` `browser()` is deprecated. Prefer `interact()`. ## Async Tools Crawl, batch scrape, and agent return a job ID. Pair them with `poll`. ### Crawl ```typescript theme={null} import { generateText } from 'ai'; import { crawl, poll } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Crawl https://docs.firecrawl.dev (limit 3 pages) and summarize.', tools: { crawl, poll }, }); ``` ### Batch Scrape ```typescript theme={null} import { generateText } from 'ai'; import { batchScrape, poll } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Scrape https://firecrawl.dev and https://docs.firecrawl.dev, then compare them.', tools: { batchScrape, poll }, }); ``` ### Agent Autonomous web data gathering that searches, navigates, and extracts on its own. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { agent, poll } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Find the founders of Firecrawl, their roles, and their backgrounds.', tools: { agent, poll }, stopWhen: stepCountIs(10), }); ``` ## Logging ```typescript theme={null} import { generateText } from 'ai'; import { logStep, scrape, stepLogger } from 'firecrawl-aisdk'; const logger = stepLogger(); const { text, usage } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Scrape https://firecrawl.dev and summarize it.', tools: { scrape }, onStepFinish: logger.onStep, experimental_onToolCallFinish: logger.onToolCallFinish, }); logger.close(); logger.summary(usage); await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Scrape https://firecrawl.dev and summarize it again.', tools: { scrape }, onStepFinish: logStep, }); ``` ## All Exports ```typescript theme={null} import { // Core tools search, // Search the web scrape, // Scrape a single URL map, // Discover URLs on a site crawl, // Crawl multiple pages (async, use poll) batchScrape, // Scrape multiple URLs (async, use poll) agent, // Autonomous web research (async, use poll) // Job management poll, // Poll async jobs for results status, // Check job status cancel, // Cancel running jobs // Browser/session tools interact, // interact({ profile: { name: '...' } }) browser, // deprecated compatibility export // All-in-one bundle FirecrawlTools, // FirecrawlTools({ search, scrape, interact, crawl, agent }) // Helpers stepLogger, // Token stats per tool call logStep, // Simple one-liner logging } from 'firecrawl-aisdk'; ``` # MCP Web Search & Scrape in ChatGPT Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/chatgpt Add web scraping and search to ChatGPT in 2 minutes MCP support in ChatGPT is currently a beta feature. The interface and availability may change as OpenAI continues development. Availability: Developer mode with MCP connectors is not available on the Free tier and requires a paid ChatGPT subscription. Availability and features vary by plan and region. See [OpenAI's documentation on Developer Mode](https://help.openai.com/en/articles/12584461-developer-mode-apps-and-full-mcp-connectors-in-chatgpt-beta) for current availability and setup instructions. Add web scraping and search capabilities to ChatGPT with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) and copy your API key. ### 2. Enable Developer Mode Open ChatGPT settings by clicking your username in the bottom left corner, or navigate directly to [chatgpt.com/#settings](https://chatgpt.com/#settings). In the settings modal, scroll to the bottom and select Advanced Settings. Toggle Developer mode to ON. ### 3. Create the Connector With Developer mode enabled, go to the Apps & Connectors tab in the same settings modal. Click the Create** button in the top right corner. Fill in the connector details: * Name: `Firecrawl MCP` * Description: `Web scraping, crawling, search, and content extraction` (optional) * MCP Server URL: `https://mcp.firecrawl.dev/YOUR_API_KEY_HERE/v2/mcp` * Authentication: `None` Replace `YOUR_API_KEY_HERE` in the URL with your actual [Firecrawl API key](https://www.firecrawl.dev/app/api-keys). Check the "I understand and want to continue" checkbox, then click Create. ### 4. Verify Setup Go back to the main ChatGPT interface. You should see Developer mode displayed, indicating that MCP connectors are active. If you do not see Developer mode, reload the page. If it still does not appear, open settings again and verify that Developer mode is toggled ON under Advanced Settings. ### 5. Access Firecrawl Tools To use Firecrawl in a conversation, click the + button in the chat input, then select More and choose Firecrawl MCP. ## Quick Demo With Firecrawl MCP selected, try these prompts: Search: ``` Search for the latest React Server Components updates ``` Scrape: ``` Scrape firecrawl.dev and tell me what it does ``` Get docs: ``` Scrape the Vercel documentation for edge functions and summarize it ``` ## Tool Confirmation When ChatGPT uses the Firecrawl MCP tools, you may see a confirmation prompt asking for your approval. Some ChatGPT Desktop versions auto-approve tool calls without showing this dialog. If no prompt appears, you can verify the tool was invoked by checking for a "Called tool" section in the response or reviewing your usage at [firecrawl.dev/app/usage](https://www.firecrawl.dev/app/usage). No confirmation prompt appearing? If ChatGPT answers your question without showing a confirmation dialog, the Firecrawl MCP connector is most likely not attached to the conversation. Go back to [Step 5](#5-access-firecrawl-tools) and make sure you click the + button, select More, and choose Firecrawl MCP before sending your prompt. The connector must be attached to each new conversation. You can check "Remember for this conversation" to avoid repeated confirmations during the same chat session. This security measure is implemented by OpenAI to ensure MCP tools do not perform unintended actions. Once confirmed, ChatGPT will execute the request and return the results. # MCP Web Search & Scrape in Claude.ai Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/claude-ai Add web scraping and search to Claude.ai (Co-work) in 2 minutes Add web scraping and search capabilities to Claude.ai with Firecrawl MCP using custom connectors. Looking for Claude Code setup? See the [Claude Code guide](/developer-guides/mcp-setup-guides/claude-code) instead. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) and copy your API key. ### 2. Add Custom Connector Go to [Settings > Connectors](https://claude.ai/settings/connectors) in Claude.ai and click Add custom connector. Fill in the connector details: * URL: `https://mcp.firecrawl.dev/YOUR_API_KEY/v2/mcp` * OAuth Client ID: Leave blank * OAuth Client Secret: Leave blank Replace `YOUR_API_KEY` in the URL with your actual [Firecrawl API key](https://www.firecrawl.dev/app/api-keys). Your API key is embedded directly in the URL, so no additional authentication fields are needed. Click Add to save the connector. ### 3. Enable in Conversation In any Claude.ai conversation, click the + button at the bottom left, go to Connectors, and enable the Firecrawl connector. ## Quick Demo With the Firecrawl connector enabled, try these prompts: Search the web: ``` Search for the latest Next.js 15 features ``` Scrape a page: ``` Scrape firecrawl.dev and tell me what it does ``` Get documentation: ``` Find and scrape the Stripe API docs for payment intents ``` Claude will automatically use Firecrawl's search and scrape tools to get the information. # MCP Web Search & Scrape in Claude Code Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/claude-code Add web scraping and search to Claude Code in 2 minutes Add web scraping and search capabilities to Claude Code with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Add Firecrawl MCP Server Option A: Remote hosted URL (recommended) ```bash theme={null} claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp ``` Option B: Local (npx) ```bash theme={null} claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp ``` Replace `your-api-key` with your actual Firecrawl API key. Done! You can now search and scrape the web from Claude Code. ## Quick Demo Try these in Claude Code: Search the web: ``` Search for the latest Next.js 15 features ``` Scrape a page: ``` Scrape firecrawl.dev and tell me what it does ``` Get documentation: ``` Find and scrape the Stripe API docs for payment intents ``` Claude will automatically use Firecrawl's search and scrape tools to get the information. # MCP Web Search & Scrape in Cursor Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/cursor Add web scraping and search to Cursor in 2 minutes Add web scraping and search capabilities to Cursor with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Add to Cursor Open Settings (`Cmd+,`), search for "MCP", and add: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "your_api_key_here" } } } } ``` Replace `your_api_key_here` with your actual Firecrawl API key. ### 3. Restart Cursor Done! You can now search and scrape the web from Cursor. ## Quick Demo Try these in Cursor Chat (`Cmd+K`): Search: ``` Search for TypeScript best practices 2025 ``` Scrape: ``` Scrape firecrawl.dev and tell me what it does ``` Get docs: ``` Scrape the React hooks documentation and explain useEffect ``` Cursor will automatically use Firecrawl tools. ## Windows Troubleshooting If you see a `spawn npx ENOENT` or "No server info found" error on Windows, Cursor cannot find `npx` in your PATH. Try one of these fixes: Option A: Use the full path to `npx.cmd` Run `where npx` in Command Prompt to get the full path, then update your config: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "C:\\Program Files\\nodejs\\npx.cmd", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "your_api_key_here" } } } } ``` Replace the `command` path with the output from `where npx`. Option B: Use the remote hosted URL (no Node.js required) ```json theme={null} { "mcpServers": { "firecrawl": { "url": "https://mcp.firecrawl.dev/YOUR-API-KEY/v2/mcp" } } } ``` Replace `YOUR-API-KEY` with your Firecrawl API key. # MCP Web Search & Scrape in Factory AI Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/factory-ai Add web scraping and search to Factory AI in 2 minutes Add web scraping and search capabilities to Factory AI with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Install Factory AI CLI Install the [Factory AI CLI](https://docs.factory.ai/cli/getting-started/quickstart) if you haven't already: macOS/Linux: ```bash theme={null} curl -fsSL https://app.factory.ai/cli | sh ``` Windows: ```powershell theme={null} iwr https://app.factory.ai/cli/install.ps1 -useb | iex ``` ### 3. Add Firecrawl MCP Server In the Factory droid CLI, add Firecrawl using the `/mcp add` command: ```bash theme={null} /mcp add firecrawl "npx -y firecrawl-mcp" -e FIRECRAWL_API_KEY=your-api-key-here ``` Replace `your-api-key-here` with your actual Firecrawl API key. ### 4. Done! The Firecrawl tools are now available in your Factory AI session! ## Quick Demo Try these in Factory AI: Search the web: ``` Search for the latest Next.js 15 features ``` Scrape a page: ``` Scrape firecrawl.dev and tell me what it does ``` Get documentation: ``` Find and scrape the Stripe API docs for payment intents ``` Factory will automatically use Firecrawl's search and scrape tools to get the information. # MCP Web Search & Scrape in Windsurf Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/windsurf Add web scraping and search to Windsurf in 2 minutes Add web scraping and search capabilities to Windsurf with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Add to Windsurf Add this to your `./codeium/windsurf/model_config.json`: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` Replace `YOUR_API_KEY` with your actual Firecrawl API key. ### 3. Restart Windsurf Done! Windsurf can now search and scrape the web. ## Quick Demo Try these in Windsurf: Search: ``` Search for the latest Tailwind CSS features ``` Scrape: ``` Scrape firecrawl.dev and explain what it does ``` Get docs: ``` Find and scrape the Supabase authentication documentation ``` Windsurf's AI agents will automatically use Firecrawl tools. # OpenClaw Source: https://docs.firecrawl.dev/developer-guides/openclaw Use Firecrawl with OpenClaw to give your agents web scraping, search, and browser automation capabilities. Integrate Firecrawl with OpenClaw to give your agents the ability to scrape, search, crawl, extract, and interact with the web — all through the [Firecrawl CLI](/sdks/cli). ## Why Firecrawl + OpenClaw * No local browser needed — every session runs in a remote, isolated sandbox. No Chromium installs, no driver conflicts, no RAM pressure on your machine. * Real parallelism — run many browser sessions at once without local resource fights. Agents can browse in batches across multiple sites simultaneously. * Secure by default — navigation, DOM evaluation, and extraction all happen inside disposable sandboxes, not on your workstation. * Better token economics — agents get back clean artifacts (snapshots, extracted fields) instead of hauling giant DOMs and driver logs into the context window. * Full web toolkit — scrape, search, and browser automation all through a single CLI that your agent already knows how to use. ## Setup Tell your agent to install the Firecrawl CLI, authenticate and initialize the skill with this command: ```bash theme={null} npx -y firecrawl-cli init --browser --all ``` * `--all` installs the Firecrawl skill to every detected AI coding agent * `--browser` opens the browser for Firecrawl authentication automatically or install everything seperately: ```bash theme={null} npm install -g firecrawl-cli firecrawl init skills firecrawl login --browser # Or, skip the browser and provide your API key directly: export FIRECRAWL_API_KEY="fc-YOUR-API-KEY" ``` Verify everything is set up correctly: ```bash theme={null} firecrawl --status ``` Once the skill is installed, your OpenClaw agent automatically discovers and uses Firecrawl commands — no extra configuration needed. ## Scrape Scrape a single page and extract its content: ```bash theme={null} firecrawl https://example.com --only-main-content ``` Get specific formats: ```bash theme={null} firecrawl https://example.com --format markdown,links --pretty ``` ## Search Search the web and optionally scrape the results: ```bash theme={null} firecrawl search "latest AI funding rounds 2025" --limit 10 # Search and scrape the results firecrawl search "OpenClaw documentation" --scrape --scrape-formats markdown ``` ## Browser Launch a remote browser session for interactive automation. Each session runs in an isolated sandbox — no local Chromium install required. `agent-browser` is pre-installed with 40+ commands. ```bash theme={null} # Shorthand: auto-launches a session if none is active firecrawl browser "open https://news.ycombinator.com" firecrawl browser "snapshot" firecrawl browser "scrape" firecrawl browser close ``` Interact with page elements using refs from the snapshot: ```bash theme={null} firecrawl browser "open https://example.com" firecrawl browser "snapshot" # snapshot returns @ref IDs — use them to interact firecrawl browser "click @e5" firecrawl browser "fill @e3 'search query'" firecrawl browser "scrape" firecrawl browser close ``` The shorthand form (`firecrawl browser "..."`) sends commands to `agent-browser` automatically and auto-launches a sandbox session if there isn't one active. Your agent issues intent-level actions (`open`, `click`, `fill`, `snapshot`, `scrape`) instead of writing Playwright code. ## Example: tell your agent Here are some prompts you can give your OpenClaw agent: * "Use Firecrawl to scrape [https://example.com](https://example.com) and summarize the main content." * "Search for the latest OpenAI news and give me a summary of the top 5 results." * "Use Firecrawl Browser to open Hacker News, get the top 5 stories, and the first 10 comments on each." * "Crawl the docs at [https://docs.firecrawl.dev](https://docs.firecrawl.dev) and save them to a file." ## Further reading * [Firecrawl CLI reference](/sdks/cli) * [Browser Sandbox docs](/features/browser) * [Agent docs](/features/agent) # Choosing the Data Extractor Source: https://docs.firecrawl.dev/developer-guides/usage-guides/choosing-the-data-extractor Compare /agent, /extract, and /scrape (JSON mode) to pick the right tool for structured data extraction Firecrawl offers three approaches for extracting structured data from web pages. Each serves different use cases with varying levels of automation and control. ## Quick Comparison | Feature | `/agent` | `/extract` | `/scrape` (JSON mode) | | ------------------- | -------------------------------------- | ------------------------------------------ | ---------------------------- | | Status | Active | Use `/agent` instead | Active | | URL Required | No (optional) | Yes (wildcards supported) | Yes (single URL) | | Scope | Web-wide discovery | Multiple pages/domains | Single page | | URL Discovery | Autonomous web search | Crawls from given URLs | None | | Processing | Asynchronous | Asynchronous | Synchronous | | Schema Required | No (prompt or schema) | No (prompt or schema) | No (prompt or schema) | | Pricing | Dynamic (5 free runs/day) | Token-based (1 credit = 15 tokens) | 1 credit/page | | Best For | Research, discovery, complex gathering | Multi-page extraction (when you know URLs) | Known single-page extraction | ## 1. `/agent` Endpoint The `/agent` endpoint is Firecrawl's most advanced offering—the successor to `/extract`. It uses AI agents to autonomously search, navigate, and gather data from across the web. ### Key Characteristics * URLs Optional: Just describe what you need via `prompt`; URLs are completely optional * Autonomous Navigation: The agent searches and navigates deep into sites to find your data * Deep Web Search: Autonomously discovers information across multiple domains and pages * Parallel Processing: Processes multiple sources simultaneously for faster results * Models Available: `spark-1-mini` (default, 60% cheaper) and `spark-1-pro` (higher accuracy) ### Example ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") background: Optional[str] = Field(None, description="Professional background") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema, model="spark-1-mini", max_credits=100 ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ prompt: "Find the founders of Firecrawl", schema: z.object({ founders: z.array(z.object({ name: z.string().describe("Full name of the founder"), role: z.string().describe("Role or position").optional(), background: z.string().describe("Professional background").optional() })).describe("List of founders") }), model: "spark-1-mini", maxCredits: 100 }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the founders of Firecrawl", "model": "spark-1-mini", "maxCredits": 100, "schema": { "type": "object", "properties": { "founders": { "type": "array", "description": "List of founders", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "Full name" }, "role": { "type": "string", "description": "Role or position" }, "background": { "type": "string", "description": "Professional background" } }, "required": ["name"] } } }, "required": ["founders"] } }' ``` ### Best Use Case: Autonomous Research & Discovery Scenario: You need to find information about AI startups that raised Series A funding, including their founders and funding amounts. Why `/agent`: You don't know which websites contain this information. The agent will autonomously search the web, navigate to relevant sources (Crunchbase, news sites, company pages), and compile the structured data for you. For more details, see the [Agent documentation](/features/agent). * ## 2. `/extract` Endpoint Use `/agent` instead**: We recommend migrating to [`/agent`](/features/agent)—it's faster, more reliable, doesn't require URLs, and handles all `/extract` use cases plus more. The `/extract` endpoint collects structured data from specified URLs or entire domains using LLM-powered extraction. ### Key Characteristics * URLs Typically Required: Provide at least one URL (supports wildcards like `example.com/`) Domain Crawling: Can crawl and parse all URLs discovered in a domain * Web Search Enhancement: Optional `enableWebSearch` to follow links outside specified domains * Schema Optional: Supports strict JSON schema OR natural language prompts * Async Processing: Returns job ID for status checking ### The URL Limitation The fundamental challenge with `/extract` is that you typically need to know URLs upfront: 1. Discovery gap: For tasks like "find YC W24 companies," you don't know which URLs contain the data. You'd need a separate search step before calling `/extract`. 2. Awkward web search: While `enableWebSearch` exists, it's constrained to start from URLs you provide—an awkward workflow for discovery tasks. 3. Why `/agent` was created: `/extract` is good at extracting from known locations, but less effective at discovering where data lives. ### Example ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") schema = { "type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"], } res = firecrawl.extract( urls=["https://docs.firecrawl.dev"], prompt="Extract the page description", schema=schema, ) print(res.data["description"]) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const schema = { type: 'object', properties: { title: { type: 'string' } }, required: ['title'] }; const res = await firecrawl.extract({ urls: ['https://docs.firecrawl.dev'], prompt: 'Extract the page title', schema, scrapeOptions: { formats: [{ type: 'json', prompt: 'Extract', schema }] } }); console.log(res.status || res.success, res.data); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/extract" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://docs.firecrawl.dev"], "prompt": "Extract the page title", "schema": { "type": "object", "properties": {"title": {"type": "string"}}, "required": ["title"] }, "scrapeOptions": { "formats": [{"type": "json", "prompt": "Extract", "schema": {"type": "object"}}] } }' ``` ### Best Use Case: Targeted Multi-Page Extraction Scenario: You have your competitor's documentation URL and want to extract all their API endpoints from `docs.competitor.com/*`. Why `/extract` worked here: You knew the exact domain. But even then, `/agent` with URLs provided would typically give better results today. For more details, see the [Extract documentation](/features/extract). *** ## 3. `/scrape` Endpoint with JSON Mode The `/scrape` endpoint with JSON mode is the most controlled approach—it extracts structured data from a single known URL using an LLM to parse the page content into your specified schema. ### Key Characteristics * Single URL Only: Designed for extracting data from one specific page at a time * Exact URL Required: You must know the precise URL containing the data * Schema Optional: Can use JSON schema OR just a prompt (LLM chooses structure) * Synchronous: Returns data immediately (no job polling needed) * Additional Formats: Can combine JSON extraction with markdown, HTML, screenshots in one request ### Example ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Define schema to extract contents into const schema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", schema: schema }], }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] }' ``` ### Best Use Case: Single-Page Precision Extraction Scenario: You're building a price monitoring tool and need to extract the price, stock status, and product details from a specific product page you already have the URL for. Why `/scrape` with JSON mode: You know exactly which page contains the data, need precise single-page extraction, and want synchronous results without job management overhead. For more details, see the [JSON mode documentation](/features/llm-extract). * ## Decision Guide Do you know the exact URL(s) containing your data?** * NO → Use `/agent` (autonomous web discovery) * YES * Single page? → Use `/scrape` with JSON mode * Multiple pages? → Use `/agent` with URLs (or batch `/scrape`) ### Recommendations by Scenario | Scenario | Recommended Endpoint | | -------------------------------------------------- | ------------------------------- | | "Find all AI startups and their funding" | `/agent` | | "Extract data from this specific product page" | `/scrape` (JSON mode) | | "Get all blog posts from competitor.com" | `/agent` with URL | | "Monitor prices across multiple known URLs" | `/scrape` with batch processing | | "Research companies in a specific industry" | `/agent` | | "Extract contact info from 50 known company pages" | `/scrape` with batch processing | * ## Pricing | Endpoint | Cost | Notes | | --------------------- | ---------------------------------- | ------------------------------------- | | `/scrape` (JSON mode) | 1 credit/page | Fixed, predictable | | `/extract` | Token-based (1 credit = 15 tokens) | Variable based on content | | `/agent` | Dynamic | 5 free runs/day; varies by complexity | ### Example: "Find the founders of Firecrawl" | Endpoint | How It Works | Credits Used | | ---------- | ----------------------------------------------- | ---------------------- | | `/scrape` | You find the URL manually, then scrape 1 page | \~1 credit | | `/extract` | You provide URL(s), it extracts structured data | Variable (token-based) | | `/agent` | Just send the prompt—agent finds and extracts | \~100–500 credits | Tradeoff: `/scrape` is cheapest but requires you to know the URL. `/agent` costs more but handles discovery automatically. For detailed pricing, see [Firecrawl Pricing](https://firecrawl.dev/pricing). * ## Migration: `/extract` → `/agent` If you're currently using `/extract`, migration is straightforward: Before (extract): ```python theme={null} result = app.extract( urls=["https://example.com/"], prompt="Extract product information", schema=schema ) ``` After (agent): ```python theme={null} result = app.agent( urls=["https://example.com"], # Optional - can omit entirely prompt="Extract product information from example.com", schema=schema, model="spark-1-mini" # or "spark-1-pro" for higher accuracy ) ``` The key advantage: with `/agent`, you can drop the URLs entirely and just describe what you need. ## Key Takeaways 1. Know the exact URL? Use `/scrape` with JSON mode—it's the cheapest (1 credit/page), fastest (synchronous), and most predictable option. 2. Need autonomous research? Use `/agent`—it handles discovery automatically with 5 free runs/day, then dynamic pricing based on complexity. 3. Migrate from `/extract` to `/agent` for new projects—`/agent` is the successor with better capabilities. 4. Cost vs. convenience tradeoff: `/scrape` is most cost-effective when you know your URLs; `/agent` costs more but eliminates manual URL discovery. * ## Further Reading * [Agent documentation](/features/agent) * [Agent models](/features/models) * [JSON mode documentation](/features/llm-extract) * [Extract documentation](/features/extract) * [Batch scraping](/features/batch-scrape) # Firecrawl + Dify Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/dify Official plugin for Firecrawl + Dify AI workflow automation Official Dify Plugin: [marketplace.dify.ai/plugins/langgenius/firecrawl](https://marketplace.dify.ai/plugins/langgenius/firecrawl) Official plugin by Dify team • 44,000+ installs • Chatflow & Agent apps • Free to use ## Dify Integration Overview Dify is an open-source LLM app development platform. The official Firecrawl plugin enables web crawling and scraping directly in your AI workflows. Build visual pipelines with Firecrawl nodes for data extraction Give AI agents the power to scrape live web data on demand ## Firecrawl Tools in Dify Convert any URL into clean, structured data. Transform raw HTML into actionable insights. Use Cases: Extract product data, scrape article content, get structured data with JSON mode. Perform recursive crawls of websites and subdomains to gather extensive content. Use Cases: Full site content extraction, documentation scraping, multi-page data collection. Generate a complete map of all URLs present on a website. Use Cases: Site structure analysis, SEO auditing, URL discovery for batch scraping. Retrieve scraping results based on a Job ID or cancel ongoing tasks. Use Cases: Monitor long-running crawls, manage async scraping workflows, cancel operations when needed. ## Getting Started Access the [Dify Plugin Marketplace](https://marketplace.dify.ai/plugins/langgenius/firecrawl) and install the Firecrawl tool Visit [Firecrawl API Keys](https://www.firecrawl.dev/app/api-keys) and create a new API key Navigate to Plugins > Firecrawl > To Authorize and input your API key Drag Firecrawl tools into your Chatflow, Workflow, or Agent application Set up parameters and test your workflow ## Usage Patterns Visual Pipeline Integration 1. Add Firecrawl node to your pipeline 2. Select action (Map, Crawl, Scrape) 3. Define input variables 4. Execute pipeline sequentially Example Flow: ``` User Input → Firecrawl (Scrape) → LLM Processing → Response ``` Automated Data Processing Build multi-step workflows with: * Scheduled scraping * Data transformation * Database storage * Notifications Example Flow: ``` Schedule Trigger → Firecrawl (Crawl) → Data Processing → Storage ``` AI-Powered Web Access Give agents real-time web scraping capabilities: 1. Add Firecrawl tool to Agent 2. Agent autonomously decides when to scrape 3. LLM analyzes extracted content 4. Agent provides informed responses Use Case: Customer support agents that reference live documentation ## Common Use Cases Build RAG-powered chatbots that scrape and reference live website content Agents that research topics by scraping and analyzing multiple sources Automated workflows that track competitor websites and alert on changes Extract and enrich data from websites into structured databases ## Firecrawl Actions | Tool | Description | Best For | | ------------- | ------------------------------ | ----------------------- | | Scrape | Single-page data extraction | Quick content capture | | Crawl | Multi-page recursive crawling | Full site extraction | | Map | URL discovery and site mapping | SEO analysis, URL lists | | Crawl Job | Async job management | Long-running operations | ## Best Practices * Let agents decide when to scrape * Use natural language instructions * Enable tool calling in LLM settings * Monitor token usage with large scrapes * Use Map before Crawl for large sites * Set appropriate crawl limits * Add error handling nodes * Test with small datasets first ## Dify vs Other Platforms | Feature | Dify | Make | Zapier | n8n | | --------------- | -------------------- | ------------------- | ------------------- | ------------------- | | Type | LLM app platform | Workflow automation | Workflow automation | Workflow automation | | Best For | AI agents & chatbots | Visual workflows | Quick automation | Developer control | | Pricing | Open-source + Cloud | Operations-based | Per-task | Flat monthly | | AI-Native | Yes | Partial | Partial | Partial | | Self-Hosted | Yes | No | No | Yes | Pro Tip: Dify excels at building AI-native applications where agents need dynamic web access. Perfect for chatbots, research assistants, and AI tools that need live data. # Firecrawl + Make Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/make Official integration and workflow automation for Firecrawl + Make Official Make Integration: [make.com/en/integrations/firecrawl](https://www.make.com/en/integrations/firecrawl) Connect with 3,000+ apps • Visual workflow builder • Enterprise-grade automation • AI-powered scenarios ## Make Integration Overview Make (formerly Integromat) provides a verified, officially supported Firecrawl integration maintained by Mendable. Design complex automations with Make's intuitive visual interface Scale securely with enterprise-grade automation and controls ## Firecrawl Modules in Make ### Crawl a Website Crawl a URL and get its content from multiple pages * ### Extract a Website Extract structured data from pages using LLMs * ### Scrape a Website Scrape a URL and get its content from a single page * ### Map a Website Map multiple URLs based on options * ### Search a Website Web search with Firecrawl's scraping capabilities * ### Get Crawl Status Get the status of a given crawl event ID * ### Get Extract Status Get the status of a given extraction event ID ### Search a Website Full-page content retrieval for any search query ### Make an API Call Perform arbitrary authorized API calls for custom use cases ## Popular App Integrations Google Sheets - Track and log scraped data in spreadsheets Airtable - Build structured databases with scraped content Google Drive - Store scraped files and reports Notion - Organize research and web data Slack - Get alerts for website changes and updates Telegram Bot - Instant notifications for monitoring Gmail - Email reports and digests Microsoft 365 Email - Enterprise email automation HubSpot CRM - Enrich leads with web data monday.com - Track competitor intelligence ClickUp - Manage research tasks OpenAI (ChatGPT, DALL-E) - Analyze and summarize scraped content Google Gemini AI - Process and extract insights Perplexity AI - Enhanced research workflows Make AI Agents - Build adaptive AI-powered automations ## Common Workflow Patterns Schedule → Firecrawl (Scrape) → Google Sheets (Log) → Slack (Alert) Track competitor websites and get instant notifications Google Forms → Firecrawl (Scrape company site) → HubSpot CRM (Update) Automatically enrich leads with company data Schedule → Firecrawl (Crawl blog) → OpenAI (Summarize) → Gmail (Send digest) Automated content curation and distribution Schedule (Hourly) → Firecrawl (Scrape) → Filter → Telegram (Alert) Real-time price tracking and alerts ## Getting Started Get your API key at [firecrawl.dev](https://firecrawl.dev) Log into [Make](https://make.com) and click "Create a new scenario" Search for "Firecrawl" and select your desired action Add your Firecrawl API key to authenticate Set up your workflow parameters and run a test ## Firecrawl Actions Overview | Module | Use Case | Best For | | --------------------- | -------------------------------- | ------------------- | | Scrape a Website | Single-page data extraction | Quick data capture | | Crawl a Website | Multi-page content collection | Full site scraping | | Extract a Website | AI-powered structured extraction | Complex data needs | | Search a Website | Search + full content | Research automation | | Map a Website | URL discovery | SEO analysis | ## Best Practices * Use Scrape for single pages (fastest) * Use Crawl with limits for large sites * Schedule appropriately to avoid rate limits * Add error handling modules * Schedule strategically (hourly/daily/weekly) * Use filters to prevent unnecessary runs * Set crawl limits to control API usage * Test in Firecrawl playground first ## Industry Use Cases * Competitor price monitoring * Product availability tracking * Review aggregation and analysis * Inventory level monitoring * Listing aggregation from multiple sources * Market trend analysis * Property data enrichment * Competitive pricing intelligence * Competitor content monitoring * SEO performance tracking * Backlink analysis * Social media mention tracking * Market data collection * News and sentiment aggregation * Regulatory filing monitoring * Stock price tracking * Job posting aggregation * Company research automation * Candidate background enrichment * Salary benchmarking ## Make vs Zapier vs n8n | Feature | Make | Zapier | n8n | | ------------------ | --------------------------------- | ---------------- | -------------------- | | Setup | Visual builder, cloud | No-code, cloud | Self-hosted or cloud | | Pricing | Operations-based | Per-task pricing | Flat monthly | | Integrations | 3,000+ apps | 8,000+ apps | 400+ integrations | | Complexity | Advanced workflows | Simple workflows | Complex workflows | | Best For | Visual automation, mid-complexity | Quick automation | Developer control | | Learning Curve | Moderate | Easy | Moderate-Advanced | Pro Tip: Make excels at visual workflow design and complex automations. Perfect for teams that need more control than Zapier but prefer visual building over n8n's code-first approach. # Firecrawl + n8n Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n Learn how to use Firecrawl with n8n for web scraping automation, a complete step-by-step guide. ## Introduction to Firecrawl and n8n Web scraping automation has become essential for modern businesses. Whether you need to monitor competitor prices, aggregate content, generate leads, or power AI applications with real-time data, the combination of Firecrawl and n8n provides a powerful solution without requiring programming knowledge. What is n8n? n8n is an open-source workflow automation platform that connects different tools and services together. Think of it as a visual programming environment where you drag and drop nodes onto a canvas, connect them, and create automated workflows. With over 400 integrations, n8n lets you build complex automations without writing code. ## Why Use Firecrawl with n8n? Traditional web scraping presents several challenges. Custom scripts break when websites update their structure. Anti-bot systems block automated requests. JavaScript-heavy sites don't render properly. Infrastructure requires constant maintenance. Firecrawl handles these technical complexities on the scraping side, while n8n provides the automation framework. Together, they let you build production-ready workflows that: * Extract data from any website reliably * Connect scraped data to other business tools * Run on schedules or triggered by events * Scale from simple tasks to complex pipelines This guide will walk you through setting up both platforms and building your first scraping workflow from scratch. ## Step 1: Create Your Firecrawl Account Firecrawl provides the web scraping capabilities for your workflows. Let's set up your account and get your API credentials. ### Sign Up for Firecrawl 1. Navigate to [firecrawl.dev](https://firecrawl.dev) in your web browser 2. Click the "Get Started" or "Sign Up" button 3. Create an account using your email address or GitHub login 4. Verify your email if prompted ### Get Your API Key After signing in, you need an API key to connect Firecrawl to n8n: 1. Go to your Firecrawl dashboard 2. Navigate to the [API Keys page](https://www.firecrawl.dev/app/api-keys) 3. Click "Create New API Key" 4. Give your key a descriptive name (e.g., "n8n Integration") 5. Copy the generated API key and save it somewhere secure Your API key is like a password. Keep it secure and never share it publicly. You'll need this key in the next section. Firecrawl provides free credits when you sign up, which is enough to test your workflows and complete this tutorial. ## Step 2: Set Up n8n n8n offers two deployment options: cloud-hosted or self-hosted. For beginners, the cloud version is the fastest way to get started. ### Choose Your n8n Version n8n Cloud (Recommended for beginners): * No installation required * Free tier available * Managed infrastructure * Automatic updates Self-Hosted: * Complete data control * Run on your own servers * Requires Docker installation * Good for advanced users with specific security requirements Choose the option that fits your needs. Both paths will lead you to the same workflow editor interface. ### Option A: n8n Cloud (Recommended for Beginners) 1. Visit [n8n.cloud](https://n8n.cloud) 2. Click "Start Free" or "Sign Up" 3. Register using your email address or GitHub 4. Complete the verification process 5. You'll be directed to your n8n dashboard The free tier provides everything you need to build and test workflows. You can upgrade later if you need more execution time or advanced features. ### Option B: Self-Hosted with Docker If you prefer to run n8n on your own infrastructure, you can set it up quickly using Docker. Prerequisites: * [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed on your computer * Basic familiarity with command line/terminal Installation Steps: 1. Open your terminal or command prompt 2. Create a Docker volume to persist your workflow data: ```bash theme={null} docker volume create n8n_data ``` This volume stores your workflows, credentials, and execution history so they persist even if you restart the container. 3. Run the n8n Docker container: ```bash theme={null} docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n ``` 4. Wait for n8n to start. You'll see output indicating the server is running 5. Open your web browser and navigate to `http://localhost:5678` 6. Create your n8n account by registering with an email Your self-hosted n8n instance is now running locally. The interface is identical to n8n Cloud, so you can follow the rest of this guide regardless of which option you chose. The `--rm` flag automatically removes the container when you stop it, but your data remains safe in the `n8n_data` volume. For production deployments, see the [n8n self-hosting documentation](https://docs.n8n.io/hosting/) for more advanced configuration options. ### Understanding the n8n Interface When you first log in to n8n, you'll see the main dashboard: Key interface elements: * Workflows: Your saved automations appear here * Executions: History of workflow runs * Credentials: Stored API keys and authentication tokens * Settings: Account and workspace configuration Click "Create New Workflow" to open the workflow editor. ### The Workflow Canvas The workflow editor is where you'll build your automations: Important elements: * Canvas: The main area where you place and connect nodes * Add Node Button (+): Click this to add new nodes to your workflow * Node Panel: Opens when you click "+" showing all available nodes * Execute Workflow: Runs your workflow manually for testing * Save: Saves your workflow configuration Let's build your first workflow by adding the Firecrawl node. ## Step 3: Install and Configure the Firecrawl Node n8n includes native support for Firecrawl. You'll install the node and connect it to your Firecrawl account using the API key you created earlier. ### Add the Firecrawl Node to Your Workflow 1. In your new workflow canvas, click the "+" button in the center 2. The node selection panel opens on the right side 3. In the search box at the top, type "Firecrawl" 4. You'll see the Firecrawl node appear in the search results 5. Click "Install" next to the Firecrawl node 6. Wait for the installation to complete (this takes a few seconds) 7. Once installed, click on the Firecrawl node to add it to your canvas The Firecrawl node will appear on your canvas as a box with the Firecrawl logo. This node represents a single Firecrawl operation in your workflow. ### Connect Your Firecrawl API Key n8n Cloud users: Instead of manually entering an API key, you can use the one-click "Connect to Firecrawl" OAuth button when adding the Firecrawl node. This automatically creates a new Firecrawl team linked to your n8n account and grants 100,000 free credits. To view these credits on the [Firecrawl dashboard](https://www.firecrawl.dev/app/usage), make sure you switch to your n8n-linked team using the team selector in the top-left corner. Before you can use the Firecrawl node, you need to authenticate it with your API key: 1. Click on the Firecrawl node box to open its configuration panel on the right 2. At the top, you'll see a "Credential to connect with" dropdown 3. Since this is your first time, click "Create New Credential" 4. A credential configuration window opens 5. Enter a name for this credential (e.g., "My Firecrawl Account") 6. Paste your Firecrawl API key in the "API Key" field 7. Click "Save" at the bottom The credential is now saved in n8n. You won't need to enter your API key again for future Firecrawl nodes. ### Test Your Connection Let's verify that your Firecrawl node is properly connected: 1. With the Firecrawl node still selected, look at the configuration panel 2. In the "Resource" dropdown, select "Scrape a url and get its content" 3. In the "URL" field, enter: `https://firecrawl.dev` 4. Leave other settings at their defaults for now 5. Click the "Test step" button at the bottom right of the node If everything is configured correctly, you'll see the scraped content from example.com appear in the output panel below the node. Congratulations! Your Firecrawl node is now connected and working. ## Step 4: Create Your Telegram Bot Before building your first workflow, you'll need a Telegram bot to receive notifications. Telegram bots are free and easy to create through Telegram's BotFather. ### Create a Bot with BotFather 1. Open Telegram on your phone or desktop 2. Search for "@BotFather" (the official bot from Telegram) 3. Start a conversation with BotFather by clicking "Start" 4. Send the command `/newbot` to create a new bot 5. BotFather will ask you to choose a name for your bot (this is the display name users will see) 6. Enter a name like "My Firecrawl Bot" 7. Next, choose a username for your bot. It must end with "bot" (e.g., "my\_firecrawl\_updates\_bot") 8. If the username is available, BotFather will create your bot and send you a message with your bot token Save your bot token securely. This token is like a password that allows n8n to send messages as your bot. Never share it publicly. ### Get Your Chat ID To send messages to yourself, you need your Telegram chat ID: 1. Open your web browser and visit this URL (replace `YOUR_BOT_TOKEN` with your actual bot token): ``` https://api.telegram.org/botYOUR_BOT_TOKEN/getUpdates ``` 2. Keep this browser tab open 3. Now, search for your bot's username in Telegram (the one you just created) 4. Start a conversation with your bot by clicking "Start" 5. Send any message to your bot (e.g., "hello") 6. Go back to the browser tab and refresh the page 7. Look for the `"chat":{"id":` field in the JSON response 8. The number next to `"id":` is your chat ID (e.g., `123456789`) 9. Save this chat ID for later Your chat ID is the unique identifier for your conversation with the bot. You'll use this to tell n8n where to send messages. You now have everything needed to integrate Telegram with your n8n workflows. ## Step 5: Build Practical Workflows with Telegram Now let's build three real-world workflows that send information directly to your Telegram. These examples demonstrate different Firecrawl operations and how to integrate them with Telegram notifications. ### Example 1: Daily Firecrawl Product Updates Summary Get a daily summary of Firecrawl product updates delivered to your Telegram every morning. What you'll build: * Scrapes Firecrawl's product updates blog at 9 AM daily * Uses AI to generate a summary of the content * Sends the summary to your Telegram Step-by-step: 1. Create a new workflow in n8n 2. Add a Schedule Trigger node: * Click the "+" button on canvas * Search for "Schedule Trigger" * Configure: Every day at 9:00 AM 3. Add the Firecrawl node: * Click "+" next to Schedule Trigger * Search for and add "Firecrawl" * Select your Firecrawl credential * Configure: * Resource: Scrape a url and get its content * URL: `https://www.firecrawl.dev/blog/category/product-updates` * Formats: Select "Summary" 4. Add the Telegram node: * Click "+" next to Firecrawl * Search for "Telegram" * Click "Send a text message" to add it to the canvas 5. Set up Telegram credentials: * Click on the Telegram node to open its configuration * In the "Credential to connect with" dropdown, click "Create New Credential" * Paste your bot token from BotFather * Click "Save" 6. Configure the Telegram message: * Operation: Send Message * Chat ID: Enter your chat ID * Text: Leave this with a "hello" message for now * Click Execute step to test sending a message while receiving the summary from Firecrawl. * Now with Firecrawl's summary structure, add the summary to the message text by dragging the `summary` field from Firecrawl node output. 7. Test the workflow: * Click "Execute Workflow" * Check your Telegram for the summary message 8. Activate the workflow by toggling the "Active" switch Your Telegram bot will now send you a daily summary of Firecrawl product updates every morning at 9 AM. ### Example 2: AI News Search to Telegram This workflow uses Firecrawl's Search operation to find AI news and send formatted results to Telegram. Key differences from Example 1: * Uses a Manual Trigger instead of Schedule (run on demand) * Uses Search operation instead of Scrape * Includes a Code node to format multiple results Build the workflow: 1. Create a new workflow and add a Manual Trigger node 2. Add Firecrawl node with these settings: * Resource: Search and optionally scrape search results * Query: `ai news` * Limit: 5 3. Add a Code node to format the search results: * Select "Run Once for All Items" * Paste this code: ```javascript theme={null} const results = $input.all(); let message = "Latest AI News:\n\n"; results.forEach((item) => { const webData = item.json.data.web; webData.forEach((article, index) => { message += `${index + 1}. ${article.title}\n`; message += `${article.description}\n`; message += `${article.url}\n\n`; }); }); return [{ json: { message } }]; ``` 4. Update Telegram node (using your saved credential): * Text: Drag the `message` field from Code node Replace the Manual Trigger with a Schedule Trigger to get automatic AI news updates at set intervals. ### Example 3: AI-Powered News Summary This workflow adds AI to Example 2, using OpenAI to generate intelligent summaries of the latest AI news before sending to Telegram. Key changes from Example 2: * Add OpenAI credentials setup * Add AI Agent node between Code and Telegram * AI Agent analyzes and summarizes all the news articles intelligently * Telegram receives the AI-generated summary instead of raw news list Modify the workflow: 1. Get your OpenAI API key: * Go to [platform.openai.com/api-keys](https://platform.openai.com/api-keys) * Sign in or create an account * Click "Create new secret key" * Give it a name (e.g., "n8n Integration") * Copy the API key immediately (you won't see it again) 2. Add and connect the AI Agent node: * Click "+" after the Code node * Search for "Basic LLM Chain" or "AI Agent" * Drag the `message` field from the Code node to the AI Agent's input prompt field * Select OpenAI as the LLM provider 3. Add your OpenAI credentials: * Click "Create New Credential" for OpenAI * Paste your OpenAI API key * Select model: gpt-5-mini (cost-effective) or gpt-5 (more capable) * Click "Save" 4. Add the system prompt to the AI Agent: * In the AI Agent node, add this system prompt: ``` You are an AI news analyst. Analyze the provided AI news articles and create a concise, insightful summary highlighting the most important developments and trends. Group related topics together and provide context about why these developments matter. Keep the summary conversational and engaging, around 3-4 paragraphs. ``` 5. Update the Telegram node and test: * Update the Telegram node: * Text: Drag the AI Agent's output (the generated summary) * Remove the old mapping to the Code node's message * Click "Execute Workflow" to test * The AI will analyze all news articles and create a summary * Check your Telegram for the AI-generated summary The AI Agent receives all the formatted news articles and creates an intelligent summary, making it easier to understand trends and important developments at a glance. ## Understanding Firecrawl Operations Now that you've built some workflows, let's explore the different Firecrawl operations available in n8n. Each operation is designed for specific web scraping use cases. ### Scrape a url and get its content Extracts content from a single web page and returns it in various formats. What it does: * Scrapes a single URL * Returns clean markdown, HTML, or AI-generated summaries * Can capture screenshots and extract links Best for: * Article extraction * Product page monitoring * Blog post scraping * Generating page summaries Example use case: Daily blog summaries (like Example 1 above) ### Search and optionally scrape search results Performs web searches and returns results with optional content scraping. What it does: * Searches the web, news, or images * Returns titles, descriptions, and URLs * Optionally scrapes the full content of results Best for: * Research automation * News monitoring * Trend discovery * Finding relevant content Example use case: AI news aggregation (like Example 2 above) ### Crawl a website Recursively discovers and scrapes multiple pages from a website. What it does: * Follows links automatically * Scrapes multiple pages in one operation * Can filter URLs by patterns Best for: * Full documentation extraction * Site archiving * Multi-page data collection ### Map a website and get urls Returns all URLs found on a website without scraping content. What it does: * Discovers all links on a site * Returns clean URL list * Fast and lightweight Best for: * URL discovery * Sitemap generation * Planning larger crawls ### Extract Data Uses AI to extract structured information based on custom prompts or schemas. What it does: * AI-powered data extraction * Returns data in your specified format * Works across multiple pages Best for: * Custom data extraction * Building databases * Structured information gathering ### Batch Scrape Scrapes multiple URLs in parallel efficiently. What it does: * Processes multiple URLs at once * More efficient than loops * Returns all results together Best for: * Processing URL lists * Bulk data collection * Large-scale scraping projects ### Agent Uses an AI agent to autonomously browse and extract data from websites based on a natural language prompt. What it does: * Accepts a prompt describing what data you need * AI agent navigates and extracts information autonomously * Available in Sync mode (waits for results) and Async mode (returns a job ID immediately) * Use Get Agent Status to poll for results when using Async mode Best for: * Complex, multi-page data gathering guided by a prompt * Extracting information when you don't know the exact page structure * Research tasks that require navigating multiple pages Sync vs. Async: * Agent (Sync) starts the job and waits for the result in one step — simplest for most use cases. The Max Wait Time parameter controls how long the node polls before timing out (default: 300 seconds, maximum: 600 seconds). If the agent job takes longer than this, the node returns a timeout status even though the job may still complete on the Firecrawl side. For jobs that may exceed 10 minutes, use the async mode instead. * Agent (Async) returns a job ID immediately. Add a second Firecrawl node with the Get Agent Status operation to retrieve results once the job completes. For details on the agent feature, see the [Agent documentation](/features/agent). ### Browser Sandbox Provides a persistent browser session that you can control with code, allowing multi-step browser automation within a single session. Operations: * Create Browser Session — starts a new browser session and returns a `sessionId` * Execute Browser Code — runs JavaScript, Python, or bash code in the session (using the `sessionId` from the Create step) * List Browser Sessions — lists active or destroyed sessions * Delete Browser Session — destroys a session when you are done Best for: * Multi-step browser workflows that require maintaining state across pages * Dynamic page navigation where the number of steps is not known in advance * Workflows that use persistent browser profiles to preserve cookies and localStorage across runs In n8n, select the Browser resource on the Firecrawl node to access these operations. Pass the `sessionId` from the Create step into each subsequent Execute or Delete step. Use n8n's Loop Over Items node to iterate through a dynamic list of pages, calling Execute for each one within the same session. For details on the Browser Sandbox feature, see the [Browser Sandbox documentation](/features/browser-sandbox). ## Workflow Templates and Examples Instead of building from scratch, you can start with pre-built templates. The n8n community has created numerous Firecrawl workflows you can copy and customize. ### Featured Templates Build an AI chatbot with web access using Firecrawl and n8n Ready-to-use templates for lead generation, price monitoring, and more Scrape pages into embeddings and store in Supabase pgvector for RAG Scrape company websites and extract structured business signals Scrape pages into embeddings and store in Pinecone for RAG Browse hundreds of workflows using Firecrawl View official integration documentation ### How to Import Templates To use a template from the n8n community: 1. Click on a workflow template link 2. Click "Import template to localhost:5678 self-hosted instance" button on the template page 3. The workflow opens in your n8n instance 4. Configure credentials for each node 5. Customize settings for your use case 6. Activate the workflow ## Best Practices Follow these guidelines to build reliable, efficient workflows: ### Testing and Debugging * Always test workflows manually before activating schedules * Use the "Execute Workflow" button to test the entire flow * Check output data at each node to verify correctness * Use the "Executions" tab to review past runs and debug issues ### Error Handling * Add Error Trigger nodes to catch and handle failures * Set up notifications when workflows fail * Use the "Continue On Fail" setting for non-critical nodes * Monitor your workflow executions regularly ### Performance Optimization * Use Batch Scrape for multiple URLs instead of loops * Set appropriate rate limits to avoid overwhelming target sites * Cache data when possible to reduce unnecessary requests * Schedule intensive workflows during off-peak hours ### Security * Never expose API keys in workflow configurations * Use n8n's credential system to securely store authentication * Be careful when sharing workflows publicly * Follow target websites' terms of service and robots.txt ## Next Steps You now have the fundamentals to build web scraping automations with Firecrawl and n8n. Here's how to continue learning: ### Explore Advanced Features * Study webhook configurations for real-time data processing * Experiment with AI-powered extraction using prompts and schemas * Build complex multi-step workflows with branching logic ### Join the Community * [Firecrawl Discord](https://discord.gg/firecrawl) - Get help with Firecrawl and discuss web scraping * [n8n Community Forum](https://community.n8n.io/) - Ask questions about workflow automation * Share your workflows and learn from others ### Recommended Learning Path 1. Complete the example workflows in this guide 2. Modify templates from the community library 3. Build a workflow to solve a real problem in your work 4. Explore advanced Firecrawl operations 5. Contribute your own templates to help others Need help? If you're stuck or have questions, the Firecrawl and n8n communities are active and helpful. Don't hesitate to ask for guidance as you build your automations. ## Additional Resources * [Firecrawl API Documentation](/api-reference/v2-introduction) * [n8n Documentation](https://docs.n8n.io/) * [Web Scraping Best Practices](https://www.firecrawl.dev/blog) # Firecrawl + Zapier Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/zapier Official tutorials and Zapier integration templates for Firecrawl + Zapier automation Official Zapier Integration: [zapier.com/apps/firecrawl/integrations](https://zapier.com/apps/firecrawl/integrations) Connect with 8,000+ apps • No-code automation • Pre-built Zap templates • Cloud-based ## Official Blog Post Real-world case study: How Zapier integrated Firecrawl into Zapier Chatbots in a single afternoon. ## Popular Integrations ### Google Sheets → [View Integration](https://zapier.com/apps/google-sheets/integrations/firecrawl) Track competitor data, centralize marketing insights, and automate data collection. Best For: Business owners, marketing teams * ### Airtable → [View Integration](https://zapier.com/apps/airtable/integrations/firecrawl) Build lead generation databases and content aggregation systems with structured storage. Best For: Sales teams, project managers * ### Zapier Tables → [View Integration](https://zapier.com/apps/zapier-tables/integrations/firecrawl) No-code database automation for employee onboarding and centralized lead management. Best For: HR teams, operations ### Slack → [View Integration](https://zapier.com/apps/slack/integrations/firecrawl) Get website change notifications, competitor monitoring alerts, and market intelligence updates. Best For: Marketing teams, product managers * ### Telegram → [View Integration](https://zapier.com/apps/telegram/integrations/firecrawl) Instant price alerts, breaking news notifications, and real-time monitoring. Best For: Traders, news enthusiasts ### HubSpot → [View Integration](https://zapier.com/apps/hubspot/integrations/firecrawl) Contact enrichment, lead scoring with web data, and marketing automation. Best For: Marketing ops, sales ops * ### Pipedrive → [View Integration](https://zapier.com/apps/pipedrive/integrations/firecrawl) Lead enrichment from websites and competitor intelligence tracking. Best For: Sales teams, account executives * ### Attio → [View Integration](https://zapier.com/apps/attio/integrations/firecrawl) Modern CRM data enrichment and relationship intelligence. Best For: Modern sales teams ### Google Docs → [View Integration](https://zapier.com/apps/google-docs/integrations/firecrawl) Automated report generation, research documentation, and content aggregation. Best For: Researchers, content creators * ### Notion → [View Integration](https://zapier.com/apps/notion/integrations/firecrawl) Knowledge base updates, research library building, and content curation. Best For: Product teams, researchers ### Schedule by Zapier → [View Integration](https://zapier.com/apps/schedule/integrations/firecrawl) Run hourly, daily, weekly, or monthly scraping automatically. * ### Zapier Interfaces → [View Integration](https://zapier.com/apps/interfaces/integrations/firecrawl) Build custom internal tools with form-based scraping and team dashboards. Best For: Operations teams * ### Zapier Chatbots → [View Integration](https://zapier.com/apps/zapier-chatbots/integrations/firecrawl) AI chatbots with live web knowledge for customer support and lead generation. Official Zapier product uses Firecrawl internally ## Firecrawl Actions | Action | Use Case | | --------------------------- | ----------------------------------------- | | Scrape URL | Quick single-page data capture | | Crawl Website | Full site scraping with multiple pages | | Extract Structured Data | AI-powered extraction with custom schemas | | Search Web | Research automation with search + scrape | | Map Website | SEO analysis and site structure mapping | ## Quick Reference 1. Sign up at [firecrawl.dev](https://firecrawl.dev) 2. Get your API key 3. Create a Zap in Zapier 4. Connect Firecrawl with your API key 5. Choose your workflow and activate * Use `/Scrape URL` for single pages (faster) * Schedule strategically (hourly/daily/weekly) * Test in Firecrawl playground first * Add error handling for failed scrapes * Use filters to prevent unnecessary runs ## Industry Use Cases * Price monitoring across competitors * Product availability tracking * Review aggregation * Listing aggregation * Market trend analysis * Property data collection * Competitor content tracking * SEO monitoring * Backlink analysis * Market data collection * News aggregation * Regulatory filing monitoring * Job posting aggregation * Company research automation * Candidate information enrichment ## Zapier vs n8n | Feature | Zapier | n8n | | ---------------- | ------------------------------------- | ------------------------ | | Setup | No-code, cloud-based | Self-hosted or cloud | | Pricing | Per-task pricing | Flat monthly | | Integrations | 8,000+ apps | 400+ integrations | | Best For | Quick automation, non-technical users | Custom logic, developers | Pro Tip: Start with Zapier's pre-built templates and customize as needed. Perfect for quick, no-code automation! # Agent Source: https://docs.firecrawl.dev/features/agent Gather data wherever it lives on the web. Firecrawl `/agent` is a magic API that searches, navigates, and gathers data from the widest range of websites, finding data in hard-to-reach places and uncovering data in ways no other API can. It accomplishes in a few minutes what would take a human many hours — end-to-end data collection, without scripts or manual work. Whether you need one data point or entire datasets at scale, Firecrawl `/agent` works to get your data. Think of `/agent` as deep research for data, wherever it is! Research Preview: Agent is in early access. Expect rough edges. It will get significantly better over time. [Share feedback →](mailto:product@firecrawl.com) Agent builds on everything great about `/extract` and takes it further: * No URLs Required: Just describe what you need via `prompt` parameter. URLs are optional * Deep Web Search: Autonomously searches and navigates deep into sites to find your data * Reliable and Accurate: Works with a wide variety of queries and use cases * Faster: Processes multiple sources in parallel for quicker results Test the agent in the interactive playground — no code required. ## Using `/agent` The only required parameter is `prompt`. Simply describe what data you want to extract. For structured output, provide a JSON schema. The SDKs support Pydantic (Python) and Zod (Node) for type-safe schema definitions: ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") background: Optional[str] = Field(None, description="Professional background") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema, model="spark-1-mini", max_credits=100 ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ prompt: "Find the founders of Firecrawl", schema: z.object({ founders: z.array(z.object({ name: z.string().describe("Full name of the founder"), role: z.string().describe("Role or position").optional(), background: z.string().describe("Professional background").optional() })).describe("List of founders") }), model: "spark-1-mini", maxCredits: 100 }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the founders of Firecrawl", "model": "spark-1-mini", "maxCredits": 100, "schema": { "type": "object", "properties": { "founders": { "type": "array", "description": "List of founders", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "Full name" }, "role": { "type": "string", "description": "Role or position" }, "background": { "type": "string", "description": "Professional background" } }, "required": ["name"] } } }, "required": ["founders"] } }' ``` ### Response ```json JSON theme={null} { "success": true, "status": "completed", "data": { "founders": [ { "name": "Eric Ciarla", "role": "Co-founder", "background": "Previously at Mendable" }, { "name": "Nicolas Camara", "role": "Co-founder", "background": "Previously at Mendable" }, { "name": "Caleb Peffer", "role": "Co-founder", "background": "Previously at Mendable" } ] }, "expiresAt": "2024-12-15T00:00:00.000Z", "creditsUsed": 15 } ``` ## Providing URLs (Optional) You can optionally provide URLs to focus the agent on specific pages: ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.agent( urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt="Compare the features and pricing information from these pages" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ urls: ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt: "Compare the features and pricing information from these pages" }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://docs.firecrawl.dev", "https://firecrawl.dev/pricing" ], "prompt": "Compare the features and pricing information from these pages" }' ``` ## Job Status and Completion Agent jobs run asynchronously. When you submit a job, you'll receive a Job ID that you can use to check status: * Default method: `agent()` waits and returns final results * Start then poll: Use `start_agent` (Python) or `startAgent` (Node) to get a Job ID immediately, then poll with `get_agent_status` / `getAgentStatus` Job results are available via the API for 24 hours after completion. After this period, you can still view your agent history and results in the [activity logs](https://www.firecrawl.dev/app/logs). ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Start an agent job agent_job = app.start_agent( prompt="Find the founders of Firecrawl" ) # Check the status status = app.get_agent_status(agent_job.id) print(status) # Example output: # status='completed' # success=True # data={ ... } # expires_at=datetime.datetime(...) # credits_used=15 ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Start an agent job const started = await firecrawl.startAgent({ prompt: "Find the founders of Firecrawl" }); // Check the status if (started.id) { const status = await firecrawl.getAgentStatus(started.id); console.log(status.status, status.data); } ``` ```bash cURL theme={null} curl -X GET "https://api.firecrawl.dev/v2/agent/" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ### Possible States | Status | Description | | ------------ | ------------------------------------------ | | `processing` | The agent is still working on your request | | `completed` | Extraction finished successfully | | `failed` | An error occurred during extraction | | `cancelled` | The job was cancelled by the user | #### Pending Example ```json JSON theme={null} { "success": true, "status": "processing", "expiresAt": "2024-12-15T00:00:00.000Z" } ``` #### Completed Example ```json JSON theme={null} { "success": true, "status": "completed", "data": { "founders": [ { "name": "Eric Ciarla", "role": "Co-founder" }, { "name": "Nicolas Camara", "role": "Co-founder" }, { "name": "Caleb Peffer", "role": "Co-founder" } ] }, "expiresAt": "2024-12-15T00:00:00.000Z", "creditsUsed": 15 } ``` ## Share agent runs You can share agent runs directly from the Agent playground. Shared links are public — anyone with the link can view the run output and activity — and you can revoke access at any time to disable the link. Shared pages are not indexed by search engines. ## Model Selection Firecrawl Agent offers two models. Spark 1 Mini is 60% cheaper and is the default — perfect for most use cases. Upgrade to Spark 1 Pro when you need maximum accuracy on complex tasks. | Model | Cost | Accuracy | Best For | | -------------- | --------------- | -------- | ------------------------------------- | | `spark-1-mini` | 60% cheaper | Standard | Most tasks (default) | | `spark-1-pro` | Standard | Higher | Complex research, critical extraction | Start with Spark 1 Mini (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical. ### Spark 1 Mini (Default) `spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks. Use Mini when: * Extracting simple data points (contact info, pricing, etc.) * Working with well-structured websites * Cost efficiency is a priority * Running high-volume extraction jobs ### Spark 1 Pro `spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks. Use Pro when: * Performing complex competitive analysis * Extracting data that requires deep reasoning * Accuracy is critical for your use case * Dealing with ambiguous or hard-to-find data ### Specifying a Model Pass the `model` parameter to select which model to use: ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Using Spark 1 Mini (default - can be omitted) result = app.agent( prompt="Find the pricing of Firecrawl", model="spark-1-mini" ) # Using Spark 1 Pro for complex tasks result = app.agent( prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Using Spark 1 Mini (default - can be omitted) const result = await firecrawl.agent({ prompt: "Find the pricing of Firecrawl", model: "spark-1-mini" }); // Using Spark 1 Pro for complex tasks const resultPro = await firecrawl.agent({ prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model: "spark-1-pro" }); console.log(result.data); ``` ```bash cURL theme={null} # Using Spark 1 Mini (default) curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the pricing of Firecrawl", "model": "spark-1-mini" }' # Using Spark 1 Pro for complex tasks curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", "model": "spark-1-pro" }' ``` ## Parameters | Parameter | Type | Required | Description | | ------------ | ------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `prompt` | string | Yes | Natural language description of the data you want to extract (max 10,000 characters) | | `model` | string | No | Model to use: `spark-1-mini` (default) or `spark-1-pro` | | `urls` | array | No | Optional list of URLs to focus the extraction | | `schema` | object | No | Optional JSON schema for structured output | | `maxCredits` | number | No | Maximum number of credits to spend on this agent task. Defaults to 2,500 if not set. The dashboard supports values up to 2,500; for higher limits, set `maxCredits` via the API (values above 2,500 are always treated as paid requests). If the limit is reached, the job fails and no data is returned, though credits consumed for work performed are still charged. | ## Agent vs Extract: What's Improved | Feature | Agent (New) | Extract | | ----------------- | ----------- | -------- | | URLs Required | No | Yes | | Speed | Faster | Standard | | Cost | Lower | Standard | | Reliability | Higher | Standard | | Query Flexibility | High | Moderate | ## Example Use Cases * Research: "Find the top 5 AI startups and their funding amounts" * Competitive Analysis: "Compare pricing plans between Slack and Microsoft Teams" * Data Gathering: "Extract contact information from company websites" * Content Summarization: "Summarize the latest blog posts about web scraping" ## CSV Upload in Agent Playground The [Agent Playground](https://www.firecrawl.dev/app/agent) supports CSV upload for batch processing. Your CSV can contain one or more columns of input data. For example, a single column of company names, or multiple columns such as company name, product, and website URL. Each row represents one item for the agent to process. Upload your CSV, then add output columns using the "+" button in the grid header. Each column has its own prompt — click a column header to describe what the agent should find for that field (e.g., "CEO or founder name", "Total funding raised"). Hit Run, and the agent processes each row in parallel, filling in the results. ## API Reference Check out the [Agent API Reference](/api-reference/endpoint/agent) for more details. Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com). ## Pricing Firecrawl Agent uses dynamic billing that scales with the complexity of your data extraction request. You pay based on the actual work Agent performs, ensuring fair pricing whether you're extracting simple data points or complex structured information from multiple sources. ### How Agent pricing works Agent pricing is dynamic and credit-based during Research Preview: * Simple extractions (like contact info from a single page) typically use fewer credits and cost less * Complex research tasks (like competitive analysis across multiple domains) use more credits but reflect the total effort involved * Transparent usage shows you exactly how many credits each request consumed * Credit conversion automatically converts agent credit usage to credits for easy billing Credit usage varies based on the complexity of your prompt, the amount of data processed, and the structure of the output requested. As a rough guide, most agent runs consume a few hundred credits, though simpler single-page tasks may use less and complex multi-domain research may use more. ### Parallel Agents Pricing If you are running multiple agents in parallel with Spark-1 Fast, pricing is a lot more predictable at 10 credits per cell. ### Getting started All users receive 5 free daily runs, which can be used from either the playground or the API, to explore Agent's capabilities without any cost. Additional usage is billed based on credit consumption and converted to credits. ### Managing costs Agent can be expensive, but there are some ways to decrease the cost: * Start with free runs: Use your 5 daily free requests to understand pricing * Set a `maxCredits` parameter: Limit your spending by setting a maximum number of credits you're willing to spend. The dashboard caps this at 2,500 credits; to set a higher limit, use the `maxCredits` parameter directly via the API (note: values above 2,500 are always billed as paid requests) * Optimize prompts: More specific prompts often use fewer credits * Break large tasks into smaller runs: A single agent run has an output ceiling based on the underlying model's generation capacity (\~150-200 rows of structured data). For large extraction jobs, split by category, region, or URL batch (3-5 URLs per run) and merge the results. This also keeps each run well under the `maxCredits` limit. * Monitor usage: Track your consumption through the dashboard * Set expectations: Complex multi-domain research will use more credits than simple single-page extractions Try Agent now at [firecrawl.dev/app/agent](https://www.firecrawl.dev/app/agent) to see how credit usage scales with your specific use cases. Pricing is subject to change as we move from Research Preview to general availability. Current users will receive advance notice of any pricing updates. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Batch Scrape Source: https://docs.firecrawl.dev/features/batch-scrape Scrape multiple URLs in a single batch job Batch scrape lets you scrape multiple URLs in a single job. Pass a list of URLs and optional parameters, and Firecrawl processes them concurrently and returns all results together. * Works like `/crawl` but for an explicit list of URLs * Synchronous and asynchronous modes * Supports all scrape options including structured extraction * Configurable concurrency per job ## How it works You can run a batch scrape in two ways: | Mode | SDK method (JS / Python) | Behavior | | ------------ | ----------------------------------------- | ---------------------------------------------------------------- | | Synchronous | `batchScrape` / `batch_scrape` | Starts the batch and waits for completion, returning all results | | Asynchronous | `startBatchScrape` / `start_batch_scrape` | Starts the batch and returns a job ID for polling or webhooks | ## Basic usage ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") start = firecrawl.start_batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"]) # returns id job = firecrawl.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"], poll_interval=2, wait_timeout=120) print(job.status, job.completed, job.total) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Start a batch scrape job const { id } = await firecrawl.startBatchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev' ], { options: { formats: ['markdown'] }, }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev' ], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 }); console.log(job.status, job.completed, job.total); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"], "formats": ["markdown"] }' ``` ### Response Calling `batchScrape` / `batch_scrape` returns the full results when the batch completes. ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` Calling `startBatchScrape` / `start_batch_scrape` returns a job ID you can track via `getBatchScrapeStatus` / `get_batch_scrape_status`, the API endpoint `/batch/scrape/{id}`, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the [activity logs](https://www.firecrawl.dev/app/logs). ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Concurrency By default, a batch scrape job uses your team's full concurrent browser limit (see [Rate Limits](/rate-limits)). You can lower this per job with the `maxConcurrency` parameter. For example, `maxConcurrency: 50` limits that job to 50 simultaneous scrapes. Setting this value too low on large batches will significantly slow down processing, so only reduce it if you need to leave capacity for other concurrent jobs. ## Structured extraction You can use batch scrape to extract structured data from every page in the batch. This is useful when you want the same schema applied to a list of URLs. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Scrape multiple websites: batch_scrape_result = firecrawl.batch_scrape( ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], formats=[{ 'type': 'json', 'prompt': 'Extract the title and description from the page.', 'schema': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'description': {'type': 'string'} }, 'required': ['title', 'description'] } }] ) print(batch_scrape_result) # Or, you can use the start method: batch_scrape_job = firecrawl.start_batch_scrape( ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], formats=[{ 'type': 'json', 'prompt': 'Extract the title and description from the page.', 'schema': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'description': {'type': 'string'} }, 'required': ['title', 'description'] } }] ) print(batch_scrape_job) # You can then use the job ID to check the status of the batch scrape: batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id) print(batch_scrape_status) ``` ```js Node theme={null} import Firecrawl, { ScrapeResponse } from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); // Define schema to extract contents into const schema = { type: "object", properties: { title: { type: "string" }, description: { type: "string" } }, required: ["title", "description"] }; // Scrape multiple websites (synchronous): const batchScrapeResult = await firecrawl.batchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], { formats: [ { type: "json", prompt: "Extract the title and description from the page.", schema: schema } ] }); // Output all the results of the batch scrape: console.log(batchScrapeResult) // Or, you can use the start method: const batchScrapeJob = await firecrawl.startBatchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], { formats: [ { type: "json", prompt: "Extract the title and description from the page.", schema: schema } ] }); console.log(batchScrapeJob) // You can then use the job ID to check the status of the batch scrape: const batchScrapeStatus = await firecrawl.getBatchScrapeStatus(batchScrapeJob.id); console.log(batchScrapeStatus) ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"], "formats" : [{ "type": "json", "prompt": "Extract the title and description from the page.", "schema": { "type": "object", "properties": { "title": { "type": "string" }, "description": { "type": "string" } }, "required": [ "title", "description" ] } }] }' ``` ### Response `batchScrape` / `batch_scrape` returns full results: ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "json": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot." } }, ... ] } ``` `startBatchScrape` / `start_batch_scrape` returns a job ID: ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Webhooks You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This lets you process results immediately instead of waiting for the entire batch to complete. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Event types | Event | Description | | ------------------------ | ------------------------------------- | | `batch_scrape.started` | The batch scrape job has begun | | `batch_scrape.page` | A single URL was successfully scraped | | `batch_scrape.completed` | All URLs have been processed | | `batch_scrape.failed` | The batch scrape encountered an error | ### Payload Each webhook delivery includes a JSON body with the following structure: ```json theme={null} { "success": true, "type": "batch_scrape.page", "id": "batch-job-id", "data": [...], "metadata": {}, "error": null } ``` ### Verifying webhook signatures Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with. 1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings 2. Extract the signature from the `X-Firecrawl-Signature` header 3. Compute HMAC-SHA256 of the raw request body using your secret 4. Compare with the signature header using a timing-safe function Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...` For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Change Tracking Source: https://docs.firecrawl.dev/features/change-tracking Detect and monitor changes in web content between scrapes Change tracking compares the current content of a page against the last time you scraped it. Add `changeTracking` to your `formats` array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed. * Works with `/scrape`, `/crawl`, and `/batch/scrape` * Two diff modes: `git-diff` for line-level changes, `json` for field-level comparison * Scoped to your team, and optionally scoped to a tag that you pass in ## How it works Every scrape with `changeTracking` enabled stores a snapshot and compares it against the previous snapshot for that URL. Snapshots are stored persistently and do not expire, so comparisons remain accurate regardless of how much time has passed between scrapes. | Scrape | Result | | ----------------- | -------------------------------------------------- | | First time | `changeStatus: "new"` (no previous version exists) | | Content unchanged | `changeStatus: "same"` | | Content modified | `changeStatus: "changed"` (diff data available) | | Page removed | `changeStatus: "removed"` | The response includes these fields in the `changeTracking` object: | Field | Type | Description | | ------------------ | --------------------- | ---------------------------------------------------------------------------------------------- | | `previousScrapeAt` | `string \| null` | Timestamp of the previous scrape (`null` on first scrape) | | `changeStatus` | `string` | `"new"`, `"same"`, `"changed"`, or `"removed"` | | `visibility` | `string` | `"visible"` (discoverable via links/sitemap) or `"hidden"` (URL works but is no longer linked) | | `diff` | `object \| undefined` | Line-level diff (only present in `git-diff` mode when status is `"changed"`) | | `json` | `object \| undefined` | Field-level comparison (only present in `json` mode when status is `"changed"`) | ## Basic usage Include both `markdown` and `changeTracking` in the `formats` array. The `markdown` format is required because change tracking compares pages via their markdown content. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") result = firecrawl.scrape( "https://example.com/pricing", formats=["markdown", "changeTracking"] ) print(result.changeTracking) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://example.com/pricing', { formats: ['markdown', 'changeTracking'] }); console.log(result.changeTracking); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": ["markdown", "changeTracking"] }' ``` ### Response On the first scrape, `changeStatus` is `"new"` and `previousScrapeAt` is `null`: ```json theme={null} { "success": true, "data": { "markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...", "changeTracking": { "previousScrapeAt": null, "changeStatus": "new", "visibility": "visible" } } } ``` On subsequent scrapes, `changeStatus` reflects whether content changed: ```json theme={null} { "success": true, "data": { "markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...", "changeTracking": { "previousScrapeAt": "2025-06-01T10:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible" } } } ``` ## Git-diff mode The `git-diff` mode returns line-by-line changes in a format similar to `git diff`. Pass an object in the `formats` array with `modes: ["git-diff"]`: ```python Python theme={null} result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] ) if result.changeTracking.changeStatus == "changed": print(result.changeTracking.diff.text) ``` ```js Node theme={null} const result = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', modes: ['git-diff'] } ] }); if (result.changeTracking.changeStatus === 'changed') { console.log(result.changeTracking.diff.text); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }' ``` ### Response The `diff` object contains both a plain-text diff and a structured JSON representation: ```json theme={null} { "changeTracking": { "previousScrapeAt": "2025-06-01T10:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "diff": { "text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo", "json": { "files": [{ "chunks": [{ "content": "@@ -1,3 +1,3 @@", "changes": [ { "type": "normal", "content": "# Pricing" }, { "type": "del", "ln": 2, "content": "Starter: $9/mo" }, { "type": "del", "ln": 3, "content": "Pro: $29/mo" }, { "type": "add", "ln": 2, "content": "Starter: $12/mo" }, { "type": "add", "ln": 3, "content": "Pro: $39/mo" } ] }] }] } } } } ``` The structured `diff.json` object contains: * `files`: array of changed files (typically one for web pages) * `chunks`: sections of changes within a file * `changes`: individual line changes with `type` (`"add"`, `"del"`, or `"normal"`), line number (`ln`), and `content` ## JSON mode The `json` mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff. Pass `modes: ["json"]` with a `schema` defining the fields to extract: ```python Python theme={null} result = firecrawl.scrape( "https://example.com/product/widget", formats=[ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "price": { "type": "string" }, "availability": { "type": "string" } } } } ] ) if result.changeTracking.changeStatus == "changed": changes = result.changeTracking.json print(f"Price: {changes['price']['previous']} → {changes['price']['current']}") ``` ```js Node theme={null} const result = await firecrawl.scrape('https://example.com/product/widget', { formats: [ 'markdown', { type: 'changeTracking', modes: ['json'], schema: { type: 'object', properties: { price: { type: 'string' }, availability: { type: 'string' } } } } ] }); if (result.changeTracking.changeStatus === 'changed') { const changes = result.changeTracking.json; console.log(`Price: ${changes.price.previous} → ${changes.price.current}`); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/product/widget", "formats": [ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "price": { "type": "string" }, "availability": { "type": "string" } } } } ] }' ``` ### Response Each field in the schema is returned with `previous` and `current` values: ```json theme={null} { "changeTracking": { "previousScrapeAt": "2025-06-05T08:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "json": { "price": { "previous": "$19.99", "current": "$24.99" }, "availability": { "previous": "In Stock", "current": "In Stock" } } } } ``` You can also pass an optional `prompt` to guide the LLM extraction alongside the schema. JSON mode uses LLM extraction and costs 5 credits per page. Basic change tracking and `git-diff` mode have no additional cost. ## Using tags By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain separate tracking histories for the same URL, which is useful when you monitor the same page at different intervals or in different contexts. ```python Python theme={null} # Hourly monitoring (compared against last "hourly" scrape) result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "tag": "hourly" } ] ) # Daily summary (compared against last "daily" scrape) result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "tag": "daily" } ] ) ``` ```js Node theme={null} // Hourly monitoring (compared against last "hourly" scrape) const result = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', tag: 'hourly' } ] }); // Daily summary (compared against last "daily" scrape) const result2 = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', tag: 'daily' } ] }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "tag": "hourly" } ] }' ``` ## Crawl with change tracking Add change tracking to crawl operations to monitor an entire site for changes. Pass the `changeTracking` format inside `scrapeOptions`: ```python Python theme={null} result = firecrawl.crawl( "https://example.com", limit=50, scrape_options={ "formats": ["markdown", "changeTracking"] } ) for page in result.data: status = page.changeTracking.changeStatus url = page.metadata.url print(f"{url}: {status}") ``` ```js Node theme={null} const result = await firecrawl.crawl('https://example.com', { limit: 50, scrapeOptions: { formats: ['markdown', 'changeTracking'] } }); for (const page of result.data) { console.log(`${page.metadata.url}: ${page.changeTracking.changeStatus}`); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "scrapeOptions": { "formats": ["markdown", "changeTracking"] } }' ``` ## Batch scrape with change tracking Use [batch scrape](/features/batch-scrape) to monitor a specific set of URLs: ```python Python theme={null} result = firecrawl.batch_scrape( [ "https://example.com/pricing", "https://example.com/product/widget", "https://example.com/blog/latest" ], formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}] ) ``` ```js Node theme={null} const result = await firecrawl.batchScrape([ 'https://example.com/pricing', 'https://example.com/product/widget', 'https://example.com/blog/latest' ], { options: { formats: ['markdown', { type: 'changeTracking', modes: ['git-diff'] }] } }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example.com/pricing", "https://example.com/product/widget", "https://example.com/blog/latest" ], "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }' ``` ## Scheduling change tracking Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools. ### Cron job Create a script that scrapes a URL and alerts on changes: ```bash check-pricing.sh theme={null} #!/bin/bash RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://competitor.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "starter_price": { "type": "string" }, "pro_price": { "type": "string" } } } } ] }') STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus') if [ "$STATUS" = "changed" ]; then echo "$RESPONSE" | jq '.data.changeTracking.json' # Send alert via email, Slack, etc. fi ``` Schedule it with `crontab -e`: ```bash theme={null} 0 /6 * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1 ``` | Schedule | Expression | | ------------------------ | ------------- | | Every hour | `0 * * * ` | | Every 6 hours | `0 /6 * * ` | | Daily at 9 AM | `0 9 * ` | | Weekly on Monday at 8 AM | `0 8 * 1` | ### Cloud and serverless schedulers * AWS: EventBridge rule triggering a Lambda function * GCP: Cloud Scheduler triggering a Cloud Function * Vercel / Netlify: Cron-triggered serverless functions * GitHub Actions: Scheduled workflows with `schedule` and `cron` trigger ### Workflow automation No-code platforms like n8n, Zapier, and Make can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the [workflow automation guides](/developer-guides/workflow-automation/n8n). ## Webhooks For async operations like crawl and batch scrape, use [webhooks](/webhooks/overview) to receive change tracking results as they arrive instead of polling. ```python Python theme={null} job = firecrawl.start_crawl( "https://example.com", limit=50, scrape_options={ "formats": [ "markdown", {"type": "changeTracking", "modes": ["git-diff"]} ] }, webhook={ "url": "https://your-server.com/firecrawl-webhook", "headers": {"Authorization": "Bearer your-webhook-secret"}, "events": ["crawl.page", "crawl.completed"] } ) ``` ```js Node theme={null} const { id } = await firecrawl.startCrawl('https://example.com', { limit: 50, scrapeOptions: { formats: [ 'markdown', { type: 'changeTracking', modes: ['git-diff'] } ] }, webhook: { url: 'https://your-server.com/firecrawl-webhook', headers: { Authorization: 'Bearer your-webhook-secret' }, events: ['crawl.page', 'crawl.completed'] } }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "scrapeOptions": { "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }, "webhook": { "url": "https://your-server.com/firecrawl-webhook", "headers": { "Authorization": "Bearer your-webhook-secret" }, "events": ["crawl.page", "crawl.completed"] } }' ``` The `crawl.page` event payload includes the `changeTracking` object for each page: ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [{ "markdown": "# Pricing\n\nStarter: $12/mo...", "metadata": { "title": "Pricing", "url": "https://example.com/pricing", "statusCode": 200 }, "changeTracking": { "previousScrapeAt": "2025-06-05T12:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "diff": { "text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo" } } }] } ``` For webhook configuration details (headers, metadata, events, retries, signature verification), see the [Webhooks documentation](/webhooks/overview). ## Configuration reference The full set of options available when passing a `changeTracking` format object: | Parameter | Type | Default | Description | | --------- | ---------- | ---------- | ----------------------------------------------------------------- | | `type` | `string` | (required) | Must be `"changeTracking"` | | `modes` | `string[]` | `[]` | Diff modes to enable: `"git-diff"`, `"json"`, or both | | `schema` | `object` | (none) | JSON Schema for field-level comparison (required for `json` mode) | | `prompt` | `string` | (none) | Custom prompt to guide LLM extraction (used with `json` mode) | | `tag` | `string` | `null` | Separate tracking history identifier | ### Data models ```ts TypeScript theme={null} interface ChangeTrackingResult { previousScrapeAt: string | null; changeStatus: "new" | "same" | "changed" | "removed"; visibility: "visible" | "hidden"; diff?: { text: string; json: { files: Array<{ from: string | null; to: string | null; chunks: Array<{ content: string; changes: Array<{ type: "add" | "del" | "normal"; ln?: number; ln1?: number; ln2?: number; content: string; }>; }>; }>; }; }; json?: Record; } ``` ```python Python theme={null} class ChangeTrackingData(BaseModel): previous_scrape_at: Optional[str] = None change_status: str # "new" | "same" | "changed" | "removed" visibility: str # "visible" | "hidden" diff: Optional[Dict[str, Any]] = None json: Optional[Dict[str, Any]] = None ``` ## Important details The `markdown` format must always be included alongside `changeTracking`. Change tracking compares pages via their markdown content. * Snapshot retention: Snapshots are stored persistently and do not expire. A scrape performed months after the previous one will still compare correctly against the earlier snapshot. * Scoping: Comparisons are scoped to your team. Your first scrape of any URL returns `"new"`, even if other users have scraped it. * URL matching: Previous scrapes are matched on exact source URL, team ID, `markdown` format, and `tag`. Keep URLs consistent between scrapes. * Parameter consistency: Using different `includeTags`, `excludeTags`, or `onlyMainContent` settings across scrapes of the same URL produces unreliable comparisons. * Comparison algorithm: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization. * Caching: Requests with `changeTracking` bypass the index cache. The `maxAge` parameter is ignored. * Error handling: Monitor the `warning` field in responses and handle the `changeTracking` object potentially being absent (this can occur if the database lookup for the previous scrape times out). ## Billing | Mode | Cost | | --------------------- | --------------------------------------- | | Basic change tracking | No extra cost (standard scrape credits) | | `git-diff` mode | No extra cost | | `json` mode | 5 credits per page | > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Crawl Source: https://docs.firecrawl.dev/features/crawl Recursively crawl a website and get content from every page Crawl submits a URL to Firecrawl and recursively discovers and scrapes every reachable subpage. It handles sitemaps, JavaScript rendering, and rate limits automatically, returning clean markdown or structured data for each page. * Discovers pages via sitemap and recursive link traversal * Supports path filtering, depth limits, and subdomain/external link control * Returns results via polling, WebSocket, or webhook Test crawling in the interactive playground — no code required. ## Installation ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` ## Basic usage Submit a crawl job by calling `POST /v2/crawl` with a starting URL. The endpoint returns a job ID that you use to poll for results. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") docs = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=10) print(docs) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const docs = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 10 }); console.log(docs); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ```bash CLI theme={null} # Start a crawl job (returns job ID) firecrawl crawl https://firecrawl.dev # Wait for completion with progress firecrawl crawl https://firecrawl.dev --wait --progress --limit 100 ``` Each page crawled consumes 1 credit. The default crawl `limit` is 10,000 pages. Before starting, the crawl endpoint checks that your remaining credits can cover the `limit` — if not, it returns a 402 (Payment Required) error. Set a lower `limit` to match your intended crawl size (e.g. `limit: 100`) to avoid this. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page. ### Scrape options All options from the [Scrape endpoint](/api-reference/endpoint/scrape) are available in crawl via `scrapeOptions` (JS) / `scrape_options` (Python). These apply to every page the crawler scrapes, including formats, proxy, caching, actions, location, and tags. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') # Crawl with scrape options response = firecrawl.crawl('https://example.com', limit=100, scrape_options={ 'formats': [ 'markdown', { 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'title': { 'type': 'string' } } } } ], 'proxy': 'auto', 'max_age': 600000, 'only_main_content': True } ) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }); // Crawl with scrape options const crawlResponse = await firecrawl.crawl('https://example.com', { limit: 100, scrapeOptions: { formats: [ 'markdown', { type: 'json', schema: { type: 'object', properties: { title: { type: 'string' } } }, }, ], proxy: 'auto', maxAge: 600000, onlyMainContent: true, }, }); ``` ## Checking crawl status Use the job ID to poll for the crawl status and retrieve results. ```python Python theme={null} status = firecrawl.get_crawl_status("") print(status) ``` ```js Node theme={null} const status = await firecrawl.getCrawlStatus(""); console.log(status); ``` ```bash cURL theme={null} # After starting a crawl, poll status by jobId curl -s -X GET "https://api.firecrawl.dev/v2/crawl/" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # Check crawl status using job ID firecrawl crawl ``` Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the [activity logs](https://www.firecrawl.dev/app/logs). Pages in the crawl results `data` array are pages that Firecrawl successfully scraped, even if the target site returned an HTTP error like 404. The `metadata.statusCode` field shows the HTTP status code from the target site. To retrieve pages that Firecrawl itself failed to scrape (e.g. network errors, timeouts, or robots.txt blocks), use the dedicated [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint (`GET /crawl/{id}/errors`). ### Response handling The response varies based on the crawl's status. For incomplete or large responses exceeding 10MB, a `next` URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the `next` parameter is absent, it indicates the end of the crawl data. The `skip` and `next` parameters are only relevant when hitting the API directly. If you're using the SDK, pagination is handled automatically and all results are returned at once. ```json Scraping theme={null} { "status": "scraping", "total": 36, "completed": 10, "creditsUsed": 10, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=10", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` ## SDK methods There are two ways to use crawl with the SDK. ### Crawl and wait The `crawl` method waits for the crawl to complete and returns the full response. It handles pagination automatically. This is recommended for most use cases. ```python Python theme={null} from firecrawl import Firecrawl from firecrawl.types import ScrapeOptions firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Crawl a website: crawl_status = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options=ScrapeOptions(formats=['markdown', 'html']), poll_interval=30 ) print(crawl_status) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); const crawlResponse = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown', 'html'], } }) console.log(crawlResponse) ``` The response includes the crawl status and all scraped data: ```bash Python theme={null} success=True status='completed' completed=100 total=100 creditsUsed=100 expiresAt=datetime.datetime(2025, 4, 23, 19, 21, 17, tzinfo=TzInfo(UTC)) next=None data=[ Document( markdown='[Day 7 - Launch Week III.Integrations DayApril 14th to 20th](...', metadata={ 'title': '15 Python Web Scraping Projects: From Beginner to Advanced', ... 'scrapeId': '97dcf796-c09b-43c9-b4f7-868a7a5af722', 'sourceURL': 'https://www.firecrawl.dev/blog/python-web-scraping-projects', 'url': 'https://www.firecrawl.dev/blog/python-web-scraping-projects', 'statusCode': 200 } ), ... ] ``` ```json Node theme={null} { success: true, status: "completed", completed: 100, total: 100, creditsUsed: 100, expiresAt: "2025-04-23T19:28:45.000Z", data: [ { markdown: "[Day 7 - Launch Week III.Integrations DayApril ...", html: ` ``` The initial response returns the job ID: ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/crawl/123-456-789" } ``` ## Real-time results with WebSocket The watcher method provides real-time updates as pages are crawled. Start a crawl, then subscribe to events for immediate data processing. ```python Python theme={null} import asyncio from firecrawl import AsyncFirecrawl async def main(): firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY") # Start a crawl first started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5) # Watch updates (snapshots) until terminal status async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120): if snapshot.status == "completed": print("DONE", snapshot.status) for doc in snapshot.data: print("DOC", doc.metadata.source_url if doc.metadata else None) elif snapshot.status == "failed": print("ERR", snapshot.status) else: print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total) asyncio.run(main()) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start a crawl and then watch it const { id } = await firecrawl.startCrawl('https://mendable.ai', { excludePaths: ['blog/'], limit: 5, }); const watcher = firecrawl.watcher(id, { kind: 'crawl', pollInterval: 2, timeout: 120 }); watcher.on('document', (doc) => { console.log('DOC', doc); }); watcher.on('error', (err) => { console.error('ERR', err?.error || err); }); watcher.on('done', (state) => { console.log('DONE', state.status); }); // Begin watching (WS with HTTP fallback) await watcher.start(); ``` ## Webhooks You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they are scraped instead of waiting for the entire crawl to complete. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Event types | Event | Description | | ----------------- | ---------------------------------------- | | `crawl.started` | Fires when the crawl begins | | `crawl.page` | Fires for each page successfully scraped | | `crawl.completed` | Fires when the crawl finishes | | `crawl.failed` | Fires if the crawl encounters an error | ### Payload ```json theme={null} { "success": true, "type": "crawl.page", "id": "crawl-job-id", "data": [...], // Page data for 'page' events "metadata": {}, // Your custom metadata "error": null } ``` ### Verifying webhook signatures Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with. 1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings 2. Extract the signature from the `X-Firecrawl-Signature` header 3. Compute HMAC-SHA256 of the raw request body using your secret 4. Compare with the signature header using a timing-safe function Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...` For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, payload structure, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview). ## Configuration reference The full set of parameters available when submitting a crawl job: | Parameter | Type | Default | Description | | ----------------------- | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `url` | `string` | (required) | The starting URL to crawl from | | `limit` | `integer` | `10000` | Maximum number of pages to crawl | | `maxDiscoveryDepth` | `integer` | (none) | Maximum depth from the root URL based on link-discovery hops, not the number of `/` segments in the URL. Each time a new URL is found on a page, it is assigned a depth one higher than the page it was discovered on. The root site and sitemapped pages have a discovery depth of 0. Pages at the max depth are still scraped, but links on them are not followed. | | `includePaths` | `string[]` | (none) | URL pathname regex patterns to include. Only matching paths are crawled. | | `excludePaths` | `string[]` | (none) | URL pathname regex patterns to exclude from the crawl | | `regexOnFullURL` | `boolean` | `false` | Match `includePaths`/`excludePaths` against the full URL (including query parameters) instead of just the pathname | | `crawlEntireDomain` | `boolean` | `false` | Follow internal links to sibling or parent URLs, not just child paths | | `allowSubdomains` | `boolean` | `false` | Follow links to subdomains of the main domain | | `allowExternalLinks` | `boolean` | `false` | Follow links to external websites | | `sitemap` | `string` | `"include"` | Sitemap handling: `"include"` (default), `"skip"`, or `"only"` | | `ignoreQueryParameters` | `boolean` | `false` | Avoid re-scraping the same path with different query parameters | | `ignoreRobotsTxt` | `boolean` | `false` | Ignore the website's robots.txt rules. Enterprise only — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. | | `robotsUserAgent` | `string` | (none) | Custom User-Agent string for robots.txt evaluation. When set, robots.txt is fetched with this User-Agent and rules are matched against it instead of the default. Enterprise only — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. | | `delay` | `number` | (none) | Delay in seconds between scrapes to respect rate limits. Setting this forces concurrency to 1. | | `maxConcurrency` | `integer` | (none) | Maximum concurrent scrapes. Defaults to your team's concurrency limit. | | `scrapeOptions` | `object` | (none) | Options applied to every scraped page (formats, proxy, caching, actions, etc.) | | `webhook` | `object` | (none) | Webhook configuration for real-time notifications | | `prompt` | `string` | (none) | Natural language prompt to generate crawl options. Explicitly set parameters override generated equivalents. | ## Important details By default, crawl ignores sublinks that are not children of the URL you provide. For example, `website.com/other-parent/blog-1` would not be returned if you crawled `website.com/blogs/`. Use the `crawlEntireDomain` parameter to include sibling and parent paths. To crawl subdomains like `blog.website.com` when crawling `website.com`, use the `allowSubdomains` parameter. Sitemap discovery: By default, the crawler includes the website's sitemap to discover URLs (`sitemap: "include"`). If you set `sitemap: "skip"`, only pages reachable through HTML links from the root URL are found. Assets like PDFs or deeply nested pages listed in the sitemap but not directly linked from HTML will be missed. For maximum coverage, keep the default setting. * Credit usage: Each page crawled costs 1 credit. JSON mode adds 4 credits per page, enhanced proxy adds 4 credits per page, and PDF parsing costs 1 credit per PDF page. * Result expiration: Job results are available via the API for 24 hours after completion. After that, view results in the [activity logs](https://www.firecrawl.dev/app/logs). * Crawl errors: The `data` array contains pages Firecrawl successfully scraped. Use the [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint to retrieve pages that failed due to network errors, timeouts, or robots.txt blocks. * Non-deterministic results: Crawl results may vary between runs of the same configuration. Pages are scraped concurrently, so the order in which links are discovered depends on network timing and which pages finish loading first. This means different branches of a site may be explored to different extents near the depth boundary, especially at higher `maxDiscoveryDepth` values. To get more deterministic results, set `maxConcurrency` to `1` or use `sitemap: "only"` if the site has a comprehensive sitemap. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Document Parsing Source: https://docs.firecrawl.dev/features/document-parsing Learn about document parsing capabilities. Firecrawl provides powerful document parsing capabilities, allowing you to extract structured content from various document formats. This feature is particularly useful for processing files like spreadsheets, Word documents, and more. ## Supported Document Formats Firecrawl currently supports the following document formats: * Excel Spreadsheets (`.xlsx`, `.xls`) * Each worksheet is converted to an HTML table * Worksheets are separated by H2 headings with the sheet name * Preserves cell formatting and data types * Word Documents (`.docx`, `.doc`, `.odt`, `.rtf`) * Extracts text content while preserving document structure * Maintains headings, paragraphs, lists, and tables * Preserves basic formatting and styling * PDF Documents (`.pdf`) * Extracts text content with layout information * Preserves document structure including sections and paragraphs * Handles both text-based and scanned PDFs (with OCR support) * Supports `mode` option to control parsing strategy: `fast` (text-only), `auto` (text with OCR fallback, default), or `ocr` (force OCR) * Priced at 1 credit per-page. See [Pricing](https://firecrawl.dev/pricing) for details. ### PDF Parsing Modes Use the `parsers` option to control how PDFs are processed: | Mode | Description | | ------ | --------------------------------------------------------------------------------------------------------------------- | | `auto` | Attempts fast text-based extraction first, falls back to OCR if needed. This is the default. | | `fast` | Text-based parsing only (embedded text). Fastest option, but will not extract text from scanned or image-heavy pages. | | `ocr` | Forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page. | ```js theme={null} // Object syntax with mode parsers: [{ type: "pdf", mode: "ocr", maxPages: 20 }] // Default (auto mode) parsers: [{ type: "pdf" }] ``` ## How to Use Document Parsing Document parsing in Firecrawl works automatically when you provide a URL that points to a supported document type. The system will detect the file type based on the URL extension or content-type header and process it accordingly. ### Example: Scraping an Excel File ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com/data.xlsx'); console.log(doc.markdown); ``` ### Example: Scraping a Word Document ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com/data.docx'); console.log(doc.markdown); ``` ## Output Format All supported document types are converted to clean, structured markdown. For example, an Excel file with multiple sheets might be converted to: ```markdown theme={null} ## Sheet1 | Name | Value | |-------|-------| | Item 1 | 100 | | Item 2 | 200 | ## Sheet2 | Date | Description | |------------|--------------| | 2023-01-01 | First quarter| ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Enhanced Mode Source: https://docs.firecrawl.dev/features/enhanced-mode Use enhanced proxies for reliable scraping on complex sites Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. Set the `proxy` parameter to control which proxy strategy is used for a request. ## Proxy types Firecrawl supports three proxy types: | Type | Description | Speed | Cost | | ---------- | ------------------------------------------------------------ | ------ | ----------------------------------------------------------- | | `basic` | Standard proxies suitable for most sites | Fast | 1 credit | | `enhanced` | Enhanced proxies for complex sites | Slower | 5 credits per request | | `auto` | Tries `basic` first, then retries with `enhanced` on failure | Varies | 1 credit if basic succeeds, 5 credits if enhanced is needed | If you do not specify a proxy, Firecrawl defaults to `auto`. ## Basic usage Set the `proxy` parameter to choose a proxy strategy. The following example uses `auto`, which lets Firecrawl decide when to escalate to enhanced proxies. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') # Choose proxy strategy: 'basic' | 'enhanced' | 'auto' doc = firecrawl.scrape('https://example.com', formats=['markdown'], proxy='auto') print(doc.warning or 'ok') ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Choose proxy strategy: 'basic' | 'enhanced' | 'auto' const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], proxy: 'auto' }); console.log(doc.warning || 'ok'); ``` ```bash cURL theme={null} // Choose proxy strategy: 'basic' | 'enhanced' | 'auto' curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "proxy": "auto" }' ``` Enhanced proxy requests cost 5 credits per request. When using `auto`, the 5-credit cost only applies if the basic proxy fails and the enhanced retry succeeds. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Faster Scraping Source: https://docs.firecrawl.dev/features/fast-scraping Speed up your scrapes by 500% with the maxAge parameter ## How It Works Firecrawl caches previously scraped pages and, by default, returns a recent copy when available. * Default freshness: `maxAge = 172800000` ms (2 days). If the cached copy is newer than this, it’s returned instantly; otherwise, Firecrawl scrapes fresh and updates the cache. * Force fresh: Set `maxAge: 0` to always scrape. Be aware this bypasses the cache entirely, meaning every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if you don't need real-time content on every request. * Skip caching: Set `storeInCache: false` if you don’t want to store results for a request. Get your results up to 500% faster when you don’t need the absolute freshest data. Control freshness via `maxAge`: 1. Return instantly if we have a recent version of the page 2. Scrape fresh only if our version is older than your specified age 3. Save you time - results come back in milliseconds instead of seconds ## When to Use This Great for: * Documentation, articles, product pages * Bulk processing jobs * Development and testing * Building knowledge bases Skip for: * Real-time data (stock prices, live scores, breaking news) * Frequently updated content * Time-sensitive applications ## Usage Add `maxAge` to your scrape request. Values are in milliseconds (e.g., `3600000` = 1 hour). ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Use cached data if it's less than 1 hour old (3600000 ms) # This can be 500% faster than a fresh scrape! scrape_result = firecrawl.scrape( 'https://firecrawl.dev', formats=['markdown'], max_age=3600000 # 1 hour in milliseconds ) print(scrape_result.markdown) ``` ```javascript JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Use cached data if it's less than 1 hour old (3600000 ms) // This can be 500% faster than a fresh scrape! const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'], maxAge: 3600000 // 1 hour in milliseconds }); console.log(scrapeResult.markdown); ``` ## Common maxAge values Here are some helpful reference values: * 5 minutes: `300000` - For semi-dynamic content * 1 hour: `3600000` - For content that updates hourly * 1 day: `86400000` - For daily-updated content * 1 week: `604800000` - For relatively static content ## Performance impact With `maxAge` enabled: * 500% faster response times for recent content * Instant results instead of waiting for fresh scrapes ## Important notes * Default: `maxAge` is `172800000` (2 days) * Fresh when needed: If our data is older than `maxAge`, we scrape fresh automatically * No stale data: You'll never get data older than your specified `maxAge` * Credits: Cached results still cost 1 credit per page. Caching improves speed and latency, not credit usage. ### When caching is bypassed Caching is automatically skipped when your request includes any of the following: * Custom `headers` * `actions` (browser automation steps) * A browser `profile` * `changeTracking` format * Custom `screenshot` viewport or quality settings ### Cache hit matching For a cache hit, these parameters must match exactly between the original and subsequent requests: `url`, `mobile`, `location`, `waitFor`, `blockAds`, `screenshot` (enabled/disabled and full-page), and stealth proxy mode. You can verify cache behavior by checking `metadata.cacheState` in the response — it will be `"hit"` or `"miss"`. ## Faster crawling The same speed benefits apply when crawling multiple pages. Use `maxAge` within `scrapeOptions` to get cached results for pages we’ve seen recently. ```python Python theme={null} from firecrawl import Firecrawl from firecrawl.v2.types import ScrapeOptions firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Crawl with cached scraping - 500% faster for pages we've seen recently crawl_result = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options=ScrapeOptions( formats=['markdown'], max_age=3600000 # Use cached data if less than 1 hour old ) ) for page in crawl_result.data: print(f"URL: {page.metadata.source_url}") print(f"Content: {page.markdown[:200]}...") ``` ```javascript JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Crawl with cached scraping - 500% faster for pages we've seen recently const crawlResult = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown'], maxAge: 3600000 // Use cached data if less than 1 hour old } }); crawlResult.data.forEach(page => { console.log(`URL: ${page.metadata.sourceURL}`); console.log(`Content: ${page.markdown.substring(0, 200)}...`); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "limit": 100, "scrapeOptions": { "formats": ["markdown"], "maxAge": 3600000 } }' ``` When crawling with `maxAge`, each page in your crawl will benefit from the 500% speed improvement if we have recent cached data for that page. Start using `maxAge` today for dramatically faster scrapes and crawls! > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Interact after scraping Source: https://docs.firecrawl.dev/features/interact Interact with a page you fetched by prompting or running code. Scrape a page to get clean data, then call `/interact` to start taking actions in that page - click buttons, fill forms, extract dynamic content, or navigate deeper. Just describe what you want, or write code if you need full control. Describe what action you want to take in the page Interact via code execution securely with playwright, agent-browser Watch or interact with the browser in real time via embeddable stream ## How It Works 1. Scrape a URL with `POST /v2/scrape`. The response includes a `scrapeId` in `data.metadata.scrapeId`. Optionally pass a `profile` to persist browser state across sessions. 2. Interact by calling `POST /v2/scrape/{scrapeId}/interact` with a `prompt` or with playwright `code`. On the first call, the scraped session is resumed and you can start interacting with the page. 3. Stop the session with `DELETE /v2/scrape/{scrapeId}/interact` when you're done. ## Quick Start Scrape a page, interact with it, and stop the session: ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") # 1. Scrape Amazon's homepage result = app.scrape("https://www.amazon.com", formats=["markdown"]) scrape_id = result.metadata.scrape_id # 2. Interact — search for a product and get its price app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max") response = app.interact(scrape_id, prompt="Click on the first result and tell me the price") print(response.output) # 3. Stop the session app.stop_interaction(scrape_id) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // 1. Scrape Amazon's homepage const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; // 2. Interact — search for a product and get its price await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); console.log(response.output); // 3. Stop the session await app.stopInteraction(scrapeId); ``` ```bash cURL theme={null} # 1. Scrape Amazon's homepage RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "formats": ["markdown"]}') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # 2. Interact — search for a product and get its price curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Search for iPhone 16 Pro Max"}' curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Click on the first result and tell me the price"}' # 3. Stop the session curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # 1. Scrape Amazon's homepage (scrape ID is saved automatically) firecrawl scrape https://www.amazon.com # 2. Interact — search for a product and get its price firecrawl interact "Search for iPhone 16 Pro Max" firecrawl interact "Click on the first result and tell me the price" # 3. Stop the session firecrawl interact stop ``` ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.", "exitCode": 0, "killed": false } ``` ## Interact via prompting The simplest way to interact with a page. Describe what you want in natural language and it will click, type, scroll, and extract data automatically. ```python Python theme={null} response = app.interact(scrape_id, prompt="What are the customer reviews saying about battery life?") print(response.output) ``` ```js Node theme={null} const response = await app.interact(scrapeId, { prompt: 'What are the customer reviews saying about battery life?', }); console.log(response.output); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "What are the customer reviews saying about battery life?" }' ``` ```bash CLI theme={null} firecrawl interact "What are the customer reviews saying about battery life?" ``` The response includes an `output` field with the agent's answer: ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "Customers are generally positive about battery life. Most reviewers report 8-10 hours of use on a single charge. A few noted it drains faster with heavy multitasking.", "stdout": "...", "result": "...", "stderr": "", "exitCode": 0, "killed": false } ``` ### Keep Prompts Small and Focused Prompts work best when each one is a single, clear task. Instead of asking the agent to do a complex multi-step workflow in one shot, break it into separate interact calls. Each call reuses the same browser session, so state carries over between them. ## Running Code For full control, you can execute code directly in the browser sandbox. The `page` variable (a Playwright Page object) is available in Node.js and Python. Bash mode has [agent-browser](https://github.com/vercel-labs/agent-browser) pre-installed. You can also take screenshots within the session — use `(await page.screenshot()).toString("base64")` in Node.js, `await page.screenshot(path="/tmp/screenshot.png")` in Python, or `agent-browser screenshot` in Bash. ### Node.js (Playwright) The default language. Write Playwright code directly — `page` is already connected to the browser. ```python Python theme={null} response = app.interact(scrape_id, code=""" // Click a button and wait for navigation await page.click('#next-page'); await page.waitForLoadState('networkidle'); // Extract content from the new page const title = await page.title(); const content = await page.$eval('.article-body', el => el.textContent); JSON.stringify({ title, content }); """) print(response.result) ``` ```js Node theme={null} const response = await app.interact(scrapeId, { code: ` // Click a button and wait for navigation await page.click('#next-page'); await page.waitForLoadState('networkidle'); // Extract content from the new page const title = await page.title(); const content = await page.$eval('.article-body', el => el.textContent); JSON.stringify({ title, content }); `, }); console.log(response.result); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "code": "await page.click(\"#next-page\"); await page.waitForLoadState(\"networkidle\"); const title = await page.title(); JSON.stringify({ title });", "language": "node", "timeout": 30 }' ``` ```bash CLI theme={null} # Uses the last scrape automatically firecrawl interact -c " await page.click('#next-page'); await page.waitForLoadState('networkidle'); const title = await page.title(); const content = await page.\$eval('.article-body', el => el.textContent); JSON.stringify({ title, content }); " # Or pass a scrape ID explicitly # firecrawl interact -c "await page.title()" ``` ### Python Set `language` to `"python"` for Playwright's Python API. ```python Python theme={null} response = app.interact( scrape_id, code=""" import json await page.click('#load-more') await page.wait_for_load_state('networkidle') items = await page.query_selector_all('.item') data = [] for item in items: text = await item.text_content() data.append(text.strip()) print(json.dumps(data)) """, language="python", ) print(response.stdout) ``` ```js Node theme={null} const response = await app.interact(scrapeId, { code: ` import json await page.click('#load-more') await page.wait_for_load_state('networkidle') items = await page.query_selector_all('.item') data = [] for item in items: text = await item.text_content() data.append(text.strip()) print(json.dumps(data)) `, language: 'python', }); console.log(response.stdout); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "code": "import json\nawait page.click(\"#load-more\")\nawait page.wait_for_load_state(\"networkidle\")\nitems = await page.query_selector_all(\".item\")\ndata = [await i.text_content() for i in items]\nprint(json.dumps(data))", "language": "python" }' ``` ```bash CLI theme={null} firecrawl interact --python -c " import json await page.click('#load-more') await page.wait_for_load_state('networkidle') items = await page.query_selector_all('.item') data = [await i.text_content() for i in items] print(json.dumps(data)) " ``` ### Bash (agent-browser) [agent-browser](https://github.com/vercel-labs/agent-browser) is a CLI pre-installed in the sandbox with 60+ commands. It provides an accessibility tree with element refs (`@e1`, `@e2`, ...) — ideal for LLM-driven automation. ```python Python theme={null} # Take a snapshot to see interactive elements snapshot = app.interact( scrape_id, code="agent-browser snapshot -i", language="bash", ) print(snapshot.stdout) # Output: # [document] # @e1 [input type="text"] "Search..." # @e2 [button] "Search" # @e3 [link] "About" # Interact with elements using @refs app.interact( scrape_id, code='agent-browser fill @e1 "firecrawl" && agent-browser click @e2', language="bash", ) ``` ```js Node theme={null} // Take a snapshot to see interactive elements const snapshot = await app.interact(scrapeId, { code: 'agent-browser snapshot -i', language: 'bash', }); console.log(snapshot.stdout); // Output: // [document] // @e1 [input type="text"] "Search..." // @e2 [button] "Search" // @e3 [link] "About" // Interact with elements using @refs await app.interact(scrapeId, { code: 'agent-browser fill @e1 "firecrawl" && agent-browser click @e2', language: 'bash', }); ``` ```bash cURL theme={null} # Take a snapshot to see interactive elements curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"code": "agent-browser snapshot -i", "language": "bash"}' # Interact with elements using @refs curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"code": "agent-browser fill @e1 \"firecrawl\" && agent-browser click @e2", "language": "bash"}' ``` ```bash CLI theme={null} # Take a snapshot to see interactive elements firecrawl interact --bash -c "agent-browser snapshot -i" # Interact with elements using @refs firecrawl interact --bash -c 'agent-browser fill @e1 "firecrawl" && agent-browser click @e2' ``` Common agent-browser commands: | Command | Description | | ------------------------- | ----------------------------------------- | | `snapshot` | Full accessibility tree with element refs | | `snapshot -i` | Interactive elements only | | `click @e1` | Click element by ref | | `fill @e1 "text"` | Clear field and type text | | `type @e1 "text"` | Type without clearing | | `press Enter` | Press a keyboard key | | `scroll down 500` | Scroll down by pixels | | `get text @e1` | Get text content | | `get url` | Get current URL | | `wait @e1` | Wait for element | | `wait --load networkidle` | Wait for network idle | | `find text "X" click` | Find element by text and click | | `screenshot` | Take a screenshot of the current page | | `eval "js code"` | Run JavaScript in page | ## Live View Every interact response returns a `liveViewUrl` that you can embed to watch the browser in real time. Useful for debugging, demos, or building browser-powered UIs. ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "stdout": "", "result": "...", "exitCode": 0 } ``` ```html theme={null} ``` ### Interactive Live View The response also includes an `interactiveLiveViewUrl`. Unlike the standard live view which is view-only, the interactive live view allows users to click, type, and interact with the browser session directly through the embedded stream. This is useful for building user-facing browser UIs — such as login flows, or guided workflows where end users need to control the browser. ```html theme={null} <iframe src="INTERACTIVE_LIVE_VIEW_URL" width="100%" height="600" /> ``` ## Session Lifecycle ### Creation The first `POST /v2/scrape/{scrapeId}/interact` continues the scrape session and starts the interaction. ### Reuse Subsequent interact calls on the same `scrapeId` reuse the existing session. The browser stays open and maintains its state between calls, so you can chain multiple interactions: <CodeGroup> ```python Python theme={null} # First call — click a tab app.interact(scrape_id, code="await page.click('#tab-2')") # Second call — the tab is still selected, extract its content result = app.interact(scrape_id, code="await page.$eval('#tab-2-content', el => el.textContent)") print(result.result) ``` ```js Node theme={null} // First call — click a tab await app.interact(scrapeId, { code: "await page.click('#tab-2')" }); // Second call — the tab is still selected, extract its content const result = await app.interact(scrapeId, { code: "await page.$eval('#tab-2-content', el => el.textContent)", }); console.log(result.result); ``` ```bash CLI theme={null} # First call — click a tab firecrawl interact -c "await page.click('#tab-2')" # Second call — the tab is still selected, extract its content firecrawl interact -c "await page.\$eval('#tab-2-content', el => el.textContent)" ``` </CodeGroup> ### Cleanup Stop the session explicitly when done: <CodeGroup> ```python Python theme={null} app.stop_interaction(scrape_id) ``` ```js Node theme={null} await app.stopInteraction(scrapeId); ``` ```bash cURL theme={null} curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # Stops the last scrape session firecrawl interact stop # Or stop a specific session by ID # firecrawl interact stop <scrape-id> ``` </CodeGroup> Sessions also expire automatically based on TTL (default: 10 minutes) or inactivity timeout (default: 5 minutes). <Warning> Always stop sessions when you're done to avoid unnecessary billing. Credits are prorated by the second. </Warning> ## Persistent Profiles By default, each scrape + interact session starts with a clean browser. With `profile`, you can save and reuse browser state (cookies, localStorage, sessions) across scrapes. This is useful for staying logged in and preserving preferences. Pass the `profile` parameter when calling scrape. Sessions with the same profile name share state. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") # Session 1: Scrape with a profile, log in, then stop (state is saved) result = app.scrape( "https://app.example.com/login", formats=["markdown"], profile={"name": "my-app", "save_changes": True}, ) scrape_id = result.metadata.scrape_id app.interact(scrape_id, prompt="Fill in user@example.com and password, then click Login") app.stop_interaction(scrape_id) # Session 2: Scrape with the same profile — already logged in result = app.scrape( "https://app.example.com/dashboard", formats=["markdown"], profile={"name": "my-app", "save_changes": True}, ) scrape_id = result.metadata.scrape_id response = app.interact(scrape_id, prompt="Extract the dashboard data") print(response.output) app.stop_interaction(scrape_id) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Session 1: Scrape with a profile, log in, then stop (state is saved) const result1 = await app.scrape('https://app.example.com/login', { formats: ['markdown'], profile: { name: 'my-app', saveChanges: true }, }); const scrapeId1 = result1.metadata?.scrapeId; await app.interact(scrapeId1, { prompt: 'Fill in user@example.com and password, then click Login' }); await app.stopInteraction(scrapeId1); // Session 2: Scrape with the same profile — already logged in const result2 = await app.scrape('https://app.example.com/dashboard', { formats: ['markdown'], profile: { name: 'my-app', saveChanges: true }, }); const scrapeId2 = result2.metadata?.scrapeId; const response = await app.interact(scrapeId2, { prompt: 'Extract the dashboard data' }); console.log(response.output); await app.stopInteraction(scrapeId2); ``` ```bash cURL theme={null} # Session 1: Scrape with a profile RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://app.example.com/login", "formats": ["markdown"], "profile": { "name": "my-app", "saveChanges": true } }') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # Log in via interact curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Fill in user@example.com and password, then click Login"}' # Stop — state is saved to the profile curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" # Session 2: Scrape again with the same profile — already logged in RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://app.example.com/dashboard", "formats": ["markdown"], "profile": { "name": "my-app", "saveChanges": true } }') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Extract the dashboard data"}' ``` ```bash CLI theme={null} # Session 1: Scrape with a profile, log in, then stop (state is saved) firecrawl scrape https://app.example.com/login --profile my-app firecrawl interact "Fill in user@example.com and password, then click Login" firecrawl interact stop # Session 2: Scrape with the same profile — already logged in firecrawl scrape https://app.example.com/dashboard --profile my-app firecrawl interact "Extract the dashboard data" firecrawl interact stop # Read-only: load profile state without saving changes back firecrawl scrape https://app.example.com/dashboard --profile my-app --no-save-changes ``` </CodeGroup> | Parameter | Default | Description | | ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `name` | — | A name for the persistent profile. Scrapes with the same name share browser state. | | `saveChanges` | `true` | When `true`, browser state is saved back to the profile when the interact session stops. Set to `false` to load existing data without writing — useful when you need multiple concurrent readers. | <Note> Only one session can save to a profile at a time. If another session is already saving, you'll get a `409` error. You can still open the same profile with `saveChanges: false`, or try again later. </Note> The browser state is saved when the interact session is stopped. Always stop the session when you're done so the profile can be reused. ## When to Use What | Use Case | Recommended | Why | | -------------------------------- | -------------------------- | ------------------------------- | | Web search | [Search](/features/search) | Dedicated search endpoint | | Get clean content from a URL | [Scrape](/features/scrape) | One API call, no session needed | | Click, type, navigate on a page | Interact (prompt) | Just describe it in English | | Extract data behind interactions | Interact (prompt) | No selectors needed | | Complex scraping logic | Interact (code) | Full Playwright control | <Info> Interact vs Browser Sandbox: Interact is built on the same infrastructure as [Browser Sandbox](/features/browser) but provides a better interface for the most common pattern — scrape a page, then go deeper. Browser Sandbox is better when you need a standalone browser session that isn't tied to a specific scrape. </Info> ## Pricing * Code-only (no `prompt`) — 2 credits per session minute * With AI prompts — 7 credits per session minute * Scrape — billed separately (1 credit per scrape, plus any format-specific costs) ## API Reference * [Execute Interact](/api-reference/endpoint/scrape-execute) — `POST /v2/scrape/{scrapeId}/interact` * [Stop Interact](/api-reference/endpoint/scrape-browser-delete) — `DELETE /v2/scrape/{scrapeId}/interact` ### Request Body (POST) | Field | Type | Default | Description | | ---------- | -------- | -------- | ---------------------------------------------------------------------------------------------------- | | `prompt` | `string` | — | Natural language task for the AI agent. Required if `code` is not set. Max 10,000 characters. | | `code` | `string` | — | Code to execute (Node.js, Python, or Bash). Required if `prompt` is not set. Max 100,000 characters. | | `language` | `string` | `"node"` | `"node"`, `"python"`, or `"bash"`. Only used with `code`. | | `timeout` | `number` | `30` | Timeout in seconds (1–300). | | `origin` | `string` | — | Caller identifier for activity tracking. | ### Response | Field | Description | | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | `success` | `true` if the execution completed without errors | | `liveViewUrl` | Read-only live view URL for the browser session | | `interactiveLiveViewUrl` | Interactive live view URL (viewers can control the browser) | | `output` | The agent's natural language answer to your prompt. Only present when using `prompt`. | | `stdout` | Standard output from the code execution | | `result` | Raw return value from the sandbox. For `code`: the last expression evaluated. For `prompt`: the raw page snapshot the agent used to produce `output`. | | `stderr` | Standard error output | | `exitCode` | Exit code (`0` = success) | | `killed` | `true` if the execution was terminated due to timeout | * Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com) or reach out on [Discord](https://discord.gg/firecrawl). # JSON mode - Structured result Source: https://docs.firecrawl.dev/features/llm-extract Extract structured data from pages via LLMs <Note> v2 API Change: JSON schema extraction is fully supported in v2, but the API format has changed. In v2, the schema is embedded directly inside the format object as `formats: [{type: "json", schema: {...}}]`. The v1 `jsonOptions` parameter no longer exists in v2. </Note> ## Scrape and extract structured data with Firecrawl Firecrawl uses AI to get structured data from web pages in 3 steps: 1. Set the Schema (optional): Define a JSON schema (using OpenAI's format) to specify the data you want, or just provide a `prompt` if you don't need a strict schema, along with the webpage URL. 2. Make the Request: Send your URL and schema to our scrape endpoint using JSON mode. See how here: [Scrape Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/scrape) 3. Get Your Data:** Get back clean, structured data matching your schema that you can use right away. This makes getting web data in the format you need quick and easy. ## Extract structured data ### JSON mode via /scrape Used to extract structured data from scraped pages. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Define schema to extract contents into const schema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", schema: schema }], }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", "supports_sso": true, "is_open_source": true, "is_in_yc": true }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### Structured data without schema You can also extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "prompt": "Extract the company mission from the page." }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", prompt: "Extract the company mission from the page." }] }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the company mission from the page." }] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### Real-world example: Extracting company information Here's a comprehensive example extracting structured company information from a website: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev/', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }] ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const companyInfoSchema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev/", { formats: [{ type: "json", schema: companyInfoSchema }] }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev/", "formats": [{ "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } }] }' ``` </CodeGroup> Output: ```json Output theme={null} { "success": true, "data": { "json": { "company_mission": "Turn websites into LLM-ready data", "supports_sso": true, "is_open_source": true, "is_in_yc": true } } } ``` ### JSON format options When using JSON mode in v2, include an object in `formats` with the schema embedded directly: `formats: [{ type: 'json', schema: { ... }, prompt: '...' }]` Parameters: * `schema`: JSON Schema describing the structured output you want (required for schema-based extraction). * `prompt`: Optional prompt to guide extraction (also used for no-schema extraction). Important: Unlike v1, there is no separate `jsonOptions` parameter in v2. The schema must be included directly inside the format object in the `formats` array. <Note> HTML attributes are not available in JSON extraction. JSON extraction works on the markdown conversion of the page, which only preserves visible text content. HTML attributes (e.g., `data-id`, custom attributes on elements) are stripped during conversion and the LLM cannot see them. If you need to extract HTML attribute values, use `rawHtml` format and parse attributes client-side, or use an `executeJavascript` action to inject attribute values into visible text before extraction. </Note> ## Tips for consistent extraction If you are seeing inconsistent or incomplete results from JSON extraction, these practices can help: * Keep prompts short and focused. Long prompts with many rules increase variability. Move specific constraints (like allowed values) into the schema instead. * Use concise property names. Avoid embedding instructions or enum lists in property names. Use a short key like `"installation_type"` and put allowed values in an `enum` array. * Add `enum` arrays for constrained fields. When a field has a fixed set of values, list them in `enum` and make sure they match the exact text shown on the page. * Include null-handling in field descriptions. Add `"Return null if not found on the page."` to each field's `description` so the model does not guess missing values. * Add location hints. Tell the model where to find data on the page, e.g. `"Flow rate in GPM from the Specifications table."`. * Split large schemas into smaller requests. Schemas with many fields (e.g. 30+) produce less consistent results. Split them into 2–3 requests of 10–15 fields each. * Avoid `minItems`/`maxItems` on arrays. JSON Schema validation keywords like `minItems` and `maxItems` do not control how much content the scraper collects. Setting `minItems: 20` will not make the LLM return more items — it may instead hallucinate entries to satisfy the constraint. Remove these keywords and use a `prompt` instead (e.g. `"Extract ALL reviews from the page. Do not skip any."`) to guide completeness. * Use `"type": "array"` to extract lists of items. If you need to extract multiple items (e.g. a list of people, products, or reviews), wrap them in an array property with an `items` block. Using `"type": "object"` for a list will return only a single item. See the array schema example below. Example of a well-structured schema: ```json theme={null} { "type": "object", "properties": { "product_name": { "type": ["string", "null"], "description": "Full descriptive product name as shown on the page. Return null if not found." }, "installation_type": { "type": ["string", "null"], "description": "Installation type from the Specifications section. Return null if not found.", "enum": ["Deck-mount", "Wall-mount", "Countertop", "Drop-in", "Undermount"] }, "flow_rate_gpm": { "type": ["string", "null"], "description": "Flow rate in GPM from the Specifications section. Return null if not found." } } } ``` Example of extracting a list of items: When a page contains multiple items (e.g. team members, products, reviews), use `"type": "array"` with `"items"` to get the full list: ```json theme={null} { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "role": { "type": "string" }, "department": { "type": "string" } } } } } } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Map Source: https://docs.firecrawl.dev/features/map Input a website and get all the urls on the website - extremely fast ## Introducing /map The easiest way to go from a single url to a map of the entire website. This is extremely useful for: * When you need to prompt the end-user to choose which links to scrape * Need to quickly know the links on a website * Need to scrape pages of a website that are related to a specific topic (use the `search` parameter) * Only need to scrape specific pages of a website <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=map"> Test mapping in the interactive playground — no code required. </Card> ## Mapping ### /map endpoint Used to map a URL and get urls of the website. This returns most links present on the website. URLs are primarily discovered from the website's sitemap, supplemented with SERP (search engine) results and previously crawled pages to improve coverage. You can control sitemap behavior with the `sitemap` parameter. ### Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ### Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") res = firecrawl.map(url="https://firecrawl.dev", limit=50, sitemap="include") print(res) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const res = await firecrawl.map('https://firecrawl.dev', { limit: 50, sitemap: 'include' }); console.log(res); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev" }' ``` ```bash CLI theme={null} # Map a website to discover URLs firecrawl map https://firecrawl.dev # Output as JSON with limit firecrawl map https://firecrawl.dev --json --limit 100 --pretty ``` </CodeGroup> <Info> Each map request consumes 1 credit per call, regardless of the number of URLs returned. For example, setting `limit` to 100,000 still uses 1 credit. </Info> ### Response SDKs will return the data object directly. cURL will return the payload exactly as shown below. ```json theme={null} { "success": true, "links": [ { "url": "https://docs.firecrawl.dev/features/scrape", "title": "Scrape | Firecrawl", "description": "Turn any url into clean data" }, { "url": "https://www.firecrawl.dev/blog/5_easy_ways_to_access_glm_4_5", "title": "5 Easy Ways to Access GLM-4.5", "description": "Discover how to access GLM-4.5 models locally, through chat applications, via the official API, and using the LLM marketplaces API for seamless integration i..." }, { "url": "https://www.firecrawl.dev/playground", "title": "Playground - Firecrawl", "description": "Preview the API response and get the code snippets for the API" }, { "url": "https://www.firecrawl.dev/?testId=2a7e0542-077b-4eff-bec7-0130395570d6", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ..." }, { "url": "https://www.firecrawl.dev/?testId=af391f07-ca0e-40d3-8ff2-b1ecf2e3fcde", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ..." }, ... ] } ``` <Warning> Title and description are not always present as it depends on the website. </Warning> #### Map with search Map with `search` param allows you to search for specific urls inside a website. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "search": "docs" }' ``` Response will be an ordered list from the most relevant to the least relevant. ```json theme={null} { "status": "success", "links": [ { "url": "https://docs.firecrawl.dev", "title": "Firecrawl Docs", "description": "Firecrawl documentation" }, { "url": "https://docs.firecrawl.dev/sdks/python", "title": "Firecrawl Python SDK", "description": "Firecrawl Python SDK documentation" }, ... ] } ``` ## Location and Language Specify country and preferred languages to get relevant content based on your target location and language preferences, similar to the scrape endpoint. ### How it works When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified. ### Usage To use the location and language settings, include the `location` object in your request body with the following properties: * `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'. * `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") res = firecrawl.map('https://example.com', location={ 'country': 'US', 'languages': ['en'] } ) print(res) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const res = await firecrawl.map('https://example.com', { location: { country: 'US', languages: ['en'] }, }); console.log(res.metadata); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/map" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "location": { "country": "US", "languages": ["en"] } }' ``` </CodeGroup> For more details about supported locations, refer to the [Proxies documentation](/features/proxies). ## Considerations This endpoint prioritizes speed, so it may not capture all website links. It primarily relies on the website's sitemap, supplemented by cached crawl data and search engine results. For a more thorough and up-to-date list of URLs, consider using the [/crawl](/features/crawl) endpoint instead. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Models Source: https://docs.firecrawl.dev/features/models Choose the right model for your agent extraction tasks. Firecrawl Agent offers two models optimized for different use cases. Choose the right model based on your extraction complexity and cost requirements. ## Available Models | Model | Cost | Accuracy | Best For | | -------------- | --------------- | -------- | ------------------------------------- | | `spark-1-mini` | 60% cheaper | Standard | Most tasks (default) | | `spark-1-pro` | Standard | Higher | Complex research, critical extraction | <Tip> Start with Spark 1 Mini (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical. </Tip> ## Spark 1 Mini (Default) `spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks. Use Mini when: * Extracting simple data points (contact info, pricing, etc.) * Working with well-structured websites * Cost efficiency is a priority * Running high-volume extraction jobs Example use cases: * Extracting product prices from e-commerce sites * Gathering contact information from company pages * Pulling basic metadata from articles * Simple data point lookups ## Spark 1 Pro `spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks. Use Pro when: * Performing complex competitive analysis * Extracting data that requires deep reasoning * Accuracy is critical for your use case * Dealing with ambiguous or hard-to-find data Example use cases: * Multi-domain competitive analysis * Complex research tasks requiring reasoning * Extracting nuanced information from multiple sources * Critical business intelligence gathering ## Specifying a Model Pass the `model` parameter to select which model to use: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Using Spark 1 Mini (default - can be omitted) result = app.agent( prompt="Find the pricing of Firecrawl", model="spark-1-mini" ) # Using Spark 1 Pro for complex tasks result = app.agent( prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Using Spark 1 Mini (default - can be omitted) const result = await firecrawl.agent({ prompt: "Find the pricing of Firecrawl", model: "spark-1-mini" }); // Using Spark 1 Pro for complex tasks const resultPro = await firecrawl.agent({ prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model: "spark-1-pro" }); console.log(result.data); ``` ```bash cURL theme={null} # Using Spark 1 Mini (default) curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the pricing of Firecrawl", "model": "spark-1-mini" }' # Using Spark 1 Pro for complex tasks curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", "model": "spark-1-pro" }' ``` </CodeGroup> ## Model Comparison | Feature | Spark 1 Mini | Spark 1 Pro | | ---------------- | ------------ | ------------- | | Cost | 60% cheaper | Standard | | Accuracy | Standard | Higher | | Speed | Fast | Fast | | Best for | Most tasks | Complex tasks | | Reasoning | Standard | Advanced | | Multi-domain | Good | Excellent | ## Pricing by Model Both models use dynamic, credit-based pricing that scales with task complexity: * Spark 1 Mini: Uses approximately 60% fewer credits than Pro for equivalent tasks * Spark 1 Pro: Standard credit consumption for maximum accuracy <Info> Credit usage varies based on prompt complexity, data processed, and output structure — regardless of model selected. </Info> ## Choosing the Right Model ``` ┌─────────────────────────────────┐ │ What type of task? │ └─────────────────────────────────┘ │ ┌──────────────┴──────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Simple/Direct │ │ Complex/Research│ │ extraction │ │ multi-domain │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ spark-1-mini │ │ spark-1-pro │ │ (60% cheaper) │ │ (higher acc.) │ └─────────────────┘ └─────────────────┘ ``` ## API Reference See the [Agent API Reference](/api-reference/endpoint/agent) for complete parameter documentation. Have questions about which model to use? Email [help@firecrawl.com](mailto:help@firecrawl.com). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Proxies Source: https://docs.firecrawl.dev/features/proxies Learn about proxy types, locations, and how Firecrawl selects proxies for your requests. Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. The proxy type can be specified using the `proxy` parameter. > By default, Firecrawl routes all requests through proxies to help ensure reliability and access, even if you do not specify a proxy type or location. ## Location-Based Proxy Selection Firecrawl automatically selects the best proxy based on your specified or detected location. This helps optimize scraping performance and reliability. However, not all locations are currently supported. The following locations are available: | Country Code | Country Name | Basic Proxy Support | Enhanced Proxy Support | | ------------ | -------------------- | ------------------- | ---------------------- | | AE | United Arab Emirates | Yes | No | | AU | Australia | Yes | No | | BR | Brazil | Yes | No | | CA | Canada | Yes | No | | CN | China | Yes | No | | CZ | Czechia | Yes | No | | DE | Germany | Yes | No | | DK | Denmark | Yes | Yes | | EE | Estonia | Yes | No | | EG | Egypt | Yes | No | | ES | Spain | Yes | No | | FR | France | Yes | No | | GB | United Kingdom | Yes | No | | GR | Greece | Yes | No | | HU | Hungary | Yes | No | | ID | Indonesia | Yes | No | | IL | Israel | Yes | No | | IN | India | Yes | No | | IT | Italy | Yes | No | | JP | Japan | Yes | No | | MY | Malaysia | Yes | No | | NO | Norway | Yes | No | | PL | Poland | Yes | No | | PT | Portugal | Yes | No | | QA | Qatar | Yes | No | | SG | Singapore | Yes | No | | US | United States | Yes | Yes | | VN | Vietnam | Yes | No | <Warning>The list of supported proxy locations will change over time.</Warning> If you need proxies in a location not listed above, please [contact us](mailto:help@firecrawl.com) and let us know your requirements. If you do not specify a proxy or location, Firecrawl will automatically use US proxies. ## How to Specify Proxy Location You can request a specific proxy location by setting the `location.country` parameter in your request. For example, to use a Brazilian proxy, set `location.country` to `BR`. For full details, see the [API reference for `location.country`](https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-location). <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape('https://example.com', formats=['markdown'], location={ 'country': 'US', 'languages': ['en'] } ) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], location: { country: 'US', languages: ['en'] }, }); console.log(doc.metadata); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "formats": ["markdown"], "location": { "country": "US", "languages": ["en"] } }' ``` </CodeGroup> <Info>If you request a country where a proxy is not available, Firecrawl will use the closest available region (EU or US) and set the browser location to your requested country.</Info> ## Proxy Types Firecrawl supports three types of proxies: * basic: Proxies for scraping most sites. Fast and usually works. * enhanced: Enhanced proxies for scraping complex sites while maintaining privacy. Slower, but more reliable on certain sites. [Learn more about Enhanced Mode →](/features/enhanced-mode) * auto: Firecrawl will automatically retry scraping with enhanced proxies if the basic proxy fails. If the retry with enhanced is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed. * > Note:** For detailed information on using enhanced proxies, including credit costs and retry strategies, see the [Enhanced Mode documentation](/features/enhanced-mode). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Scrape Source: https://docs.firecrawl.dev/features/scrape Turn any url into clean data Firecrawl converts web pages into markdown, ideal for LLM applications. * It manages complexities: proxies, caching, rate limits, js-blocked content * Handles dynamic content: dynamic websites, js-rendered sites, PDFs, images * Outputs clean markdown, structured data, screenshots or html. For details, see the [Scrape Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape). <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=scrape"> Test scraping in the interactive playground — no code required. </Card> ## Scraping a URL with Firecrawl ### /scrape endpoint Used to scrape a URL and get its content. ### Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ### Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") # Scrape a website: doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"]) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Scrape a website: const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["markdown", "html"] }' ``` ```bash CLI theme={null} # Scrape a URL and get markdown firecrawl https://firecrawl.dev # With multiple formats (returns JSON) firecrawl https://firecrawl.dev --format markdown,html,links --pretty ``` </CodeGroup> For more details about the parameters, refer to the [API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape). <Info> Each scrape consumes 1 credit. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, PDF parsing costs 1 credit per PDF page, and audio extraction costs 4 additional credits per page. </Info> ### Response SDKs will return the data object directly. cURL will return the payload exactly as shown below. ```json theme={null} { "success": true, "data" : { "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"light\" style=\"color-scheme: light;\"><body class=\"__variable_36bd41 __variable_d7dc5d font-inter ...", "metadata": { "title": "Home - Firecrawl", "description": "Firecrawl crawls and converts any website into clean markdown.", "language": "en", "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "Turn any website into LLM-ready data.", "ogUrl": "https://www.firecrawl.dev/", "ogImage": "https://www.firecrawl.dev/og.png?123", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev", "statusCode": 200, "contentType": "text/html" } } } ``` ## Scrape Formats You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are: * Markdown (`markdown`) * Summary (`summary`) * HTML (`html`) - cleaned version of the page's HTML * Raw HTML (`rawHtml`) - unmodified HTML as received from the page * Screenshot (`screenshot`, with options like `fullPage`, `quality`, `viewport`) — screenshot URLs expire after 24 hours * Links (`links`) * JSON (`json`) - structured output * Images (`images`) - extract all image URLs from the page * Branding (`branding`) - extract brand identity and design system * Audio (`audio`) - extract MP3 audio from supported video URLs, e.g. YouTube (returns a signed GCS URL, expires after 1 hour) Output keys will match the format you choose. ## Extract structured data ### /scrape (with json) endpoint Used to extract structured data from scraped pages. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Define schema to extract contents into const schema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", schema: schema }], }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", "supports_sso": true, "is_open_source": true, "is_in_yc": true }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### Extracting without schema You can now extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "prompt": "Extract the company mission from the page." }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", prompt: "Extract the company mission from the page." }] }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the company mission from the page." }] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### JSON format options When using the `json` format, pass an object inside `formats` with the following parameters: * `schema`: JSON Schema for the structured output. * `prompt`: Optional prompt to help guide extraction when a schema is present or when you prefer light guidance. ## Extract brand identity ### /scrape (with branding) endpoint The branding format extracts comprehensive brand identity information from a webpage, including colors, fonts, typography, spacing, UI components, and more. This is useful for design system analysis, brand monitoring, or building tools that need to understand a website's visual identity. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') result = firecrawl.scrape( url='https://firecrawl.dev', formats=['branding'] ) print(result['branding']) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['branding'] }); console.log(result.branding); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["branding"] }' ``` </CodeGroup> ### Response The branding format returns a comprehensive `BrandingProfile` object with the following structure: ```json Output theme={null} { "success": true, "data": { "branding": { "colorScheme": "dark", "logo": "https://firecrawl.dev/logo.svg", "colors": { "primary": "#FF6B35", "secondary": "#004E89", "accent": "#F77F00", "background": "#1A1A1A", "textPrimary": "#FFFFFF", "textSecondary": "#B0B0B0" }, "fonts": [ { "family": "Inter" }, { "family": "Roboto Mono" } ], "typography": { "fontFamilies": { "primary": "Inter", "heading": "Inter", "code": "Roboto Mono" }, "fontSizes": { "h1": "48px", "h2": "36px", "h3": "24px", "body": "16px" }, "fontWeights": { "regular": 400, "medium": 500, "bold": 700 } }, "spacing": { "baseUnit": 8, "borderRadius": "8px" }, "components": { "buttonPrimary": { "background": "#FF6B35", "textColor": "#FFFFFF", "borderRadius": "8px" }, "buttonSecondary": { "background": "transparent", "textColor": "#FF6B35", "borderColor": "#FF6B35", "borderRadius": "8px" } }, "images": { "logo": "https://firecrawl.dev/logo.svg", "favicon": "https://firecrawl.dev/favicon.ico", "ogImage": "https://firecrawl.dev/og-image.png" } } } } ``` ### Branding Profile Structure The `branding` object contains the following properties: * `colorScheme`: The detected color scheme (`"light"` or `"dark"`) * `logo`: URL of the primary logo * `colors`: Object containing brand colors: * `primary`, `secondary`, `accent`: Main brand colors * `background`, `textPrimary`, `textSecondary`: UI colors * `link`, `success`, `warning`, `error`: Semantic colors * `fonts`: Array of font families used on the page * `typography`: Detailed typography information: * `fontFamilies`: Primary, heading, and code font families * `fontSizes`: Size definitions for headings and body text * `fontWeights`: Weight definitions (light, regular, medium, bold) * `lineHeights`: Line height values for different text types * `spacing`: Spacing and layout information: * `baseUnit`: Base spacing unit in pixels * `borderRadius`: Default border radius * `padding`, `margins`: Spacing values * `components`: UI component styles: * `buttonPrimary`, `buttonSecondary`: Button styles * `input`: Input field styles * `icons`: Icon style information * `images`: Brand images (logo, favicon, og:image) * `animations`: Animation and transition settings * `layout`: Layout configuration (grid, header/footer heights) * `personality`: Brand personality traits (tone, energy, target audience) ### Combining with other formats You can combine the branding format with other formats to get comprehensive page data: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') result = firecrawl.scrape( url='https://firecrawl.dev', formats=['markdown', 'branding', 'screenshot'] ) print(result['markdown']) print(result['branding']) print(result['screenshot']) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'branding', 'screenshot'] }); console.log(result.markdown); console.log(result.branding); console.log(result.screenshot); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["markdown", "branding", "screenshot"] }' ``` </CodeGroup> ## Audio extraction The `audio` format extracts audio from supported websites (e.g. YouTube) as MP3 files and returns a signed Google Cloud Storage URL. This is useful for building audio processing pipelines, transcription services, or podcast tools. <Info> Audio extraction costs 5 credits per page (1 base + 4 additional). </Info> <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape("https://www.youtube.com/watch?v=dQw4w9WgXcQ", formats=["audio"]) print(doc.audio) # Signed GCS URL to the MP3 file ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://www.youtube.com/watch?v=dQw4w9WgXcQ', { formats: ['audio'] }); console.log(doc.audio); // Signed GCS URL to the MP3 file ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "formats": ["audio"] }' ``` </CodeGroup> ## Interacting with the page with Actions Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction. Here is an example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot. It is important to almost always use the `wait` action before/after executing other actions to give enough time for the page to load. ### Example <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape( url="https://example.com/login", formats=["markdown"], actions=[ {"type": "write", "text": "john@example.com"}, {"type": "press", "key": "Tab"}, {"type": "write", "text": "secret"}, {"type": "click", "selector": 'button[type="submit"]'}, {"type": "wait", "milliseconds": 1500}, {"type": "screenshot", "full_page": True}, ], ) print(doc.markdown, doc.screenshot) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com/login', { formats: ['markdown'], actions: [ { type: 'write', text: 'john@example.com' }, { type: 'press', key: 'Tab' }, { type: 'write', text: 'secret' }, { type: 'click', selector: 'button[type="submit"]' }, { type: 'wait', milliseconds: 1500 }, { type: 'screenshot', fullPage: true }, ], }); console.log(doc.markdown, doc.screenshot); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://example.com/login", "formats": ["markdown"], "actions": [ { "type": "write", "text": "john@example.com" }, { "type": "press", "key": "Tab" }, { "type": "write", "text": "secret" }, { "type": "click", "selector": "button[type=\"submit\"]" }, { "type": "wait", "milliseconds": 1500 }, { "type": "screenshot", "fullPage": true }, ], }' ``` </CodeGroup> ### Output <CodeGroup> ```json JSON theme={null} { "success": true, "data": { "markdown": "Our first Launch Week is over! [See the recap 🚀](blog/firecrawl-launch-week-1-recap)...", "actions": { "screenshots": [ "https://alttmdsdujxrfnakrkyi.supabase.co/storage/v1/object/public/media/screenshot-75ef2d87-31e0-4349-a478-fb432a29e241.png" ], "scrapes": [ { "url": "https://www.firecrawl.dev/", "html": "<html><body><h1>Firecrawl</h1></body></html>" } ] }, "metadata": { "title": "Home - Firecrawl", "description": "Firecrawl crawls and converts any website into clean markdown.", "language": "en", "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "Turn any website into LLM-ready data.", "ogUrl": "https://www.firecrawl.dev/", "ogImage": "https://www.firecrawl.dev/og.png?123", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "http://google.com", "statusCode": 200 } } } ``` </CodeGroup> For more details about the actions parameters, refer to the [API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape). ## Location and Language Specify country and preferred languages to get relevant content based on your target location and language preferences. ### How it works When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified. ### Usage To use the location and language settings, include the `location` object in your request body with the following properties: * `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'. * `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape('https://example.com', formats=['markdown'], location={ 'country': 'US', 'languages': ['en'] } ) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], location: { country: 'US', languages: ['en'] }, }); console.log(doc.metadata); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "formats": ["markdown"], "location": { "country": "US", "languages": ["en"] } }' ``` </CodeGroup> For more details about supported locations, refer to the [Proxies documentation](/features/proxies). ## Caching and maxAge To make requests faster, Firecrawl serves results from cache by default when a recent copy is available. * Default freshness window: `maxAge = 172800000` ms (2 days). If a cached page is newer than this, it’s returned instantly; otherwise, the page is scraped and then cached. * Performance: This can speed up scrapes by up to 5x when data doesn’t need to be ultra-fresh. * Always fetch fresh: Set `maxAge` to `0`. Note that this bypasses the cache entirely, so every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if freshness on every request is not critical. * Avoid storing: Set `storeInCache` to `false` if you don’t want Firecrawl to cache/store results for this request. * Cache-only lookup: Set `minAge` to perform a cache-only lookup without triggering a fresh scrape. The value is in milliseconds and specifies the minimum age the cached data must be. If no cached data is found, a `404` with error code `SCRAPE_NO_CACHED_DATA` is returned. Set `minAge` to `1` to accept any cached data regardless of age. * Change tracking: Requests that include `changeTracking` bypass the cache, so `maxAge` is ignored. * Credits: Cached results still cost 1 credit per page. Caching improves speed, not credit usage. Example (force fresh content): <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') doc = firecrawl.scrape(url='https://example.com', max_age=0, formats=['markdown']) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { maxAge: 0, formats: ['markdown'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "maxAge": 0, "formats": ["markdown"] }' ``` </CodeGroup> Example (use a 10-minute cache window): <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') doc = firecrawl.scrape(url='https://example.com', max_age=600000, formats=['markdown', 'html']) print(doc) ``` ```js Node theme={null} const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { maxAge: 600000, formats: ['markdown', 'html'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "maxAge": 600000, "formats": ["markdown", "html"] }' ``` </CodeGroup> ## Batch scraping multiple URLs You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats. ### How it works It is very similar to how the `/crawl` endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape. The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape. ### Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") job = firecrawl.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"], poll_interval=2, wait_timeout=120) print(job) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const job = await firecrawl.batchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev', ], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 }); console.log(job); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"], "formats": ["markdown"] }' ``` </CodeGroup> ### Response If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape. #### Synchronous ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` #### Asynchronous You can then use the job ID to check the status of the batch scrape by calling the `/batch/scrape/{id}` endpoint. This endpoint is meant to be used while the job is still running or right after it has completed as batch scrape jobs expire after 24 hours. ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Enhanced Mode For complex websites, Firecrawl offers enhanced mode that provides better success rates while maintaining privacy. Learn more about [Enhanced Mode](/features/enhanced-mode). ## Zero Data Retention (ZDR) Firecrawl supports Zero Data Retention (ZDR) for teams with strict data handling requirements. When enabled, Firecrawl will not persist any page content or extracted data beyond the lifetime of the request. To enable ZDR, set `zeroDataRetention: true` in your request: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "url": "https://example.com", "formats": ["markdown"], "zeroDataRetention": true }' ``` ZDR is available on Enterprise plans and must be enabled for your team. Visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started. ZDR adds 1 additional credit per page on top of the base scrape cost. <Note> Screenshots are not available in ZDR mode. Because screenshots require uploading to persistent storage, they are incompatible with the ZDR guarantee. Requests that include both `zeroDataRetention: true` and a `screenshot` format will return an error. </Note> > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Search Source: https://docs.firecrawl.dev/features/search Search the web and get full content from results Search the web and get clean, structured content from every result in a single API call. Pass a query to `/search` and Firecrawl returns titles, descriptions, and URLs. Add `scrapeOptions` to also retrieve full-page markdown, HTML, links, or screenshots for each result. For the full parameter list, see the [Search Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/search). <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=search"> Test searching in the interactive playground — no code required. </Card> ## Performing a Search with Firecrawl ### /search endpoint Used to perform web searches and optionally retrieve content from the results. ### Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ### Basic Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( query="firecrawl", limit=3, ) print(results) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(results); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/search" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "firecrawl", "limit": 3 }' ``` ```bash CLI theme={null} # Search the web firecrawl search "firecrawl web scraping" --limit 5 --pretty ``` </CodeGroup> ### Response SDKs will return the data object directly. cURL will return the complete payload. ```json JSON theme={null} { "success": true, "data": { "web": [ { "url": "https://www.firecrawl.dev/", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.", "position": 1 }, { "url": "https://github.com/firecrawl/firecrawl", "title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub", "description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.", "position": 2 }, ... ], "images": [ { "title": "Quickstart | Firecrawl", "imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png", "imageWidth": 5814, "imageHeight": 1200, "url": "https://docs.firecrawl.dev/", "position": 1 }, ... ], "news": [ { "title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees", "url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/", "snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.", "date": "3 months ago", "position": 1 }, ... ] } } ``` ## Search result types In addition to regular web results, Search supports specialized result types via the `sources` parameter: * `web`: standard web results (default) * `news`: news-focused results * `images`: image search results You can request multiple sources in a single call (e.g., `sources: ["web", "news"]`). When you do, the `limit` parameter applies per source type — so `limit: 5` with `sources: ["web", "news"]` returns up to 5 web results and up to 5 news results (10 total). If you need different parameters per source (for example, different `limit` values or different `scrapeOptions`), make separate calls instead. ## Search Categories Filter search results by specific categories using the `categories` parameter: * `github`: Search within GitHub repositories, code, issues, and documentation * `research`: Search academic and research websites (arXiv, Nature, IEEE, PubMed, etc.) * `pdf`: Search for PDFs ### GitHub Category Search Search specifically within GitHub repositories: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "web scraping python", "categories": ["github"], "limit": 10 }' ``` ### Research Category Search Search academic and research websites: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "machine learning transformers", "categories": ["research"], "limit": 10 }' ``` ### Mixed Category Search Combine multiple categories in one search: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "neural networks", "categories": ["github", "research"], "limit": 15 }' ``` ### Category Response Format Each search result includes a `category` field indicating its source: ```json theme={null} { "success": true, "data": { "web": [ { "url": "https://github.com/example/neural-network", "title": "Neural Network Implementation", "description": "A PyTorch implementation of neural networks", "category": "github" }, { "url": "https://arxiv.org/abs/2024.12345", "title": "Advances in Neural Network Architecture", "description": "Research paper on neural network improvements", "category": "research" } ] } } ``` Examples: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "openai", "sources": ["news"], "limit": 5 }' ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "jupiter", "sources": ["images"], "limit": 8 }' ``` ### HD Image Search with Size Filtering Use images operators to find high-resolution images: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sunset imagesize:1920x1080", "sources": ["images"], "limit": 5 }' ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "mountain wallpaper larger:2560x1440", "sources": ["images"], "limit": 8 }' ``` Common HD resolutions: * `imagesize:1920x1080` - Full HD (1080p) * `imagesize:2560x1440` - QHD (1440p) * `imagesize:3840x2160` - 4K UHD * `larger:1920x1080` - HD and above * `larger:2560x1440` - QHD and above ## Search with Content Scraping Search and retrieve content from the search results in one operation. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Search and scrape content results = firecrawl.search( "firecrawl web scraping", limit=3, scrape_options={ "formats": ["markdown", "links"] } ) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(results); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "firecrawl web scraping", "limit": 3, "scrapeOptions": { "formats": ["markdown", "links"] } }' ``` ```bash CLI theme={null} # Search and scrape results firecrawl search "firecrawl" --scrape --scrape-formats markdown --limit 5 --pretty ``` </CodeGroup> Every option in scrape endpoint is supported by this search endpoint through the `scrapeOptions` parameter. ### Response with Scraped Content ```json theme={null} { "success": true, "data": [ { "title": "Firecrawl - The Ultimate Web Scraping API", "description": "Firecrawl is a powerful web scraping API that turns any website into clean, structured data for AI and analysis.", "url": "https://firecrawl.dev/", "markdown": "# Firecrawl\n\nThe Ultimate Web Scraping API\n\n## Turn any website into clean, structured data\n\nFirecrawl makes it easy to extract data from websites for AI applications, market research, content aggregation, and more...", "links": [ "https://firecrawl.dev/pricing", "https://firecrawl.dev/docs", "https://firecrawl.dev/guides" ], "metadata": { "title": "Firecrawl - The Ultimate Web Scraping API", "description": "Firecrawl is a powerful web scraping API that turns any website into clean, structured data for AI and analysis.", "sourceURL": "https://firecrawl.dev/", "statusCode": 200 } } ] } ``` ## Advanced Search Options Firecrawl's search API supports various parameters to customize your search: ### Location Customization <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Search with location settings (Germany) search_result = firecrawl.search( "web scraping tools", limit=5, location="Germany" ) # Process the results for result in search_result.data: print(f"Title: {result['title']}") print(f"URL: {result['url']}") ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Search with location settings (Germany) const results = await firecrawl.search('web scraping tools', { limit: 5, location: "Germany" }); // Process the results console.log(results); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "web scraping tools", "limit": 5, "location": "Germany" }' ``` ```bash CLI theme={null} # Search with location firecrawl search "local restaurants" --location "San Francisco,California,United States" --country US --pretty ``` </CodeGroup> ### Time-Based Search Use the `tbs` parameter to filter results by time. Note that `tbs` only applies to `web` source results — it does not filter `news` or `images` results. If you need time-filtered news, consider using a `web` source with the `site:` operator to target specific news domains. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( query="firecrawl", limit=5, tbs="qdr:d", ) print(len(results.get('web', []))) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 5, tbs: 'qdr:d', // past day }); console.log(results.web); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "latest web scraping techniques", "limit": 5, "tbs": "qdr:w" }' ``` ```bash CLI theme={null} # Search with time filter (past week) firecrawl search "firecrawl updates" --tbs qdr:w --limit 5 --pretty ``` </CodeGroup> Common `tbs` values: * `qdr:h` - Past hour * `qdr:d` - Past 24 hours * `qdr:w` - Past week * `qdr:m` - Past month * `qdr:y` - Past year * `sbd:1` - Sort by date (newest first) For more precise time filtering, you can specify exact date ranges using the custom date range format: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl # Initialize the client with your API key firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Search for results from December 2024 search_result = firecrawl.search( "firecrawl updates", limit=10, tbs="cdr:1,cd_min:12/1/2024,cd_max:12/31/2024" ) ``` ```js JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; // Initialize the client with your API key const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); // Search for results from December 2024 firecrawl.search("firecrawl updates", { limit: 10, tbs: "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024" }) .then(searchResult => { console.log(searchResult.data); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "firecrawl updates", "limit": 10, "tbs": "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024" }' ``` </CodeGroup> You can combine `sbd:1` with time filters to get date-sorted results within a time range. For example, `sbd:1,qdr:w` returns results from the past week sorted newest first, and `sbd:1,cdr:1,cd_min:12/1/2024,cd_max:12/31/2024` returns results from December 2024 sorted by date. ### Custom Timeout Set a custom timeout for search operations: <CodeGroup> ```python Python theme={null} from firecrawl import FirecrawlApp # Initialize the client with your API key app = FirecrawlApp(api_key="fc-YOUR_API_KEY") # Set a 30-second timeout search_result = app.search( "complex search query", limit=10, timeout=30000 # 30 seconds in milliseconds ) ``` ```js JavaScript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; // Initialize the client with your API key const app = new FirecrawlApp({apiKey: "fc-YOUR_API_KEY"}); // Set a 30-second timeout app.search("complex search query", { limit: 10, timeout: 30000 // 30 seconds in milliseconds }) .then(searchResult => { // Process results console.log(searchResult.data); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "complex search query", "limit": 10, "timeout": 30000 }' ``` </CodeGroup> ## Zero Data Retention (ZDR) For teams with strict data handling requirements, Firecrawl offers Zero Data Retention (ZDR) options for the `/search` endpoint via the `enterprise` parameter. ZDR search is available on Enterprise plans — visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started. <Note> This is separate from the `zeroDataRetention` scrape option, which controls ZDR for scraping operations. See [Scrape ZDR](/features/scrape#zero-data-retention-zdr) for details. The `enterprise` parameter only applies to the search portion of the request. </Note> ### End-to-End ZDR With end-to-end ZDR, both Firecrawl and our upstream search provider enforce zero data retention. No query or result data is stored at any point in the pipeline. * Cost: 10 credits per 10 results * Parameter: `enterprise: ["zdr"]` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sensitive topic", "limit": 10, "enterprise": ["zdr"] }' ``` ### Anonymized ZDR With anonymized ZDR, Firecrawl enforces full zero data retention on our side. Our search provider may cache the query, but it is fully anonymized — no identifying information is attached. * Cost: 2 credits per 10 results * Parameter: `enterprise: ["anon"]` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sensitive topic", "limit": 10, "enterprise": ["anon"] }' ``` ### Combining Search ZDR with Scrape ZDR If you are using search with content scraping (`scrapeOptions`), the `enterprise` parameter covers the search portion while `zeroDataRetention` in `scrapeOptions` covers the scraping portion. To get full ZDR across both, set both: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sensitive topic", "limit": 5, "enterprise": ["zdr"], "scrapeOptions": { "formats": ["markdown"], "zeroDataRetention": true } }' ``` ## Cost Implications The cost of a search is 2 credits per 10 results, rounded up (1–10 results = 2 credits, 11–20 = 4 credits, and so on). If scraping options are enabled, the standard scraping costs apply to each search result: * Basic scrape: 1 credit per webpage * PDF parsing: 1 credit per PDF page * Enhanced proxy mode: 4 additional credits per webpage * JSON mode: 4 additional credits per webpage To help control costs: * Set `parsers: []` if PDF parsing isn’t required * Use `proxy: "basic"` instead of `"enhanced"` when possible, or set it to `"auto"` * Limit the number of search results with the `limit` parameter ## Advanced Scraping Options For more details about the scraping options, refer to the [Scrape Feature documentation](https://docs.firecrawl.dev/features/scrape). Everything except for the FIRE-1 Agent and Change-Tracking features are supported by this Search endpoint. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Introduction Source: https://docs.firecrawl.dev/introduction Search the web, scrape any page, and interact with it — all through one API. ## Get started <CardGroup> <Card title="Get your API key" icon="key" href="https://www.firecrawl.dev/app/api-keys"> Sign up and get your API key to start using Firecrawl </Card> <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground"> Test the API instantly without writing any code </Card> </CardGroup> ### Use Firecrawl with AI agents (recommended) The Firecrawl skill is the fastest way for agents to discover and use Firecrawl. Without it, your agent will not know Firecrawl is available. ```bash theme={null} npx -y firecrawl-cli@latest init --all --browser ``` <Note> Restart your agent after installing the skill. See [Skill + CLI](/sdks/cli) for the full setup. </Note> Or use the [MCP Server](/mcp-server) to connect Firecrawl directly to Claude, Cursor, Windsurf, VS Code, and other AI tools. *** ## What can Firecrawl do? <CardGroup> <Card title="Search" icon="magnifying-glass" href="#search"> Search the web and get full page content from results </Card> <Card title="Scrape" icon="file-lines" href="#scrape"> Extract content from any URL as markdown, HTML, or structured JSON </Card> <Card title="Interact" icon="hand-pointer" href="#interact"> Continue working with any scraped page — click, fill forms, extract dynamic content </Card> </CardGroup> ### Why Firecrawl? * Search, scrape, and interact: Three APIs that cover every step of gathering web data reliably. * LLM-ready output: Clean markdown, structured JSON, screenshots, and more. * Handles the hard stuff: Proxies, anti-bot, JavaScript rendering, and dynamic content. * Reliable: Built for production with high uptime and consistent results. * Fast: Results in seconds, optimized for high throughput. * MCP Server: Connect Firecrawl to any AI tool via the [Model Context Protocol](/mcp-server). * ## Search Search the web and get full page content from results in one call. See the [Search feature docs](/features/search) for all options. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( query="firecrawl", limit=3, ) print(results) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(results); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/search" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "firecrawl", "limit": 3 }' ``` ```bash CLI theme={null} # Search the web firecrawl search "firecrawl web scraping" --limit 5 --pretty ``` </CodeGroup> <Accordion title="Response"> SDKs will return the data object directly. cURL will return the complete payload. ```json JSON theme={null} { "success": true, "data": { "web": [ { "url": "https://www.firecrawl.dev/", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.", "position": 1 }, { "url": "https://github.com/firecrawl/firecrawl", "title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub", "description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.", "position": 2 }, ... ], "images": [ { "title": "Quickstart | Firecrawl", "imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png", "imageWidth": 5814, "imageHeight": 1200, "url": "https://docs.firecrawl.dev/", "position": 1 }, ... ], "news": [ { "title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees", "url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/", "snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.", "date": "3 months ago", "position": 1 }, ... ] } } ``` </Accordion> ## Scrape Scrape any URL and get its content in markdown, HTML, or other formats. See the [Scrape feature docs](/features/scrape) for all options. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") # Scrape a website: doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"]) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Scrape a website: const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["markdown", "html"] }' ``` ```bash CLI theme={null} # Scrape a URL and get markdown firecrawl https://firecrawl.dev # With multiple formats (returns JSON) firecrawl https://firecrawl.dev --format markdown,html,links --pretty ``` </CodeGroup> <Accordion title="Response"> SDKs will return the data object directly. cURL will return the payload exactly as shown below. ```json theme={null} { "success": true, "data" : { "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"light\" style=\"color-scheme: light;\"><body class=\"__variable_36bd41 __variable_d7dc5d font-inter ...", "metadata": { "title": "Home - Firecrawl", "description": "Firecrawl crawls and converts any website into clean markdown.", "language": "en", "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "Turn any website into LLM-ready data.", "ogUrl": "https://www.firecrawl.dev/", "ogImage": "https://www.firecrawl.dev/og.png?123", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev", "statusCode": 200, "contentType": "text/html" } } } ``` </Accordion> ## Interact Scrape a page, then keep working with it — click buttons, fill forms, extract dynamic content, or navigate deeper. Describe what you want in plain English or write code for full control. See the [Interact feature docs](/features/interact) for all options. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") # 1. Scrape Amazon's homepage result = app.scrape("https://www.amazon.com", formats=["markdown"]) scrape_id = result.metadata.scrape_id # 2. Interact — search for a product and get its price app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max") response = app.interact(scrape_id, prompt="Click on the first result and tell me the price") print(response.output) # 3. Stop the session app.stop_interaction(scrape_id) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // 1. Scrape Amazon's homepage const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; // 2. Interact — search for a product and get its price await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); console.log(response.output); // 3. Stop the session await app.stopInteraction(scrapeId); ``` ```bash cURL theme={null} # 1. Scrape Amazon's homepage RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "formats": ["markdown"]}') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # 2. Interact — search for a product and get its price curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Search for iPhone 16 Pro Max"}' curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Click on the first result and tell me the price"}' # 3. Stop the session curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # 1. Scrape Amazon's homepage (scrape ID is saved automatically) firecrawl scrape https://www.amazon.com # 2. Interact — search for a product and get its price firecrawl interact "Search for iPhone 16 Pro Max" firecrawl interact "Click on the first result and tell me the price" # 3. Stop the session firecrawl interact stop ``` </CodeGroup> <Accordion title="Response"> ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.", "exitCode": 0, "killed": false } ``` </Accordion> * ## More capabilities <CardGroup> <Card title="Agent" icon="robot" href="/features/agent"> Autonomous web data gathering powered by AI </Card> <Card title="Browser Sandbox" icon="browser" href="/features/browser"> Managed browser sessions for interactive workflows </Card> <Card title="Map" icon="map" href="/features/map"> Discover all URLs on a website </Card> <Card title="Crawl" icon="spider-web" href="/features/crawl"> Recursively gather content from entire sites </Card> </CardGroup> *** ## Resources <CardGroup> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete API documentation with interactive examples </Card> <Card title="SDKs" icon="boxes-stacked" href="/sdks/overview"> Python, Node.js, CLI, and community SDKs </Card> <Card title="Open Source" icon="github" href="/contributing/open-source-or-cloud"> Self-host Firecrawl or contribute to the project </Card> <Card title="Integrations" icon="puzzle-piece" href="/developer-guides/llm-sdks-and-frameworks/openai"> LangChain, LlamaIndex, OpenAI, and more </Card> </CardGroup> # Firecrawl MCP Server Source: https://docs.firecrawl.dev/mcp-server Use Firecrawl's API through the Model Context Protocol A Model Context Protocol (MCP) server implementation that integrates [Firecrawl](https://github.com/firecrawl/firecrawl) for searching, scraping, and interacting with the web. Our MCP server is open-source and available on [GitHub](https://github.com/firecrawl/firecrawl-mcp-server). ## Features * Search the web and get full page content * Scrape any URL into clean, structured data * Interact with pages — click, navigate, and operate * Deep research with autonomous agent * Browser session management * Cloud and self-hosted support * Streamable HTTP support ## Installation You can either use our remote hosted URL or run the server locally. Get your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) ### Remote hosted URL ```bash theme={null} https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp ``` ### Running with npx ```bash theme={null} env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp ``` ### Manual Installation ```bash theme={null} npm install -g firecrawl-mcp ``` ### Running on Cursor <a href="cursor://anysphere.cursor-deeplink/mcp/install?name=firecrawl&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsImZpcmVjcmF3bC1tY3AiXSwiZW52Ijp7IkZJUkVDUkFXTF9BUElfS0VZIjoiWU9VUi1BUEktS0VZIn19"> <img alt="Add Firecrawl MCP server to Cursor" /> </a> #### Manual Installation Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: [Cursor MCP Server Configuration Guide](https://docs.cursor.com/context/model-context-protocol#configuring-mcp-servers) To configure Firecrawl MCP in Cursor v0.48.6 1. Open Cursor Settings 2. Go to Features > MCP Servers 3. Click "+ Add new global MCP server" 4. Enter the following code: ```json theme={null} { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR-API-KEY" } } } } ``` To configure Firecrawl MCP in Cursor v0.45.6 1. Open Cursor Settings 2. Go to Features > MCP Servers 3. Click "+ Add New MCP Server" 4. Enter the following: * Name: "firecrawl-mcp" (or your preferred name) * Type: "command" * Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp` > If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"` Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from [https://www.firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web data needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query. ### Running on Windsurf Add this to your `./codeium/windsurf/model_config.json`: ```json theme={null} { "mcpServers": { "mcp-server-firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` ### Running with Streamable HTTP Mode To run the server using streamable HTTP transport locally instead of the default stdio transport: ```bash theme={null} env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp ``` Use the url: [http://localhost:3000/v2/mcp](http://localhost:3000/v2/mcp) or [https://mcp.firecrawl.dev/\{FIRECRAWL\_API\_KEY}/v2/mcp](https://mcp.firecrawl.dev/\{FIRECRAWL_API_KEY}/v2/mcp) ### Installing via Smithery (Legacy) To install Firecrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl): ```bash theme={null} npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude ``` ### Running on VS Code For one-click installation, click one of the install buttons below\... [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-NPM-0098FF?style=flat-square\&logo=visualstudiocode\&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-NPM-24bfa5?style=flat-square\&logo=visualstudiocode\&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D\&quality=insiders) For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing `Ctrl + Shift + P` and typing `Preferences: Open User Settings (JSON)`. ```json theme={null} { "mcp": { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "Firecrawl API Key", "password": true } ], "servers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "${input:apiKey}" } } } } } ``` Optionally, you can add it to a file called `.vscode/mcp.json` in your workspace. This will allow you to share the configuration with others: ```json theme={null} { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "Firecrawl API Key", "password": true } ], "servers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "${input:apiKey}" } } } } ``` Note: Some users have reported issues when adding the MCP server to VS Code due to how it validates JSON with an outdated schema format ([microsoft/vscode#155379](https://github.com/microsoft/vscode/issues/155379)). This affects several MCP tools, including Firecrawl. Workaround: Disable JSON validation in VS Code to allow the MCP server to load properly.\ See reference: [directus/directus#25906 (comment)](https://github.com/directus/directus/issues/25906#issuecomment-3369169513). The MCP server still works fine when invoked via other extensions, but the issue occurs specifically when registering it directly in the MCP server list. We plan to add guidance once VS Code updates their schema validation. ### Running on Claude Desktop Add this to the Claude config file: ```json theme={null} { "mcpServers": { "firecrawl": { "url": "https://mcp.firecrawl.dev/v2/mcp", "headers": { "Authorization": "Bearer YOUR_API_KEY" } } } } ``` If you get a "Couldn't reach the MCP server" error, your Claude Desktop version may not support streamable HTTP transport. Use the local npx approach instead (requires [Node.js](https://nodejs.org)): ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` If you see a `spawn npx ENOENT` error, Node.js is not installed or not in your system PATH. Install Node.js from [nodejs.org](https://nodejs.org) (LTS version), then fully restart Claude Desktop. On Windows, you can also run `where npx` in Command Prompt and use the full path (e.g. `C:\\Program Files\\nodejs\\npx.cmd`) as the `command` value. ### Running on Claude Code Add the Firecrawl MCP server using the Claude Code CLI. You can use the remote hosted URL or run locally: ```bash theme={null} # Remote hosted URL (recommended) claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp # Or run locally via npx claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp ``` ### Running on Google Antigravity Google Antigravity allows you to configure MCP servers directly through its Agent interface. <img alt="Antigravity MCP Installation" /> 1. Open the Agent sidebar in the Editor or the Agent Manager view 2. Click the "..." (More Actions) menu and select MCP Servers 3. Select View raw config to open your local `mcp_config.json` file 4. Add the following configuration: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_FIRECRAWL_API_KEY" } } } } ``` 5. Save the file and click Refresh in the Antigravity MCP interface to see the new tools Replace `YOUR_FIRECRAWL_API_KEY` with your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys). ### Running on n8n To connect the Firecrawl MCP server in n8n: 1. Get your Firecrawl API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) 2. In your n8n workflow, add an AI Agent node 3. In the AI Agent configuration, add a new Tool 4. Select MCP Client Tool as the tool type 5. Enter the MCP server Endpoint (replace `{YOUR_FIRECRAWL_API_KEY}` with your actual API key): ``` https://mcp.firecrawl.dev/{YOUR_FIRECRAWL_API_KEY}/v2/mcp ``` 6. Set Server Transport to HTTP Streamable 7. Set Authentication to None 8. For Tools to include, you can select All, Selected, or All Except - this will expose the Firecrawl tools (scrape, crawl, map, search, extract, etc.) For self-hosted deployments, run the MCP server with npx and enable HTTP transport mode: ```bash theme={null} env HTTP_STREAMABLE_SERVER=true \ FIRECRAWL_API_KEY=fc-YOUR_API_KEY \ FIRECRAWL_API_URL=YOUR_FIRECRAWL_INSTANCE \ npx -y firecrawl-mcp ``` This will start the server on `http://localhost:3000/v2/mcp` which you can use in your n8n workflow as Endpoint. The `HTTP_STREAMABLE_SERVER=true` environment variable is required since n8n needs HTTP transport. ## Configuration ### Environment Variables #### Required for Cloud API * `FIRECRAWL_API_KEY`: Your Firecrawl API key * Required when using cloud API (default) * Optional when using self-hosted instance with `FIRECRAWL_API_URL` * `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances * Example: `https://firecrawl.your-domain.com` * If not provided, the cloud API will be used (requires API key) #### Optional Configuration ##### Retry Configuration * `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3) * `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000) * `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000) * `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2) ##### Credit Usage Monitoring * `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000) * `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100) ### Configuration Examples For cloud API usage with custom retry and credit monitoring: ```bash theme={null} # Required for cloud API export FIRECRAWL_API_KEY=your-api-key # Optional retry configuration export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff # Optional credit monitoring export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits ``` For self-hosted instance: ```bash theme={null} # Required for self-hosted export FIRECRAWL_API_URL=https://firecrawl.your-domain.com # Optional authentication for self-hosted export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth # Custom retry configuration export FIRECRAWL_RETRY_MAX_ATTEMPTS=10 export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries ``` ### Custom configuration with Claude Desktop Add this to your `claude_desktop_config.json`: ```json theme={null} { "mcpServers": { "mcp-server-firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE", "FIRECRAWL_RETRY_MAX_ATTEMPTS": "5", "FIRECRAWL_RETRY_INITIAL_DELAY": "2000", "FIRECRAWL_RETRY_MAX_DELAY": "30000", "FIRECRAWL_RETRY_BACKOFF_FACTOR": "3", "FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000", "FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500" } } } } ``` ### System Configuration The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured: ```typescript theme={null} const CONFIG = { retry: { maxAttempts: 3, // Number of retry attempts for rate-limited requests initialDelay: 1000, // Initial delay before first retry (in milliseconds) maxDelay: 10000, // Maximum delay between retries (in milliseconds) backoffFactor: 2, // Multiplier for exponential backoff }, credit: { warningThreshold: 1000, // Warn when credit usage reaches this level criticalThreshold: 100, // Critical alert when credit usage reaches this level }, }; ``` These configurations control: 1. Retry Behavior * Automatically retries failed requests due to rate limits * Uses exponential backoff to avoid overwhelming the API * Example: With default settings, retries will be attempted at: * 1st retry: 1 second delay * 2nd retry: 2 seconds delay * 3rd retry: 4 seconds delay (capped at maxDelay) 2. Credit Usage Monitoring * Tracks API credit consumption for cloud API usage * Provides warnings at specified thresholds * Helps prevent unexpected service interruption * Example: With default settings: * Warning at 1000 credits remaining * Critical alert at 100 credits remaining ### Rate Limiting and Batch Processing The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities: * Automatic rate limit handling with exponential backoff * Efficient parallel processing for batch operations * Smart request queuing and throttling * Automatic retries for transient errors ## Available Tools ### 1. Scrape Tool (`firecrawl_scrape`) Scrape content from a single URL with advanced options. ```json theme={null} { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["markdown"], "onlyMainContent": true, "waitFor": 1000, "mobile": false, "includeTags": ["article", "main"], "excludeTags": ["nav", "footer"], "skipTlsVerification": false } } ``` ### 2. Map Tool (`firecrawl_map`) Map a website to discover all indexed URLs on the site. ```json theme={null} { "name": "firecrawl_map", "arguments": { "url": "https://example.com", "search": "blog", "sitemap": "include", "includeSubdomains": false, "limit": 100, "ignoreQueryParameters": true } } ``` #### Map Tool Options: * `url`: The base URL of the website to map * `search`: Optional search term to filter URLs * `sitemap`: Control sitemap usage - "include", "skip", or "only" * `includeSubdomains`: Whether to include subdomains in the mapping * `limit`: Maximum number of URLs to return * `ignoreQueryParameters`: Whether to ignore query parameters when mapping Best for: Discovering URLs on a website before deciding what to scrape; finding specific sections of a website. Returns: Array of URLs found on the site. ### 3. Search Tool (`firecrawl_search`) Search the web and optionally extract content from search results. ```json theme={null} { "name": "firecrawl_search", "arguments": { "query": "your search query", "limit": 5, "location": "United States", "tbs": "qdr:m", "scrapeOptions": { "formats": ["markdown"], "onlyMainContent": true } } } ``` #### Search Tool Options: * `query`: The search query string (required) * `limit`: Maximum number of results to return * `location`: Geographic location for search results * `tbs`: Time-based search filter (e.g., `qdr:d` for past day, `qdr:w` for past week, `qdr:m` for past month) * `filter`: Additional search filter * `sources`: Array of source types to search (`web`, `images`, `news`) * `scrapeOptions`: Options for scraping search result pages * `enterprise`: Array of enterprise options (`default`, `anon`, `zdr`) ### 4. Crawl Tool (`firecrawl_crawl`) Start an asynchronous crawl with advanced options. ```json theme={null} { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com", "maxDiscoveryDepth": 2, "limit": 100, "allowExternalLinks": false, "deduplicateSimilarURLs": true } } ``` ### 5. Check Crawl Status (`firecrawl_check_crawl_status`) Check the status of a crawl job. ```json theme={null} { "name": "firecrawl_check_crawl_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` Returns: Status and progress of the crawl job, including results if available. ### 6. Extract Tool (`firecrawl_extract`) Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction. ```json theme={null} { "name": "firecrawl_extract", "arguments": { "urls": ["https://example.com/page1", "https://example.com/page2"], "prompt": "Extract product information including name, price, and description", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } }, "required": ["name", "price"] }, "allowExternalLinks": false, "enableWebSearch": false, "includeSubdomains": false } } ``` Example response: ```json theme={null} { "content": [ { "type": "text", "text": { "name": "Example Product", "price": 99.99, "description": "This is an example product description" } } ], "isError": false } ``` #### Extract Tool Options: * `urls`: Array of URLs to extract information from * `prompt`: Custom prompt for the LLM extraction * `schema`: JSON schema for structured data extraction * `allowExternalLinks`: Allow extraction from external links * `enableWebSearch`: Enable web search for additional context * `includeSubdomains`: Include subdomains in extraction When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service. ### 7. Agent Tool (`firecrawl_agent`) Autonomous web research agent that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. This runs asynchronously -- it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results. ```json theme={null} { "name": "firecrawl_agent", "arguments": { "prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts", "schema": { "type": "object", "properties": { "startups": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "funding": { "type": "string" }, "founded": { "type": "string" } } } } } } } } ``` You can also provide specific URLs for the agent to focus on: ```json theme={null} { "name": "firecrawl_agent", "arguments": { "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], "prompt": "Compare the features and pricing information from these pages" } } ``` #### Agent Tool Options: * `prompt`: Natural language description of the data you want (required, max 10,000 characters) * `urls`: Optional array of URLs to focus the agent on specific pages * `schema`: Optional JSON schema for structured output Best for: Complex research tasks where you don't know the exact URLs; multi-source data gathering; finding information scattered across the web; extracting data from JavaScript-heavy SPAs that fail with regular scrape. Returns: Job ID for status checking. Use `firecrawl_agent_status` to poll for results. ### 8. Check Agent Status (`firecrawl_agent_status`) Check the status of an agent job and retrieve results when complete. Poll every 15-30 seconds and keep polling for at least 2-3 minutes before considering the request failed. ```json theme={null} { "name": "firecrawl_agent_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` #### Agent Status Options: * `id`: The agent job ID returned by `firecrawl_agent` (required) Possible statuses: * `processing`: Agent is still researching -- keep polling * `completed`: Research finished -- response includes the extracted data * `failed`: An error occurred Returns: Status, progress, and results (if completed) of the agent job. ### 9. Create Browser Session (`firecrawl_browser_create`) Create a persistent browser session for code execution via CDP (Chrome DevTools Protocol). ```json theme={null} { "name": "firecrawl_browser_create", "arguments": { "ttl": 120, "activityTtl": 60 } } ``` #### Browser Create Options: * `ttl`: Total session lifetime in seconds (30-3600, optional) * `activityTtl`: Idle timeout in seconds (10-3600, optional) Best for: Running code (Python/JS) that interacts with a live browser page, multi-step browser automation, sessions with profiles that survive across multiple tool calls. Returns: Session ID, CDP URL, and live view URL. ### 10. Execute Code in Browser (`firecrawl_browser_execute`) Execute code in an active browser session. Supports agent-browser commands (bash), Python, or JavaScript. ```json theme={null} { "name": "firecrawl_browser_execute", "arguments": { "sessionId": "session-id-here", "code": "agent-browser open https://example.com", "language": "bash" } } ``` Python example with Playwright: ```json theme={null} { "name": "firecrawl_browser_execute", "arguments": { "sessionId": "session-id-here", "code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)", "language": "python" } } ``` #### Browser Execute Options: * `sessionId`: The browser session ID (required) * `code`: The code to execute (required) * `language`: `bash`, `python`, or `node` (optional, defaults to `bash`) Common agent-browser commands (bash): * `agent-browser open <url>` -- Navigate to URL * `agent-browser snapshot` -- Get accessibility tree with clickable refs * `agent-browser click @e5` -- Click element by ref from snapshot * `agent-browser type @e3 "text"` -- Type into element * `agent-browser screenshot [path]` -- Take screenshot * `agent-browser scroll down` -- Scroll page * `agent-browser wait 2000` -- Wait 2 seconds Returns: Execution result including stdout, stderr, and exit code. ### 11. Delete Browser Session (`firecrawl_browser_delete`) Destroy a browser session. ```json theme={null} { "name": "firecrawl_browser_delete", "arguments": { "sessionId": "session-id-here" } } ``` #### Browser Delete Options: * `sessionId`: The browser session ID to destroy (required) Returns: Success confirmation. ### 12. List Browser Sessions (`firecrawl_browser_list`) List browser sessions, optionally filtered by status. ```json theme={null} { "name": "firecrawl_browser_list", "arguments": { "status": "active" } } ``` #### Browser List Options: * `status`: Filter by session status -- `active` or `destroyed` (optional) Returns: Array of browser sessions. ### 13. Interact with Scraped Page (`firecrawl_interact`) Interact with a previously scraped page in a live browser session. Scrape a page first with `firecrawl_scrape`, then use the returned `scrapeId` (from the scrape response metadata) to click buttons, fill forms, extract dynamic content, or navigate deeper. The response includes a `liveViewUrl` and `interactiveLiveViewUrl` you can open in your browser to watch or control the session in real time. ```json theme={null} { "name": "firecrawl_interact", "arguments": { "scrapeId": "scrape-id-from-previous-scrape", "prompt": "Click the Sign In button" } } ``` #### Interact Tool Options: * `scrapeId`: The scrape job ID from a previous `firecrawl_scrape` call (required) * `prompt`: Natural language instruction describing the action to take (provide `prompt` or `code`) * `code`: Code to execute in the browser session (provide `code` or `prompt`) * `language`: `bash`, `python`, or `node` (optional, defaults to `node`, only used with `code`) * `timeout`: Execution timeout in seconds, 1–300 (optional, defaults to 30) Best for: Multi-step workflows on a single page — searching a site, clicking through results, filling forms, extracting data that requires interaction. Returns: Interaction result including `liveViewUrl` and `interactiveLiveViewUrl`. ### 14. Stop Interact Session (`firecrawl_interact_stop`) Stop an interact session for a scraped page. Call this when you are done interacting to free resources. ```json theme={null} { "name": "firecrawl_interact_stop", "arguments": { "scrapeId": "scrape-id-from-previous-scrape" } } ``` #### Interact Stop Options: * `scrapeId`: The scrape ID for the session to stop (required) Returns: Confirmation that the session has been stopped. ## Logging System The server includes comprehensive logging: * Operation status and progress * Performance metrics * Credit usage monitoring * Rate limit tracking * Error conditions Example log messages: ``` [INFO] Firecrawl MCP Server initialized successfully [INFO] Starting scrape for URL: https://example.com [INFO] Starting crawl for URL: https://example.com [WARNING] Credit usage has reached warning threshold [ERROR] Rate limit exceeded, retrying in 2s... ``` ## Error Handling The server provides robust error handling: * Automatic retries for transient errors * Rate limit handling with backoff * Detailed error messages * Credit usage warnings * Network resilience Example error response: ```json theme={null} { "content": [ { "type": "text", "text": "Error: Rate limit exceeded. Retrying in 2 seconds..." } ], "isError": true } ``` ## Development ```bash theme={null} # Install dependencies npm install # Build npm run build # Run tests npm test ``` ### Contributing 1. Fork the repository 2. Create your feature branch 3. Run tests: `npm test` 4. Submit a pull request ### Thanks to contributors Thanks to [@vrknetha](https://github.com/vrknetha), [@cawstudios](https://caw.tech) for the initial implementation! Thanks to MCP.so and Klavis AI for hosting and [@gstarwd](https://github.com/gstarwd), [@xiangkaiz](https://github.com/xiangkaiz) and [@zihaolin96](https://github.com/zihaolin96) for integrating our server. ## License MIT License - see LICENSE file for details # ASP.NET Core Source: https://docs.firecrawl.dev/quickstarts/aspnet-core Use Firecrawl with ASP.NET Core to search, scrape, and interact with web data using the REST API. ## Prerequisites * .NET 6.0+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Configuration Add your API key to `appsettings.json`: ```json theme={null} { "Firecrawl": { "ApiKey": "fc-YOUR-API-KEY", "BaseUrl": "https://api.firecrawl.dev/v2" } } ``` Or use environment variables / user secrets: ```bash theme={null} export Firecrawl__ApiKey=fc-YOUR-API-KEY ``` ## Create a service Create `Services/FirecrawlService.cs`: ```csharp theme={null} using System.Net.Http.Headers; using System.Text; using System.Text.Json; public class FirecrawlService { private readonly HttpClient _http; private readonly string _baseUrl; public FirecrawlService(IConfiguration config) { _baseUrl = config["Firecrawl:BaseUrl"] ?? "https://api.firecrawl.dev/v2"; _http = new HttpClient(); _http.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", config["Firecrawl:ApiKey"]); } public async Task<JsonDocument> SearchAsync(string query, int limit = 5) { var content = new StringContent( JsonSerializer.Serialize(new { query, limit }), Encoding.UTF8, "application/json"); var response = await _http.PostAsync($"{_baseUrl}/search", content); response.EnsureSuccessStatusCode(); return JsonDocument.Parse(await response.Content.ReadAsStringAsync()); } public async Task<JsonDocument> ScrapeAsync(string url) { var content = new StringContent( JsonSerializer.Serialize(new { url }), Encoding.UTF8, "application/json"); var response = await _http.PostAsync($"{_baseUrl}/scrape", content); response.EnsureSuccessStatusCode(); return JsonDocument.Parse(await response.Content.ReadAsStringAsync()); } public async Task<JsonDocument> InteractAsync(string url, string prompt, string? followUp = null) { // 1. Scrape to open a browser session var scrapeContent = new StringContent( JsonSerializer.Serialize(new { url, formats = new[] { "markdown" } }), Encoding.UTF8, "application/json"); var scrapeRes = await _http.PostAsync($"{_baseUrl}/scrape", scrapeContent); scrapeRes.EnsureSuccessStatusCode(); var scrapeDoc = JsonDocument.Parse(await scrapeRes.Content.ReadAsStringAsync()); var scrapeId = scrapeDoc.RootElement .GetProperty("data").GetProperty("metadata").GetProperty("scrapeId").GetString(); // 2. Send first prompt var firstPrompt = new StringContent( JsonSerializer.Serialize(new { prompt }), Encoding.UTF8, "application/json"); await _http.PostAsync($"{_baseUrl}/scrape/{scrapeId}/interact", firstPrompt); // 3. Send follow-up prompt JsonDocument? result = null; if (followUp != null) { var followUpContent = new StringContent( JsonSerializer.Serialize(new { prompt = followUp }), Encoding.UTF8, "application/json"); var followUpRes = await _http.PostAsync( $"{_baseUrl}/scrape/{scrapeId}/interact", followUpContent); followUpRes.EnsureSuccessStatusCode(); result = JsonDocument.Parse(await followUpRes.Content.ReadAsStringAsync()); } // 4. Close the session await _http.DeleteAsync($"{_baseUrl}/scrape/{scrapeId}/interact"); return result ?? scrapeDoc; } } ``` ## Register and use In `Program.cs`: ```csharp theme={null} var builder = WebApplication.CreateBuilder(args); builder.Services.AddSingleton<FirecrawlService>(); var app = builder.Build(); app.MapPost("/api/search", async (FirecrawlService firecrawl, SearchRequest req) => { var result = await firecrawl.SearchAsync(req.Query, req.Limit); return Results.Ok(result.RootElement); }); app.MapPost("/api/scrape", async (FirecrawlService firecrawl, ScrapeRequest req) => { var result = await firecrawl.ScrapeAsync(req.Url); return Results.Ok(result.RootElement); }); app.MapPost("/api/interact", async (FirecrawlService firecrawl, InteractRequest req) => { var result = await firecrawl.InteractAsync(req.Url, req.Prompt, req.FollowUp); return Results.Ok(result.RootElement); }); app.Run(); record SearchRequest(string Query, int Limit = 5); record ScrapeRequest(string Url); record InteractRequest(string Url, string Prompt, string? FollowUp = null); ``` ## Run it ```bash theme={null} dotnet run ``` ## Test it ```bash theme={null} # Search the web curl -X POST http://localhost:5000/api/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' # Scrape a page curl -X POST http://localhost:5000/api/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Interact with a page curl -X POST http://localhost:5000/api/interact \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "prompt": "Search for iPhone 16 Pro Max", "followUp": "Click on the first result and tell me the price"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> </CardGroup> # Astro Source: https://docs.firecrawl.dev/quickstarts/astro Use Firecrawl with Astro to scrape, search, and interact with web data in your content-driven site. ## Prerequisites * Astro project with SSR enabled (`output: "server"` or `"hybrid"`) * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Create `src/pages/api/search.ts`: ```typescript theme={null} import type { APIRoute } from "astro"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: import.meta.env.FIRECRAWL_API_KEY, }); export const POST: APIRoute = async ({ request }) => { const { query } = await request.json(); const results = await firecrawl.search(query, { limit: 5 }); return new Response(JSON.stringify(results), { headers: { "Content-Type": "application/json" }, }); }; ``` Or search at request time in a server-rendered page (`src/pages/search.astro`): ```astro theme={null} --- import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: import.meta.env.FIRECRAWL_API_KEY, }); const query = Astro.url.searchParams.get("q"); let results = []; if (query) { const searchData = await firecrawl.search(query, { limit: 5 }); results = searchData.web || []; } --- <html> <body> <h1>Search Results</h1> {results.length > 0 ? ( <ul> {results.map((r) => ( <li><a href={r.url}>{r.title}</a></li> ))} </ul> ) : ( <p>Pass ?q= to search the web</p> )} </body> </html> ``` ## Scrape a page Create `src/pages/api/scrape.ts`: ```typescript theme={null} import type { APIRoute } from "astro"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: import.meta.env.FIRECRAWL_API_KEY, }); export const POST: APIRoute = async ({ request }) => { const { url } = await request.json(); const result = await firecrawl.scrape(url); return new Response(JSON.stringify(result), { headers: { "Content-Type": "application/json" }, }); }; ``` Or scrape at request time in a server-rendered page (`src/pages/scrape.astro`): ```astro theme={null} --- import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: import.meta.env.FIRECRAWL_API_KEY, }); const target = Astro.url.searchParams.get("url"); let markdown = null; if (target) { const result = await firecrawl.scrape(target); markdown = result.markdown; } --- <html> <body> <h1>Scraped Content</h1> {markdown ? <pre>{markdown}</pre> : <p>Pass ?url= to scrape a page</p>} </body> </html> ``` ## Interact with a page Create `src/pages/api/interact.ts`: ```typescript theme={null} import type { APIRoute } from "astro"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: import.meta.env.FIRECRAWL_API_KEY, }); export const POST: APIRoute = async () => { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); await firecrawl.stopInteraction(scrapeId); return new Response(JSON.stringify({ output: response.output }), { headers: { "Content-Type": "application/json" }, }); }; ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # AWS Lambda Source: https://docs.firecrawl.dev/quickstarts/aws-lambda Use Firecrawl with AWS Lambda to search, scrape, and interact with web data in serverless functions. ## Prerequisites * AWS account with Lambda access * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} mkdir firecrawl-lambda && cd firecrawl-lambda npm init -y npm install @mendable/firecrawl-js ``` ## Search the web Create `index.mjs` with a search handler: ```javascript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function handler(event) { const body = JSON.parse(event.body || "{}"); if (body.action === "search") { const results = await firecrawl.search(body.query, { limit: 5 }); return { statusCode: 200, body: JSON.stringify(results), }; } return { statusCode: 400, body: JSON.stringify({ error: "Unknown action" }) }; } ``` ## Scrape a page Add a `scrape` action to the same handler: ```javascript theme={null} if (body.action === "scrape") { const result = await firecrawl.scrape(body.url); return { statusCode: 200, body: JSON.stringify(result), }; } ``` ## Interact with a page Add an `interact` action to control a live browser session — click buttons, fill forms, and extract dynamic content: ```javascript theme={null} if (body.action === "interact") { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); await firecrawl.stopInteraction(scrapeId); return { statusCode: 200, body: JSON.stringify({ output: response.output }), }; } ``` ## Deploy Package and deploy with the AWS CLI: ```bash theme={null} zip -r function.zip index.mjs node_modules/ aws lambda create-function \ --function-name firecrawl-scraper \ --runtime nodejs20.x \ --handler index.handler \ --zip-file fileb://function.zip \ --role arn:aws:iam::YOUR_ACCOUNT:role/lambda-role \ --environment Variables="{FIRECRAWL_API_KEY=fc-YOUR-API-KEY}" \ --timeout 60 ``` <Note> Set the Lambda timeout to at least 30 seconds. Scraping dynamic pages and interact sessions can take longer than the default 3-second timeout. </Note> ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Bun Source: https://docs.firecrawl.dev/quickstarts/bun Use Firecrawl with Bun to build fast web scraping and search servers. ## Prerequisites * Bun 1.0+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} bun add @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Bun has a built-in HTTP server. Create `index.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); Bun.serve({ port: 3000, async fetch(req) { const url = new URL(req.url); if (req.method === "POST" && url.pathname === "/search") { const { query } = await req.json(); const results = await firecrawl.search(query, { limit: 5 }); return Response.json(results); } return new Response("Not found", { status: 404 }); }, }); console.log("Server running on port 3000"); ``` Run it: ```bash theme={null} bun run index.ts ``` ## Scrape a page Add a `/scrape` route to the same server: ```typescript theme={null} if (req.method === "POST" && url.pathname === "/scrape") { const { url: targetUrl } = await req.json(); const result = await firecrawl.scrape(targetUrl); return Response.json(result); } ``` ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ```typescript theme={null} if (req.method === "POST" && url.pathname === "/interact") { const { url: targetUrl } = await req.json(); const result = await firecrawl.scrape(targetUrl, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); await firecrawl.stopInteraction(scrapeId); return Response.json({ output: response.output }); } ``` ## Script usage Use Firecrawl in a standalone Bun script: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const app = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); const results = await app.search("firecrawl web scraping", { limit: 5 }); console.log(results); ``` ```bash theme={null} bun run search.ts ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Cloudflare Workers Source: https://docs.firecrawl.dev/quickstarts/cloudflare-workers Use Firecrawl with Cloudflare Workers to search, scrape, and interact with web data at the edge. ## Prerequisites * Wrangler CLI (`npm install -g wrangler`) * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} npm create cloudflare@latest my-scraper cd my-scraper npm install @mendable/firecrawl-js ``` Add your API key as a secret: ```bash theme={null} wrangler secret put FIRECRAWL_API_KEY ``` ## Search the web Create a handler that searches the web and returns results with full page content. Edit `src/index.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; export interface Env { FIRECRAWL_API_KEY: string; } export default { async fetch(request: Request, env: Env): Promise<Response> { const firecrawl = new Firecrawl({ apiKey: env.FIRECRAWL_API_KEY }); const url = new URL(request.url); if (request.method === "POST" && url.pathname === "/search") { const { query } = (await request.json()) as { query: string }; const results = await firecrawl.search(query, { limit: 5 }); return Response.json(results); } return new Response("Not found", { status: 404 }); }, }; ``` ## Scrape a page Add a `/scrape` route to extract clean markdown from any URL. ```typescript theme={null} if (request.method === "POST" && url.pathname === "/scrape") { const { url: targetUrl } = (await request.json()) as { url: string }; const result = await firecrawl.scrape(targetUrl); return Response.json(result); } ``` ## Interact with a page Add an `/interact` route to control a live browser session — click buttons, fill forms, and extract dynamic content. ```typescript theme={null} if (request.method === "POST" && url.pathname === "/interact") { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); await firecrawl.stopInteraction(scrapeId); return Response.json({ output: response.output }); } ``` ## Deploy ```bash theme={null} wrangler deploy ``` ## Test it ```bash theme={null} curl -X POST https://my-scraper.<your-subdomain>.workers.dev/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Deno Deploy Source: https://docs.firecrawl.dev/quickstarts/deno-deploy Use Firecrawl with Deno Deploy to search, scrape, and interact with web data at the edge. ## Prerequisites * Deno 1.40+ or Deno 2 * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup Create `main.ts`: ```typescript theme={null} import Firecrawl from "npm:@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: Deno.env.get("FIRECRAWL_API_KEY"), }); ``` ## Search the web Add a `/search` route that searches the web and returns results with full page content. ```typescript theme={null} Deno.serve(async (req) => { const url = new URL(req.url); if (req.method === "POST" && url.pathname === "/search") { const { query } = await req.json(); const results = await firecrawl.search(query, { limit: 5 }); return Response.json(results); } return new Response("Not found", { status: 404 }); }); ``` ## Scrape a page Add a `/scrape` route to extract clean markdown from any URL. ```typescript theme={null} if (req.method === "POST" && url.pathname === "/scrape") { const { url: targetUrl } = await req.json(); const result = await firecrawl.scrape(targetUrl); return Response.json(result); } ``` ## Interact with a page Add an `/interact` route to control a live browser session — click buttons, fill forms, and extract dynamic content. ```typescript theme={null} if (req.method === "POST" && url.pathname === "/interact") { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); console.log(response.output); await firecrawl.stopInteraction(scrapeId); return Response.json({ output: response.output }); } ``` ## Run locally ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY deno run --allow-net --allow-env main.ts ``` ## Deploy Install the Deno Deploy CLI (`deployctl`) and deploy: ```bash theme={null} deployctl deploy --project=my-scraper main.ts ``` Set the environment variable in the Deno Deploy dashboard or via CLI: ```bash theme={null} deployctl env set FIRECRAWL_API_KEY=fc-YOUR-API-KEY --project=my-scraper ``` ## Test it ```bash theme={null} curl -X POST https://my-scraper.deno.dev/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Django Source: https://docs.firecrawl.dev/quickstarts/django Use Firecrawl with Django to scrape, search, and interact with web data in your Python web application. ## Prerequisites * Django 4+ project * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} pip install firecrawl-py ``` Add your API key to your Django settings or environment: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Create views Add search, scrape, and interact views to your Django app. In `views.py`: ```python theme={null} import json import os from django.http import JsonResponse from django.views.decorators.csrf import csrf_exempt from django.views.decorators.http import require_POST from firecrawl import Firecrawl firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"]) @csrf_exempt @require_POST def search_view(request): body = json.loads(request.body) results = firecrawl.search(body["query"], limit=body.get("limit", 5)) return JsonResponse( [{"title": r.title, "url": r.url} for r in results.web], safe=False, ) @csrf_exempt @require_POST def scrape_view(request): body = json.loads(request.body) result = firecrawl.scrape(body["url"]) return JsonResponse({ "markdown": result.markdown, "metadata": result.metadata, }) @csrf_exempt @require_POST def interact_start_view(request): body = json.loads(request.body) result = firecrawl.scrape(body["url"], formats=["markdown"]) return JsonResponse({"scrape_id": result.metadata.scrape_id}) @csrf_exempt @require_POST def interact_view(request): body = json.loads(request.body) response = firecrawl.interact(body["scrape_id"], prompt=body["prompt"]) return JsonResponse({"output": response.output}) @csrf_exempt @require_POST def interact_stop_view(request): body = json.loads(request.body) firecrawl.stop_interaction(body["scrape_id"]) return JsonResponse({"status": "stopped"}) ``` ## Wire up URLs In `urls.py`: ```python theme={null} from django.urls import path from . import views urlpatterns = [ path("api/search/", views.search_view), path("api/scrape/", views.scrape_view), path("api/interact/start/", views.interact_start_view), path("api/interact/", views.interact_view), path("api/interact/stop/", views.interact_stop_view), ] ``` ## Test it ```bash theme={null} python manage.py runserver # Search the web curl -X POST http://localhost:8000/api/search/ \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping", "limit": 5}' # Scrape a page curl -X POST http://localhost:8000/api/scrape/ \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Start an interactive session curl -X POST http://localhost:8000/api/interact/start/ \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com"}' ``` ## Management command Use Firecrawl in a Django management command for scripts and data pipelines. Create `management/commands/scrape.py`: ```python theme={null} import os from django.core.management.base import BaseCommand from firecrawl import Firecrawl class Command(BaseCommand): help = "Scrape a URL and print the markdown" def add_arguments(self, parser): parser.add_argument("url", type=str) def handle(self, *args, **options): firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"]) result = firecrawl.scrape(options["url"]) self.stdout.write(result.markdown) ``` ```bash theme={null} python manage.py scrape https://example.com ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Python SDK reference" icon="python" href="/sdks/python"> Full SDK reference with crawl, map, async, and more </Card> </CardGroup> # .NET Source: https://docs.firecrawl.dev/quickstarts/dotnet Get started with Firecrawl in .NET. Scrape, search, and interact with web data using the REST API. ## Prerequisites * .NET 6.0+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Search the web Firecrawl works with .NET through the REST API using `HttpClient`. ```csharp theme={null} using System.Net.Http.Headers; using System.Text; using System.Text.Json; var apiKey = Environment.GetEnvironmentVariable("FIRECRAWL_API_KEY"); var client = new HttpClient(); client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey); var content = new StringContent( JsonSerializer.Serialize(new { query = "firecrawl web scraping", limit = 5 }), Encoding.UTF8, "application/json" ); var response = await client.PostAsync("https://api.firecrawl.dev/v2/search", content); var json = await response.Content.ReadAsStringAsync(); Console.WriteLine(json); ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "web": [ { "url": "https://docs.firecrawl.dev", "title": "Firecrawl Documentation", "markdown": "# Firecrawl\n\nFirecrawl is a web scraping API..." } ] } } ``` </Accordion> ## Scrape a page ```csharp theme={null} var scrapeContent = new StringContent( JsonSerializer.Serialize(new { url = "https://example.com" }), Encoding.UTF8, "application/json" ); var scrapeResponse = await client.PostAsync("https://api.firecrawl.dev/v2/scrape", scrapeContent); var scrapeJson = await scrapeResponse.Content.ReadAsStringAsync(); using var doc = JsonDocument.Parse(scrapeJson); var markdown = doc.RootElement.GetProperty("data").GetProperty("markdown").GetString(); Console.WriteLine(markdown); ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } } ``` </Accordion> ## Interact with a page Start a browser session, interact with the page using natural-language prompts, then close the session. ### Step 1 — Scrape to start a session ```csharp theme={null} var sessionContent = new StringContent( JsonSerializer.Serialize(new { url = "https://www.amazon.com", formats = new[] { "markdown" } }), Encoding.UTF8, "application/json" ); var sessionResponse = await client.PostAsync("https://api.firecrawl.dev/v2/scrape", sessionContent); var sessionJson = await sessionResponse.Content.ReadAsStringAsync(); using var sessionDoc = JsonDocument.Parse(sessionJson); var scrapeId = sessionDoc.RootElement .GetProperty("data") .GetProperty("metadata") .GetProperty("scrapeId") .GetString(); Console.WriteLine($"scrapeId: {scrapeId}"); ``` ### Step 2 — Send interactions ```csharp theme={null} var interactUrl = $"https://api.firecrawl.dev/v2/scrape/{scrapeId}/interact"; // Search for a product var searchBody = new StringContent( JsonSerializer.Serialize(new { prompt = "Search for iPhone 16 Pro Max" }), Encoding.UTF8, "application/json" ); var searchResult = await client.PostAsync(interactUrl, searchBody); Console.WriteLine(await searchResult.Content.ReadAsStringAsync()); // Click on the first result var clickBody = new StringContent( JsonSerializer.Serialize(new { prompt = "Click on the first result and tell me the price" }), Encoding.UTF8, "application/json" ); var clickResult = await client.PostAsync(interactUrl, clickBody); Console.WriteLine(await clickResult.Content.ReadAsStringAsync()); ``` ### Step 3 — Stop the session ```csharp theme={null} await client.DeleteAsync(interactUrl); Console.WriteLine("Session stopped"); ``` ## Reusable client class For repeated use, wrap the API in a typed client: ```csharp theme={null} using System.Net.Http.Headers; using System.Text; using System.Text.Json; public class FirecrawlClient { private readonly HttpClient _http; private const string BaseUrl = "https://api.firecrawl.dev/v2"; public FirecrawlClient(string apiKey) { _http = new HttpClient(); _http.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey); } private async Task<JsonDocument> PostAsync(string endpoint, object payload) { var content = new StringContent( JsonSerializer.Serialize(payload), Encoding.UTF8, "application/json" ); var response = await _http.PostAsync($"{BaseUrl}{endpoint}", content); response.EnsureSuccessStatusCode(); var json = await response.Content.ReadAsStringAsync(); return JsonDocument.Parse(json); } public async Task<JsonDocument> ScrapeAsync(string url) { return await PostAsync("/scrape", new { url }); } public async Task<JsonDocument> SearchAsync(string query, int limit = 5) { return await PostAsync("/search", new { query, limit }); } } // Usage var firecrawl = new FirecrawlClient(Environment.GetEnvironmentVariable("FIRECRAWL_API_KEY")!); var result = await firecrawl.SearchAsync("firecrawl web scraping"); Console.WriteLine(result.RootElement); ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> </CardGroup> # Elixir Source: https://docs.firecrawl.dev/quickstarts/elixir Get started with Firecrawl in Elixir. Search, scrape, and interact with web data using the official SDK. ## Prerequisites * Elixir 1.14+ and OTP 25+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK Add `firecrawl` to your `mix.exs`: ```elixir theme={null} defp deps do [ {:firecrawl, "~> 1.0"} ] end ``` Configure your API key in `config/config.exs`: ```elixir theme={null} config :firecrawl, api_key: System.get_env("FIRECRAWL_API_KEY") ``` ## Search the web ```elixir theme={null} {:ok, result} = Firecrawl.search_and_scrape(query: "firecrawl web scraping", limit: 5) for entry <- result.body["data"]["web"] do IO.puts("#{entry["title"]} - #{entry["url"]}") end ``` ## Scrape a page ```elixir theme={null} {:ok, result} = Firecrawl.scrape_and_extract_from_url(url: "https://example.com") IO.puts(result.body["data"]["markdown"]) ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } } ``` </Accordion> ## Interact with a page Scrape a page, then keep working with it using the browser session API. ```elixir theme={null} {:ok, scrape} = Firecrawl.scrape_and_extract_from_url( url: "https://www.amazon.com", formats: ["markdown"] ) scrape_id = get_in(scrape.body, ["data", "metadata", "scrapeId"]) # Use the REST API for interact (prompt-based) headers = [ {"Authorization", "Bearer #{Application.get_env(:firecrawl, :api_key)}"}, {"Content-Type", "application/json"} ] {:ok, _} = Req.post( "https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact", json: %{prompt: "Search for iPhone 16 Pro Max"}, headers: headers ) {:ok, response} = Req.post( "https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact", json: %{prompt: "Click on the first result and tell me the price"}, headers: headers ) IO.inspect(response.body) # Stop the session Req.delete( "https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact", headers: headers ) ``` ## Environment variable Set `FIRECRAWL_API_KEY` instead of hardcoding: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Elixir SDK reference" icon="droplet" href="/sdks/elixir"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Express Source: https://docs.firecrawl.dev/quickstarts/express Use Firecrawl with Express to build web scraping and search APIs. ## Prerequisites * Node.js 18+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} npm install express @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web ```javascript theme={null} import express from "express"; import Firecrawl from "@mendable/firecrawl-js"; const app = express(); app.use(express.json()); const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); app.post("/search", async (req, res) => { try { const { query } = req.body; const results = await firecrawl.search(query, { limit: 5 }); res.json(results); } catch (error) { res.status(500).json({ error: error.message }); } }); app.listen(3000, () => console.log("Server running on port 3000")); ``` ## Scrape a page ```javascript theme={null} app.post("/scrape", async (req, res) => { try { const { url } = req.body; const result = await firecrawl.scrape(url); res.json(result); } catch (error) { res.status(500).json({ error: error.message }); } }); ``` ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ```javascript theme={null} app.post("/interact", async (req, res) => { try { const { url } = req.body; const result = await firecrawl.scrape(url, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); await firecrawl.stopInteraction(scrapeId); res.json({ output: response.output }); } catch (error) { res.status(500).json({ error: error.message }); } }); ``` ## Test it ```bash theme={null} curl -X POST http://localhost:3000/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # FastAPI Source: https://docs.firecrawl.dev/quickstarts/fastapi Use Firecrawl with FastAPI to build async web scraping and search APIs in Python. ## Prerequisites * Python 3.8+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} pip install fastapi uvicorn firecrawl-py ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Create the API Create `main.py`: ```python theme={null} import os from fastapi import FastAPI from pydantic import BaseModel from firecrawl import Firecrawl app = FastAPI() firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"]) class SearchRequest(BaseModel): query: str limit: int = 5 class ScrapeRequest(BaseModel): url: str class InteractRequest(BaseModel): scrape_id: str prompt: str @app.post("/search") async def search(req: SearchRequest): results = firecrawl.search(req.query, limit=req.limit) return [{"title": r.title, "url": r.url} for r in results.web] @app.post("/scrape") async def scrape(req: ScrapeRequest): result = firecrawl.scrape(req.url) return {"markdown": result.markdown, "metadata": result.metadata} @app.post("/interact/start") async def interact_start(req: ScrapeRequest): result = firecrawl.scrape(req.url, formats=["markdown"]) return {"scrape_id": result.metadata.scrape_id} @app.post("/interact") async def interact(req: InteractRequest): response = firecrawl.interact(req.scrape_id, prompt=req.prompt) return {"output": response.output} @app.post("/interact/stop") async def interact_stop(req: InteractRequest): firecrawl.stop_interaction(req.scrape_id) return {"status": "stopped"} ``` ## Run it ```bash theme={null} uvicorn main:app --reload ``` ## Test it ```bash theme={null} # Search the web curl -X POST http://localhost:8000/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping", "limit": 5}' # Scrape a page curl -X POST http://localhost:8000/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Start an interactive session, then send prompts curl -X POST http://localhost:8000/interact/start \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com"}' ``` FastAPI auto-generates interactive docs at `http://localhost:8000/docs`. ## Async variant For better concurrency under load, use `AsyncFirecrawl`: ```python theme={null} from firecrawl import AsyncFirecrawl async_firecrawl = AsyncFirecrawl(api_key=os.environ["FIRECRAWL_API_KEY"]) @app.post("/scrape-async") async def scrape_async(req: ScrapeRequest): result = await async_firecrawl.scrape(req.url) return {"markdown": result.markdown} ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Python SDK reference" icon="python" href="/sdks/python"> Full SDK reference with crawl, map, async, and more </Card> </CardGroup> # Fastify Source: https://docs.firecrawl.dev/quickstarts/fastify Use Firecrawl with Fastify to build high-performance web scraping and search APIs. ## Prerequisites * Node.js 18+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} npm install fastify @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web ```javascript theme={null} import Fastify from "fastify"; import Firecrawl from "@mendable/firecrawl-js"; const fastify = Fastify({ logger: true }); const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); fastify.post("/search", async (request) => { const { query } = request.body; return firecrawl.search(query, { limit: 5 }); }); fastify.listen({ port: 3000 }); ``` ## Scrape a page ```javascript theme={null} fastify.post("/scrape", async (request) => { const { url } = request.body; return firecrawl.scrape(url); }); ``` ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ```javascript theme={null} fastify.post("/interact", async (request) => { const { url } = request.body; const result = await firecrawl.scrape(url, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); await firecrawl.stopInteraction(scrapeId); return { output: response.output }; }); ``` ## As a Fastify plugin Encapsulate the client in a plugin for reuse across routes: ```javascript theme={null} import fp from "fastify-plugin"; import Firecrawl from "@mendable/firecrawl-js"; export default fp(async function firecrawlPlugin(fastify) { const client = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); fastify.decorate("firecrawl", client); }); ``` Register the plugin, then use `fastify.firecrawl` in any route: ```javascript theme={null} fastify.register(firecrawlPlugin); fastify.post("/search", async function (request) { const { query } = request.body; return this.firecrawl.search(query, { limit: 5 }); }); ``` ## Test it ```bash theme={null} curl -X POST http://localhost:3000/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Flask Source: https://docs.firecrawl.dev/quickstarts/flask Use Firecrawl with Flask to build web scraping and search APIs in Python. ## Prerequisites * Python 3.8+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} pip install flask firecrawl-py ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Create the app Create `app.py`: ```python theme={null} import os from flask import Flask, request, jsonify from firecrawl import Firecrawl app = Flask(name) firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"]) @app.post("/search") def search(): data = request.get_json() results = firecrawl.search(data["query"], limit=data.get("limit", 5)) return jsonify([{"title": r.title, "url": r.url} for r in results.web]) @app.post("/scrape") def scrape(): data = request.get_json() result = firecrawl.scrape(data["url"]) return jsonify(markdown=result.markdown, metadata=result.metadata) @app.post("/interact/start") def interact_start(): data = request.get_json() result = firecrawl.scrape(data["url"], formats=["markdown"]) return jsonify(scrape_id=result.metadata.scrape_id) @app.post("/interact") def interact(): data = request.get_json() response = firecrawl.interact(data["scrape_id"], prompt=data["prompt"]) return jsonify(output=response.output) @app.post("/interact/stop") def interact_stop(): data = request.get_json() firecrawl.stop_interaction(data["scrape_id"]) return jsonify(status="stopped") if name == "main": app.run(debug=True) ``` ## Run it ```bash theme={null} flask run ``` ## Test it ```bash theme={null} # Search the web curl -X POST http://localhost:5000/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping", "limit": 5}' # Scrape a page curl -X POST http://localhost:5000/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Start an interactive session curl -X POST http://localhost:5000/interact/start \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com"}' ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Python SDK reference" icon="python" href="/sdks/python"> Full SDK reference with crawl, map, async, and more </Card> </CardGroup> # Go Source: https://docs.firecrawl.dev/quickstarts/go Get started with Firecrawl in Go. Scrape, search, and interact with web data using the REST API. ## Prerequisites * Go 1.21+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Search the web Firecrawl works with Go through the REST API. Use `net/http` to make requests directly. ```go theme={null} package main import ( "bytes" "encoding/json" "fmt" "io" "net/http" "os" ) func main() { apiKey := os.Getenv("FIRECRAWL_API_KEY") body, _ := json.Marshal(map[string]interface{}{ "query": "firecrawl web scraping", "limit": 5, }) req, _ := http.NewRequest("POST", "https://api.firecrawl.dev/v2/search", bytes.NewReader(body)) req.Header.Set("Authorization", "Bearer "+apiKey) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { fmt.Fprintf(os.Stderr, "request failed: %v\n", err) os.Exit(1) } defer resp.Body.Close() result, _ := io.ReadAll(resp.Body) fmt.Println(string(result)) } ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "web": [ { "url": "https://docs.firecrawl.dev", "title": "Firecrawl Documentation", "markdown": "# Firecrawl\n\nFirecrawl is a web scraping API..." } ] } } ``` </Accordion> ## Scrape a page ```go theme={null} body, _ := json.Marshal(map[string]string{ "url": "https://example.com", }) req, _ := http.NewRequest("POST", "https://api.firecrawl.dev/v2/scrape", bytes.NewReader(body)) req.Header.Set("Authorization", "Bearer "+apiKey) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { fmt.Fprintf(os.Stderr, "request failed: %v\n", err) os.Exit(1) } defer resp.Body.Close() result, _ := io.ReadAll(resp.Body) fmt.Println(string(result)) ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } } ``` </Accordion> ## Interact with a page Start a browser session, interact with the page using natural-language prompts, then close the session. ### Step 1 — Scrape to start a session ```go theme={null} body, _ := json.Marshal(map[string]interface{}{ "url": "https://www.amazon.com", "formats": []string{"markdown"}, }) req, _ := http.NewRequest("POST", "https://api.firecrawl.dev/v2/scrape", bytes.NewReader(body)) req.Header.Set("Authorization", "Bearer "+apiKey) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { fmt.Fprintf(os.Stderr, "request failed: %v\n", err) os.Exit(1) } defer resp.Body.Close() var scrapeResult map[string]interface{} json.NewDecoder(resp.Body).Decode(&scrapeResult) data := scrapeResult["data"].(map[string]interface{}) metadata := data["metadata"].(map[string]interface{}) scrapeId := metadata["scrapeId"].(string) fmt.Println("scrapeId:", scrapeId) ``` ### Step 2 — Send interactions ```go theme={null} // Search for a product interactBody, _ := json.Marshal(map[string]string{ "prompt": "Search for iPhone 16 Pro Max", }) interactURL := fmt.Sprintf("https://api.firecrawl.dev/v2/scrape/%s/interact", scrapeId) req, _ = http.NewRequest("POST", interactURL, bytes.NewReader(interactBody)) req.Header.Set("Authorization", "Bearer "+apiKey) req.Header.Set("Content-Type", "application/json") resp, err = http.DefaultClient.Do(req) if err != nil { fmt.Fprintf(os.Stderr, "interact failed: %v\n", err) os.Exit(1) } defer resp.Body.Close() result, _ := io.ReadAll(resp.Body) fmt.Println(string(result)) // Click on the first result interactBody, _ = json.Marshal(map[string]string{ "prompt": "Click on the first result and tell me the price", }) req, _ = http.NewRequest("POST", interactURL, bytes.NewReader(interactBody)) req.Header.Set("Authorization", "Bearer "+apiKey) req.Header.Set("Content-Type", "application/json") resp, err = http.DefaultClient.Do(req) if err != nil { fmt.Fprintf(os.Stderr, "interact failed: %v\n", err) os.Exit(1) } defer resp.Body.Close() result, _ = io.ReadAll(resp.Body) fmt.Println(string(result)) ``` ### Step 3 — Stop the session ```go theme={null} req, _ = http.NewRequest("DELETE", interactURL, nil) req.Header.Set("Authorization", "Bearer "+apiKey) resp, err = http.DefaultClient.Do(req) if err != nil { fmt.Fprintf(os.Stderr, "delete failed: %v\n", err) os.Exit(1) } defer resp.Body.Close() fmt.Println("Session stopped") ``` ## Reusable helper For repeated use, wrap the API in a small helper: ```go theme={null} type FirecrawlClient struct { APIKey string BaseURL string Client http.Client } func NewFirecrawlClient(apiKey string) FirecrawlClient { return &FirecrawlClient{ APIKey: apiKey, BaseURL: "https://api.firecrawl.dev/v2", Client: &http.Client{}, } } func (fc FirecrawlClient) post(endpoint string, payload interface{}) ([]byte, error) { body, err := json.Marshal(payload) if err != nil { return nil, err } req, err := http.NewRequest("POST", fc.BaseURL+endpoint, bytes.NewReader(body)) if err != nil { return nil, err } req.Header.Set("Authorization", "Bearer "+fc.APIKey) req.Header.Set("Content-Type", "application/json") resp, err := fc.Client.Do(req) if err != nil { return nil, err } defer resp.Body.Close() return io.ReadAll(resp.Body) } func (fc FirecrawlClient) Scrape(url string) ([]byte, error) { return fc.post("/scrape", map[string]string{"url": url}) } func (fc FirecrawlClient) Search(query string, limit int) ([]byte, error) { return fc.post("/search", map[string]interface{}{"query": query, "limit": limit}) } ``` <Note> A [community Go SDK](https://github.com/mendableai/firecrawl-go) is available for the v1 API. See the [Go SDK docs](/sdks/go) for details. </Note> ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> </CardGroup> # Hono Source: https://docs.firecrawl.dev/quickstarts/hono Use Firecrawl with Hono to build lightweight web scraping and search APIs that run anywhere. ## Prerequisites Node.js 18+, Bun, or Deno * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} npm install hono @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web ```typescript theme={null} import { Hono } from "hono"; import Firecrawl from "@mendable/firecrawl-js"; const app = new Hono(); const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); app.post("/search", async (c) => { const { query } = await c.req.json(); const results = await firecrawl.search(query, { limit: 5 }); return c.json(results); }); export default app; ``` ## Scrape a page ```typescript theme={null} app.post("/scrape", async (c) => { const { url } = await c.req.json(); const result = await firecrawl.scrape(url); return c.json(result); }); ``` ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ```typescript theme={null} app.post("/interact", async (c) => { const { url } = await c.req.json(); const result = await firecrawl.scrape(url, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); await firecrawl.stopInteraction(scrapeId); return c.json({ output: response.output }); }); ``` ## Deploy anywhere Hono runs on multiple runtimes. For Cloudflare Workers, pass the API key from the environment binding: ```typescript theme={null} import { Hono } from "hono"; import Firecrawl from "@mendable/firecrawl-js"; type Bindings = { FIRECRAWL_API_KEY: string }; const app = new Hono<{ Bindings: Bindings }>(); app.post("/search", async (c) => { const firecrawl = new Firecrawl({ apiKey: c.env.FIRECRAWL_API_KEY }); const { query } = await c.req.json(); const results = await firecrawl.search(query, { limit: 5 }); return c.json(results); }); export default app; ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Java Source: https://docs.firecrawl.dev/quickstarts/java Get started with Firecrawl in Java. Search, scrape, and interact with web data using the official SDK. ## Prerequisites * Java 11+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK <Tabs> <Tab title="Gradle (Kotlin DSL)"> ```kotlin theme={null} dependencies { implementation("com.firecrawl:firecrawl-java:1.2.0") } ``` </Tab> <Tab title="Maven"> ```xml theme={null} <dependency> <groupId>com.firecrawl</groupId> <artifactId>firecrawl-java</artifactId> <version>1.2.0</version> </dependency> ``` </Tab> </Tabs> ## Search the web ```java theme={null} import com.firecrawl.client.FirecrawlClient; import com.firecrawl.models.SearchData; import com.firecrawl.models.SearchOptions; public class Main { public static void main(String[] args) { FirecrawlClient client = FirecrawlClient.builder() .apiKey("fc-YOUR-API-KEY") .build(); SearchData results = client.search( "firecrawl web scraping", SearchOptions.builder().limit(5).build() ); if (results.getWeb() != null) { for (var result : results.getWeb()) { System.out.println(result.get("title") + " - " + result.get("url")); } } } } ``` ## Scrape a page ```java theme={null} import com.firecrawl.models.Document; Document doc = client.scrape("https://example.com"); System.out.println(doc.getMarkdown()); ``` <Accordion title="Example response"> ```json theme={null} { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } ``` </Accordion> ## Interact with a page Open a browser session, run Playwright code against it, and close it when done: ```java theme={null} import com.firecrawl.models.ScrapeOptions; import com.firecrawl.models.BrowserExecuteResponse; import java.util.List; Document doc = client.scrape("https://www.amazon.com", ScrapeOptions.builder().formats(List.of((Object) "markdown")).build()); String scrapeId = (String) doc.getMetadata().get("scrapeId"); BrowserExecuteResponse run = client.interact(scrapeId, "const title = await page.title(); console.log(title);"); System.out.println(run.getStdout()); client.stopInteractiveBrowser(scrapeId); ``` ## Environment variable Instead of passing `apiKey` directly, set the `FIRECRAWL_API_KEY` environment variable: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ```java theme={null} FirecrawlClient client = FirecrawlClient.fromEnv(); ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Java SDK reference" icon="java" href="/sdks/java"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Laravel Source: https://docs.firecrawl.dev/quickstarts/laravel Use Firecrawl with Laravel to search, scrape, and interact with web data using the REST API. ## Prerequisites * Laravel 10+ project * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Configuration Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` Add the config entry to `config/services.php`: ```php theme={null} 'firecrawl' => [ 'api_key' => env('FIRECRAWL_API_KEY'), 'base_url' => env('FIRECRAWL_API_URL', 'https://api.firecrawl.dev/v2'), ], ``` ## Create a service class Create `app/Services/FirecrawlService.php`: ```php theme={null} <?php namespace App\Services; use Illuminate\Support\Facades\Http; class FirecrawlService { private string $apiKey; private string $baseUrl; public function __construct() { $this->apiKey = config('services.firecrawl.api_key'); $this->baseUrl = config('services.firecrawl.base_url'); } public function search(string $query, int $limit = 5): array { $response = Http::withToken($this->apiKey) ->post("{$this->baseUrl}/search", [ 'query' => $query, 'limit' => $limit, ]); return $response->json(); } public function scrape(string $url, array $options = []): array { $response = Http::withToken($this->apiKey) ->post("{$this->baseUrl}/scrape", array_merge(['url' => $url], $options)); return $response->json(); } public function interact(string $url, string $prompt, ?string $followUp = null): array { // 1. Scrape to open a browser session $scrapeResult = $this->scrape($url, ['formats' => ['markdown']]); $scrapeId = $scrapeResult['data']['metadata']['scrapeId']; // 2. Send first prompt Http::withToken($this->apiKey) ->post("{$this->baseUrl}/scrape/{$scrapeId}/interact", [ 'prompt' => $prompt, ]); // 3. Send follow-up prompt $result = null; if ($followUp) { $result = Http::withToken($this->apiKey) ->post("{$this->baseUrl}/scrape/{$scrapeId}/interact", [ 'prompt' => $followUp, ])->json(); } // 4. Close the session Http::withToken($this->apiKey) ->delete("{$this->baseUrl}/scrape/{$scrapeId}/interact"); return $result ?? $scrapeResult; } } ``` ## Create a controller Create `app/Http/Controllers/FirecrawlController.php`: ```php theme={null} <?php namespace App\Http\Controllers; use App\Services\FirecrawlService; use Illuminate\Http\Request; class FirecrawlController extends Controller { public function __construct(private FirecrawlService $firecrawl) {} public function search(Request $request) { $validated = $request->validate(['query' => 'required|string']); return response()->json( $this->firecrawl->search($validated['query'], $request->input('limit', 5)) ); } public function scrape(Request $request) { $validated = $request->validate(['url' => 'required|url']); return response()->json($this->firecrawl->scrape($validated['url'])); } public function interact(Request $request) { $validated = $request->validate([ 'url' => 'required|url', 'prompt' => 'required|string', ]); return response()->json( $this->firecrawl->interact( $validated['url'], $validated['prompt'], $request->input('followUp') ) ); } } ``` ## Register routes In `routes/api.php`: ```php theme={null} use App\Http\Controllers\FirecrawlController; Route::post('/search', [FirecrawlController::class, 'search']); Route::post('/scrape', [FirecrawlController::class, 'scrape']); Route::post('/interact', [FirecrawlController::class, 'interact']); ``` ## Test it ```bash theme={null} php artisan serve # Search the web curl -X POST http://localhost:8000/api/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' # Scrape a page curl -X POST http://localhost:8000/api/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Interact with a page curl -X POST http://localhost:8000/api/interact \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "prompt": "Search for iPhone 16 Pro Max", "followUp": "Click on the first result and tell me the price"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> </CardGroup> # NestJS Source: https://docs.firecrawl.dev/quickstarts/nestjs Use Firecrawl with NestJS to build structured web scraping and search services. ## Prerequisites * NestJS project (`@nestjs/cli`) * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Create a Firecrawl service Create `src/firecrawl/firecrawl.service.ts`: ```typescript theme={null} import { Injectable } from "@nestjs/common"; import Firecrawl from "@mendable/firecrawl-js"; @Injectable() export class FirecrawlService { private readonly client: Firecrawl; constructor() { this.client = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); } async search(query: string, limit = 5) { return this.client.search(query, { limit }); } async scrape(url: string) { return this.client.scrape(url); } async interact(url: string, prompts: string[]) { const result = await this.client.scrape(url, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; const responses = []; for (const prompt of prompts) { const response = await this.client.interact(scrapeId, { prompt }); responses.push(response); } await this.client.stopInteraction(scrapeId); return responses; } } ``` ## Create a controller Create `src/firecrawl/firecrawl.controller.ts`: ```typescript theme={null} import { Body, Controller, Post } from "@nestjs/common"; import { FirecrawlService } from "./firecrawl.service"; @Controller("firecrawl") export class FirecrawlController { constructor(private readonly firecrawlService: FirecrawlService) {} @Post("search") async search(@Body("query") query: string) { return this.firecrawlService.search(query); } @Post("scrape") async scrape(@Body("url") url: string) { return this.firecrawlService.scrape(url); } @Post("interact") async interact(@Body("url") url: string, @Body("prompts") prompts: string[]) { return this.firecrawlService.interact(url, prompts); } } ``` ## Register the module Create `src/firecrawl/firecrawl.module.ts`: ```typescript theme={null} import { Module } from "@nestjs/common"; import { FirecrawlService } from "./firecrawl.service"; import { FirecrawlController } from "./firecrawl.controller"; @Module({ providers: [FirecrawlService], controllers: [FirecrawlController], exports: [FirecrawlService], }) export class FirecrawlModule {} ``` Import `FirecrawlModule` in your `AppModule`. ## Test it ```bash theme={null} curl -X POST http://localhost:3000/firecrawl/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Next.js Source: https://docs.firecrawl.dev/quickstarts/nextjs Use Firecrawl with Next.js to scrape, search, and interact with web data in your React application. ## Prerequisites * Next.js 14+ project (App Router) * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` ## Set your API key Add your API key to `.env.local`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web The SDK should only run server-side since it requires your API key. ### Route Handler Create `app/api/search/route.ts`: ```typescript theme={null} import { NextResponse } from "next/server"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function POST(request: Request) { const { query } = await request.json(); const results = await firecrawl.search(query, { limit: 5 }); return NextResponse.json(results); } ``` ### Server Action Create `app/actions.ts` for use from Client Components: ```typescript theme={null} "use server"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function searchWeb(query: string) { const results = await firecrawl.search(query, { limit: 5 }); return (results.web || []).map((r) => ({ title: r.title, url: r.url })); } ``` ## Scrape a page ### Route Handler Create `app/api/scrape/route.ts`: ```typescript theme={null} import { NextResponse } from "next/server"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function POST(request: Request) { const { url } = await request.json(); const result = await firecrawl.scrape(url); return NextResponse.json(result); } ``` ### Server Component Fetch data directly in a Server Component at `app/page.tsx`: ```tsx theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export default async function Page() { const result = await firecrawl.scrape("https://example.com"); return ( <article> <h1>Scraped Content</h1> <pre>{result.markdown}</pre> </article> ); } ``` ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ### Route Handler Create `app/api/interact/route.ts`: ```typescript theme={null} import { NextResponse } from "next/server"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function POST(request: Request) { const { url, prompts } = await request.json(); const result = await firecrawl.scrape(url, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); await firecrawl.stopInteraction(scrapeId); return NextResponse.json({ output: response.output }); } ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Node.js Source: https://docs.firecrawl.dev/quickstarts/nodejs Get started with Firecrawl in Node.js. Scrape, search, and interact with web data using the official SDK. ## Prerequisites * Node.js 18+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` ## Environment variable Instead of passing `apiKey` directly, set the `FIRECRAWL_API_KEY` environment variable: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ```javascript theme={null} const app = new Firecrawl(); ``` ## Search the web ```javascript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const app = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await app.search("firecrawl web scraping", { limit: 5 }); for (const result of results.web) { console.log(result.title, result.url); } ``` ## Scrape a page ```javascript theme={null} const result = await app.scrape("https://example.com"); console.log(result.markdown); ``` <Accordion title="Example response"> ```json theme={null} { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } ``` </Accordion> ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ```javascript theme={null} const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); console.log(response.output); await app.stopInteraction(scrapeId); ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Nuxt Source: https://docs.firecrawl.dev/quickstarts/nuxt Use Firecrawl with Nuxt to scrape, search, and interact with web data in your Vue application. ## Prerequisites * Nuxt 3+ project * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Create `server/api/search.post.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY, }); export default defineEventHandler(async (event) => { const { query } = await readBody(event); const results = await firecrawl.search(query, { limit: 5 }); return results; }); ``` Call it from a Vue component: ```vue theme={null} <script setup> const query = ref(""); const { data, execute } = useFetch("/api/search", { method: "POST", body: { query }, immediate: false, }); </script> <template> <div> <input v-model="query" placeholder="Search the web..." /> <button @click="execute()">Search</button> <ul v-if="data?.web"> <li v-for="result in data.web" :key="result.url"> <a :href="result.url">{{ result.title }}</a> </li> </ul> </div> </template> ``` ## Scrape a page Create `server/api/scrape.post.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY, }); export default defineEventHandler(async (event) => { const { url } = await readBody(event); const result = await firecrawl.scrape(url); return result; }); ``` Call it from a Vue component: ```vue theme={null} <script setup> const url = ref("https://example.com"); const { data, execute } = useFetch("/api/scrape", { method: "POST", body: { url }, immediate: false, }); </script> <template> <div> <input v-model="url" placeholder="Enter URL" /> <button @click="execute()">Scrape</button> <pre v-if="data">{{ data.markdown }}</pre> </div> </template> ``` ## Interact with a page Create `server/api/interact.post.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY, }); export default defineEventHandler(async (event) => { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); await firecrawl.stopInteraction(scrapeId); return { output: response.output }; }); ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # PHP Source: https://docs.firecrawl.dev/quickstarts/php Get started with Firecrawl in PHP. Scrape, search, and interact with web data using the REST API. ## Prerequisites * PHP 8.0+ with cURL extension * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Search the web Firecrawl works with PHP through the REST API using cURL. ```php theme={null} <?php $apiKey = getenv('FIRECRAWL_API_KEY'); $ch = curl_init('https://api.firecrawl.dev/v2/search'); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_HTTPHEADER => [ 'Authorization: Bearer ' . $apiKey, 'Content-Type: application/json', ], CURLOPT_POSTFIELDS => json_encode([ 'query' => 'firecrawl web scraping', 'limit' => 5, ]), ]); $response = curl_exec($ch); curl_close($ch); $results = json_decode($response, true); foreach ($results['data']['web'] as $result) { echo $result['title'] . ' - ' . $result['url'] . "\n"; } ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "web": [ { "url": "https://docs.firecrawl.dev", "title": "Firecrawl Documentation", "markdown": "# Firecrawl\n\nFirecrawl is a web scraping API..." } ] } } ``` </Accordion> ## Scrape a page ```php theme={null} <?php $ch = curl_init('https://api.firecrawl.dev/v2/scrape'); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_HTTPHEADER => [ 'Authorization: Bearer ' . $apiKey, 'Content-Type: application/json', ], CURLOPT_POSTFIELDS => json_encode([ 'url' => 'https://example.com', ]), ]); $response = curl_exec($ch); curl_close($ch); $data = json_decode($response, true); echo $data['data']['markdown']; ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } } ``` </Accordion> ## Interact with a page Start a browser session, interact with the page using natural-language prompts, then close the session. ### Step 1 — Scrape to start a session ```php theme={null} <?php $ch = curl_init('https://api.firecrawl.dev/v2/scrape'); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_HTTPHEADER => [ 'Authorization: Bearer ' . $apiKey, 'Content-Type: application/json', ], CURLOPT_POSTFIELDS => json_encode([ 'url' => 'https://www.amazon.com', 'formats' => ['markdown'], ]), ]); $response = curl_exec($ch); curl_close($ch); $scrapeResult = json_decode($response, true); $scrapeId = $scrapeResult['data']['metadata']['scrapeId']; echo "scrapeId: $scrapeId\n"; ``` ### Step 2 — Send interactions ```php theme={null} <?php $interactUrl = "https://api.firecrawl.dev/v2/scrape/$scrapeId/interact"; $headers = [ 'Authorization: Bearer ' . $apiKey, 'Content-Type: application/json', ]; // Search for a product $ch = curl_init($interactUrl); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_HTTPHEADER => $headers, CURLOPT_POSTFIELDS => json_encode([ 'prompt' => 'Search for iPhone 16 Pro Max', ]), ]); $response = curl_exec($ch); curl_close($ch); echo $response . "\n"; // Click on the first result $ch = curl_init($interactUrl); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_HTTPHEADER => $headers, CURLOPT_POSTFIELDS => json_encode([ 'prompt' => 'Click on the first result and tell me the price', ]), ]); $response = curl_exec($ch); curl_close($ch); echo $response . "\n"; ``` ### Step 3 — Stop the session ```php theme={null} <?php $ch = curl_init($interactUrl); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_CUSTOMREQUEST => 'DELETE', CURLOPT_HTTPHEADER => [ 'Authorization: Bearer ' . $apiKey, ], ]); curl_exec($ch); curl_close($ch); echo "Session stopped\n"; ``` ## Reusable helper class For repeated use, wrap the API in a class: ```php theme={null} <?php class Firecrawl { private string $apiKey; private string $baseUrl = 'https://api.firecrawl.dev/v2'; public function __construct(string $apiKey) { $this->apiKey = $apiKey; } private function post(string $endpoint, array $payload): array { $ch = curl_init($this->baseUrl . $endpoint); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_HTTPHEADER => [ 'Authorization: Bearer ' . $this->apiKey, 'Content-Type: application/json', ], CURLOPT_POSTFIELDS => json_encode($payload), ]); $response = curl_exec($ch); $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if ($httpCode >= 400) { throw new \RuntimeException("Firecrawl API error: HTTP $httpCode"); } return json_decode($response, true); } public function scrape(string $url, array $options = []): array { return $this->post('/scrape', array_merge(['url' => $url], $options)); } public function search(string $query, int $limit = 5): array { return $this->post('/search', ['query' => $query, 'limit' => $limit]); } } // Usage $app = new Firecrawl(getenv('FIRECRAWL_API_KEY')); $result = $app->scrape('https://example.com'); echo $result['data']['markdown']; ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> </CardGroup> # Python Source: https://docs.firecrawl.dev/quickstarts/python Get started with Firecrawl in Python. Scrape, search, and interact with web data using the official SDK. ## Prerequisites * Python 3.8+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} pip install firecrawl-py ``` ## Search the web ```python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") results = app.search("firecrawl web scraping", limit=5) for result in results.web: print(result.title, result.url) ``` ## Scrape a page ```python theme={null} result = app.scrape("https://example.com") print(result.markdown) ``` <Accordion title="Example response"> ```json theme={null} { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } ``` </Accordion> ## Interact with a page Use Interact to control a live browser session — click buttons, fill forms, and extract dynamic content. ```python theme={null} result = app.scrape("https://www.amazon.com", formats=["markdown"]) scrape_id = result.metadata.scrape_id app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max") response = app.interact(scrape_id, prompt="Click on the first result and tell me the price") print(response.output) app.stop_interaction(scrape_id) ``` ## Environment variable Instead of passing `api_key` directly, set the `FIRECRAWL_API_KEY` environment variable: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ```python theme={null} app = Firecrawl() ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Python SDK reference" icon="python" href="/sdks/python"> Full SDK reference with crawl, map, async, and more </Card> </CardGroup> # Rails Source: https://docs.firecrawl.dev/quickstarts/rails Use Firecrawl with Ruby on Rails to search, scrape, and interact with web data using the REST API. ## Prerequisites * Ruby 3.0+ and Rails 7+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Configuration Add your API key to your Rails credentials or environment: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Create a service Create `app/services/firecrawl_service.rb`: ```ruby theme={null} require "net/http" require "json" require "uri" class FirecrawlService BASE_URL = "https://api.firecrawl.dev/v2" def initialize(api_key: ENV.fetch("FIRECRAWL_API_KEY")) @api_key = api_key end def search(query, limit: 5) post("/search", { query: query, limit: limit }) end def scrape(url, **options) post("/scrape", { url: url }.merge(options)) end def interact(url, prompt, follow_up: nil) # 1. Scrape to open a browser session scrape_result = scrape(url, formats: ["markdown"]) scrape_id = scrape_result.dig("data", "metadata", "scrapeId") # 2. Send first prompt post("/scrape/#{scrape_id}/interact", { prompt: prompt }) # 3. Send follow-up prompt result = nil if follow_up result = post("/scrape/#{scrape_id}/interact", { prompt: follow_up }) end # 4. Close the session delete("/scrape/#{scrape_id}/interact") result || scrape_result end private def post(endpoint, payload) uri = URI("#{BASE_URL}#{endpoint}") request = Net::HTTP::Post.new(uri) request["Authorization"] = "Bearer #{@api_key}" request["Content-Type"] = "application/json" request.body = payload.to_json response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http| http.request(request) end JSON.parse(response.body) end def delete(endpoint) uri = URI("#{BASE_URL}#{endpoint}") request = Net::HTTP::Delete.new(uri) request["Authorization"] = "Bearer #{@api_key}" Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http| http.request(request) end end end ``` ## Create a controller Generate a controller: ```bash theme={null} rails generate controller Firecrawl search scrape interact --skip-routes ``` Edit `app/controllers/firecrawl_controller.rb`: ```ruby theme={null} class FirecrawlController < ApplicationController skip_before_action :verify_authenticity_token def search service = FirecrawlService.new result = service.search(params.require(:query), limit: params.fetch(:limit, 5).to_i) render json: result end def scrape service = FirecrawlService.new result = service.scrape(params.require(:url)) render json: result end def interact service = FirecrawlService.new result = service.interact( params.require(:url), params.require(:prompt), follow_up: params[:followUp] ) render json: result end end ``` ## Add routes In `config/routes.rb`: ```ruby theme={null} Rails.application.routes.draw do post "api/search", to: "firecrawl#search" post "api/scrape", to: "firecrawl#scrape" post "api/interact", to: "firecrawl#interact" end ``` ## Test it ```bash theme={null} rails server # Search the web curl -X POST http://localhost:3000/api/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' # Scrape a page curl -X POST http://localhost:3000/api/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Interact with a page curl -X POST http://localhost:3000/api/interact \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "prompt": "Search for iPhone 16 Pro Max", "followUp": "Click on the first result and tell me the price"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> </CardGroup> # Remix Source: https://docs.firecrawl.dev/quickstarts/remix Use Firecrawl with Remix to scrape, search, and interact with web data in your full-stack React app. ## Prerequisites * Remix project * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Use Firecrawl in an `action` to handle form submissions. Create `app/routes/search.tsx`: ```tsx theme={null} import { json, type ActionFunctionArgs } from "@remix-run/node"; import { Form, useActionData } from "@remix-run/react"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function action({ request }: ActionFunctionArgs) { const formData = await request.formData(); const query = formData.get("query") as string; const results = await firecrawl.search(query, { limit: 5 }); return json({ results: (results.web || []).map((r) => ({ title: r.title, url: r.url })) }); } export default function SearchPage() { const data = useActionData<typeof action>(); return ( <div> <Form method="post"> <input name="query" placeholder="Search the web..." /> <button type="submit">Search</button> </Form> {data?.results?.map((r, i) => ( <div key={i}> <a href={r.url}>{r.title}</a> </div> ))} </div> ); } ``` ## Scrape a page Use Firecrawl in a `loader` to fetch data at request time. Create `app/routes/scrape.tsx`: ```tsx theme={null} import { json, type LoaderFunctionArgs } from "@remix-run/node"; import { useLoaderData } from "@remix-run/react"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function loader({ request }: LoaderFunctionArgs) { const url = new URL(request.url); const target = url.searchParams.get("url"); if (!target) return json({ markdown: null }); const result = await firecrawl.scrape(target); return json({ markdown: result.markdown }); } export default function ScrapePage() { const { markdown } = useLoaderData<typeof loader>(); return ( <div> <h1>Scraped Content</h1> {markdown ? <pre>{markdown}</pre> : <p>Pass ?url= to scrape a page</p>} </div> ); } ``` ## Interact with a page Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. Create `app/routes/interact.tsx`: ```tsx theme={null} import { json, type ActionFunctionArgs } from "@remix-run/node"; import { Form, useActionData } from "@remix-run/react"; import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function action({ request }: ActionFunctionArgs) { const formData = await request.formData(); const url = formData.get("url") as string; const result = await firecrawl.scrape(url, { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); await firecrawl.stopInteraction(scrapeId); return json({ output: response.output }); } export default function InteractPage() { const data = useActionData<typeof action>(); return ( <div> <Form method="post"> <input name="url" placeholder="URL to interact with..." /> <button type="submit">Interact</button> </Form> {data?.output && <pre>{data.output}</pre>} </div> ); } ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Ruby Source: https://docs.firecrawl.dev/quickstarts/ruby Get started with Firecrawl in Ruby. Search, scrape, and interact with web data using the REST API. ## Prerequisites * Ruby 3.0+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Search the web Firecrawl works with Ruby through the REST API using `net/http`. ```ruby theme={null} require "net/http" require "json" require "uri" api_key = ENV.fetch("FIRECRAWL_API_KEY") uri = URI("https://api.firecrawl.dev/v2/search") request = Net::HTTP::Post.new(uri) request["Authorization"] = "Bearer #{api_key}" request["Content-Type"] = "application/json" request.body = { query: "firecrawl web scraping", limit: 5 }.to_json response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(request) } results = JSON.parse(response.body) results["data"]["web"].each do |result| puts "#{result['title']} - #{result['url']}" end ``` ## Scrape a page ```ruby theme={null} uri = URI("https://api.firecrawl.dev/v2/scrape") request = Net::HTTP::Post.new(uri) request["Authorization"] = "Bearer #{api_key}" request["Content-Type"] = "application/json" request.body = { url: "https://example.com" }.to_json response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(request) } data = JSON.parse(response.body) puts data.dig("data", "markdown") ``` <Accordion title="Example response"> ```json theme={null} { "success": true, "data": { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } } ``` </Accordion> ## Interact with a page Scrape a page, then keep working with it using natural language prompts. ```ruby theme={null} uri = URI("https://api.firecrawl.dev/v2/scrape") request = Net::HTTP::Post.new(uri) request["Authorization"] = "Bearer #{api_key}" request["Content-Type"] = "application/json" request.body = { url: "https://www.amazon.com", formats: ["markdown"] }.to_json response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(request) } scrape_id = JSON.parse(response.body).dig("data", "metadata", "scrapeId") interact_uri = URI("https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact") interact_req = Net::HTTP::Post.new(interact_uri) interact_req["Authorization"] = "Bearer #{api_key}" interact_req["Content-Type"] = "application/json" interact_req.body = { prompt: "Search for iPhone 16 Pro Max" }.to_json interact_resp = Net::HTTP.start(interact_uri.hostname, interact_uri.port, use_ssl: true) { |http| http.request(interact_req) } puts JSON.parse(interact_resp.body) # Stop the session delete_uri = URI("https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact") delete_req = Net::HTTP::Delete.new(delete_uri) delete_req["Authorization"] = "Bearer #{api_key}" Net::HTTP.start(delete_uri.hostname, delete_uri.port, use_ssl: true) { |http| http.request(delete_req) } ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete REST API documentation </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> </CardGroup> # Rust Source: https://docs.firecrawl.dev/quickstarts/rust Get started with Firecrawl in Rust. Search, scrape, and interact with web data using the official SDK. ## Prerequisites * Rust 1.70+ with Cargo * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the crate Add `firecrawl` to your `Cargo.toml`: ```toml theme={null} [dependencies] firecrawl = "2" tokio = { version = "1", features = ["full"] } serde_json = "1" ``` ## Search the web ```rust theme={null} use firecrawl::{Client, SearchOptions}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let client = Client::new("fc-YOUR-API-KEY")?; let results = client.search( "firecrawl web scraping", SearchOptions { limit: Some(5), ..Default::default() }, ).await?; if let Some(web) = results.data.web { for item in web { if let firecrawl::SearchResultOrDocument::WebResult(r) = item { println!("{} - {}", r.url, r.title.unwrap_or_default()); } } } Ok(()) } ``` ## Scrape a page ```rust theme={null} let doc = client.scrape("https://example.com", None).await?; println!("{}", doc.markdown.unwrap_or_default()); ``` <Accordion title="Example response"> ```json theme={null} { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "metadata": { "title": "Example Domain", "sourceURL": "https://example.com" } } ``` </Accordion> ## Interact with a page Scrape a page to get a `scrapeId`, then use the interact API to control the browser session: ```rust theme={null} use firecrawl::{Client, ScrapeOptions, Format, ScrapeExecuteOptions}; let doc = client.scrape( "https://www.amazon.com", ScrapeOptions { formats: Some(vec![Format::Markdown]), ..Default::default() }, ).await?; let scrape_id = doc.metadata .as_ref() .and_then(|m| m.scrape_id.as_deref()) .expect("scrapeId not found"); // Send a prompt to interact with the page let run = client.interact( scrape_id, ScrapeExecuteOptions { prompt: Some("Search for iPhone 16 Pro Max".to_string()), ..Default::default() }, ).await?; let run = client.interact( scrape_id, ScrapeExecuteOptions { prompt: Some("Click on the first result and tell me the price".to_string()), ..Default::default() }, ).await?; println!("{:?}", run.output); // Close the session client.stop_interaction(scrape_id).await?; ``` ## Environment variable Set `FIRECRAWL_API_KEY` instead of passing the key directly: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ```rust theme={null} let api_key = std::env::var("FIRECRAWL_API_KEY")?; let client = Client::new(api_key)?; ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Rust SDK reference" icon="rust" href="/sdks/rust"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Spring Boot Source: https://docs.firecrawl.dev/quickstarts/spring-boot Use Firecrawl with Spring Boot to search, scrape, and interact with web data using the official Java SDK. ## Prerequisites * Java 17+ and Spring Boot 3+ * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Add the dependency <Tabs> <Tab title="Gradle (Kotlin DSL)"> ```kotlin theme={null} dependencies { implementation("com.firecrawl:firecrawl-java:1.2.0") } ``` </Tab> <Tab title="Maven"> ```xml theme={null} <dependency> <groupId>com.firecrawl</groupId> <artifactId>firecrawl-java</artifactId> <version>1.2.0</version> </dependency> ``` </Tab> </Tabs> ## Configuration Add your API key to `application.properties`: ```properties theme={null} firecrawl.api-key=${FIRECRAWL_API_KEY} ``` Or set it as an environment variable: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Create a configuration bean Create `FirecrawlConfig.java`: ```java theme={null} import com.firecrawl.client.FirecrawlClient; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; @Configuration public class FirecrawlConfig { @Bean public FirecrawlClient firecrawlClient( @Value("${firecrawl.api-key}") String apiKey) { return FirecrawlClient.builder() .apiKey(apiKey) .build(); } } ``` ## Create a REST controller Create `FirecrawlController.java`: ```java theme={null} import com.firecrawl.client.FirecrawlClient; import com.firecrawl.models.Document; import com.firecrawl.models.SearchData; import com.firecrawl.models.SearchOptions; import com.firecrawl.models.ScrapeOptions; import com.firecrawl.models.BrowserExecuteResponse; import org.springframework.web.bind.annotation.; import java.util.List; import java.util.Map; @RestController @RequestMapping("/api") public class FirecrawlController { private final FirecrawlClient firecrawl; public FirecrawlController(FirecrawlClient firecrawl) { this.firecrawl = firecrawl; } @PostMapping("/search") public SearchData search(@RequestBody Map<String, Object> body) { return firecrawl.search( (String) body.get("query"), SearchOptions.builder() .limit((int) body.getOrDefault("limit", 5)) .build() ); } @PostMapping("/scrape") public Map<String, Object> scrape(@RequestBody Map<String, String> body) { Document doc = firecrawl.scrape(body.get("url")); return Map.of( "markdown", doc.getMarkdown(), "metadata", doc.getMetadata() ); } @PostMapping("/interact") public Map<String, Object> interact(@RequestBody Map<String, String> body) { Document doc = firecrawl.scrape(body.get("url"), ScrapeOptions.builder().formats(List.of((Object) "markdown")).build()); String scrapeId = (String) doc.getMetadata().get("scrapeId"); BrowserExecuteResponse response = firecrawl.interact(scrapeId, body.getOrDefault("code", "const title = await page.title(); console.log(title);")); firecrawl.stopInteractiveBrowser(scrapeId); return Map.of("result", response.getStdout()); } } ``` ## Run it ```bash theme={null} ./gradlew bootRun ``` ## Test it ```bash theme={null} # Search the web curl -X POST http://localhost:8080/api/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' # Scrape a page curl -X POST http://localhost:8080/api/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Interact with a page curl -X POST http://localhost:8080/api/interact \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "code": "const title = await page.title(); console.log(title);"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Java SDK reference" icon="java" href="/sdks/java"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Supabase Edge Functions Source: https://docs.firecrawl.dev/quickstarts/supabase-edge-functions Use Firecrawl with Supabase Edge Functions to search, scrape, and interact with web data at the edge. ## Prerequisites Supabase project with CLI (`supabase init`) * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} supabase functions new firecrawl-search supabase functions new firecrawl-scrape supabase functions new firecrawl-interact ``` Set the secret: ```bash theme={null} supabase secrets set FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Edit `supabase/functions/firecrawl-search/index.ts`: ```typescript theme={null} import Firecrawl from "npm:@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: Deno.env.get("FIRECRAWL_API_KEY"), }); Deno.serve(async (req) => { const { query } = await req.json(); const results = await firecrawl.search(query, { limit: 5 }); return new Response(JSON.stringify(results), { headers: { "Content-Type": "application/json" }, }); }); ``` ## Scrape a page Edit `supabase/functions/firecrawl-scrape/index.ts`: ```typescript theme={null} import Firecrawl from "npm:@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: Deno.env.get("FIRECRAWL_API_KEY"), }); Deno.serve(async (req) => { const { url } = await req.json(); const result = await firecrawl.scrape(url); return new Response(JSON.stringify(result), { headers: { "Content-Type": "application/json" }, }); }); ``` ## Interact with a page Edit `supabase/functions/firecrawl-interact/index.ts`: ```typescript theme={null} import Firecrawl from "npm:@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: Deno.env.get("FIRECRAWL_API_KEY"), }); Deno.serve(async (_req) => { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); console.log(response.output); await firecrawl.stopInteraction(scrapeId); return new Response(JSON.stringify({ output: response.output }), { headers: { "Content-Type": "application/json" }, }); }); ``` ## Deploy ```bash theme={null} supabase functions deploy firecrawl-search supabase functions deploy firecrawl-scrape supabase functions deploy firecrawl-interact ``` ## Test it ```bash theme={null} curl -X POST https://<project-ref>.supabase.co/functions/v1/firecrawl-search \ -H "Authorization: Bearer <ANON_KEY>" \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # SvelteKit Source: https://docs.firecrawl.dev/quickstarts/sveltekit Use Firecrawl with SvelteKit to scrape, search, and interact with web data in your Svelte application. ## Prerequisites * SvelteKit project * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Install the SDK ```bash theme={null} npm install @mendable/firecrawl-js ``` Add your API key to `.env`: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Create a form action in `src/routes/search/+page.server.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { FIRECRAWL_API_KEY } from "$env/static/private"; const firecrawl = new Firecrawl({ apiKey: FIRECRAWL_API_KEY }); export const actions = { default: async ({ request }) => { const data = await request.formData(); const query = data.get("query") as string; const results = await firecrawl.search(query, { limit: 5 }); return { results: (results.web || []).map((r) => ({ title: r.title, url: r.url })) }; }, }; ``` Wire it up in `src/routes/search/+page.svelte`: ```svelte theme={null} <script> export let form; </script> <form method="POST"> <input name="query" placeholder="Search the web..." /> <button>Search</button> </form> {#if form?.results} {#each form.results as result} <div><a href={result.url}>{result.title}</a></div> {/each} {/if} ``` ## Scrape a page Fetch data in a load function at `src/routes/scrape/+page.server.ts`: ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { FIRECRAWL_API_KEY } from "$env/static/private"; const firecrawl = new Firecrawl({ apiKey: FIRECRAWL_API_KEY }); export async function load({ url }) { const target = url.searchParams.get("url"); if (!target) return { markdown: null }; const result = await firecrawl.scrape(target); return { markdown: result.markdown }; } ``` Display it in `src/routes/scrape/+page.svelte`: ```svelte theme={null} <script> export let data; </script> {#if data.markdown} <pre>{data.markdown}</pre> {:else} <p>Pass ?url= to scrape a page</p> {/if} ``` ## Interact with a page Create a server endpoint at `src/routes/api/interact/+server.ts`: ```typescript theme={null} import { json } from "@sveltejs/kit"; import Firecrawl from "@mendable/firecrawl-js"; import { FIRECRAWL_API_KEY } from "$env/static/private"; const firecrawl = new Firecrawl({ apiKey: FIRECRAWL_API_KEY }); export async function POST() { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); await firecrawl.stopInteraction(scrapeId); return json({ output: response.output }); } ``` ## Next steps <CardGroup> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Vercel Functions Source: https://docs.firecrawl.dev/quickstarts/vercel-functions Use Firecrawl with Vercel Functions to search, scrape, and interact with web data in serverless deployments. ## Prerequisites * Vercel project (Next.js, SvelteKit, Nuxt, or standalone) * A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys) ## Setup ```bash theme={null} npm install @mendable/firecrawl-js ``` Add `FIRECRAWL_API_KEY` as an environment variable in your Vercel project settings, or in `.env.local` for local development: ```bash theme={null} FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ## Search the web Create `api/search.ts` (or `app/api/search/route.ts` for Next.js): ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function POST(request: Request) { const { query } = await request.json(); const results = await firecrawl.search(query, { limit: 5 }); return new Response(JSON.stringify(results), { headers: { "Content-Type": "application/json" }, }); } ``` ## Scrape a page Create `api/scrape.ts` (or `app/api/scrape/route.ts` for Next.js): ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function POST(request: Request) { const { url } = await request.json(); const result = await firecrawl.scrape(url); return new Response(JSON.stringify(result), { headers: { "Content-Type": "application/json" }, }); } ``` ## Interact with a page Create `api/interact.ts` (or `app/api/interact/route.ts` for Next.js): ```typescript theme={null} import Firecrawl from "@mendable/firecrawl-js"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); export async function POST(request: Request) { const result = await firecrawl.scrape("https://www.amazon.com", { formats: ["markdown"], }); const scrapeId = result.metadata?.scrapeId; await firecrawl.interact(scrapeId, { prompt: "Search for iPhone 16 Pro Max", }); const response = await firecrawl.interact(scrapeId, { prompt: "Click on the first result and tell me the price", }); await firecrawl.stopInteraction(scrapeId); return new Response(JSON.stringify({ output: response.output }), { headers: { "Content-Type": "application/json" }, }); } ``` ## Deploy ```bash theme={null} vercel deploy ``` ## Test it ```bash theme={null} curl -X POST https://your-project.vercel.app/api/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping"}' ``` <Note> Vercel Functions have a default timeout of 10 seconds on the Hobby plan and 60 seconds on Pro. For large crawl jobs, use the Firecrawl async API with webhooks instead. </Note> ## Next steps <CardGroup> <Card title="Search docs" icon="magnifying-glass" href="/features/search"> Search the web and get full page content </Card> <Card title="Scrape docs" icon="file-lines" href="/features/scrape"> All scrape options including formats, actions, and proxies </Card> <Card title="Interact docs" icon="hand-pointer" href="/features/interact"> Click, fill forms, and extract dynamic content </Card> <Card title="Node SDK reference" icon="node" href="/sdks/node"> Full SDK reference with crawl, map, batch scrape, and more </Card> </CardGroup> # Rate Limits Source: https://docs.firecrawl.dev/rate-limits Rate limits for different pricing plans and API requests ## Billing Model Firecrawl uses subscription-based monthly plans. There is no pure pay-as-you-go model, but the auto-recharge feature provides flexible scaling. Once you subscribe to a plan, you can automatically purchase additional credits when you dip below a threshold. Larger auto-recharge packs offer better rates. To test before committing to a larger plan, start with the Free or Hobby tier. Plan downgrades take effect at the next renewal. Unused-time credits are not issued. ## Concurrent Browser Limits Concurrent browsers represent how many web pages Firecrawl can process for you at the same time. Your plan determines how many of these jobs can run simultaneously - if you exceed this limit, additional jobs will wait in a queue until resources become available. Note that time spent waiting in the queue counts against the request's [`timeout`](/advanced-scraping-guide#timing-and-cache) parameter, so you can set a lower timeout to fail fast instead of waiting. You can also check current availability via the [Queue Status](/api-reference/endpoint/queue-status) endpoint. ### Current Plans <div> <table> <thead> <tr> <th>Plan</th> <th>Concurrent Browsers</th> <th>Max Queued Jobs</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>2</td> <td>50,000</td> </tr> <tr> <td>Hobby</td> <td>5</td> <td>50,000</td> </tr> <tr> <td>Standard</td> <td>50</td> <td>100,000</td> </tr> <tr> <td>Growth</td> <td>100</td> <td>200,000</td> </tr> <tr> <td>Scale / Enterprise</td> <td>150+</td> <td>300,000+</td> </tr> </tbody> </table> </div> Each team has a maximum number of jobs that can be waiting in the concurrency queue. If you exceed this limit, new jobs will be rejected with a `429` status code until existing jobs complete. For larger plans with custom concurrency limits, the max queued jobs is 2,000 times your concurrency limit, capped at 2,000,000. If you require higher concurrency limits, [contact us about enterprise plans](https://firecrawl.dev/enterprise). ### Extract Plans (Legacy) <div> <table> <thead> <tr> <th>Plan</th> <th>Concurrent Browsers</th> <th>Max Queued Jobs</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>2</td> <td>50,000</td> </tr> <tr> <td>Starter</td> <td>50</td> <td>100,000</td> </tr> <tr> <td>Explorer</td> <td>100</td> <td>200,000</td> </tr> <tr> <td>Pro</td> <td>200</td> <td>400,000</td> </tr> </tbody> </table> </div> ## API Rate Limits Rate limits are measured in requests per minute and are primarily in place to prevent abuse. When configured correctly, your real bottleneck will be concurrent browsers. Rate limits are applied per team, so all API keys on the same team share the same rate limit counters. ### Current Plans <div> <table> <thead> <tr> <th>Plan</th> <th>/scrape</th> <th>/map</th> <th>/crawl</th> <th>/search</th> <th>/agent</th> <th>/crawl/status</th> <th>/agent/status</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>10</td> <td>10</td> <td>1</td> <td>5</td> <td>10</td> <td>1500</td> <td>500</td> </tr> <tr> <td>Hobby</td> <td>100</td> <td>100</td> <td>15</td> <td>50</td> <td>100</td> <td>1500</td> <td>25000</td> </tr> <tr> <td>Standard</td> <td>500</td> <td>500</td> <td>50</td> <td>250</td> <td>500</td> <td>1500</td> <td>25000</td> </tr> <tr> <td>Growth</td> <td>5000</td> <td>5000</td> <td>250</td> <td>2500</td> <td>1000</td> <td>1500</td> <td>25000</td> </tr> <tr> <td>Scale</td> <td>7500</td> <td>7500</td> <td>750</td> <td>7500</td> <td>1000</td> <td>25000</td> <td>25000</td> </tr> </tbody> </table> </div> These rate limits are enforced to ensure fair usage and availability of the API for all users. If you require higher limits, please contact us at [help@firecrawl.com](mailto:help@firecrawl.com) to discuss custom plans. ### Extract Endpoints The extract endpoints share limits with the corresponding /agent rate limits. ### Batch Scrape Endpoints The batch scrape endpoints share limits with the corresponding /crawl rate limits. ### Browser Sandbox The browser sandbox endpoints have per-plan rate limits that scale with your subscription: <div> <table> <thead> <tr> <th>Plan</th> <th>/browser</th> <th>/browser/\{id}/execute</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>2</td> <td>10</td> </tr> <tr> <td>Hobby</td> <td>20</td> <td>100</td> </tr> <tr> <td>Standard</td> <td>100</td> <td>500</td> </tr> <tr> <td>Growth</td> <td>1,000</td> <td>5,000</td> </tr> <tr> <td>Scale</td> <td>1,500</td> <td>7,500</td> </tr> </tbody> </table> </div> In addition, each team's plan determines how many browser sessions can be active simultaneously (see Concurrent Browser Limits above). If you exceed this limit, new session requests will return a `429` status code until existing sessions are destroyed. ### FIRE-1 Agent Requests involving the FIRE-1 agent requests have separate rate limits that are counted independently for each endpoint: <div> <table> <thead> <tr> <th>Endpoint</th> <th>Rate Limit (requests/min)</th> </tr> </thead> <tbody> <tr> <td>/scrape</td> <td>10</td> </tr> <tr> <td>/extract</td> <td>10</td> </tr> </tbody> </table> </div> ### Extract Plans (Legacy) <div> <table> <thead> <tr> <th>Plan</th> <th>/extract (requests/min)</th> <th>/extract/status (requests/min)</th> </tr> </thead> <tbody> <tr> <td>Starter</td> <td>100</td> <td>25000</td> </tr> <tr> <td>Explorer</td> <td>500</td> <td>25000</td> </tr> <tr> <td>Pro</td> <td>1000</td> <td>25000</td> </tr> </tbody> </table> </div> # Skill + CLI Source: https://docs.firecrawl.dev/sdks/cli Firecrawl Skill is an easy way for AI agents such as Claude Code, Antigravity and OpenCode to use Firecrawl through the CLI. Search, scrape, and interact with the web directly from the terminal. The Firecrawl CLI works standalone or as a skill that AI coding agents like Claude Code, Antigravity, and OpenCode can discover and use automatically. ## Installation If you are using an AI agent like Claude Code, you can install the Firecrawl skill below and the agent will set it up for you. ```bash theme={null} npx -y firecrawl-cli@latest init --all --browser ``` * `--all` installs the Firecrawl skill to every detected AI coding agent * `--browser` opens the browser for Firecrawl authentication automatically <Note> After installing the skill, restart your agent for it to discover the new skill. </Note> You can also manually install the Firecrawl CLI globally using npm: ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl-cli ``` ## Authentication Before using the CLI, you need to authenticate with your Firecrawl API key. ### Login ```bash CLI theme={null} # Interactive login (opens browser or prompts for API key) firecrawl login # Login with browser authentication (recommended for agents) firecrawl login --browser # Login with API key directly firecrawl login --api-key fc-YOUR-API-KEY # Or set via environment variable export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ### View Configuration ```bash CLI theme={null} # View current configuration and authentication status firecrawl view-config ``` ### Logout ```bash CLI theme={null} # Clear stored credentials firecrawl logout ``` ### Self-Hosted / Local Development For self-hosted Firecrawl instances or local development, use the `--api-url` option: ```bash CLI theme={null} # Use a local Firecrawl instance (no API key required) firecrawl --api-url http://localhost:3002 scrape https://example.com # Or set via environment variable export FIRECRAWL_API_URL=http://localhost:3002 firecrawl scrape https://example.com # Configure and persist the custom API URL firecrawl config --api-url http://localhost:3002 ``` When using a custom API URL (anything other than `https://api.firecrawl.dev`), API key authentication is automatically skipped, allowing you to use local instances without an API key. ### Check Status Verify installation, authentication, and view rate limits: ```bash CLI theme={null} firecrawl --status ``` Output when ready: ``` 🔥 firecrawl cli v1.1.1 ● Authenticated via FIRECRAWL_API_KEY Concurrency: 0/100 jobs (parallel scrape limit) Credits: 500,000 remaining ``` * Concurrency: Maximum parallel jobs. Run parallel operations close to this limit but not above. * Credits: Remaining API credits. Each scrape/crawl consumes credits. ## Commands ### Scrape Scrape a single URL and extract its content in various formats. <Tip> Use `--only-main-content` to get clean output without navigation, footers, and ads. This is recommended for most use cases where you want just the article or main page content. </Tip> ```bash CLI theme={null} # Scrape a URL (default: markdown output) firecrawl https://example.com # Or use the explicit scrape command firecrawl scrape https://example.com # Recommended: use --only-main-content for clean output without nav/footer firecrawl https://example.com --only-main-content ``` #### Output Formats ```bash CLI theme={null} # Get HTML output firecrawl https://example.com --html # Multiple formats (returns JSON) firecrawl https://example.com --format markdown,links # Get images from a page firecrawl https://example.com --format images # Get a summary of the page content firecrawl https://example.com --format summary # Track changes on a page firecrawl https://example.com --format changeTracking # Available formats: markdown, html, rawHtml, links, screenshot, json, images, summary, changeTracking, attributes, branding ``` #### Scrape Options ```bash CLI theme={null} # Extract only main content (removes navs, footers) firecrawl https://example.com --only-main-content # Wait for JavaScript rendering firecrawl https://example.com --wait-for 3000 # Take a screenshot firecrawl https://example.com --screenshot # Include/exclude specific HTML tags firecrawl https://example.com --include-tags article,main firecrawl https://example.com --exclude-tags nav,footer # Save output to file firecrawl https://example.com -o output.md # Pretty print JSON output firecrawl https://example.com --format markdown,links --pretty # Force JSON output even with single format firecrawl https://example.com --json # Show request timing information firecrawl https://example.com --timing ``` Available Options: | Option | Short | Description | | ----------------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `--url <url>` | `-u` | URL to scrape (alternative to positional argument) | | `--format <formats>` | `-f` | Output formats (comma-separated): `markdown`, `html`, `rawHtml`, `links`, `screenshot`, `json`, `images`, `summary`, `changeTracking`, `attributes`, `branding` | | `--html` | `-H` | Shortcut for `--format html` | | `--only-main-content` | | Extract only main content | | `--wait-for <ms>` | | Wait time in milliseconds for JS rendering | | `--screenshot` | | Take a screenshot | | `--include-tags <tags>` | | HTML tags to include (comma-separated) | | `--exclude-tags <tags>` | | HTML tags to exclude (comma-separated) | | `--output <path>` | `-o` | Save output to file | | `--json` | | Force JSON output even with single format | | `--pretty` | | Pretty print JSON output | | `--timing` | | Show request timing and other useful information | * ### Search Search the web and optionally scrape the results. ```bash CLI theme={null} # Search the web firecrawl search "web scraping tutorials" # Limit results firecrawl search "AI news" --limit 10 # Pretty print results firecrawl search "machine learning" --pretty ``` #### Search Options ```bash CLI theme={null} # Search specific sources firecrawl search "AI" --sources web,news,images # Search with category filters firecrawl search "react hooks" --categories github firecrawl search "machine learning" --categories research,pdf # Time-based filtering firecrawl search "tech news" --tbs qdr:h # Last hour firecrawl search "tech news" --tbs qdr:d # Last day firecrawl search "tech news" --tbs qdr:w # Last week firecrawl search "tech news" --tbs qdr:m # Last month firecrawl search "tech news" --tbs qdr:y # Last year # Location-based search firecrawl search "restaurants" --location "Berlin,Germany" --country DE # Search and scrape results firecrawl search "documentation" --scrape --scrape-formats markdown # Save to file firecrawl search "firecrawl" --pretty -o results.json ``` Available Options: | Option | Description | | ---------------------------- | ------------------------------------------------------------------------------------------- | | `--limit <number>` | Maximum results (default: 5, max: 100) | | `--sources <sources>` | Sources to search: `web`, `images`, `news` (comma-separated) | | `--categories <categories>` | Filter by category: `github`, `research`, `pdf` (comma-separated) | | `--tbs <value>` | Time filter: `qdr:h` (hour), `qdr:d` (day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year) | | `--location <location>` | Geo-targeting (e.g., "Berlin,Germany") | | `--country <code>` | ISO country code (default: US) | | `--timeout <ms>` | Timeout in milliseconds (default: 60000) | | `--ignore-invalid-urls` | Exclude URLs invalid for other Firecrawl endpoints | | `--scrape` | Scrape search results | | `--scrape-formats <formats>` | Formats for scraped content (default: markdown) | | `--only-main-content` | Include only main content when scraping (default: true) | | `--json` | Output as JSON | | `--output <path>` | Save output to file | | `--pretty` | Pretty print JSON output | * ### Map Discover all URLs on a website quickly. ```bash CLI theme={null} # Discover all URLs on a website firecrawl map https://example.com # Output as JSON firecrawl map https://example.com --json # Limit number of URLs firecrawl map https://example.com --limit 500 ``` #### Map Options ```bash CLI theme={null} # Filter URLs by search query firecrawl map https://example.com --search "blog" # Include subdomains firecrawl map https://example.com --include-subdomains # Control sitemap usage firecrawl map https://example.com --sitemap include # Use sitemap firecrawl map https://example.com --sitemap skip # Skip sitemap firecrawl map https://example.com --sitemap only # Only use sitemap # Ignore query parameters (dedupe URLs) firecrawl map https://example.com --ignore-query-parameters # Wait for map to complete with timeout firecrawl map https://example.com --wait --timeout 60 # Save to file firecrawl map https://example.com -o urls.txt firecrawl map https://example.com --json --pretty -o urls.json ``` Available Options: | Option | Description | | --------------------------- | ----------------------------------------------- | | `--url <url>` | URL to map (alternative to positional argument) | | `--limit <number>` | Maximum URLs to discover | | `--search <query>` | Filter URLs by search query | | `--sitemap <mode>` | Sitemap handling: `include`, `skip`, `only` | | `--include-subdomains` | Include subdomains | | `--ignore-query-parameters` | Treat URLs with different params as same | | `--wait` | Wait for map to complete | | `--timeout <seconds>` | Timeout in seconds | | `--json` | Output as JSON | | `--output <path>` | Save output to file | | `--pretty` | Pretty print JSON output | * ### Browser Have your agents interact with the web using a secure browser sandbox. Launch cloud browser sessions and execute Python, JavaScript, or bash code remotely. Each session runs a full Chromium instance — no local browser install required. Code runs server-side with a pre-configured [Playwright](https://playwright.dev/) `page` object ready to use. ```bash CLI theme={null} # Launch a cloud browser session firecrawl browser launch-session # Execute agent-browser commands (default - "agent-browser" is auto-prefixed) firecrawl browser execute "open https://example.com" firecrawl browser execute "snapshot" firecrawl browser execute "click @e5" firecrawl browser execute "scrape" # Execute Playwright Python code firecrawl browser execute --python 'await page.goto("https://example.com") print(await page.title())' # Execute Playwright JavaScript code firecrawl browser execute --node 'await page.goto("https://example.com"); console.log(await page.title());' # List all sessions (or: list active / list destroyed) firecrawl browser list # Close the active session firecrawl browser close ``` #### Browser Options ```bash CLI theme={null} # Launch with custom TTL (10 minutes) and live view firecrawl browser launch-session --ttl 600 --stream # Launch with inactivity timeout firecrawl browser launch-session --ttl 120 --ttl-inactivity 60 # agent-browser commands (default - "agent-browser" is auto-prefixed) firecrawl browser execute "open https://news.ycombinator.com" firecrawl browser execute "snapshot" firecrawl browser execute "click @e3" firecrawl browser execute "scrape" # Playwright Python - navigate, interact, extract firecrawl browser execute --python ' await page.goto("https://news.ycombinator.com") items = await page.query_selector_all(".titleline > a") for item in items[:5]: print(await item.text_content()) ' # Playwright JavaScript - same page object firecrawl browser execute --node ' await page.goto("https://example.com"); const title = await page.title(); console.log(title); ' # Explicit bash mode - runs in the sandbox firecrawl browser execute --bash "agent-browser snapshot" # Target a specific session firecrawl browser execute --session <id> --python 'print(await page.title())' # Save output to file firecrawl browser execute "scrape" -o result.txt # Close a specific session firecrawl browser close --session <id> # List sessions (all / active / destroyed) firecrawl browser list firecrawl browser list active --json ``` Subcommands: | Subcommand | Description | | ---------------- | ----------------------------------------------------------------------------------- | | `launch-session` | Launch a new cloud browser session (returns session ID, CDP URL, and live view URL) | | `execute <code>` | Execute Playwright Python/JS code or bash commands in a session | | `list [status]` | List browser sessions (filter by `active` or `destroyed`) | | `close` | Close a browser session | Execute Options: | Option | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `--bash` | Execute bash commands remotely in the sandbox (default). [agent-browser](https://github.com/vercel-labs/agent-browser) (40+ commands) is pre-installed and auto-prefixed. `CDP_URL` is auto-injected so agent-browser connects to your session automatically. Best approach for AI agents. | | `--python` | Execute as Playwright Python code. A Playwright `page` object is available — use `await page.goto()`, `await page.title()`, etc. | | `--node` | Execute as Playwright JavaScript code. Same `page` object available. | | `--session <id>` | Target a specific session (default: active session) | Launch Options: | Option | Description | | ---------------------------- | ------------------------------------------------------------------- | | `--ttl <seconds>` | Total session TTL (default: 600, range: 30–3600) | | `--ttl-inactivity <seconds>` | Auto-close after inactivity (range: 10–3600) | | `--profile <name>` | Name for a profile (saves and reuses browser state across sessions) | | `--no-save-changes` | Load existing profile data without saving changes back | | `--stream` | Enable live view streaming | Common Options: | Option | Description | | ----------------- | --------------------- | | `--output <path>` | Save output to file | | `--json` | Output as JSON format | * ### Crawl Crawl an entire website starting from a URL. ```bash CLI theme={null} # Start a crawl (returns job ID immediately) firecrawl crawl https://example.com # Wait for crawl to complete firecrawl crawl https://example.com --wait # Wait with progress indicator firecrawl crawl https://example.com --wait --progress ``` #### Check Crawl Status ```bash CLI theme={null} # Check crawl status using job ID firecrawl crawl <job-id> # Example with a real job ID firecrawl crawl 550e8400-e29b-41d4-a716-446655440000 ``` #### Crawl Options ```bash CLI theme={null} # Limit crawl depth and pages firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait # Include only specific paths firecrawl crawl https://example.com --include-paths /blog,/docs --wait # Exclude specific paths firecrawl crawl https://example.com --exclude-paths /admin,/login --wait # Include subdomains firecrawl crawl https://example.com --allow-subdomains --wait # Crawl entire domain firecrawl crawl https://example.com --crawl-entire-domain --wait # Rate limiting firecrawl crawl https://example.com --delay 1000 --max-concurrency 2 --wait # Custom polling interval and timeout firecrawl crawl https://example.com --wait --poll-interval 10 --timeout 300 # Save results to file firecrawl crawl https://example.com --wait --pretty -o results.json ``` Available Options: | Option | Description | | --------------------------- | ------------------------------------------------- | | `--url <url>` | URL to crawl (alternative to positional argument) | | `--wait` | Wait for crawl to complete | | `--progress` | Show progress indicator while waiting | | `--poll-interval <seconds>` | Polling interval (default: 5) | | `--timeout <seconds>` | Timeout when waiting | | `--status` | Check status of existing crawl job | | `--limit <number>` | Maximum pages to crawl | | `--max-depth <number>` | Maximum crawl depth | | `--include-paths <paths>` | Paths to include (comma-separated) | | `--exclude-paths <paths>` | Paths to exclude (comma-separated) | | `--sitemap <mode>` | Sitemap handling: `include`, `skip`, `only` | | `--allow-subdomains` | Include subdomains | | `--allow-external-links` | Follow external links | | `--crawl-entire-domain` | Crawl entire domain | | `--ignore-query-parameters` | Treat URLs with different params as same | | `--delay <ms>` | Delay between requests | | `--max-concurrency <n>` | Maximum concurrent requests | | `--output <path>` | Save output to file | | `--pretty` | Pretty print JSON output | * ### Agent Search and gather data from the web using natural language prompts. ```bash CLI theme={null} # Basic usage - URLs are optional firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait # Focus on specific URLs firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait # Use a schema for structured output firecrawl agent "Get company information" --urls https://example.com --schema '{"name": "string", "founded": "number"}' --wait # Use schema from a file firecrawl agent "Get product details" --urls https://example.com --schema-file schema.json --wait ``` #### Agent Options ```bash CLI theme={null} # Use Spark 1 Pro for higher accuracy firecrawl agent "Competitive analysis across multiple domains" --model spark-1-pro --wait # Set max credits to limit costs firecrawl agent "Gather contact information from company websites" --max-credits 100 --wait # Check status of an existing job firecrawl agent <job-id> --status # Custom polling interval and timeout firecrawl agent "Summarize recent blog posts" --wait --poll-interval 10 --timeout 300 # Save output to file firecrawl agent "Find pricing information" --urls https://example.com --wait -o pricing.json --pretty ``` Available Options: | Option | Description | | --------------------------- | -------------------------------------------------------------------------------------- | | `--urls <urls>` | Optional list of URLs to focus the agent on (comma-separated) | | `--model <model>` | Model to use: `spark-1-mini` (default, 60% cheaper) or `spark-1-pro` (higher accuracy) | | `--schema <json>` | JSON schema for structured output (inline JSON string) | | `--schema-file <path>` | Path to JSON schema file for structured output | | `--max-credits <number>` | Maximum credits to spend (job fails if limit reached) | | `--status` | Check status of existing agent job | | `--wait` | Wait for agent to complete before returning results | | `--poll-interval <seconds>` | Polling interval when waiting (default: 5) | | `--timeout <seconds>` | Timeout when waiting (default: no timeout) | | `--output <path>` | Save output to file | | `--json` | Output as JSON format | * ### Credit Usage Check your team's credit balance and usage. ```bash CLI theme={null} # View credit usage firecrawl credit-usage # Output as JSON firecrawl credit-usage --json --pretty ``` *** ### Version Display the CLI version. ```bash CLI theme={null} firecrawl version # or firecrawl --version ``` ## Global Options These options are available for all commands: | Option | Short | Description | | ----------------- | ----- | ------------------------------------------------------ | | `--status` | | Show version, auth, concurrency, and credits | | `--api-key <key>` | `-k` | Override stored API key for this command | | `--api-url <url>` | | Use custom API URL (for self-hosted/local development) | | `--help` | `-h` | Show help for a command | | `--version` | `-V` | Show CLI version | ## Output Handling The CLI outputs to stdout by default, making it easy to pipe or redirect: ```bash CLI theme={null} # Pipe markdown to another command firecrawl https://example.com | head -50 # Redirect to a file firecrawl https://example.com > output.md # Save JSON with pretty formatting firecrawl https://example.com --format markdown,links --pretty -o data.json ``` ### Format Behavior * Single format: Outputs raw content (markdown text, HTML, etc.) * Multiple formats: Outputs JSON with all requested data ```bash CLI theme={null} # Raw markdown output firecrawl https://example.com --format markdown # JSON output with multiple formats firecrawl https://example.com --format markdown,links ``` ## Examples ### Quick Scrape ```bash CLI theme={null} # Get markdown content from a URL (use --only-main-content for clean output) firecrawl https://docs.firecrawl.dev --only-main-content # Get HTML content firecrawl https://example.com --html -o page.html ``` ### Full Site Crawl ```bash CLI theme={null} # Crawl a docs site with limits firecrawl crawl https://docs.example.com --limit 50 --max-depth 2 --wait --progress -o docs.json ``` ### Site Discovery ```bash CLI theme={null} # Find all blog posts firecrawl map https://example.com --search "blog" -o blog-urls.txt ``` ### Research Workflow ```bash CLI theme={null} # Search and scrape results for research firecrawl search "machine learning best practices 2024" --scrape --scrape-formats markdown --pretty ``` ### Agent ```bash CLI theme={null} # URLs are optional firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait # Focus on specific URLs firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait ``` ### Browser Automation ```bash CLI theme={null} # Launch a session, scrape a page, and close firecrawl browser launch-session firecrawl browser execute "open https://news.ycombinator.com" firecrawl browser execute "snapshot" firecrawl browser execute "scrape" firecrawl browser close # Use agent-browser via bash mode (default — recommended for AI agents) firecrawl browser launch-session firecrawl browser execute "open https://example.com" firecrawl browser execute "snapshot" # snapshot returns @ref IDs — use them to interact firecrawl browser execute "click @e5" firecrawl browser execute "fill @e3 'search query'" firecrawl browser execute "scrape" # Run --help to see all 40+ commands firecrawl browser execute --bash "agent-browser --help" firecrawl browser close ``` ### Combine with Other Tools ```bash CLI theme={null} # Extract URLs from search results jq -r '.data.web[].url' search-results.json # Get titles from search results jq -r '.data.web[] | "\(.title): \(.url)"' search-results.json # Extract links and process with jq firecrawl https://example.com --format links | jq '.links[].url' # Count URLs from map firecrawl map https://example.com | wc -l ``` ## Telemetry The CLI collects anonymous usage data during authentication to help improve the product: * CLI version, OS, and Node.js version * Development tool detection (e.g., Cursor, VS Code, Claude Code) No command data, URLs, or file contents are collected via the CLI. To disable telemetry, set the environment variable: ```bash CLI theme={null} export FIRECRAWL_NO_TELEMETRY=1 ``` ## Open Source The Firecrawl CLI and Skill are open source and available on GitHub: [firecrawl/cli](https://github.com/firecrawl/cli) > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # AI Platforms Source: https://docs.firecrawl.dev/use-cases/ai-platforms Power AI assistants and let customers build AI apps AI platform builders and teams use Firecrawl to power knowledge bases, chatbots, and enable customers to build AI applications with web data. ## Start with a Template <Card title="Firestarter" icon="github" href="https://github.com/firecrawl/firestarter"> Instant AI chatbots for websites with web knowledge integration </Card> <Note> Get started with templates and examples. Build AI-powered applications with web data. </Note> ## How It Works Transform websites into AI-ready data. Power chatbots with real-time web knowledge, build RAG systems with up-to-date documentation, and enable your users to connect their AI applications to web sources. ## Why AI Platforms Choose Firecrawl ### Reduce Hallucinations with Real-Time Data Your AI assistants need current information, not outdated training data. Whether it's domain-specific knowledge, technical documentation, or industry-specific content, Firecrawl ensures your knowledge bases stay synchronized with the latest updates-reducing hallucinations and improving response accuracy. ## Customer Stories <CardGroup> <Card href="https://www.firecrawl.dev/blog/how-replit-uses-firecrawl-to-power-ai-agents"> Replit Learn how Replit leverages Firecrawl to keep Replit Agent up-to-date with the latest API documentation and web content. </Card> <Card href="https://www.firecrawl.dev/blog/how-stack-ai-uses-firecrawl-to-power-ai-agents"> Stack AI Discover how Stack AI uses Firecrawl to seamlessly feed agentic AI workflows with high-quality web data. </Card> </CardGroup> ## FAQs <AccordionGroup> <Accordion title="How does Firecrawl integrate with AI development platforms?"> Firecrawl provides simple APIs and SDKs that integrate directly into AI platforms. Whether you're building with LangChain, using no-code tools like n8n, or custom frameworks, Firecrawl delivers clean, structured web data ready for AI consumption. </Accordion> <Accordion title="Can Firecrawl handle the scale required for AI training?"> Yes. Firecrawl is designed for enterprise-scale data extraction, processing millions of pages for AI training datasets. Our infrastructure scales automatically to meet your needs. </Accordion> <Accordion title="What data formats does Firecrawl provide for AI platforms?"> Firecrawl delivers data in AI-friendly formats including clean markdown, structured JSON, raw HTML, extracted images, screenshots, and news content. This flexibility ensures compatibility with any AI platform's data ingestion requirements. </Accordion> <Accordion title="Can I use Firecrawl for real-time AI applications?"> Yes! Our API supports real-time data extraction, enabling AI applications to access fresh web data on-demand. This is perfect for AI agents that need current information to make decisions. </Accordion> </AccordionGroup> ## Related Use Cases * [Deep Research](/use-cases/deep-research) - Advanced research capabilities * [Content Generation](/use-cases/content-generation) - AI-powered content creation * [Developers & MCP](/use-cases/developers-mcp) - Developer integrations # Competitive Intelligence Source: https://docs.firecrawl.dev/use-cases/competitive-intelligence Monitor competitor websites and track changes in real-time Business intelligence teams use Firecrawl to monitor competitors and get alerts on strategic changes. ## Start with a Template <CardGroup> <Card title="Firecrawl Observer" icon="github" href="https://github.com/firecrawl/firecrawl-observer"> Real-time website monitoring with intelligent alerts </Card> <Card title="Fireplexity" icon="github" href="https://github.com/firecrawl/fireplexity"> Research and analyze competitor strategies with AI </Card> </CardGroup> <Note> Choose from monitoring and research templates. Track competitors and analyze their strategies. </Note> ## How It Works Set up a pipeline that scrapes competitor sites on a schedule, extracts the data you care about, and alerts your team when something changes. <Steps> <Step title="Scrape competitor pages"> Crawl or scrape competitor websites with Firecrawl to get clean, structured content from product pages, pricing tables, and blog posts. </Step> <Step title="Extract key data points"> Pull out specific fields like pricing tiers, feature lists, job openings, or partnership announcements using structured extraction. </Step> <Step title="Compare over time"> Store each extraction and diff it against previous snapshots to pinpoint what changed and when. </Step> <Step title="Alert on meaningful changes"> Trigger notifications when you detect important updates such as a new product launch, a pricing shift, or a change in positioning. </Step> </Steps> ## What You Can Track * Products: New launches, features, specs, pricing, documentation * Marketing: Messaging changes, campaigns, case studies, testimonials * Business: Job postings, partnerships, funding, press releases * Strategy: Positioning, target markets, pricing approaches, go-to-market * Technical: API changes, integrations, technology stack updates ## FAQs <AccordionGroup> <Accordion title="How quickly can I detect changes?"> Firecrawl extracts current page content whenever called. Build your own monitoring system to check competitors at intervals that match your needs - from hourly for critical updates to daily for routine tracking. </Accordion> <Accordion title="Can I monitor competitors in different regions?"> Yes, Firecrawl can access region-specific content. You can monitor different versions of competitor sites across multiple countries and languages. </Accordion> <Accordion title="How do I avoid false positive alerts?"> When building your monitoring system, implement filters to ignore minor changes like timestamps or dynamic content. Compare extracted data over time and use your own logic to determine what constitutes a meaningful change. </Accordion> <Accordion title="Can I track competitor social media and PR activity?"> Yes. Extract data from competitor press releases, blog posts, and public social media pages. Build systems to analyze announcement patterns, messaging changes, and campaign launches over time. </Accordion> <Accordion title="How do I organize intelligence across multiple competitors?"> Extract data from multiple competitor sites using Firecrawl's APIs. Build your own system to organize and compare this data - many users create databases with competitor profiles and custom dashboards for analysis. </Accordion> </AccordionGroup> ## Related Use Cases * [Product & E-commerce](/use-cases/product-ecommerce) - Track competitor products * [Investment & Finance](/use-cases/investment-finance) - Market intelligence * [SEO Platforms](/use-cases/seo-platforms) - search competitor tracking # Content Generation Source: https://docs.firecrawl.dev/use-cases/content-generation Generate AI content based on website data, images, and news Turn live website data into presentations, emails, marketing copy, and newsletters. Firecrawl scrapes pages, extracts clean text and images, and feeds structured content to your AI pipeline so every piece you publish is grounded in real, up-to-date sources. ## Start with a Template <Card title="Open Lovable" icon="github" href="https://github.com/firecrawl/open-lovable"> Clone and recreate any website as a modern React app </Card> <Note> Get started with the Open Lovable template. Transform websites into content and applications. </Note> ## How It Works Point Firecrawl at any URL or set of URLs. It returns clean, structured content you can pipe directly into your generation workflow. ### Multiple Output Formats Extract content as structured HTML, clean Markdown, JSON, or full-page screenshots, whichever format best fits your content pipeline. ### Image and Visual Extraction Capture images and take screenshots from source pages to enrich presentations, emails, and social posts with real visuals. ### News and Trending Content Surface relevant news stories as part of your request to keep generated content current and timely. ## What You Can Create * Sales Decks: Custom presentations with prospect data * Email Campaigns: Personalized outreach at scale * Marketing Content: Data-driven blog posts and reports * Social Media: Trending topic and news-driven content generation * Documentation: Auto-updated technical content * Newsletters: Curated updates from industry and competitor news * Visual Content: Posts and reports enriched with extracted images and screenshots ## FAQs <AccordionGroup> <Accordion title="How does Firecrawl ensure data accuracy for content creation?"> Firecrawl extracts data directly from source websites, preserving the original content structure and context. All extracted data includes source URLs and timestamps for verification. </Accordion> <Accordion title="What data can Firecrawl provide for content generation?"> Firecrawl provides clean markdown, structured JSON, HTML, images, and screenshots from websites. This extracted data serves as the factual foundation for your content generation workflows. </Accordion> <Accordion title="Can Firecrawl handle images and news sources?"> Yes. Firecrawl can extract images, capture screenshots, and pull content from news sites. This enables you to create visually rich content and stay current with industry developments. </Accordion> <Accordion title="What types of websites can Firecrawl extract from?"> Firecrawl excels at extracting from company websites, news sites, blogs, and documentation. Sites with structured HTML and clear content hierarchies yield the cleanest extraction results. </Accordion> <Accordion title="How can I use Firecrawl for bulk data extraction?"> Use Firecrawl's batch scraping and crawl APIs to extract data from multiple websites efficiently. Process hundreds of URLs in parallel to build comprehensive datasets for your content workflows. </Accordion> </AccordionGroup> ## Related Use Cases * [AI Platforms](/use-cases/ai-platforms) - Build AI-powered content tools * [Lead Enrichment](/use-cases/lead-enrichment) - Personalize with prospect data * [SEO Platforms](/use-cases/seo-platforms) - Optimize generated content # Data Migration Source: https://docs.firecrawl.dev/use-cases/data-migration Transfer web data efficiently between platforms and systems Extract content, structure, and metadata from any website and transform it for import into a new platform. Firecrawl handles the scraping so you can focus on mapping data to your target system. ## Start with a Template <Card title="Firecrawl Migrator" icon="github" href="https://github.com/firecrawl/firecrawl-migrator"> Efficiently migrate data between platforms and systems </Card> <Note> Get started with the Firecrawl Migrator template. Extract and transform data for platform migrations. </Note> ## How It Works Point Firecrawl at your source website to crawl every page and return clean, structured data. From there, you transform the output to match your target platform's schema and import it using that platform's API or bulk-import tools. ## What You Can Migrate * Content: Pages, posts, articles, media files, metadata * Structure: Hierarchies, categories, tags, taxonomies * Users: Profiles and user-related data where publicly accessible * Settings: Configurations, custom fields, workflows * E-commerce: Products, catalogs, inventory, orders ## Common Migration Use Cases Users build migration tools with Firecrawl to extract data from various platforms: <CardGroup> <Card title="CMS Content Extraction"> Extract content from WordPress, Drupal, and Joomla sites or custom CMS platforms. Preserve content structure and metadata, then export for import into new systems like Contentful, Strapi, or Sanity. </Card> <Card title="E-commerce Data Extraction"> Extract product catalogs from Magento and WooCommerce stores including inventory, pricing, descriptions, and specifications. Format data for import into Shopify, BigCommerce, or other platforms. </Card> </CardGroup> ## FAQs <AccordionGroup> <Accordion title="How do you handle large-scale migrations?"> Our infrastructure scales automatically to handle large migrations. We support incremental processing with batching and parallel extraction, allowing you to migrate millions of pages by breaking them into manageable chunks with progress tracking. </Accordion> <Accordion title="Can I preserve SEO value during migration?"> Yes! Extract all SEO metadata including URLs, titles, descriptions, and implement proper redirects. We help maintain your search rankings through the migration. </Accordion> <Accordion title="What about media files and attachments?"> Firecrawl can extract and catalog all media files. You can download them for re-upload to your new platform or reference them directly if keeping the same CDN. </Accordion> <Accordion title="How do I validate the migration?"> We provide detailed extraction reports and support comparison tools. You can verify content completeness, check broken links, and validate data integrity. </Accordion> <Accordion title="Can I migrate user-generated content and comments?"> Yes, you can extract publicly visible user-generated content including comments, reviews, and forum posts. Private user data requires appropriate authentication and permissions. </Accordion> </AccordionGroup> ## Related Use Cases * [Product & E-commerce](/use-cases/product-ecommerce) - Catalog migrations * [Content Generation](/use-cases/content-generation) - Content transformation * [AI Platforms](/use-cases/ai-platforms) - Knowledge base migration # Deep Research Source: https://docs.firecrawl.dev/use-cases/deep-research Build agentic research tools with deep web search capabilities Build automated research agents that search the web, scrape full-page content, and synthesize findings with an LLM. Firecrawl handles source discovery and content extraction so you can focus on analysis, not parsing HTML. ## Start with a Template <CardGroup> <Card title="Fireplexity" icon="github" href="https://github.com/firecrawl/fireplexity"> Blazing-fast AI search with real-time citations </Card> <Card title="Firesearch" icon="github" href="https://github.com/firecrawl/firesearch"> Deep research agent with LangGraph and answer validation </Card> <Card title="Open Researcher" icon="github" href="https://github.com/firecrawl/open-researcher"> Visual AI research assistant for comprehensive analysis </Card> </CardGroup> <Note> Choose from multiple research templates. Clone, configure your API key, and start researching. </Note> ## How It Works Build powerful research tools that transform scattered web data into comprehensive insights. The core pattern is a search → scrape → analyze → repeat loop: use Firecrawl's search API to discover relevant sources, scrape each source for full content, then feed the results into an LLM to synthesize findings and identify follow-up queries. <Steps> <Step title="Search for sources"> Use the `/search` endpoint to find relevant pages for your research topic. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( "recent advances in quantum computing", limit=5, scrape_options={"formats": ["markdown", "links"]} ) ``` ```js Node.js theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search( 'recent advances in quantum computing', { limit: 5, scrapeOptions: { formats: ['markdown', 'links'] } } ); ``` </CodeGroup> </Step> <Step title="Scrape discovered pages"> Extract full content from each result to get detailed information with citations. <CodeGroup> ```python Python theme={null} for result in results: doc = firecrawl.scrape(result["url"], formats=["markdown"]) # Feed doc content into your LLM for analysis ``` ```js Node.js theme={null} for (const result of results) { const doc = await firecrawl.scrape(result.url, { formats: ['markdown'] }); // Feed doc content into your LLM for analysis } ``` </CodeGroup> </Step> <Step title="Analyze and iterate"> Use an LLM to synthesize findings, identify gaps, and generate follow-up queries. Repeat the loop until your research question is fully answered. </Step> </Steps> ## Why Researchers Choose Firecrawl ### Accelerate Research from Weeks to Hours Build automated research systems that discover, read, and synthesize information from across the web. Create tools that deliver comprehensive reports with full citations, eliminating manual searching through hundreds of sources. ### Ensure Research Completeness Reduce the risk of missing critical information. Build systems that follow citation chains, discover related sources, and surface insights that traditional search methods miss. ## Research Tool Capabilities * Iterative Exploration: Build tools that automatically discover related topics and sources * Multi-Source Synthesis: Combine information from hundreds of websites * Citation Preservation: Maintain full source attribution in your research outputs * Intelligent Summarization: Extract key findings and insights for analysis * Trend Detection: Identify patterns across multiple sources ## FAQs <AccordionGroup> <Accordion title="How can I build research tools with Firecrawl?"> Use Firecrawl's crawl and search APIs to build iterative research systems. Start with search results, extract content from relevant pages, follow citation links, and aggregate findings. Combine with LLMs to synthesize comprehensive research reports. </Accordion> <Accordion title="Can Firecrawl handle academic and scientific websites?"> Yes. Firecrawl can extract data from open-access research papers, academic websites, and publicly available scientific publications. It preserves formatting, citations, and technical content critical for research work. </Accordion> <Accordion title="How do I ensure research data accuracy?"> Firecrawl maintains source attribution and extracts content exactly as presented on websites. All data includes source URLs and timestamps, ensuring full traceability for research purposes. </Accordion> <Accordion title="Can I use Firecrawl for longitudinal studies?"> Yes. Set up scheduled crawls to track how information changes over time. This is perfect for monitoring trends, policy changes, or any research requiring temporal data analysis. </Accordion> <Accordion title="How does Firecrawl handle large-scale research projects?"> Our crawling infrastructure scales to handle thousands of sources simultaneously. Whether you're analyzing entire industries or tracking global trends, Firecrawl provides the data pipeline you need. </Accordion> </AccordionGroup> ## Related Use Cases * [AI Platforms](/use-cases/ai-platforms) - Build AI research assistants * [Content Generation](/use-cases/content-generation) - Research-based content * [Competitive Intelligence](/use-cases/competitive-intelligence) - Market research # Developers & MCP Source: https://docs.firecrawl.dev/use-cases/developers-mcp Build powerful integrations with Model Context Protocol support Give your AI coding assistant the ability to scrape, crawl, and search the web in real time. Firecrawl's MCP server connects to Claude Desktop, Cursor, and other Model Context Protocol clients so your assistant can pull live documentation, discover site structures, and extract structured data on demand. ## Start with a Template <CardGroup> <Card title="MCP Server Firecrawl" icon="github" href="https://github.com/firecrawl/firecrawl-mcp-server"> Official MCP server - Add web scraping to Claude Desktop and Cursor </Card> <Card title="Open Lovable" icon="github" href="https://github.com/firecrawl/open-lovable"> Build complete applications from any website instantly </Card> </CardGroup> <Note> Get started with MCP in minutes. Follow our [setup guide](https://github.com/firecrawl/firecrawl-mcp-server#installation) to integrate Firecrawl into Claude Desktop or Cursor. </Note> ## How It Works Integrate Firecrawl directly into your AI coding workflow through Model Context Protocol. Once configured, your AI assistant gains access to a set of web scraping tools it can call on your behalf: | Tool | What it does | | ---------------- | ---------------------------------------------------------- | | Scrape | Extract content or structured data from a single URL | | Batch Scrape | Extract content from multiple known URLs in parallel | | Map | Discover all indexed URLs on a website | | Crawl | Walk a site section and extract content from every page | | Search | Search the web and optionally extract content from results | Your assistant picks the right tool automatically. Ask it to "read the Next.js docs" and it will scrape. Ask it to "find all blog posts on example.com" and it will map then batch scrape. ## Why Developers Choose Firecrawl MCP ### Build Smarter AI Assistants Give your AI real-time access to documentation, APIs, and web resources. Reduce outdated information and hallucinations by providing your assistant with the latest data. ### Zero Infrastructure Required No servers to manage, no crawlers to maintain. Just configure once and your AI assistant can access websites instantly through the Model Context Protocol. ## Customer Stories <CardGroup> <Card href="https://www.firecrawl.dev/blog/how-botpress-enhances-knowledge-base-creation-with-firecrawl"> Botpress Discover how Botpress uses Firecrawl to streamline knowledge base population and improve developer experience. </Card> <Card href="https://www.firecrawl.dev/blog/how-answer-hq-powers-ai-customer-support-with-firecrawl"> Answer HQ Learn how Answer HQ uses Firecrawl to help businesses import website data and build intelligent support assistants. </Card> </CardGroup> ## FAQs <AccordionGroup> <Accordion title="Which AI assistants support MCP?"> Currently, Claude Desktop and Cursor have native MCP support. More AI assistants are adding support regularly. You can also use the MCP SDK to build custom integrations. </Accordion> <Accordion title="Can I use MCP in VS Code or other IDEs?"> VS Code and other IDEs can use MCP through community extensions or terminal integrations. Native support varies by IDE. Check our [GitHub repository](https://github.com/firecrawl/firecrawl-mcp-server) for IDE-specific setup guides. </Accordion> <Accordion title="How do I cache frequently accessed docs?"> The MCP server automatically caches responses for 15 minutes. You can configure cache duration in your MCP server settings or implement custom caching logic. </Accordion> <Accordion title="Is there a rate limit for MCP requests?"> MCP requests use your standard Firecrawl API rate limits. We recommend batching related requests and using caching for frequently accessed documentation. </Accordion> <Accordion title="How do I set up MCP with my Firecrawl API key?"> Follow our [setup guide](https://github.com/firecrawl/firecrawl-mcp-server#installation) to configure MCP. You'll need to add your Firecrawl API key to your MCP configuration file. The process takes just a few minutes. </Accordion> </AccordionGroup> ## Related Use Cases * [AI Platforms](/use-cases/ai-platforms) - Build AI-powered dev tools * [Deep Research](/use-cases/deep-research) - Complex technical research * [Content Generation](/use-cases/content-generation) - Generate documentation # Investment & Finance Source: https://docs.firecrawl.dev/use-cases/investment-finance Track companies and extract financial insights from web data Turn public web data into structured financial signals. Firecrawl scrapes company websites, news sites, job boards, and regulatory filings, then returns clean JSON you can feed directly into due diligence workflows, earnings prep, or ongoing portfolio surveillance. ## Start with a Template <Card title="Firecrawl Observer" icon="github" href="https://github.com/firecrawl/firecrawl-observer"> Monitor portfolio companies for material changes and trigger events </Card> ## What You Can Track * Company Metrics: growth indicators, team changes, product launches, funding rounds * Market Signals: industry trends, competitor moves, sentiment shifts, regulatory changes * Risk Indicators: leadership changes, legal issues, regulatory mentions, customer complaints * Financial Data: pricing updates, revenue signals, partnership announcements * Alternative Data: job postings, web traffic, social signals, news mentions ## Customer Stories <CardGroup> <Card href="https://www.firecrawl.dev/blog/how-athena-intelligence-empowers-analysts-with-firecrawl"> Athena Intelligence Discover how Athena Intelligence leverages Firecrawl to fuel its AI-native analytics platform for enterprise analysts. </Card> <Card href="https://www.firecrawl.dev/blog/how-cargo-empowers-gtm-teams-with-firecrawl"> Cargo See how Cargo uses Firecrawl to analyze market data and power revenue intelligence workflows. </Card> </CardGroup> ## FAQs <AccordionGroup> <Accordion title="Can I track private companies?"> Yes, you can monitor publicly available information about private companies from their websites, news mentions, job postings, and social media presence. </Accordion> <Accordion title="How real-time is the data?"> Firecrawl extracts data in real-time when called. Build your own monitoring system to fetch data at intervals that match your investment strategy - from minute-by-minute for critical events to daily for routine tracking. </Accordion> <Accordion title="What alternative data sources can I monitor?"> Public web sources such as company websites, news sites, job boards, review sites, forums, social media, government filings, and open-access industry data. </Accordion> <Accordion title="How can I track ESG and sustainability signals?"> Extract data from company ESG reports, sustainability pages, news mentions of environmental initiatives, and regulatory filings. Build tracking systems to identify changes in sustainability commitments or ESG-related developments. </Accordion> <Accordion title="Can Firecrawl help with earnings call preparation?"> Yes. Extract recent company updates, product launches, executive changes, and industry trends before earnings calls. Combine with competitor data to anticipate questions and identify key discussion points. </Accordion> </AccordionGroup> ## Related Use Cases * [Competitive Intelligence](/use-cases/competitive-intelligence) - Track market competitors * [Deep Research](/use-cases/deep-research) - Comprehensive market analysis * [Lead Enrichment](/use-cases/lead-enrichment) - B2B investment opportunities # Lead Enrichment Source: https://docs.firecrawl.dev/use-cases/lead-enrichment Extract and filter leads from websites to power your sales pipeline Scrape business directories for leads, enrich your CRM with live company data, and automate account research. Firecrawl handles the extraction so you can focus on closing deals. ## Start with a Template <Card title="Fire Enrich" icon="github" href="https://github.com/firecrawl/fire-enrich"> AI-powered lead enrichment and data extraction from websites </Card> ## How It Works 1. Point Firecrawl at a source. Pass a directory URL, company website, or list of domains to the API. 2. Extract structured data. Firecrawl scrapes each page and returns fields like company name, contact details, team members, and recent news. 3. Send the data to your CRM. Push enriched records into Salesforce, HubSpot, or any other tool through the API or a Zapier integration. ## Why Sales Teams Choose Firecrawl ### Transform Directories into Pipeline Every industry directory is a goldmine of potential customers. Firecrawl extracts thousands of qualified leads from business directories, trade associations, and conference attendee lists-complete with company details and contact information. ### Enrich CRM Data Automatically Stop paying for stale data from traditional providers. Firecrawl pulls real-time information directly from company websites, ensuring your sales team always has the latest company news, team changes, and growth signals. ## Customer Stories <CardGroup> <Card href="https://www.firecrawl.dev/blog/how-zapier-uses-firecrawl-to-power-chatbots"> Zapier Discover how Zapier uses Firecrawl to empower customers with custom knowledge in their chatbots. </Card> <Card href="https://www.firecrawl.dev/blog/how-cargo-empowers-gtm-teams-with-firecrawl"> Cargo See how Cargo uses Firecrawl to instantly analyze webpage content and power Go-To-Market workflows. </Card> </CardGroup> ## Lead Sources ### Business Directories * Industry-specific directories * Chamber of commerce listings * Trade association members * Conference attendee lists ### Company Websites * About pages and team sections * Press releases and news * Job postings for growth signals * Customer case studies ## FAQs <AccordionGroup> <Accordion title="How does Firecrawl enhance lead enrichment processes?"> Firecrawl automatically extracts company information, contact details, product offerings, and recent news from prospect websites. This enriches your CRM with accurate, up-to-date information for better sales outreach. </Accordion> <Accordion title="Can Firecrawl find contact information from websites?"> Yes! Firecrawl extracts publicly available contact information including emails and phone numbers from company websites, team pages, and contact sections. </Accordion> <Accordion title="How accurate is the lead data from Firecrawl?"> Since Firecrawl extracts data directly from live websites, you get the most current information available. This is more accurate than static databases that quickly become outdated. </Accordion> <Accordion title="Can I integrate Firecrawl with my CRM?"> Yes. Use our API or Zapier integration to automatically enrich leads in Salesforce, HubSpot, Pipedrive, and other CRMs. Keep your lead data fresh without manual research. </Accordion> <Accordion title="How does Firecrawl help with account-based marketing?"> Extract detailed company information, recent updates, and trigger events from target account websites. This intelligence helps personalize outreach and identify the perfect timing for engagement. </Accordion> </AccordionGroup> ## Related Use Cases * [AI Platforms](/use-cases/ai-platforms) - Build AI sales assistants * [Competitive Intelligence](/use-cases/competitive-intelligence) - Track competitors * [Investment & Finance](/use-cases/investment-finance) - Investment opportunities # Observability & Monitoring Source: https://docs.firecrawl.dev/use-cases/observability Monitor websites, track uptime, and detect changes in real-time DevOps and SRE teams use Firecrawl to monitor websites, track availability, and detect critical changes across their digital infrastructure. ## Start with a Template <Card title="Firecrawl Observer" icon="github" href="https://github.com/firecrawl/firecrawl-observer"> Real-time website monitoring and intelligent change detection </Card> <Note> Get started with the Firecrawl Observer template. Monitor websites and track changes in real-time. </Note> ## How It Works Use Firecrawl's extraction capabilities to build observability systems for your websites. Extract page content, analyze changes over time, validate deployments, and create monitoring workflows that ensure your sites function correctly. ## What You Can Monitor * Availability: Uptime, response times, error rates * Content: Text changes, image updates, layout shifts * Performance: Page load times, resource sizes, Core Web Vitals * Security: SSL certificates, security headers, misconfigurations * SEO Health: Meta tags, structured data, sitemap validity ## Monitoring Types ### Synthetic Monitoring * User journey validation * Transaction monitoring * Multi-step workflows * Cross-browser testing ### Content Monitoring * Text change detection * Visual regression testing * Dynamic content validation * Internationalization checks ## FAQs <AccordionGroup> <Accordion title="How does Firecrawl help with website monitoring?"> Firecrawl extracts website content and structure on demand. Build monitoring systems that call Firecrawl's API to check pages, compare extracted data against baselines, and trigger your own alerts when changes occur. </Accordion> <Accordion title="Can I monitor JavaScript-heavy applications?"> Yes! Firecrawl fully renders JavaScript, making it perfect for monitoring modern SPAs, React apps, and dynamic content. We capture the page as users see it, not just the raw HTML. </Accordion> <Accordion title="How quickly can I detect website issues?"> Firecrawl extracts data in real-time when called. Build your monitoring system to check sites at whatever frequency you need - from minute-by-minute for critical pages to daily for routine checks. </Accordion> <Accordion title="Can I validate specific page elements?"> Yes. Use the extract API to pull specific elements like prices, inventory levels, or critical content. Build validation logic in your monitoring system to verify that important information is present and correct. </Accordion> <Accordion title="How can I integrate Firecrawl with alerting systems?"> Firecrawl provides webhooks that you can use to build integrations with your alerting tools. Send extracted data to PagerDuty, Slack, email, or any monitoring platform by building connectors that process Firecrawl's responses. </Accordion> </AccordionGroup> ## Related Use Cases * [Competitive Intelligence](/use-cases/competitive-intelligence) - Monitor competitor changes * [Product & E-commerce](/use-cases/product-ecommerce) - Track inventory and pricing * [Data Migration](/use-cases/data-migration) - Validate migrations # Use Cases Source: https://docs.firecrawl.dev/use-cases/overview Transform web data into powerful features for your applications Turn any website into structured data for your product. Whether you are building RAG pipelines, enriching leads, or monitoring competitors, Firecrawl handles the scraping so you can focus on what you do with the data. Browse the use cases below to find guides, code samples, and architecture patterns for your workflow. <CardGroup> <Card title="AI Platforms" href="/use-cases/ai-platforms"> Add web knowledge to your RAG chatbots and AI assistants. </Card> <Card title="Lead Enrichment" href="/use-cases/lead-enrichment"> Extract and filter leads from websites to enrich your sales pipeline. </Card> <Card title="SEO Platforms" href="/use-cases/seo-platforms"> Monitor search rankings and optimize content strategy. </Card> <Card title="Deep Research" href="/use-cases/deep-research"> Build agentic research tools with deep web search capabilities. </Card> <Card title="Product & E-commerce" href="/use-cases/product-ecommerce"> Monitor pricing and track inventory across e-commerce sites. </Card> <Card title="Content Generation" href="/use-cases/content-generation"> Generate AI content based on website data and structure. </Card> <Card title="Developers & MCP" href="/use-cases/developers-mcp"> Build powerful integrations with Model Context Protocol support. </Card> <Card title="Investment & Finance" href="/use-cases/investment-finance"> Track companies and extract financial insights from web data. </Card> <Card title="Competitive Intelligence" href="/use-cases/competitive-intelligence"> Monitor competitor websites and track changes in real-time. </Card> <Card title="Data Migration" href="/use-cases/data-migration"> Transfer web data seamlessly between platforms and systems. </Card> <Card title="Observability" href="/use-cases/observability"> Monitor websites, track uptime, and detect changes in real-time. </Card> </CardGroup> # Product & E-commerce Source: https://docs.firecrawl.dev/use-cases/product-ecommerce Monitor pricing and track inventory across e-commerce sites Turn any e-commerce website into structured product data. Firecrawl extracts pricing, inventory, and catalog information so you can monitor competitors, track stock levels, and migrate products between platforms. ## Start with a Template <Card title="Firecrawl Migrator" icon="github" href="https://github.com/firecrawl/firecrawl-migrator"> Migrate product catalogs and e-commerce data between platforms </Card> <Note> Get started with the Firecrawl Migrator template. Extract and migrate e-commerce data efficiently. </Note> ## What You Can Extract * Product Data: Title, SKU, specs, descriptions, categories * Pricing: Current price, discounts, shipping, tax * Inventory: Stock levels, availability, lead times * Reviews: Ratings, customer feedback, Q\&A sections ## Use Cases in Action <CardGroup> <Card> Price Monitoring Track competitor pricing across multiple e-commerce sites, receive alerts on price changes, and optimize your pricing strategy based on real-time market data. </Card> <Card> Catalog Migration Seamlessly migrate thousands of products between e-commerce platforms, preserving all product data, variants, images, and metadata. </Card> </CardGroup> ## FAQs <AccordionGroup> <Accordion title="How can I track competitor pricing changes?"> Build a monitoring system using Firecrawl's API to extract prices at regular intervals. Compare extracted data over time to identify pricing trends, promotions, and competitive positioning. </Accordion> <Accordion title="Can I extract product variants (size, color, etc.)?"> Yes, Firecrawl can extract all product variants including size, color, and other options. Structure the data with custom schemas to capture all variant information. </Accordion> <Accordion title="How do I handle dynamic pricing or user-specific prices?"> For dynamic pricing, you can use Firecrawl's JavaScript rendering to capture prices after they load. For user-specific pricing, configure authentication headers in your requests. </Accordion> <Accordion title="Can I extract data from different e-commerce platforms?"> Yes. Firecrawl can extract data from any publicly accessible e-commerce website. Users successfully extract from Shopify, WooCommerce, Magento, BigCommerce, and custom-built stores. </Accordion> <Accordion title="Can Firecrawl handle pagination and infinite scroll?"> Yes. Firecrawl can navigate through paginated product listings and handle infinite scroll mechanisms to extract complete product catalogs, ensuring no products are missed during extraction. </Accordion> </AccordionGroup> ## Related Use Cases * [Lead Enrichment](/use-cases/lead-enrichment) - Enrich B2B e-commerce leads * [Competitive Intelligence](/use-cases/competitive-intelligence) - Track competitor strategies * [Data Migration](/use-cases/data-migration) - Migrate between platforms # SEO Platforms Source: https://docs.firecrawl.dev/use-cases/seo-platforms Optimize websites for AI assistants and search engines Audit entire websites for SEO and AI readability at scale. Firecrawl extracts meta tags, header structures, internal links, and content across thousands of pages, so you can spot optimization gaps that sampling-based tools miss. ## Start with a Template <Card title="FireGEO" icon="github" href="https://github.com/firecrawl/firegeo"> GEO-powered SEO monitoring and multi-region rank tracking </Card> <Note> Get started with the FireGEO template. Optimize for both search engines and AI assistants. </Note> ## How It Works Crawl a site with Firecrawl to get structured markdown and metadata for every page. Use that output to analyze content quality, check technical SEO elements, and evaluate how well each page communicates with AI assistants. ## Why SEO Platforms Choose Firecrawl ### Optimize for AI Discovery, Not Just Google The future of search is AI-powered. While competitors focus on traditional SEO, forward-thinking platforms use Firecrawl to increase their clients' visibility in AI assistant responses-the new frontier of organic discovery. ### Complete Site Intelligence at Scale Analyze entire websites, not just sample pages. Extract every meta tag, header structure, internal link, and content element across thousands of pages simultaneously. Identify optimization opportunities your competitors miss. ## What You Can Build * AI Readability Audit: Optimize for AI comprehension * Content Analysis: Structure and semantic optimization * Technical SEO: Site performance and crawlability * Search Tracking: Monitor search engine positions ## FAQs <AccordionGroup> <Accordion title="How can I optimize my site for AI assistants?"> Firecrawl helps you structure content for optimal AI comprehension, extract semantic signals, and ensure your site follows best practices for AI discovery. This includes generating experimental formats like llms.txt (an emerging convention for AI crawler guidance). </Accordion> <Accordion title="How does Firecrawl support SEO audits?"> Firecrawl extracts complete site structures, meta tags, headers, internal links, and content to perform comprehensive SEO audits. Identify optimization opportunities and track improvements over time. </Accordion> <Accordion title="Can Firecrawl help with competitor SEO analysis?"> Yes. Analyze competitor site structures, keyword usage, content strategies, and technical SEO implementations. Understand what's working in your industry to inform your strategy. </Accordion> <Accordion title="How can I use Firecrawl for content gap analysis?"> Crawl competitor sites to identify topics they cover that you don't. Extract their content categories, blog topics, and page structures to find opportunities for new content. </Accordion> <Accordion title="Does Firecrawl help with technical SEO monitoring?"> Yes. Identify broken links, track redirect chains, extract canonical tags, and monitor meta tag implementation. Regular crawls help identify technical SEO issues across your site. </Accordion> </AccordionGroup> ## Related Use Cases * [AI Platforms](/use-cases/ai-platforms) - Build AI-powered SEO tools * [Competitive Intelligence](/use-cases/competitive-intelligence) - Track competitor SEO * [Content Generation](/use-cases/content-generation) - Create SEO content # Event Types Source: https://docs.firecrawl.dev/webhooks/events Webhook event reference Firecrawl sends webhook events at each stage of a job's lifecycle, so you can track progress, capture results, and handle failures in real time without polling. ## Quick Reference | Event | Trigger | | ------------------------ | ---------------------------------------------------- | | `crawl.started` | Crawl job begins processing | | `crawl.page` | A page is scraped during a crawl | | `crawl.completed` | Crawl job finishes and all pages have been processed | | `batch_scrape.started` | Batch scrape job begins processing | | `batch_scrape.page` | A URL is scraped during a batch scrape | | `batch_scrape.completed` | All URLs in the batch have been processed | | `extract.started` | Extract job begins processing | | `extract.completed` | Extraction finishes successfully | | `extract.failed` | Extraction fails | | `agent.started` | Agent job begins processing | | `agent.action` | Agent executes a tool (scrape, search, etc.) | | `agent.completed` | Agent finishes successfully | | `agent.failed` | Agent encounters an error | | `agent.cancelled` | Agent job is cancelled by the user | ## Payload Structure All webhook events share this structure: ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [...], "metadata": {} } ``` | Field | Type | Description | | ---------- | ------- | ----------------------------------------- | | `success` | boolean | Whether the operation succeeded | | `type` | string | Event type (e.g. `crawl.page`) | | `id` | string | Job ID | | `data` | array | Event-specific data (see examples below) | | `metadata` | object | Custom metadata from your webhook config | | `error` | string | Error message (when `success` is `false`) | ## Crawl Events ### `crawl.started` Sent when the crawl job begins processing. ```json theme={null} { "success": true, "type": "crawl.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `crawl.page` Sent for each page scraped. The `data` array contains the page content and metadata. ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "markdown": "# Page content...", "metadata": { "title": "Page Title", "description": "Page description", "url": "https://example.com/page", "statusCode": 200, "contentType": "text/html", "scrapeId": "550e8400-e29b-41d4-a716-446655440001", "sourceURL": "https://example.com/page", "proxyUsed": "basic", "cacheState": "hit", "cachedAt": "2025-09-03T21:11:25.636Z", "creditsUsed": 1 } } ], "metadata": {} } ``` ### `crawl.completed` Sent when the crawl job finishes and all pages have been processed. ```json theme={null} { "success": true, "type": "crawl.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ## Batch Scrape Events ### `batch_scrape.started` Sent when the batch scrape job begins processing. ```json theme={null} { "success": true, "type": "batch_scrape.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `batch_scrape.page` Sent for each URL scraped. The `data` array contains the page content and metadata. ```json theme={null} { "success": true, "type": "batch_scrape.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "markdown": "# Page content...", "metadata": { "title": "Page Title", "description": "Page description", "url": "https://example.com", "statusCode": 200, "contentType": "text/html", "scrapeId": "550e8400-e29b-41d4-a716-446655440001", "sourceURL": "https://example.com", "proxyUsed": "basic", "cacheState": "miss", "cachedAt": "2025-09-03T23:30:53.434Z", "creditsUsed": 1 } } ], "metadata": {} } ``` ### `batch_scrape.completed` Sent when all URLs in the batch have been processed. ```json theme={null} { "success": true, "type": "batch_scrape.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ## Extract Events ### `extract.started` Sent when the extract job begins processing. ```json theme={null} { "success": true, "type": "extract.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `extract.completed` Sent when extraction finishes successfully. The `data` array contains the extracted data and usage info. ```json theme={null} { "success": true, "type": "extract.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "success": true, "data": { "siteName": "Example Site", "category": "Technology" }, "extractId": "550e8400-e29b-41d4-a716-446655440000", "llmUsage": 0.0020118, "totalUrlsScraped": 1, "sources": { "siteName": ["https://example.com"], "category": ["https://example.com"] } } ], "metadata": {} } ``` ### `extract.failed` Sent when extraction fails. The `error` field contains the failure reason. ```json theme={null} { "success": false, "type": "extract.failed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "error": "Failed to extract data: timeout exceeded", "metadata": {} } ``` ## Agent Events ### `agent.started` Sent when the agent job begins processing. ```json theme={null} { "success": true, "type": "agent.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `agent.action` Sent after each tool execution (scrape, search, etc.). ```json theme={null} { "success": true, "type": "agent.action", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 5, "action": "mcptoolsscrape", "input": { "url": "https://example.com" } } ], "metadata": {} } ``` <Note> The `creditsUsed` value in `action` events is an estimate of the total credits used so far. The final accurate credit count is only available in the `completed`, `failed`, or `cancelled` events. </Note> ### `agent.completed` Sent when the agent finishes successfully. The `data` array contains the extracted data and total credits used. ```json theme={null} { "success": true, "type": "agent.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 15, "data": { "company": "Example Corp", "industry": "Technology", "founded": 2020 } } ], "metadata": {} } ``` ### `agent.failed` Sent when the agent encounters an error. The `error` field contains the failure reason. ```json theme={null} { "success": false, "type": "agent.failed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 8 } ], "error": "Max credits exceeded", "metadata": {} } ``` ### `agent.cancelled` Sent when the agent job is cancelled by the user. ```json theme={null} { "success": false, "type": "agent.cancelled", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 3 } ], "metadata": {} } ``` ## Event Filtering By default, you receive all events. To subscribe to specific events only, use the `events` array in your webhook config: ```json theme={null} { "url": "https://your-app.com/webhook", "events": ["completed", "failed"] } ``` This is useful if you only care about job completion and don't need per-page updates. # Overview Source: https://docs.firecrawl.dev/webhooks/overview Real-time notifications for your Firecrawl operations Get notified the moment a crawl, batch scrape, extract, or agent job starts, progresses, or finishes. Instead of polling for status, you provide an HTTPS endpoint and Firecrawl delivers events to it in real time. ## Supported Operations | Operation | Events | | ------------ | ------------------------------------------------------- | | Crawl | `started`, `page`, `completed` | | Batch Scrape | `started`, `page`, `completed` | | Extract | `started`, `completed`, `failed` | | Agent | `started`, `action`, `completed`, `failed`, `cancelled` | See [Event Types](/webhooks/events) for full payload details and examples. ## Configuration Add a `webhook` object to your request: ```json JSON theme={null} { "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed", "failed"] } } ``` | Field | Type | Required | Description | | ---------- | ------ | -------- | ------------------------------------- | | `url` | string | Yes | Your endpoint URL (HTTPS) | | `headers` | object | No | Custom headers to include | | `metadata` | object | No | Custom data included in payloads | | `events` | array | No | Event types to receive (default: all) | ## Usage ### Crawl with Webhook ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Batch Scrape with Webhook ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ## Timeouts & Retries Your endpoint must respond with a `2xx` status within 10 seconds. If delivery fails (timeout, non-2xx, or network error), Firecrawl retries automatically: | Retry | Delay after failure | | ----- | ------------------- | | 1st | 1 minute | | 2nd | 5 minutes | | 3rd | 15 minutes | After 3 failed retries, the webhook is marked as failed and no further attempts are made. # Security Source: https://docs.firecrawl.dev/webhooks/security Verify webhook authenticity Firecrawl signs every webhook request using HMAC-SHA256. Verifying signatures ensures requests are authentic and haven't been tampered with. ## Secret Key Your webhook secret is available in the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings. Each account has a unique secret used to sign all webhook requests. <Warning> Keep your webhook secret secure and never expose it publicly. If you believe your secret has been compromised, regenerate it immediately from your account settings. </Warning> ## Signature Verification Each webhook request includes an `X-Firecrawl-Signature` header: ``` X-Firecrawl-Signature: sha256=abc123def456... ``` ### How to Verify 1. Extract the signature from the `X-Firecrawl-Signature` header 2. Get the raw request body (before parsing) 3. Compute HMAC-SHA256 using your secret key 4. Compare signatures using a timing-safe function ### Implementation <CodeGroup> ```js Node/Express theme={null} import crypto from 'crypto'; import express from 'express'; const app = express(); // Use raw body parser for signature verification app.use('/webhook/firecrawl', express.raw({ type: 'application/json' })); app.post('/webhook/firecrawl', (req, res) => { const signature = req.get('X-Firecrawl-Signature'); const webhookSecret = process.env.FIRECRAWL_WEBHOOK_SECRET; if (!signature || !webhookSecret) { return res.status(401).send('Unauthorized'); } // Extract hash from signature header const [algorithm, hash] = signature.split('='); if (algorithm !== 'sha256') { return res.status(401).send('Invalid signature algorithm'); } // Compute expected signature const expectedSignature = crypto .createHmac('sha256', webhookSecret) .update(req.body) .digest('hex'); // Verify signature using timing-safe comparison if (!crypto.timingSafeEqual(Buffer.from(hash, 'hex'), Buffer.from(expectedSignature, 'hex'))) { return res.status(401).send('Invalid signature'); } // Parse and process verified webhook const event = JSON.parse(req.body); console.log('Verified Firecrawl webhook:', event); res.status(200).send('ok'); }); app.listen(3000, () => console.log('Listening on 3000')); ``` ```python Python/Flask theme={null} import hmac import hashlib from flask import Flask, request, abort app = Flask(name) WEBHOOK_SECRET = 'your-webhook-secret-here' # Get from Firecrawl dashboard @app.post('/webhook/firecrawl') def webhook(): signature = request.headers.get('X-Firecrawl-Signature') if not signature: abort(401, 'Missing signature header') # Extract hash from signature header try: algorithm, hash_value = signature.split('=', 1) if algorithm != 'sha256': abort(401, 'Invalid signature algorithm') except ValueError: abort(401, 'Invalid signature format') # Compute expected signature expected_signature = hmac.new( WEBHOOK_SECRET.encode('utf-8'), request.data, hashlib.sha256 ).hexdigest() # Verify signature using timing-safe comparison if not hmac.compare_digest(hash_value, expected_signature): abort(401, 'Invalid signature') # Parse and process verified webhook event = request.get_json(force=True) print('Verified Firecrawl webhook:', event) return 'ok', 200 if name == 'main': app.run(port=3000) ``` </CodeGroup> ## Best Practices * Verify every request. Always check the signature before processing a webhook payload. Reject any request that fails verification with a `401` status. * Use timing-safe comparisons. Standard string comparison can leak timing information. Use `crypto.timingSafeEqual()` in Node.js or `hmac.compare_digest()` in Python. * Serve your endpoint over HTTPS. This ensures webhook payloads are encrypted in transit. # Testing Source: https://docs.firecrawl.dev/webhooks/testing Test and debug webhooks Verify that your webhook integration works before deploying to production. This page covers how to receive webhooks on your local machine and how to diagnose common delivery and verification failures. ## Local Development Webhooks require a publicly reachable URL, so you need to expose your local server to the internet during development. ### Using Cloudflare Tunnels [Cloudflare Tunnels](https://github.com/cloudflare/cloudflared/releases) provide a free way to expose your local server without opening firewall ports: ```bash theme={null} cloudflared tunnel --url localhost:3000 ``` You'll get a public URL like `https://abc123.trycloudflare.com`. Use this in your webhook config: ```json theme={null} { "url": "https://abc123.trycloudflare.com/webhook" } ``` ## Troubleshooting ### Webhooks Not Arriving * Endpoint not accessible - Verify your server is publicly reachable and firewalls allow incoming connections * Using HTTP - Webhook URLs must use HTTPS * Wrong events - Check the `events` filter in your webhook config ### Signature Verification Failing The most common cause is using the parsed JSON body instead of the raw request body. ```javascript theme={null} // Wrong - using parsed body const signature = crypto .createHmac('sha256', secret) .update(JSON.stringify(req.body)) .digest('hex'); // Correct - using raw body app.use('/webhook', express.raw({ type: 'application/json' })); app.post('/webhook', (req, res) => { const signature = crypto .createHmac('sha256', secret) .update(req.body) // Raw buffer .digest('hex'); }); ``` ### Other Issues * Wrong secret - Verify you're using the correct secret from your [account settings](https://www.firecrawl.dev/app/settings?tab=advanced) * Timeout errors - Ensure your endpoint responds within 10 seconds # Advanced Scraping Guide Source: https://docs.firecrawl.dev/advanced-scraping-guide Configure scrape options, browser actions, crawl, map, and the agent endpoint with Firecrawl's full API surface. Reference for every option across Firecrawl's scrape, crawl, map, and agent endpoints. ## Basic scraping To scrape a single page and get clean markdown content, use the `/scrape` endpoint. <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape("https://firecrawl.dev") print(doc.markdown) ``` ```js Node theme={null} // npm install @mendable/firecrawl-js import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://firecrawl.dev'); console.log(doc.markdown); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` </CodeGroup> ## Scraping PDFs Firecrawl supports PDFs. Use the `parsers` option (e.g., `parsers: ["pdf"]`) when you want to ensure PDF parsing. You can control the parsing strategy with the `mode` option: * `auto` (default) — attempts fast text-based extraction first, then falls back to OCR if needed. * `fast` — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages. * `ocr` — forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page. `{ type: "pdf" }` and `"pdf"` both default to `mode: "auto"`. ```json theme={null} "parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }] ``` ## Scrape options When using the `/scrape` endpoint, you can customize the request with the following options. ### Formats (`formats`) The `formats` array controls which output types the scraper returns. Default: `["markdown"]`. String formats: pass the name directly (e.g. `"markdown"`). | Format | Description | | ---------- | ---------------------------------------------------------------------------- | | `markdown` | Page content converted to clean Markdown. | | `html` | Processed HTML with unnecessary elements removed. | | `rawHtml` | Original HTML exactly as returned by the server. | | `links` | All links found on the page. | | `images` | All images found on the page. | | `summary` | An LLM-generated summary of the page content. | | `branding` | Extracts brand identity (colors, fonts, typography, spacing, UI components). | Object formats: pass an object with `type` and additional options. | Format | Options | Description | | ---------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | `json` | `prompt?: string`, `schema?: object` | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters). | | `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. | | `changeTracking` | `modes?: ("json" \| "git-diff")[]`, `tag?: string`, `schema?: object`, `prompt?: string` | Track changes between scrapes. Requires `"markdown"` to also be in the formats array. | | `attributes` | `selectors: [{ selector: string, attribute: string }]` | Extract specific HTML attributes from elements matching CSS selectors. | ### Content filtering These parameters control which parts of the page appear in the output. When `onlyMainContent` is `true` (the default), boilerplate (nav, footer, etc.) is stripped. `includeTags` and `excludeTags` are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set `onlyMainContent: false` to use the full page as the starting point for tag filtering. | Parameter | Type | Default | Description | | ----------------- | --------- | ------- | ---------------------------------------------------------------------------- | | `onlyMainContent` | `boolean` | `true` | Return only the main content. Set `false` for the full page. | | `includeTags` | `array` | — | HTML tags, classes, or IDs to include (e.g. `["h1", "p", ".main-content"]`). | | `excludeTags` | `array` | — | HTML tags, classes, or IDs to exclude (e.g. `["#ad", "#footer"]`). | ### Timing and cache | Parameter | Type | Default | Description | | --------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------ | | `waitFor` | `integer` (ms) | `0` | Extra wait time before scraping, on top of smart-wait. Use sparingly. | | `maxAge` | `integer` (ms) | `172800000` | Return a cached version if fresher than this value (default is 2 days). Set `0` to always fetch fresh. | | `timeout` | `integer` (ms) | `60000` | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second). | ### PDF parsing | Parameter | Type | Default | Description | | --------- | ------- | --------- | -------------------------------------------------------------------------------- | | `parsers` | `array` | `["pdf"]` | Controls PDF processing. `[]` to skip parsing and return base64 (1 credit flat). | ```json theme={null} { "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 } ``` | Property | Type | Default | Description | | ---------- | --------------------------- | ------------ | ------------------------------------------------------------------------------------- | | `type` | `"pdf"` | (required) | Parser type. | | `mode` | `"fast" \| "auto" \| "ocr"` | `"auto"` | `fast`: text-based extraction only. `auto`: fast with OCR fallback. `ocr`: force OCR. | | `maxPages` | `integer` | — | Cap the number of pages to parse. | ### Actions Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all `wait` actions and `waitFor` must not exceed 60 seconds. | Action | Parameters | Description | | ------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | | `wait` | `milliseconds?: number`, `selector?: string` | Wait for a fixed duration or until an element is visible (provide one, not both). When using `selector`, times out after 30 seconds. | | `click` | `selector: string`, `all?: boolean` | Click an element matching the CSS selector. Set `all: true` to click every match. | | `write` | `text: string` | Type text into the currently focused field. You must focus the element with a `click` action first. | | `press` | `key: string` | Press a keyboard key (e.g. `"Enter"`, `"Tab"`, `"Escape"`). | | `scroll` | `direction?: "up" \| "down"`, `selector?: string` | Scroll the page or a specific element. Direction defaults to `"down"`. | | `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max viewport resolution is 7680×4320. | | `scrape` | (none) | Capture the current page HTML at this point in the action sequence. | | `executeJavascript` | `script: string` | Run JavaScript code in the page. Return values are available in the `actions.javascriptReturns` array of the response. | | `pdf` | `format?: string`, `landscape?: boolean`, `scale?: number` | Generate a PDF. Supported formats: `"A0"` through `"A6"`, `"Letter"`, `"Legal"`, `"Tabloid"`, `"Ledger"`. Defaults to `"Letter"`. | <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') doc = firecrawl.scrape('https://example.com', { 'actions': [ { 'type': 'wait', 'milliseconds': 1000 }, { 'type': 'click', 'selector': '#accept' }, { 'type': 'scroll', 'direction': 'down' }, { 'type': 'click', 'selector': '#q' }, { 'type': 'write', 'text': 'firecrawl' }, { 'type': 'press', 'key': 'Enter' }, { 'type': 'wait', 'milliseconds': 2000 } ], 'formats': ['markdown'] }) print(doc.markdown) ``` ```js Node theme={null} import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://example.com', { actions: [ { type: 'wait', milliseconds: 1000 }, { type: 'click', selector: '#accept' }, { type: 'scroll', direction: 'down' }, { type: 'click', selector: '#q' }, { type: 'write', text: 'firecrawl' }, { type: 'press', key: 'Enter' }, { type: 'wait', milliseconds: 2000 } ], formats: ['markdown'] }); console.log(doc.markdown); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "wait", "milliseconds": 1000 }, { "type": "click", "selector": "#accept" }, { "type": "scroll", "direction": "down" }, { "type": "click", "selector": "#q" }, { "type": "write", "text": "firecrawl" }, { "type": "press", "key": "Enter" }, { "type": "wait", "milliseconds": 2000 } ], "formats": ["markdown"] }' ``` </CodeGroup> #### Action execution notes * Write requires a preceding `click` to focus the target element. * Scroll accepts an optional `selector` to scroll a specific element instead of the page. * Wait accepts either `milliseconds` (fixed delay) or `selector` (wait until visible). * Actions run sequentially: each step completes before the next begins. * Actions are not supported for PDFs. If the URL resolves to a PDF the request will fail. #### Advanced action examples Taking a screenshot: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "click", "selector": "#load-more" }, { "type": "wait", "milliseconds": 1000 }, { "type": "screenshot", "fullPage": true, "quality": 80 } ] }' ``` Clicking multiple elements: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "click", "selector": ".expand-button", "all": true }, { "type": "wait", "milliseconds": 500 } ], "formats": ["markdown"] }' ``` Generating a PDF: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "pdf", "format": "A4", "landscape": false } ] }' ``` Executing JavaScript (e.g. extracting embedded page data): ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "executeJavascript", "script": "document.querySelector(\"#__NEXT_DATA__\").textContent" } ], "formats": ["markdown"] }' ``` The return value of each `executeJavascript` action is captured in the `actions.javascriptReturns` array of the response. ### Full scrape example The following request combines multiple scrape options: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "formats": [ "markdown", "links", "html", "rawHtml", { "type": "screenshot", "fullPage": true, "quality": 80 } ], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000, "parsers": ["pdf"] }' ``` This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to `<h1>`, `<p>`, `<a>`, and `.main-content` while excluding `#ad` and `#footer`, waits 1 second before scraping, sets a 15 second timeout, and enables PDF parsing. See the full [Scrape API reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape) for details. ## JSON extraction via formats Use the JSON format object in `formats` to extract structured data in one pass: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') doc = firecrawl.scrape('https://firecrawl.dev', { 'formats': [{ 'type': 'json', 'prompt': 'Extract the features of the product', 'schema': { 'type': 'object', 'properties': { 'features': { 'type': 'object' } }, 'required': ['features'] } }] }) print(doc.json) ``` ```js Node theme={null} import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: [{ type: 'json', prompt: 'Extract the features of the product', schema: { type: 'object', properties: { features: { type: 'object' } }, required: ['features'] } }] }); console.log(doc.json); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the features of the product", "schema": {"type": "object", "properties": {"features": {"type": "object"}}, "required": ["features"]} }] }' ``` </CodeGroup> ## Agent endpoint Use the `/v2/agent` endpoint for autonomous, multi-page data extraction. The agent runs asynchronously: you start a job, then poll for results. ### Agent options | Parameter | Type | Default | Description | | ----------------------- | --------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `prompt` | `string` | (required) | Natural-language instructions describing what data to extract (max 10,000 characters). | | `urls` | `array` | — | URLs to constrain the agent to. | | `schema` | `object` | — | JSON schema to structure the extracted data. | | `maxCredits` | `number` | `2500` | Maximum credits the agent can spend. The dashboard supports up to 2,500; for higher limits, set this via the API (values above 2,500 are always billed as paid requests). | | `strictConstrainToURLs` | `boolean` | `false` | When `true`, the agent only visits the provided URLs. | | `model` | `string` | `"spark-1-mini"` | AI model to use. `"spark-1-mini"` (default, 60% cheaper) or `"spark-1-pro"` (higher accuracy). | <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') # Start agent job started = firecrawl.start_agent( prompt="Extract the title and description", urls=["https://docs.firecrawl.dev"], schema={"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]} ) # Poll status status = firecrawl.get_agent_status(started["id"]) print(status.get("status"), status.get("data")) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start agent job const started = await firecrawl.startAgent({ prompt: 'Extract the title and description', urls: ['https://docs.firecrawl.dev'], schema: { type: 'object', properties: { title: { type: 'string' }, description: { type: 'string' } }, required: ['title'] } }); // Poll status const status = await firecrawl.getAgentStatus(started.id); console.log(status.status, status.data); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/agent \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "prompt": "Extract the title and description", "urls": ["https://docs.firecrawl.dev"], "schema": {"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]} }' ``` </CodeGroup> ### Check agent status Poll `GET /v2/agent/{jobId}` to check progress. The response `status` field will be `"processing"`, `"completed"`, or `"failed"`. ```bash cURL theme={null} curl -X GET https://api.firecrawl.dev/v2/agent/YOUR-JOB-ID \ -H 'Authorization: Bearer fc-YOUR-API-KEY' ``` The Python and Node SDKs also provide a convenience method (`firecrawl.agent()`) that starts the job and polls automatically until completion. ## Crawling multiple pages To crawl multiple pages, use the `/v2/crawl` endpoint. The crawl runs asynchronously and returns a job ID. Use the `limit` parameter to control how many pages are crawled. If omitted, the crawl will process up to 10,000 pages. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ### Response ```json theme={null} { "id": "1234-5678-9101" } ``` ### Check crawl job Use the job ID to check the status of a crawl and retrieve its results. ```bash cURL theme={null} curl -X GET https://api.firecrawl.dev/v2/crawl/1234-5678-9101 \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' ``` If the content is larger than 10MB or the crawl job is still running, the response may include a `next` parameter, a URL to the next page of results. ### Crawl prompt and params preview You can provide a natural-language `prompt` to let Firecrawl derive crawl settings. Preview them first: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl/params-preview \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "prompt": "Extract docs and blog" }' ``` ### Crawler options When using the `/v2/crawl` endpoint, you can customize crawling behavior with the following options. #### Path filtering | Parameter | Type | Default | Description | | ---------------- | --------- | ------- | ----------------------------------------------------------------- | | `includePaths` | `array` | — | Regex patterns for URLs to include (pathname only by default). | | `excludePaths` | `array` | — | Regex patterns for URLs to exclude (pathname only by default). | | `regexOnFullURL` | `boolean` | `false` | Match patterns against the full URL instead of just the pathname. | <Warning> The starting URL is also checked against `includePaths`. If it does not match any of the patterns, the crawl may return 0 pages. </Warning> #### Crawl scope | Parameter | Type | Default | Description | | -------------------- | ------------ | ------- | ------------------------------------------------------------ | | `maxDiscoveryDepth` | `integer` | — | Max link-depth for discovering new URLs. | | `limit` | `integer` | `10000` | Max pages to crawl. | | `crawlEntireDomain` | `boolean` | `false` | Explore siblings and parents to cover the entire domain. | | `allowExternalLinks` | `boolean` | `false` | Follow links to external domains. | | `allowSubdomains` | `boolean` | `false` | Follow subdomains of the main domain. | | `delay` | `number` (s) | — | Delay between scrapes. Setting this forces concurrency to 1. | #### Sitemap and deduplication | Parameter | Type | Default | Description | | ------------------------ | --------- | ----------- | ------------------------------------------------------------------------------------------------------- | | `sitemap` | `string` | `"include"` | `"include"`: use sitemap + link discovery. `"skip"`: ignore sitemap. `"only"`: crawl only sitemap URLs. | | `deduplicateSimilarURLs` | `boolean` | `true` | Normalize URL variants (`www.`, `https`, trailing slashes, `index.html`) as duplicates. | | `ignoreQueryParameters` | `boolean` | `false` | Strip query strings before deduplication (e.g. `/page?a=1` and `/page?a=2` become one URL). | #### Scrape options for crawl | Parameter | Type | Default | Description | | --------------- | -------- | --------------------------- | ---------------------------------------------------------------------------- | | `scrapeOptions` | `object` | `{ formats: ["markdown"] }` | Per-page scrape config. Accepts all [scrape options](#scrape-options) above. | ### Crawl example ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "includePaths": ["^/blog/.$", "^/docs/.$"], "excludePaths": ["^/admin/.$", "^/private/.$"], "maxDiscoveryDepth": 2, "limit": 1000 }' ``` ## Mapping website links The `/v2/map` endpoint identifies URLs related to a given website. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ### Map options | Parameter | Type | Default | Description | | ------------------- | --------- | ----------- | ----------------------------------- | | `search` | `string` | — | Filter links by text match. | | `limit` | `integer` | `100` | Max links to return. | | `sitemap` | `string` | `"include"` | `"include"`, `"skip"`, or `"only"`. | | `includeSubdomains` | `boolean` | `true` | Include subdomains. | Here is the API Reference for it: [Map Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/map) ## Whitelisting Firecrawl ### Allowing Firecrawl to scrape your website * User Agent: Allow `FirecrawlAgent` in your firewall or security rules. * IP addresses: Firecrawl does not use a fixed set of outbound IPs. ### Allowing your application to call the Firecrawl API If your firewall blocks outbound requests from your application to external services, you need to whitelist Firecrawl's API server IP address so your application can reach the Firecrawl API (`api.firecrawl.dev`): * IP Address: `35.245.250.27` Add this IP to your firewall's outbound allowlist so your backend can send scrape, crawl, map, and agent requests to Firecrawl. # Activity Source: https://docs.firecrawl.dev/api-reference/endpoint/activity /api-reference/v2-openapi.json GET /team/activity Lists your team's recent API activity from the last 24 hours. Returns metadata about each job including the job ID, which can be used with the corresponding GET endpoint (e.g. GET /crawl/{id}) to retrieve full results. Supports cursor-based pagination and filtering by endpoint. Lists your recent API activity from the last 24 hours. Use this to discover job IDs, then retrieve results with the corresponding GET endpoint. | Endpoint | Retrieval Endpoint | | -------------- | --------------------------- | | `scrape` | `GET /v2/scrape/{id}` | | `crawl` | `GET /v2/crawl/{id}` | | `batch_scrape` | `GET /v2/batch/scrape/{id}` | | `agent` | `GET /v2/extract/{id}` | # Agent Source: https://docs.firecrawl.dev/api-reference/endpoint/agent /api-reference/v2-openapi.json POST /agent > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Cancel Agent Source: https://docs.firecrawl.dev/api-reference/endpoint/agent-delete /api-reference/v2-openapi.json DELETE /agent/{jobId} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Agent Status Source: https://docs.firecrawl.dev/api-reference/endpoint/agent-get /api-reference/v2-openapi.json GET /agent/{jobId} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Batch Scrape Source: https://docs.firecrawl.dev/api-reference/endpoint/batch-scrape /api-reference/v2-openapi.json POST /batch/scrape > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Cancel Batch Scrape Source: https://docs.firecrawl.dev/api-reference/endpoint/batch-scrape-delete /api-reference/v2-openapi.json DELETE /batch/scrape/{id} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Batch Scrape Status Source: https://docs.firecrawl.dev/api-reference/endpoint/batch-scrape-get /api-reference/v2-openapi.json GET /batch/scrape/{id} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Batch Scrape Errors Source: https://docs.firecrawl.dev/api-reference/endpoint/batch-scrape-get-errors /api-reference/v2-openapi.json GET /batch/scrape/{id}/errors > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Create Browser Session Source: https://docs.firecrawl.dev/api-reference/endpoint/browser-create /api-reference/v2-openapi.json POST /browser Launch a new cloud browser session in a sandboxed environment. ## Headers | Header | Value | | --------------- | ------------------ | | `Authorization` | `Bearer <API_KEY>` | | `Content-Type` | `application/json` | ## Request Body | Parameter | Type | Required | Default | Description | | --------------------- | ------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `ttl` | number | No | 600 | Total session lifetime in seconds (30-3600) | | `activityTtl` | number | No | 300 | Seconds of inactivity before session is destroyed (10-3600) | | `profile` | object | No | — | Enable persistent storage across sessions. See below. | | `profile.name` | string | Yes\* | — | Name for the profile (1-128 chars). Sessions with the same name share storage. | | `profile.saveChanges` | boolean | No | `true` | When `true`, browser state is saved back to the profile on close. Set to `false` to load existing data without writing. Only one saver is allowed at a time. | ## Response | Field | Type | Description | | ------------------------ | ------- | ------------------------------------------------------------------- | | `success` | boolean | Whether the session was created | | `id` | string | Unique session identifier | | `cdpUrl` | string | WebSocket URL for CDP connections | | `liveViewUrl` | string | URL to watch the session in real time | | `interactiveLiveViewUrl` | string | URL to interact with the session in real time (click, type, scroll) | | `expiresAt` | string | When the session will expire based on TTL | ### Example Request ```bash theme={null} curl -X POST "https://api.firecrawl.dev/v2/browser" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "ttl": 120 }' ``` ### Example Response ```json theme={null} { "success": true, "id": "550e8400-e29b-41d4-a716-446655440000", "cdpUrl": "wss://cdp-proxy.firecrawl.dev/cdp/550e8400-e29b-41d4-a716-446655440000", "liveViewUrl": "https://liveview.firecrawl.dev/550e8400-e29b-41d4-a716-446655440000", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/550e8400-e29b-41d4-a716-446655440000?interactive=true" } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Delete Browser Session Source: https://docs.firecrawl.dev/api-reference/endpoint/browser-delete /api-reference/v2-openapi.json DELETE /browser/{sessionId} Destroy an active cloud browser session and release its resources. ## Headers | Header | Value | | --------------- | ------------------ | | `Authorization` | `Bearer <API_KEY>` | | `Content-Type` | `application/json` | ## Request Body | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------- | | `id` | string | Yes | The session ID to destroy | ## Response | Field | Type | Description | | --------- | ------- | ---------------------------------------------- | | `success` | boolean | Whether the session was successfully destroyed | ### Example Request ```bash theme={null} curl -X DELETE "https://api.firecrawl.dev/v2/browser" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "id": "550e8400-e29b-41d4-a716-446655440000" }' ``` ### Example Response ```json theme={null} { "success": true } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Execute Browser Code Source: https://docs.firecrawl.dev/api-reference/endpoint/browser-execute /api-reference/v2-openapi.json POST /browser/{sessionId}/execute Execute code in a sandboxed browser environment. ## Headers | Header | Value | | --------------- | ------------------ | | `Authorization` | `Bearer <API_KEY>` | | `Content-Type` | `application/json` | ## Request Body | Parameter | Type | Required | Default | Description | | ---------- | ------ | -------- | -------- | ---------------------------------------------------------------------------------------- | | `code` | string | Yes | - | Code to execute (1-100,000 characters) | | `language` | string | No | `"node"` | Language of the code: `"python"`, `"node"`, or `"bash"` (for agent-browser CLI commands) | | `timeout` | number | No | - | Execution timeout in seconds (1-300) | ## Response | Field | Type | Description | | ---------- | ------- | ----------------------------------------------------------- | | `success` | boolean | Whether the code executed successfully | | `stdout` | string | Standard output from the code execution | | `result` | string | Standard output from the code execution | | `stderr` | string | Standard error output from the code execution | | `exitCode` | number | Exit code of the executed process | | `killed` | boolean | Whether the process was killed due to timeout | | `error` | string | Error message if execution failed (only present on failure) | ### Example Request ```bash theme={null} curl -X POST "https://api.firecrawl.dev/v2/browser/550e8400-e29b-41d4-a716-446655440000/execute" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "code": "await page.goto(\"https://example.com\")\ntitle = await page.title()\nprint(title)", "language": "python" }' ``` ### Example Response (Success) ```json theme={null} { "success": true, "result": "Example Domain" } ``` ### Example Response (Error) ```json theme={null} { "success": true, "error": "TimeoutError: page.goto: Timeout 30000ms exceeded." } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # List Browser Sessions Source: https://docs.firecrawl.dev/api-reference/endpoint/browser-list /api-reference/v2-openapi.json GET /browser Retrieve a list of all browser sessions, optionally filtered by status. ## Headers | Header | Value | | --------------- | ------------------ | | `Authorization` | `Bearer <API_KEY>` | ## Query Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ----------------------------------------------------- | | `status` | string | No | Filter by session status: `"active"` or `"destroyed"` | ## Response | Field | Type | Description | | ---------- | ------- | ----------------------------- | | `success` | boolean | Whether the request succeeded | | `sessions` | array | List of session objects | ### Session Object | Field | Type | Description | | ------------------------ | ------ | ------------------------------------------------------------------- | | `id` | string | Unique session identifier | | `status` | string | Current session status (`"active"` or `"destroyed"`) | | `cdpUrl` | string | WebSocket URL for CDP connections | | `liveViewUrl` | string | URL to watch the session in real time | | `interactiveLiveViewUrl` | string | URL to interact with the session in real time (click, type, scroll) | | `createdAt` | string | ISO 8601 timestamp of session creation | | `lastActivity` | string | ISO 8601 timestamp of last activity | ### Example Request ```bash theme={null} curl -X GET "https://api.firecrawl.dev/v2/browser?status=active" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ### Example Response ```json theme={null} { "success": true, "sessions": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "active", "cdpUrl": "wss://cdp-proxy.firecrawl.dev/cdp/550e8400-e29b-41d4-a716-446655440000", "liveViewUrl": "https://liveview.firecrawl.dev/550e8400-e29b-41d4-a716-446655440000", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/550e8400-e29b-41d4-a716-446655440000?interactive=true", "createdAt": "2025-06-01T12:00:00Z", "lastActivity": "2025-06-01T12:05:30Z" } ] } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Active Crawls Source: https://docs.firecrawl.dev/api-reference/endpoint/crawl-active /api-reference/v2-openapi.json GET /crawl/active > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Cancel Crawl Source: https://docs.firecrawl.dev/api-reference/endpoint/crawl-delete /api-reference/v2-openapi.json DELETE /crawl/{id} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Crawl Status Source: https://docs.firecrawl.dev/api-reference/endpoint/crawl-get /api-reference/v2-openapi.json GET /crawl/{id} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Crawl Errors Source: https://docs.firecrawl.dev/api-reference/endpoint/crawl-get-errors /api-reference/v2-openapi.json GET /crawl/{id}/errors > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Crawl Params Preview Source: https://docs.firecrawl.dev/api-reference/endpoint/crawl-params-preview /api-reference/v2-openapi.json POST /crawl/params-preview # Crawl Source: https://docs.firecrawl.dev/api-reference/endpoint/crawl-post /api-reference/v2-openapi.json POST /crawl > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Credit Usage Source: https://docs.firecrawl.dev/api-reference/endpoint/credit-usage /api-reference/v2-openapi.json GET /team/credit-usage > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Historical Credit Usage Source: https://docs.firecrawl.dev/api-reference/endpoint/credit-usage-historical /api-reference/v2-openapi.json GET /team/credit-usage/historical Returns historical credit usage on a month-by-month basis. The endpoint can also breaks usage down by API key optionally. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Extract Source: https://docs.firecrawl.dev/api-reference/endpoint/extract /api-reference/v2-openapi.json POST /extract > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Get Extract Status Source: https://docs.firecrawl.dev/api-reference/endpoint/extract-get /api-reference/v2-openapi.json GET /extract/{id} > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Map Source: https://docs.firecrawl.dev/api-reference/endpoint/map /api-reference/v2-openapi.json POST /map > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Queue Status Source: https://docs.firecrawl.dev/api-reference/endpoint/queue-status /api-reference/v2-openapi.json GET /team/queue-status Metrics about your team's scrape queue. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Scrape Source: https://docs.firecrawl.dev/api-reference/endpoint/scrape /api-reference/v2-openapi.json POST /scrape ## Interactions For browser interactions (clicking, typing, navigating, extracting dynamic content), use the [Interact endpoint](/features/interact). Scrape a page first, then call `POST /v2/scrape/{scrapeId}/interact` with a natural-language prompt or Playwright code to take actions on the page. See the [Interact documentation](/features/interact) for full details and examples. Optionally you can also use the `actions` parameter, although it's not recommended to use it for complex interactions. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Stop Interacting Source: https://docs.firecrawl.dev/api-reference/endpoint/scrape-browser-delete /api-reference/v2-openapi.json DELETE /scrape/{jobId}/interact Stop the interactive browser session associated with a scrape job. Use this endpoint to stop the interactive browser session for a scrape job. Stopping the session releases browser resources and finalizes billing. Credits are billed based on session duration: 2 credits per browser minute, prorated by the second. ## Path Parameters | Parameter | Type | Required | Description | | --------- | ------------- | -------- | ----------------------------------------------------- | | `jobId` | string (UUID) | Yes | The scrape job ID associated with the browser session | ## Response | Field | Type | Description | | ------------------- | ------- | ---------------------------------------------- | | `success` | boolean | Whether the session was successfully destroyed | | `sessionDurationMs` | number | Total session duration in milliseconds | | `creditsBilled` | number | Number of credits billed for the session | | `error` | string | Error message (only present on failure) | ### Example Request ```bash theme={null} curl -X DELETE "https://api.firecrawl.dev/v2/scrape/550e8400-e29b-41d4-a716-446655440000/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ### Example Response ```json theme={null} { "success": true } ``` ### Error Codes | Status | Description | | ------ | -------------------------------------------- | | `403` | Session belongs to a different team | | `404` | No browser session found for this scrape job | For detailed usage with examples, see the [Interact feature guide](/features/interact). # Interact with the page Source: https://docs.firecrawl.dev/api-reference/endpoint/scrape-execute /api-reference/v2-openapi.json POST /scrape/{jobId}/interact Execute code or an AI prompt in the browser session bound to a scrape job. Use this endpoint to continue interacting with the same browser state initialized from a previous scrape. Either `code` or `prompt` must be provided — not both. `POST /v2/scrape/{jobId}/interact` handles the full lifecycle: 1. If no browser session exists for this scrape job yet, Firecrawl creates one at the same page state as the original scrape. 2. When `code` is provided, Firecrawl runs it in the browser sandbox. When `prompt` is provided, an AI agent automates the task using natural language. 3. Later `POST /interact` calls on the same `jobId` reuse the same live browser state. When you are done, call `DELETE /v2/scrape/{jobId}/interact` to stop the session. ## Path Parameters | Parameter | Type | Required | Description | | --------- | ------------- | -------- | ---------------------------------------------------------------------- | | `jobId` | string (UUID) | Yes | The scrape job ID from `data.metadata.scrapeId` in the scrape response | ## Request Body | Parameter | Type | Required | Default | Description | | ---------- | ------ | -------- | -------- | ------------------------------------------------------------------------------------------ | | `code` | string | No | — | Code to execute in the browser sandbox (1–100,000 chars). Required if `prompt` is not set. | | `prompt` | string | No | — | Natural language task for the AI agent (1–10,000 chars). Required if `code` is not set. | | `language` | string | No | `"node"` | One of `"python"`, `"node"`, or `"bash"`. Only used with `code`. | | `timeout` | number | No | `30` | Execution timeout in seconds (1–300). | | `origin` | string | No | — | Optional origin label used for telemetry. | ## Response | Field | Type | Description | | ------------------------ | ------- | ---------------------------------------------------------------------------------- | | `success` | boolean | Whether the execution completed without errors | | `liveViewUrl` | string | Read-only live view URL for the browser session | | `interactiveLiveViewUrl` | string | Interactive live view URL (viewers can control the browser) | | `output` | string | AI agent's final response (only present when using `prompt`) | | `stdout` | string | Standard output from the code execution | | `result` | string | Return value — last expression value for Node.js, final page snapshot for `prompt` | | `stderr` | string | Standard error output | | `exitCode` | number | Exit code of the execution (`0` = success) | | `killed` | boolean | Whether the execution was terminated due to timeout | | `error` | string | Error message (only present on failure) | ### Example Request (Code) ```bash theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape/550e8400-e29b-41d4-a716-446655440000/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "code": "const title = await page.title(); JSON.stringify({ title });", "language": "node", "timeout": 30 }' ``` ### Example Response (Code) ```json theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "stdout": "", "result": "{\"title\":\"Example Domain\"}", "stderr": "", "exitCode": 0, "killed": false } ``` ### Example Request (Prompt) ```bash theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape/550e8400-e29b-41d4-a716-446655440000/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the pricing section and tell me the price of the Pro plan", "timeout": 60 }' ``` ### Example Response (Prompt) ```json theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "The Pro plan costs $49/month and includes unlimited scrapes, priority support, and custom integrations.", "stdout": "...", "result": "...", "stderr": "", "exitCode": 0, "killed": false } ``` ### Error Codes | Status | Description | | ------ | ----------------------------------------------------------- | | `402` | Insufficient credits for a browser session | | `403` | Scrape job belongs to a different team | | `404` | Scrape job not found | | `409` | Replay context unavailable — rerun the scrape and try again | | `410` | Browser session has already been destroyed | | `429` | Maximum concurrent browser sessions reached | | `502` | Browser service or AI agent execution failed | | `503` | Browser feature not configured (self-hosted only) | For detailed usage with examples, see the [Interact feature guide](/features/interact). # Search Source: https://docs.firecrawl.dev/api-reference/endpoint/search /api-reference/v2-openapi.json POST /search The search endpoint combines web search with Firecrawl's scraping capabilities to return full page content for any query. Include `scrapeOptions` with `formats: [{"type": "markdown"}]` to get complete markdown content for each search result otherwise you will default to getting the results (url, title, description). You can also use other formats like `{"type": "summary"}` for condensed content. ## Supported query operators We support a variety of query operators that allow you to filter your searches better. | Operator | Functionality | Examples | | ------------- | ------------------------------------------------------------------------- | --------------------------------- | | `""` | Non-fuzzy matches a string of text | `"Firecrawl"` | | `-` | Excludes certain keywords or negates other operators | `-bad`, `-site:firecrawl.dev` | | `site:` | Only returns results from a specified website | `site:firecrawl.dev` | | `filetype:` | Only returns results with a specific file extension | `filetype:pdf`, `-filetype:pdf` | | `inurl:` | Only returns results that include a word in the URL | `inurl:firecrawl` | | `allinurl:` | Only returns results that include multiple words in the URL | `allinurl:git firecrawl` | | `intitle:` | Only returns results that include a word in the title of the page | `intitle:Firecrawl` | | `allintitle:` | Only returns results that include multiple words in the title of the page | `allintitle:firecrawl playground` | | `related:` | Only returns results that are related to a specific domain | `related:firecrawl.dev` | | `imagesize:` | Only returns images with exact dimensions | `imagesize:1920x1080` | | `larger:` | Only returns images larger than specified dimensions | `larger:1920x1080` | ## Location Parameter Use the `location` parameter to get geo-targeted search results. Format: `"string"`. Examples: `"Germany"`, `"San Francisco,California,United States"`. See the [complete list of supported locations](https://firecrawl.dev/search_locations.json) for all available countries and languages. ## Country Parameter Use the `country` parameter to specify the country for search results using ISO country codes. Default: `"US"`. Examples: `"US"`, `"DE"`, `"FR"`, `"JP"`, `"UK"`, `"CA"`. ```json theme={null} { "query": "restaurants", "country": "DE" } ``` ## Categories Parameter Filter search results by specific categories using the `categories` parameter: * `github`: Search within GitHub repositories, code, issues, and documentation * `research`: Search academic and research websites (arXiv, Nature, IEEE, PubMed, etc.) * `pdf`: Search for PDFs ### Example Usage ```json theme={null} { "query": "machine learning", "categories": ["github", "research"], "limit": 10 } ``` ### Category Response Each result includes a `category` field indicating its source: ```json theme={null} { "success": true, "data": { "web": [ { "url": "https://github.com/example/ml-project", "title": "Machine Learning Project", "description": "Implementation of ML algorithms", "category": "github" }, { "url": "https://arxiv.org/abs/2024.12345", "title": "ML Research Paper", "description": "Latest advances in machine learning", "category": "research" } ] } } ``` ## Time-Based Search Use the `tbs` parameter to filter results by time periods, including custom date ranges. See the [Search Feature documentation](https://docs.firecrawl.dev/features/search#time-based-search) for detailed examples and supported formats. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Token Usage Source: https://docs.firecrawl.dev/api-reference/endpoint/token-usage /api-reference/v2-openapi.json GET /team/token-usage <Info>We've simplified billing so that Extract now uses credits, just like all of the other endpoints. Each credit is worth 15 tokens. Reported token usage now includes usage from all endpoints.</Info> > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Historical Token Usage Source: https://docs.firecrawl.dev/api-reference/endpoint/token-usage-historical /api-reference/v2-openapi.json GET /team/token-usage/historical Returns historical token usage on a month-by-month basis. The endpoint can also breaks usage down by API key optionally. <Info>We've simplified billing so that Extract now uses credits, just like all of the other endpoints. Each credit is worth 15 tokens. Reported token usage now includes usage from all endpoints.</Info> > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Agent Action Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-agent-action /api-reference/webhooks-openapi.json webhook agentAction # Agent Cancelled Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-agent-cancelled /api-reference/webhooks-openapi.json webhook agentCancelled # Agent Completed Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-agent-completed /api-reference/webhooks-openapi.json webhook agentCompleted # Agent Failed Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-agent-failed /api-reference/webhooks-openapi.json webhook agentFailed # Agent Started Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-agent-started /api-reference/webhooks-openapi.json webhook agentStarted # Batch Scrape Completed Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-batch-scrape-completed /api-reference/webhooks-openapi.json webhook batchScrapeCompleted # Batch Scrape Page Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-batch-scrape-page /api-reference/webhooks-openapi.json webhook batchScrapePage # Batch Scrape Started Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-batch-scrape-started /api-reference/webhooks-openapi.json webhook batchScrapeStarted # Crawl Completed Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-crawl-completed /api-reference/webhooks-openapi.json webhook crawlCompleted # Crawl Page Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-crawl-page /api-reference/webhooks-openapi.json webhook crawlPage # Crawl Started Source: https://docs.firecrawl.dev/api-reference/endpoint/webhook-crawl-started /api-reference/webhooks-openapi.json webhook crawlStarted # Introduction Source: https://docs.firecrawl.dev/api-reference/v2-introduction Firecrawl API Reference (v2) The Firecrawl API gives you programmatic access to web data. All endpoints share a common base URL, authentication scheme, and response format described on this page. ## Features <CardGroup> <Card title="Scrape" icon="markdown" href="/api-reference/endpoint/scrape"> Extract content from any webpage in markdown or json format. </Card> <Card title="Crawl" icon="spider" href="/api-reference/endpoint/crawl-post"> Crawl entire websites and get content from all pages. </Card> <Card title="Map" icon="map" href="/api-reference/endpoint/map"> Get a complete list of URLs from any website quickly and reliably. </Card> <Card title="Search" icon="magnifying-glass" href="/api-reference/endpoint/search"> Search the web and get full page content in any format. </Card> </CardGroup> ## Agentic Features <CardGroup> <Card title="Agent" icon="robot" href="/api-reference/endpoint/agent"> Autonomous web data gathering powered by AI. </Card> <Card title="Browser" icon="browser" href="/api-reference/endpoint/browser-create"> Create and control browser sessions for interactive web tasks. </Card> </CardGroup> ## Base URL All requests use the following base URL: ```bash theme={null} https://api.firecrawl.dev ``` ## Authentication Every request requires an `Authorization` header with your API key: ```bash theme={null} Authorization: Bearer fc-YOUR-API-KEY ``` Include this header in all API calls. You can find your API key in the [Firecrawl dashboard](https://www.firecrawl.dev/app/api-keys). <CodeGroup> ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer fc-YOUR-API-KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' ``` ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") result = firecrawl.scrape("https://example.com") ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://example.com'); ``` </CodeGroup> ## Response codes Firecrawl uses conventional HTTP status codes to indicate the outcome of your requests. Codes in the `2xx` range indicate success, `4xx` codes indicate client errors, and `5xx` codes indicate server errors. | Status | Description | | ------ | ------------------------------------- | | `200` | Request was successful. | | `400` | Invalid request parameters. | | `401` | API key is missing or invalid. | | `402` | Payment required. | | `404` | The requested resource was not found. | | `429` | Rate limit exceeded. | | `5xx` | Server error on Firecrawl's side. | When a `5xx` error occurs, the response body includes a specific error code to help you diagnose the issue. ## Rate limit The Firecrawl API enforces rate limits on all endpoints to ensure service stability. Rate limits are based on the number of requests within a specific time window. When you exceed the rate limit, the API returns a `429` status code. Back off and retry the request after a short delay. # Billing Source: https://docs.firecrawl.dev/billing How Firecrawl billing, credits, and plans work ## Overview Firecrawl uses a credit-based billing system. Every API call you make consumes credits, and the number of credits consumed depends on the endpoint and options you use. You get a monthly credit allotment based on your plan, and can purchase additional credits via auto-recharge packs if you need more. For current plan pricing, visit the [Firecrawl pricing page](https://www.firecrawl.dev/pricing). ## Credits Credits are the unit of usage in Firecrawl. Each plan includes a monthly credit allotment that resets at the start of each billing cycle. Different API endpoints consume different amounts of credits. ### Credit costs per endpoint | Endpoint | Credit Cost | Notes | | ----------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Scrape | 1 credit / page | Convert a single URL into clean markdown, HTML, or structured data. Additional credits apply when using scrape options (see below). | | Crawl | 1 credit / page | Scrape an entire website by following links from a starting URL. The same per-page scrape option costs apply to each page crawled. | | Map | 1 credit / call | Discover all URLs on a website without scraping their content. | | Search | 2 credits / 10 results | Search the web and optionally scrape the results. Rounded up per 10 results (e.g., 11 results = 4 credits). Additional per-page scrape costs apply to each result that is scraped. Enterprise ZDR search costs 10 credits / 10 results (see [ZDR Search](/features/search#zero-data-retention-zdr)). | | Browser | 2 credits / browser minute | Interactive browser sandbox session, billed per minute. | | Agent | Dynamic | Autonomous web research agent. 5 daily runs free; usage-based pricing beyond that. | ### Additional credit costs for scrape options Certain scrape options add credits on top of the base cost per page: | Option | Additional Cost | Description | | ---------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------ | | PDF parsing | +1 credit / PDF page | Extract content from PDF documents | | JSON format (LLM extraction) | +4 credits / page | Use an LLM to extract structured JSON data from the page | | Enhanced Mode | +4 credits / page | Improved scraping for pages that are difficult to access | | Zero Data Retention (ZDR) | +1 credit / page | Ensures no data is persisted beyond the request (see [Scrape ZDR](/features/scrape#zero-data-retention-zdr)) | These modifiers stack. For example, scraping a page with both JSON format and Enhanced Mode costs 1 + 4 + 4 = 9 credits per page. These same modifiers apply to the Crawl and Search endpoints since they use scrape internally for each page. ### When credits are charged For batch scrape and crawl jobs, credits are billed asynchronously as each page completes processing — not when the job is submitted. This means there can be a delay between submitting a job and seeing the full credit cost reflected on your account. If a batch contains many URLs or pages are queued during high-traffic periods, credits may continue to appear minutes or hours after submission. Polling or checking batch status does not consume credits. ### Tracking your usage You can monitor your credit usage in two ways: * Dashboard: View your current and historical usage at [firecrawl.dev/app](https://www.firecrawl.dev/app) * API: Use the [Credit Usage](/api-reference/endpoint/credit-usage) and [Credit Usage Historical](/api-reference/endpoint/credit-usage-historical) endpoints to programmatically check your usage <Note> We are actively working on improvements to make credit usage easier to understand. Stay tuned for updates. </Note> ## Plans Firecrawl offers subscription-based monthly plans. There is no pure pay-as-you-go option, but auto-recharge (described below) provides flexible scaling on top of your base plan. ### Available plans | Plan | Monthly Credits | Concurrent Browsers | | ------------ | --------------- | ------------------: | | Free | 500 (one-time) | 2 | | Hobby | 3,000 | 5 | | Standard | 100,000 | 50 | | Growth | 500,000 | 100 | <Note> For high-volume needs beyond the Growth plan, Firecrawl offers Scale and Enterprise plans with 1M+ credits, 150+ concurrent browsers, dedicated support, SLAs, bulk discounts, zero-data retention, and SSO. Visit the [Enterprise page](https://www.firecrawl.dev/enterprise) for details. </Note> All paid plans are available with monthly or yearly billing. Yearly billing offers a discount compared to paying month-to-month. For current pricing on each plan, visit the [pricing page](https://www.firecrawl.dev/pricing). ### Concurrent browsers Concurrent browsers represent how many web pages Firecrawl can process for you simultaneously. Your plan determines this limit. If you exceed it, additional jobs wait in a queue until a slot opens. See [Rate Limits](/rate-limits) for full details on concurrency and API rate limits. ## Auto-Recharge If you occasionally need more credits than your plan includes, you can enable auto-recharge from the dashboard. When your remaining credits drop below a threshold, Firecrawl automatically purchases an additional credit pack and adds it to your balance. By default the threshold is 0 credits, but you can customize it in your [billing settings](https://www.firecrawl.dev/app/settings?tab=billing) to trigger recharges earlier — for example, setting a higher threshold ensures you always have a buffer of credits available, which is useful if you run large crawls. Note that this check occurs when you make an API request — your balance will not be topped up until your next request triggers the recharge. * Auto-recharge packs are available on all paid plans * Pack sizes and prices vary by plan (visible on the [pricing page](https://www.firecrawl.dev/pricing)) * You can toggle auto-recharge on or off at any time * Auto-recharge is limited to 4 packs per month * Credits from auto-recharge packs do not reset monthly — they persist across billing cycles and expire after 1 year from purchase, unlike your monthly plan credits which reset each period. Auto-recharge is best for handling occasional spikes in usage. If you find yourself consistently exceeding your plan's allotment, upgrading to a higher plan will give you better value per credit. ## Coupons Firecrawl supports two types of coupons: * Subscription coupons apply a discount to your plan subscription (e.g. a percentage off your monthly or yearly price). These can only be applied during the Stripe checkout flow when you first subscribe to a paid plan or change plans. You cannot apply a subscription coupon after checkout has completed. * Credit coupons add bonus credits to your account. These can be redeemed from the Billing section of your dashboard at [firecrawl.dev/app/billing](https://www.firecrawl.dev/app/billing). Look for the coupon input field on the billing page to apply your code. Bonus credits from credit coupons are separate from your plan's monthly allotment and persist even if you upgrade or downgrade your plan. If you have a coupon code and are unsure which type it is, try applying it in the billing section of your dashboard first. If it is a subscription coupon, you will need to use it at the Stripe checkout page instead. ## Billing Cycle * Monthly plans: Credits reset on your monthly renewal date * Yearly plans: You are billed annually, but credits still reset each month on your virtual monthly renewal date * Unused plan credits do not roll over — your monthly allotment resets to the plan amount at the start of each billing period. Credits from auto-recharge packs are not tied to your billing cycle — they persist and expire 1 year from the date of purchase. ## Upgrading and Downgrading * Upgrades take effect immediately. You are billed a prorated amount for the remainder of the current billing period, and your credit allotment and limits update right away. * Downgrades are scheduled to take effect at your next renewal date. You keep your current plan's credits and limits until then. ## What Happens When You Run Out of Credits If you exhaust your credit allotment and do not have auto-recharge enabled, API requests that consume credits will return an HTTP 402 (Payment Required) error. If you have auto-recharge enabled, usage will continue while recharge packs are purchased automatically — but if the monthly pack limit is reached or a recharge fails, your balance may go negative until the next billing cycle. To resume usage after a hard stop, you can: 1. Enable auto-recharge to automatically purchase more credits 2. Upgrade to a higher plan 3. Wait for your credits to reset at the next billing cycle ## Free Plan The Free plan provides a one-time allotment of 500 credits with no credit card required. These credits do not renew — once they are used up, you will need to upgrade to a paid plan to continue using Firecrawl. The Free plan also has lower rate limits and concurrency compared to paid plans (see [Rate Limits](/rate-limits)). ## FAQs <AccordionGroup> <Accordion title="Do unused credits roll over to the next month?"> Plan credits do not roll over — your monthly allotment resets at the start of each billing period. However, credits purchased through auto-recharge packs are not tied to your billing cycle and carry forward. Auto-recharge credits expire 1 year from the date of purchase. </Accordion> <Accordion title="How do I know how many credits I have left?"> Check the dashboard at [firecrawl.dev/app](https://www.firecrawl.dev/app), or call the [Credit Usage API endpoint](/api-reference/endpoint/credit-usage) programmatically. </Accordion> <Accordion title="What is the difference between credits and tokens?"> Credits are the billing unit for Firecrawl API calls. Tokens refer to LLM tokens used internally by endpoints like Extract and Agent. Each credit is equivalent to 15 tokens. For most users, you only need to think in terms of credits. </Accordion> <Accordion title="Where do I apply a coupon code?"> It depends on the coupon type. Credit coupons are applied in the Billing section of your dashboard. Subscription coupons (discounts on your plan price) can only be applied at the Stripe checkout page when subscribing or changing plans. </Accordion> <Accordion title="I need more concurrency or a custom plan. Who do I contact?"> Reach out to [help@firecrawl.dev](mailto:help@firecrawl.dev), or visit the [Enterprise page](https://www.firecrawl.dev/enterprise) to learn more about custom plans. </Accordion> </AccordionGroup> # Running Locally Source: https://docs.firecrawl.dev/contributing/guide Set up Firecrawl on your local machine for development and contribution. This guide walks you through running the Firecrawl API server on your local machine. Follow these steps to set up the development environment, start the services, and send your first request. If you're contributing, the process follows standard open-source conventions: fork the repo, make changes, run tests, and open a pull request. For questions or help getting started, reach out to [help@firecrawl.com](mailto:help@firecrawl.com) or [submit an issue](https://github.com/mendableai/firecrawl/issues). ## Prerequisites Install the following before proceeding: | Dependency | Required | Install guide | | ---------- | -------- | ------------------------------------------------------------------------------------- | | Node.js | Yes | [nodejs.org](https://nodejs.org/en/learn/getting-started/how-to-install-nodejs) | | pnpm (v9+) | Yes | [pnpm.io](https://pnpm.io/installation) | | Redis | Yes | [redis.io](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/) | | PostgreSQL | Yes | Via Docker (see below) or installed directly | | Docker | Optional | Required for the PostgreSQL container setup | ## Set up the database You need a PostgreSQL database initialized with the schema at `apps/nuq-postgres/nuq.sql`. The easiest approach is to use the Docker image inside `apps/nuq-postgres`. With Docker running, build and start the container: ```bash theme={null} docker build -t nuq-postgres apps/nuq-postgres ``` ```bash theme={null} docker run --name nuqdb \ -e POSTGRES_PASSWORD=postgres \ -p 5433:5432 \ -v nuq-data:/var/lib/postgresql/data \ -d nuq-postgres ``` ## Configure environment variables Copy the template to create your `.env` file in the `apps/api/` directory: ```bash theme={null} cp apps/api/.env.example apps/api/.env ``` For a minimal local setup without authentication or optional sub-services (PDF parsing, JS blocking, AI features), use the following configuration: ```bash apps/api/.env theme={null} # ===== Required ===== NUM_WORKERS_PER_QUEUE=8 PORT=3002 HOST=0.0.0.0 REDIS_URL=redis://localhost:6379 REDIS_RATE_LIMIT_URL=redis://localhost:6379 ## To turn on DB authentication, you need to set up supabase. USE_DB_AUTHENTICATION=false ## PostgreSQL connection for queuing — change if credentials, host, or DB differ NUQ_DATABASE_URL=postgres://postgres:postgres@localhost:5433/postgres # ===== Optional ===== # SUPABASE_ANON_TOKEN= # SUPABASE_URL= # SUPABASE_SERVICE_TOKEN= # TEST_API_KEY= # Set if you've configured authentication and want to test with a real API key # OPENAI_API_KEY= # Required for LLM-dependent features (image alt generation, etc.) # BULL_AUTH_KEY=@ # PLAYWRIGHT_MICROSERVICE_URL= # Set to run a Playwright fallback # LLAMAPARSE_API_KEY= # Set to parse PDFs with LlamaParse # SLACK_WEBHOOK_URL= # Set to send Slack server health status messages # POSTHOG_API_KEY= # Set to send PostHog events like job logs # POSTHOG_HOST= # Set to send PostHog events like job logs ``` ## Install dependencies From the `apps/api/` directory, install packages with pnpm: ```bash theme={null} cd apps/api pnpm install ``` ## Start the services You need three terminal sessions running simultaneously: Redis, the API server, and a terminal for sending requests. ### Terminal 1 — Redis Start the Redis server from anywhere in the project: ```bash theme={null} redis-server ``` ### Terminal 2 — API server Navigate to `apps/api/` and start the service: ```bash theme={null} pnpm start ``` This starts the API server and the workers responsible for processing crawl jobs. <Note> If you plan to use the [LLM extract feature](https://github.com/firecrawl/firecrawl/pull/586/), export your OpenAI key first: `export OPENAI_API_KEY=sk-...` </Note> ### Terminal 3 — Send a test request Verify the server is running with a health check: ```bash theme={null} curl -X GET http://localhost:3002/test ``` This should return `Hello, world!`. To test the crawl endpoint: ```bash theme={null} curl -X POST http://localhost:3002/v1/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://mendable.ai" }' ``` ## Alternative: Docker Compose For a simpler setup, Docker Compose runs all services (Redis, API server, and workers) in a single command. 1. Make sure Docker and Docker Compose are installed. 2. Copy `.env.example` to `.env` in the `apps/api/` directory and configure as needed. 3. From the project root, run: ```bash theme={null} docker compose up ``` This starts all services automatically in the correct configuration. ## Running tests Run the test suite with: ```bash theme={null} npm run test:snips ``` # Open Source vs Cloud Source: https://docs.firecrawl.dev/contributing/open-source-or-cloud Understand the differences between Firecrawl's open-source and cloud offerings Firecrawl is open source under the [AGPL-3.0 license](https://github.com/firecrawl/firecrawl/blob/main/LICENSE). To deliver the best possible product, we also offer a hosted cloud version at [firecrawl.dev](https://firecrawl.dev) with additional features and managed infrastructure. ## Feature comparison The open-source version includes the core scraping and crawling engine. Firecrawl Cloud adds infrastructure-level capabilities, a management dashboard, and enterprise features on top of everything in the open-source edition. <img alt="Firecrawl Cloud vs Open Source" /> | Feature | Open Source | Cloud | | ------------------- | :---------: | :---: | | Scrape | ✔ | ✔ | | Crawl | ✔ | ✔ | | Map | ✔ | ✔ | | Search | ✔ | ✔ | | Batch scrape | ✔ | ✔ | | Extract | ✔ | ✔ | | JSON mode | ✔ | ✔ | | LLM-ready formats | ✔ | ✔ | | Change tracking | ✔ | ✔ | | SDKs | ✔ | ✔ | | Agent | ✘ | ✔ | | Browser sandbox | ✘ | ✔ | | Actions | ✘ | ✔ | | Enhanced proxies | ✘ | ✔ | | Proxy rotations | ✘ | ✔ | | Dashboard | ✘ | ✔ | | Enterprise features | ✘ | ✔ | ## Why both? The cloud solution allows us to continuously innovate and maintain a high-quality, sustainable service for all users. Revenue from Firecrawl Cloud funds ongoing development of the open-source project. * Open source — Run Firecrawl on your own infrastructure with full control over the core scraping engine. See [Running locally](/contributing/guide) and [Self-hosting](/contributing/self-host). * Cloud — Get started immediately with a managed service that includes enhanced proxies, a usage dashboard, and enterprise features. Sign up at [firecrawl.dev](https://firecrawl.dev). # Self-hosting Source: https://docs.firecrawl.dev/contributing/self-host Learn how to self-host Firecrawl to run on your own and contribute to the project. #### Contributor? Welcome to [Firecrawl](https://firecrawl.dev) 🔥! Here are some instructions on how to get the project locally so you can run it on your own and contribute. If you're contributing, note that the process is similar to other open-source repos, i.e., fork Firecrawl, make changes, run tests, PR. If you have any questions or would like help getting on board, join our Discord community [here](https://discord.gg/firecrawl) for more information or submit an issue on Github [here](https://github.com/firecrawl/firecrawl/issues/new/choose)! ## Self-hosting Firecrawl Refer to [SELF\_HOST.md](https://github.com/firecrawl/firecrawl/blob/main/SELF_HOST.md) for instructions on how to run it locally. ## Why? Self-hosting Firecrawl is particularly beneficial for organizations with stringent security policies that require data to remain within controlled environments. Here are some key reasons to consider self-hosting: * Enhanced Security and Compliance: By self-hosting, you ensure that all data handling and processing complies with internal and external regulations, keeping sensitive information within your secure infrastructure. Note that Firecrawl is a Mendable product and relies on SOC2 Type2 certification, which means that the platform adheres to high industry standards for managing data security. * Customizable Services: Self-hosting allows you to tailor the services, such as the Playwright service, to meet specific needs or handle particular use cases that may not be supported by the standard cloud offering. * Learning and Community Contribution: By setting up and maintaining your own instance, you gain a deeper understanding of how Firecrawl works, which can also lead to more meaningful contributions to the project. ### Considerations However, there are some limitations and additional responsibilities to be aware of: 1. Limited Access to Fire-engine: Currently, self-hosted instances of Firecrawl do not have access to Fire-engine, which includes advanced features for handling IP blocks, robot detection mechanisms, and more. This means that while you can manage basic scraping tasks, more complex scenarios might require additional configuration or might not be supported. 2. Manual Configuration Required: If you need to use scraping methods beyond the basic fetch and Playwright options, you will need to manually configure these in the `.env` file. This requires a deeper understanding of the technologies and might involve more setup time. \| Capability | Cloud | Self-hosting | \| --- | --- | --- | \| All API endpoints supported | Yes | Not always; `/agent` and `/browser` are not supported in self-hosting | \| Screenshot support | Yes | Yes, when the Playwright service is running | \| Local LLMs (e.g., Ollama) | Not supported | Supported via `OLLAMA_BASE_URL` (experimental) | Self-hosting Firecrawl is ideal for those who need full control over their scraping and data processing environments but comes with the trade-off of additional maintenance and configuration efforts. ## Steps 1. First, start by installing the dependencies * Docker [instructions](https://docs.docker.com/get-docker/) 2. Set environment variables Create an `.env` in the root directory you can copy over the template in `apps/api/.env.example` To start, we wont set up authentication, or any optional sub services (pdf parsing, JS blocking support, AI features) ``` # .env # ===== Required ENVS ====== PORT=3002 HOST=0.0.0.0 # Note: PORT is used by both the main API server and worker liveness check endpoint # To turn on DB authentication, you need to set up Supabase. USE_DB_AUTHENTICATION=false # ===== Optional ENVS ====== ## === AI features (JSON format on scrape, /extract API) === # Provide your OpenAI API key here to enable AI features # OPENAI_API_KEY= # Experimental: Use Ollama # OLLAMA_BASE_URL=http://localhost:11434/api # MODEL_NAME=deepseek-r1:7b # MODEL_EMBEDDING_NAME=nomic-embed-text # Experimental: Use any OpenAI-compatible API # OPENAI_BASE_URL=https://example.com/v1 # OPENAI_API_KEY= ## === Proxy === # PROXY_SERVER can be a full URL (e.g. http://0.1.2.3:1234) or just an IP and port combo (e.g. 0.1.2.3:1234) # Do not uncomment PROXY_USERNAME and PROXY_PASSWORD if your proxy is unauthenticated # PROXY_SERVER= # PROXY_USERNAME= # PROXY_PASSWORD= ## === /search API === # You can specify a SearXNG server with the JSON format enabled, if you'd like to use that instead of direct Google. # You can also customize the engines and categories parameters, but the defaults should also work just fine. # SEARXNG_ENDPOINT=http://your.searxng.server # SEARXNG_ENGINES= # SEARXNG_CATEGORIES= ## === Other === # Supabase Setup (used to support DB authentication, advanced logging, etc.) # SUPABASE_ANON_TOKEN= # SUPABASE_URL= # SUPABASE_SERVICE_TOKEN= # Use if you've set up authentication and want to test with a real API key # TEST_API_KEY= # This key lets you access the queue admin panel. Change this if your deployment is publicly accessible. BULL_AUTH_KEY=CHANGEME # This is now autoconfigured by the docker-compose.yaml. You shouldn't need to set it. # PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape # REDIS_URL=redis://redis:6379 # REDIS_RATE_LIMIT_URL=redis://redis:6379 # Set if you have a llamaparse key you'd like to use to parse pdfs # LLAMAPARSE_API_KEY= # Set if you'd like to send server health status messages to Slack # SLACK_WEBHOOK_URL= # Set if you'd like to send posthog events like job logs # POSTHOG_API_KEY= # POSTHOG_HOST= ## === System Resource Configuration === # Maximum CPU usage threshold (0.0-1.0). Worker will reject new jobs when CPU usage exceeds this value. # Default: 0.8 (80%) # MAX_CPU=0.8 # Maximum RAM usage threshold (0.0-1.0). Worker will reject new jobs when memory usage exceeds this value. # Default: 0.8 (80%) # MAX_RAM=0.8 # Set if you'd like to allow local webhooks to be sent to your self-hosted instance # ALLOW_LOCAL_WEBHOOKS=true ``` <Note> The following AI features require an LLM provider configured (e.g., `OPENAI_API_KEY` or alternatives in the AI features section above): * JSON format on scrape * /extract API * Summary format * Branding format * Change tracking format </Note> 3. (Optional) Running with TypeScript Playwright Service * Update the `docker-compose.yml` file to change the Playwright service: ```plaintext theme={null} build: apps/playwright-service ``` TO ```plaintext theme={null} build: apps/playwright-service-ts ``` * Set the `PLAYWRIGHT_MICROSERVICE_URL` in your `.env` file: ```plaintext theme={null} PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/scrape ``` * Don't forget to set the proxy server in your `.env` file as needed. 4. Build and run the Docker containers: ```bash theme={null} docker compose build docker compose up ``` This will run a local instance of Firecrawl which can be accessed at `http://localhost:3002`. You should be able to see the Bull Queue Manager UI on `http://localhost:3002/admin/{BULL_AUTH_KEY}/queues`. 5. (Optional) Test the API If you’d like to test the crawl endpoint, you can run this: ```bash theme={null} curl -X POST http://localhost:3002/v2/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ## Troubleshooting This section provides solutions to common issues you might encounter while setting up or running your self-hosted instance of Firecrawl. ### Supabase client is not configured Symptom: ```bash theme={null} [YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Attempted to access Supabase client when it's not configured. [YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Error inserting scrape event: Error: Supabase client is not configured. ``` Explanation: This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances. ### You're bypassing authentication Symptom: ```bash theme={null} [YYYY-MM-DDTHH:MM:SS.SSSz]WARN - You're bypassing authentication ``` Explanation: This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances. ### Docker containers fail to start Symptom: Docker containers exit unexpectedly or fail to start. Solution: Check the Docker logs for any error messages using the command: ```bash theme={null} docker logs [container_name] ``` * Ensure all required environment variables are set correctly in the .env file. * Verify that all Docker services defined in docker-compose.yml are correctly configured and the necessary images are available. ### Connection issues with Redis Symptom: Errors related to connecting to Redis, such as timeouts or "Connection refused". Solution: * Ensure that the Redis service is up and running in your Docker environment. * Verify that the REDIS\_URL and REDIS\_RATE\_LIMIT\_URL in your .env file point to the correct Redis instance. * Check network settings and firewall rules that may block the connection to the Redis port. ### API endpoint does not respond Symptom: API requests to the Firecrawl instance timeout or return no response. Solution: * Ensure that the Firecrawl service is running by checking the Docker container status. * Verify that the PORT and HOST settings in your .env file are correct and that no other service is using the same port. * Check the network configuration to ensure that the host is accessible from the client making the API request. By addressing these common issues, you can ensure a smoother setup and operation of your self-hosted Firecrawl instance. ## Install Firecrawl on a Kubernetes Cluster (Simple Version) Read the [examples/kubernetes-cluster-install/README.md](https://github.com/firecrawl/firecrawl/tree/main/examples/kubernetes/cluster-install#readme) for instructions on how to install Firecrawl on a Kubernetes Cluster. # Overview Source: https://docs.firecrawl.dev/dashboard Overview of the Firecrawl dashboard and its key features The [Firecrawl dashboard](https://www.firecrawl.dev/app) is where you manage your account, test endpoints, and monitor usage. Below is a quick tour of each section. ## Playground The playground lets you try Firecrawl endpoints directly in the browser before integrating them into your code. * [Scrape](https://www.firecrawl.dev/app/playground?endpoint=scrape) — Extract content from a single page. * [Search](https://www.firecrawl.dev/app/playground?endpoint=search) — Search the web and get scraped results. * [Crawl](https://www.firecrawl.dev/app/playground?endpoint=crawl) — Crawl an entire website and extract content from every page. * [Map](https://www.firecrawl.dev/app/playground?endpoint=map) — Discover all URLs on a website. ## Browser [Interact with the web](https://www.firecrawl.dev/app/browser) through a live browser session. You can create persistent profiles, run actions, and take screenshots — useful for pages that require authentication or complex interaction. ## Agent The [Agent](https://www.firecrawl.dev/app/agent) is an AI-powered research tool that can autonomously browse the web, follow links, and extract structured data based on a prompt. ## Activity Logs [Activity Logs](https://www.firecrawl.dev/app/logs) show a history of your recent API requests, including status, duration, and credits consumed. ## Usage The [Usage](https://www.firecrawl.dev/app/usage) page shows your credit consumption over time and current billing-cycle totals. ## API Keys From the [API Keys](https://www.firecrawl.dev/app/api-keys) page you can create, view, and revoke API keys for your team. Your account must always have at least one active API key, so to revoke a key you will need to create a new one first. ## Settings The [Settings](https://www.firecrawl.dev/app/settings) page has three tabs: * Team — Invite members, assign roles, and manage your team. See [Team management & roles](#team-management--roles) below. * Billing — View your current plan, invoices, auto-recharge settings, and apply coupons. See also [Billing](/billing). * Advanced — Webhook signing secret and team deletion. * ## Team management & roles Firecrawl lets you invite teammates to collaborate under a shared account. From the Team tab in Settings you can invite members, assign roles, and manage your team. ### Roles Every team member is assigned one of two roles: Admin or Member. You choose the role when sending an invitation. | Capability | Admin | Member | | -------------------------------------------- | :---: | :----: | | General | | | | Use the team's API keys and shared resources | ✓ | ✓ | | Team management | | | | View the team member list | ✓ | ✓ | | Leave the team | ✓ | ✓ | | Invite new team members | ✓ | ✗ | | Remove team members | ✓ | ✗ | | Change a member's role | ✓ | ✗ | | Revoke pending invitations | ✓ | ✗ | | Edit the team name | ✓ | ✗ | | Billing | | | | View invoices and usage | ✓ | ✓ | | Apply credit coupons | ✓ | ✓ | | Manage subscription and billing portal | ✓ | ✗ | | Settings | | | | View the webhook signing secret | ✓ | ✓ | | Regenerate the webhook signing secret | ✓ | ✗ | | Delete the team | ✓ | ✗ | In short, Admins have full control over team management, billing, and settings, while Members can use the team's resources, view usage, and apply coupons but cannot modify the team or subscription. # Agent Source: https://docs.firecrawl.dev/features/agent Gather data wherever it lives on the web. Firecrawl `/agent` is a magic API that searches, navigates, and gathers data from the widest range of websites, finding data in hard-to-reach places and uncovering data in ways no other API can. It accomplishes in a few minutes what would take a human many hours — end-to-end data collection, without scripts or manual work. Whether you need one data point or entire datasets at scale, Firecrawl `/agent` works to get your data. Think of `/agent` as deep research for data, wherever it is! <Info> Research Preview**: Agent is in early access. Expect rough edges. It will get significantly better over time. [Share feedback →](mailto:product@firecrawl.com) </Info> Agent builds on everything great about `/extract` and takes it further: * No URLs Required: Just describe what you need via `prompt` parameter. URLs are optional * Deep Web Search: Autonomously searches and navigates deep into sites to find your data * Reliable and Accurate: Works with a wide variety of queries and use cases * Faster: Processes multiple sources in parallel for quicker results <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/agent"> Test the agent in the interactive playground — no code required. </Card> ## Using `/agent` The only required parameter is `prompt`. Simply describe what data you want to extract. For structured output, provide a JSON schema. The SDKs support Pydantic (Python) and Zod (Node) for type-safe schema definitions: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") background: Optional[str] = Field(None, description="Professional background") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema, model="spark-1-mini", max_credits=100 ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ prompt: "Find the founders of Firecrawl", schema: z.object({ founders: z.array(z.object({ name: z.string().describe("Full name of the founder"), role: z.string().describe("Role or position").optional(), background: z.string().describe("Professional background").optional() })).describe("List of founders") }), model: "spark-1-mini", maxCredits: 100 }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the founders of Firecrawl", "model": "spark-1-mini", "maxCredits": 100, "schema": { "type": "object", "properties": { "founders": { "type": "array", "description": "List of founders", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "Full name" }, "role": { "type": "string", "description": "Role or position" }, "background": { "type": "string", "description": "Professional background" } }, "required": ["name"] } } }, "required": ["founders"] } }' ``` </CodeGroup> ### Response ```json JSON theme={null} { "success": true, "status": "completed", "data": { "founders": [ { "name": "Eric Ciarla", "role": "Co-founder", "background": "Previously at Mendable" }, { "name": "Nicolas Camara", "role": "Co-founder", "background": "Previously at Mendable" }, { "name": "Caleb Peffer", "role": "Co-founder", "background": "Previously at Mendable" } ] }, "expiresAt": "2024-12-15T00:00:00.000Z", "creditsUsed": 15 } ``` ## Providing URLs (Optional) You can optionally provide URLs to focus the agent on specific pages: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.agent( urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt="Compare the features and pricing information from these pages" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ urls: ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt: "Compare the features and pricing information from these pages" }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://docs.firecrawl.dev", "https://firecrawl.dev/pricing" ], "prompt": "Compare the features and pricing information from these pages" }' ``` </CodeGroup> ## Job Status and Completion Agent jobs run asynchronously. When you submit a job, you'll receive a Job ID that you can use to check status: * Default method: `agent()` waits and returns final results * Start then poll: Use `start_agent` (Python) or `startAgent` (Node) to get a Job ID immediately, then poll with `get_agent_status` / `getAgentStatus` <Note>Job results are available via the API for 24 hours after completion. After this period, you can still view your agent history and results in the [activity logs](https://www.firecrawl.dev/app/logs).</Note> <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Start an agent job agent_job = app.start_agent( prompt="Find the founders of Firecrawl" ) # Check the status status = app.get_agent_status(agent_job.id) print(status) # Example output: # status='completed' # success=True # data={ ... } # expires_at=datetime.datetime(...) # credits_used=15 ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Start an agent job const started = await firecrawl.startAgent({ prompt: "Find the founders of Firecrawl" }); // Check the status if (started.id) { const status = await firecrawl.getAgentStatus(started.id); console.log(status.status, status.data); } ``` ```bash cURL theme={null} curl -X GET "https://api.firecrawl.dev/v2/agent/<jobId>" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` </CodeGroup> ### Possible States | Status | Description | | ------------ | ------------------------------------------ | | `processing` | The agent is still working on your request | | `completed` | Extraction finished successfully | | `failed` | An error occurred during extraction | | `cancelled` | The job was cancelled by the user | #### Pending Example ```json JSON theme={null} { "success": true, "status": "processing", "expiresAt": "2024-12-15T00:00:00.000Z" } ``` #### Completed Example ```json JSON theme={null} { "success": true, "status": "completed", "data": { "founders": [ { "name": "Eric Ciarla", "role": "Co-founder" }, { "name": "Nicolas Camara", "role": "Co-founder" }, { "name": "Caleb Peffer", "role": "Co-founder" } ] }, "expiresAt": "2024-12-15T00:00:00.000Z", "creditsUsed": 15 } ``` ## Share agent runs You can share agent runs directly from the Agent playground. Shared links are public — anyone with the link can view the run output and activity — and you can revoke access at any time to disable the link. Shared pages are not indexed by search engines. ## Model Selection Firecrawl Agent offers two models. Spark 1 Mini is 60% cheaper and is the default — perfect for most use cases. Upgrade to Spark 1 Pro when you need maximum accuracy on complex tasks. | Model | Cost | Accuracy | Best For | | -------------- | --------------- | -------- | ------------------------------------- | | `spark-1-mini` | 60% cheaper | Standard | Most tasks (default) | | `spark-1-pro` | Standard | Higher | Complex research, critical extraction | <Tip> Start with Spark 1 Mini (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical. </Tip> ### Spark 1 Mini (Default) `spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks. Use Mini when: * Extracting simple data points (contact info, pricing, etc.) * Working with well-structured websites * Cost efficiency is a priority * Running high-volume extraction jobs ### Spark 1 Pro `spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks. Use Pro when: * Performing complex competitive analysis * Extracting data that requires deep reasoning * Accuracy is critical for your use case * Dealing with ambiguous or hard-to-find data ### Specifying a Model Pass the `model` parameter to select which model to use: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Using Spark 1 Mini (default - can be omitted) result = app.agent( prompt="Find the pricing of Firecrawl", model="spark-1-mini" ) # Using Spark 1 Pro for complex tasks result = app.agent( prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Using Spark 1 Mini (default - can be omitted) const result = await firecrawl.agent({ prompt: "Find the pricing of Firecrawl", model: "spark-1-mini" }); // Using Spark 1 Pro for complex tasks const resultPro = await firecrawl.agent({ prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model: "spark-1-pro" }); console.log(result.data); ``` ```bash cURL theme={null} # Using Spark 1 Mini (default) curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the pricing of Firecrawl", "model": "spark-1-mini" }' # Using Spark 1 Pro for complex tasks curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", "model": "spark-1-pro" }' ``` </CodeGroup> ## Parameters | Parameter | Type | Required | Description | | ------------ | ------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `prompt` | string | Yes | Natural language description of the data you want to extract (max 10,000 characters) | | `model` | string | No | Model to use: `spark-1-mini` (default) or `spark-1-pro` | | `urls` | array | No | Optional list of URLs to focus the extraction | | `schema` | object | No | Optional JSON schema for structured output | | `maxCredits` | number | No | Maximum number of credits to spend on this agent task. Defaults to 2,500 if not set. The dashboard supports values up to 2,500; for higher limits, set `maxCredits` via the API (values above 2,500 are always treated as paid requests). If the limit is reached, the job fails and no data is returned, though credits consumed for work performed are still charged. | ## Agent vs Extract: What's Improved | Feature | Agent (New) | Extract | | ----------------- | ----------- | -------- | | URLs Required | No | Yes | | Speed | Faster | Standard | | Cost | Lower | Standard | | Reliability | Higher | Standard | | Query Flexibility | High | Moderate | ## Example Use Cases * Research: "Find the top 5 AI startups and their funding amounts" * Competitive Analysis: "Compare pricing plans between Slack and Microsoft Teams" * Data Gathering: "Extract contact information from company websites" * Content Summarization: "Summarize the latest blog posts about web scraping" ## CSV Upload in Agent Playground The [Agent Playground](https://www.firecrawl.dev/app/agent) supports CSV upload for batch processing. Your CSV can contain one or more columns of input data. For example, a single column of company names, or multiple columns such as company name, product, and website URL. Each row represents one item for the agent to process. Upload your CSV, then add output columns using the "+" button in the grid header. Each column has its own prompt — click a column header to describe what the agent should find for that field (e.g., "CEO or founder name", "Total funding raised"). Hit Run, and the agent processes each row in parallel, filling in the results. ## API Reference Check out the [Agent API Reference](/api-reference/endpoint/agent) for more details. Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com). ## Pricing Firecrawl Agent uses dynamic billing that scales with the complexity of your data extraction request. You pay based on the actual work Agent performs, ensuring fair pricing whether you're extracting simple data points or complex structured information from multiple sources. ### How Agent pricing works Agent pricing is dynamic and credit-based during Research Preview: * Simple extractions (like contact info from a single page) typically use fewer credits and cost less * Complex research tasks (like competitive analysis across multiple domains) use more credits but reflect the total effort involved * Transparent usage shows you exactly how many credits each request consumed * Credit conversion automatically converts agent credit usage to credits for easy billing <Info> Credit usage varies based on the complexity of your prompt, the amount of data processed, and the structure of the output requested. As a rough guide, most agent runs consume a few hundred credits, though simpler single-page tasks may use less and complex multi-domain research may use more. </Info> ### Parallel Agents Pricing If you are running multiple agents in parallel with Spark-1 Fast, pricing is a lot more predictable at 10 credits per cell. ### Getting started All users receive 5 free daily runs, which can be used from either the playground or the API, to explore Agent's capabilities without any cost. Additional usage is billed based on credit consumption and converted to credits. ### Managing costs Agent can be expensive, but there are some ways to decrease the cost: * Start with free runs: Use your 5 daily free requests to understand pricing * Set a `maxCredits` parameter: Limit your spending by setting a maximum number of credits you're willing to spend. The dashboard caps this at 2,500 credits; to set a higher limit, use the `maxCredits` parameter directly via the API (note: values above 2,500 are always billed as paid requests) * Optimize prompts: More specific prompts often use fewer credits * Break large tasks into smaller runs: A single agent run has an output ceiling based on the underlying model's generation capacity (\~150-200 rows of structured data). For large extraction jobs, split by category, region, or URL batch (3-5 URLs per run) and merge the results. This also keeps each run well under the `maxCredits` limit. * Monitor usage: Track your consumption through the dashboard * Set expectations: Complex multi-domain research will use more credits than simple single-page extractions Try Agent now at [firecrawl.dev/app/agent](https://www.firecrawl.dev/app/agent) to see how credit usage scales with your specific use cases. <Note> Pricing is subject to change as we move from Research Preview to general availability. Current users will receive advance notice of any pricing updates. </Note> > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Batch Scrape Source: https://docs.firecrawl.dev/features/batch-scrape Scrape multiple URLs in a single batch job Batch scrape lets you scrape multiple URLs in a single job. Pass a list of URLs and optional parameters, and Firecrawl processes them concurrently and returns all results together. * Works like `/crawl` but for an explicit list of URLs * Synchronous and asynchronous modes * Supports all scrape options including structured extraction * Configurable concurrency per job ## How it works You can run a batch scrape in two ways: | Mode | SDK method (JS / Python) | Behavior | | ------------ | ----------------------------------------- | ---------------------------------------------------------------- | | Synchronous | `batchScrape` / `batch_scrape` | Starts the batch and waits for completion, returning all results | | Asynchronous | `startBatchScrape` / `start_batch_scrape` | Starts the batch and returns a job ID for polling or webhooks | ## Basic usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") start = firecrawl.start_batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"]) # returns id job = firecrawl.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"], poll_interval=2, wait_timeout=120) print(job.status, job.completed, job.total) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Start a batch scrape job const { id } = await firecrawl.startBatchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev' ], { options: { formats: ['markdown'] }, }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev' ], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 }); console.log(job.status, job.completed, job.total); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"], "formats": ["markdown"] }' ``` </CodeGroup> ### Response Calling `batchScrape` / `batch_scrape` returns the full results when the batch completes. ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` Calling `startBatchScrape` / `start_batch_scrape` returns a job ID you can track via `getBatchScrapeStatus` / `get_batch_scrape_status`, the API endpoint `/batch/scrape/{id}`, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the [activity logs](https://www.firecrawl.dev/app/logs). ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Concurrency By default, a batch scrape job uses your team's full concurrent browser limit (see [Rate Limits](/rate-limits)). You can lower this per job with the `maxConcurrency` parameter. For example, `maxConcurrency: 50` limits that job to 50 simultaneous scrapes. Setting this value too low on large batches will significantly slow down processing, so only reduce it if you need to leave capacity for other concurrent jobs. ## Structured extraction You can use batch scrape to extract structured data from every page in the batch. This is useful when you want the same schema applied to a list of URLs. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Scrape multiple websites: batch_scrape_result = firecrawl.batch_scrape( ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], formats=[{ 'type': 'json', 'prompt': 'Extract the title and description from the page.', 'schema': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'description': {'type': 'string'} }, 'required': ['title', 'description'] } }] ) print(batch_scrape_result) # Or, you can use the start method: batch_scrape_job = firecrawl.start_batch_scrape( ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], formats=[{ 'type': 'json', 'prompt': 'Extract the title and description from the page.', 'schema': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'description': {'type': 'string'} }, 'required': ['title', 'description'] } }] ) print(batch_scrape_job) # You can then use the job ID to check the status of the batch scrape: batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id) print(batch_scrape_status) ``` ```js Node theme={null} import Firecrawl, { ScrapeResponse } from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); // Define schema to extract contents into const schema = { type: "object", properties: { title: { type: "string" }, description: { type: "string" } }, required: ["title", "description"] }; // Scrape multiple websites (synchronous): const batchScrapeResult = await firecrawl.batchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], { formats: [ { type: "json", prompt: "Extract the title and description from the page.", schema: schema } ] }); // Output all the results of the batch scrape: console.log(batchScrapeResult) // Or, you can use the start method: const batchScrapeJob = await firecrawl.startBatchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], { formats: [ { type: "json", prompt: "Extract the title and description from the page.", schema: schema } ] }); console.log(batchScrapeJob) // You can then use the job ID to check the status of the batch scrape: const batchScrapeStatus = await firecrawl.getBatchScrapeStatus(batchScrapeJob.id); console.log(batchScrapeStatus) ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"], "formats" : [{ "type": "json", "prompt": "Extract the title and description from the page.", "schema": { "type": "object", "properties": { "title": { "type": "string" }, "description": { "type": "string" } }, "required": [ "title", "description" ] } }] }' ``` </CodeGroup> ### Response `batchScrape` / `batch_scrape` returns full results: ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "json": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot." } }, ... ] } ``` `startBatchScrape` / `start_batch_scrape` returns a job ID: ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Webhooks You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This lets you process results immediately instead of waiting for the entire batch to complete. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Event types | Event | Description | | ------------------------ | ------------------------------------- | | `batch_scrape.started` | The batch scrape job has begun | | `batch_scrape.page` | A single URL was successfully scraped | | `batch_scrape.completed` | All URLs have been processed | | `batch_scrape.failed` | The batch scrape encountered an error | ### Payload Each webhook delivery includes a JSON body with the following structure: ```json theme={null} { "success": true, "type": "batch_scrape.page", "id": "batch-job-id", "data": [...], "metadata": {}, "error": null } ``` ### Verifying webhook signatures Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with. 1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings 2. Extract the signature from the `X-Firecrawl-Signature` header 3. Compute HMAC-SHA256 of the raw request body using your secret 4. Compare with the signature header using a timing-safe function <Warning> Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...` </Warning> For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Change Tracking Source: https://docs.firecrawl.dev/features/change-tracking Detect and monitor changes in web content between scrapes <img alt="Change Tracking" /> Change tracking compares the current content of a page against the last time you scraped it. Add `changeTracking` to your `formats` array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed. * Works with `/scrape`, `/crawl`, and `/batch/scrape` * Two diff modes: `git-diff` for line-level changes, `json` for field-level comparison * Scoped to your team, and optionally scoped to a tag that you pass in ## How it works Every scrape with `changeTracking` enabled stores a snapshot and compares it against the previous snapshot for that URL. Snapshots are stored persistently and do not expire, so comparisons remain accurate regardless of how much time has passed between scrapes. | Scrape | Result | | ----------------- | -------------------------------------------------- | | First time | `changeStatus: "new"` (no previous version exists) | | Content unchanged | `changeStatus: "same"` | | Content modified | `changeStatus: "changed"` (diff data available) | | Page removed | `changeStatus: "removed"` | The response includes these fields in the `changeTracking` object: | Field | Type | Description | | ------------------ | --------------------- | ---------------------------------------------------------------------------------------------- | | `previousScrapeAt` | `string \| null` | Timestamp of the previous scrape (`null` on first scrape) | | `changeStatus` | `string` | `"new"`, `"same"`, `"changed"`, or `"removed"` | | `visibility` | `string` | `"visible"` (discoverable via links/sitemap) or `"hidden"` (URL works but is no longer linked) | | `diff` | `object \| undefined` | Line-level diff (only present in `git-diff` mode when status is `"changed"`) | | `json` | `object \| undefined` | Field-level comparison (only present in `json` mode when status is `"changed"`) | ## Basic usage Include both `markdown` and `changeTracking` in the `formats` array. The `markdown` format is required because change tracking compares pages via their markdown content. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") result = firecrawl.scrape( "https://example.com/pricing", formats=["markdown", "changeTracking"] ) print(result.changeTracking) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://example.com/pricing', { formats: ['markdown', 'changeTracking'] }); console.log(result.changeTracking); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": ["markdown", "changeTracking"] }' ``` </CodeGroup> ### Response On the first scrape, `changeStatus` is `"new"` and `previousScrapeAt` is `null`: ```json theme={null} { "success": true, "data": { "markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...", "changeTracking": { "previousScrapeAt": null, "changeStatus": "new", "visibility": "visible" } } } ``` On subsequent scrapes, `changeStatus` reflects whether content changed: ```json theme={null} { "success": true, "data": { "markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...", "changeTracking": { "previousScrapeAt": "2025-06-01T10:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible" } } } ``` ## Git-diff mode The `git-diff` mode returns line-by-line changes in a format similar to `git diff`. Pass an object in the `formats` array with `modes: ["git-diff"]`: <CodeGroup> ```python Python theme={null} result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] ) if result.changeTracking.changeStatus == "changed": print(result.changeTracking.diff.text) ``` ```js Node theme={null} const result = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', modes: ['git-diff'] } ] }); if (result.changeTracking.changeStatus === 'changed') { console.log(result.changeTracking.diff.text); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }' ``` </CodeGroup> ### Response The `diff` object contains both a plain-text diff and a structured JSON representation: ```json theme={null} { "changeTracking": { "previousScrapeAt": "2025-06-01T10:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "diff": { "text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo", "json": { "files": [{ "chunks": [{ "content": "@@ -1,3 +1,3 @@", "changes": [ { "type": "normal", "content": "# Pricing" }, { "type": "del", "ln": 2, "content": "Starter: $9/mo" }, { "type": "del", "ln": 3, "content": "Pro: $29/mo" }, { "type": "add", "ln": 2, "content": "Starter: $12/mo" }, { "type": "add", "ln": 3, "content": "Pro: $39/mo" } ] }] }] } } } } ``` The structured `diff.json` object contains: * `files`: array of changed files (typically one for web pages) * `chunks`: sections of changes within a file * `changes`: individual line changes with `type` (`"add"`, `"del"`, or `"normal"`), line number (`ln`), and `content` ## JSON mode The `json` mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff. Pass `modes: ["json"]` with a `schema` defining the fields to extract: <CodeGroup> ```python Python theme={null} result = firecrawl.scrape( "https://example.com/product/widget", formats=[ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "price": { "type": "string" }, "availability": { "type": "string" } } } } ] ) if result.changeTracking.changeStatus == "changed": changes = result.changeTracking.json print(f"Price: {changes['price']['previous']} → {changes['price']['current']}") ``` ```js Node theme={null} const result = await firecrawl.scrape('https://example.com/product/widget', { formats: [ 'markdown', { type: 'changeTracking', modes: ['json'], schema: { type: 'object', properties: { price: { type: 'string' }, availability: { type: 'string' } } } } ] }); if (result.changeTracking.changeStatus === 'changed') { const changes = result.changeTracking.json; console.log(`Price: ${changes.price.previous} → ${changes.price.current}`); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/product/widget", "formats": [ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "price": { "type": "string" }, "availability": { "type": "string" } } } } ] }' ``` </CodeGroup> ### Response Each field in the schema is returned with `previous` and `current` values: ```json theme={null} { "changeTracking": { "previousScrapeAt": "2025-06-05T08:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "json": { "price": { "previous": "$19.99", "current": "$24.99" }, "availability": { "previous": "In Stock", "current": "In Stock" } } } } ``` You can also pass an optional `prompt` to guide the LLM extraction alongside the schema. <Info> JSON mode uses LLM extraction and costs 5 credits per page. Basic change tracking and `git-diff` mode have no additional cost. </Info> ## Using tags By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain separate tracking histories for the same URL, which is useful when you monitor the same page at different intervals or in different contexts. <CodeGroup> ```python Python theme={null} # Hourly monitoring (compared against last "hourly" scrape) result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "tag": "hourly" } ] ) # Daily summary (compared against last "daily" scrape) result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "tag": "daily" } ] ) ``` ```js Node theme={null} // Hourly monitoring (compared against last "hourly" scrape) const result = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', tag: 'hourly' } ] }); // Daily summary (compared against last "daily" scrape) const result2 = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', tag: 'daily' } ] }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "tag": "hourly" } ] }' ``` </CodeGroup> ## Crawl with change tracking Add change tracking to crawl operations to monitor an entire site for changes. Pass the `changeTracking` format inside `scrapeOptions`: <CodeGroup> ```python Python theme={null} result = firecrawl.crawl( "https://example.com", limit=50, scrape_options={ "formats": ["markdown", "changeTracking"] } ) for page in result.data: status = page.changeTracking.changeStatus url = page.metadata.url print(f"{url}: {status}") ``` ```js Node theme={null} const result = await firecrawl.crawl('https://example.com', { limit: 50, scrapeOptions: { formats: ['markdown', 'changeTracking'] } }); for (const page of result.data) { console.log(`${page.metadata.url}: ${page.changeTracking.changeStatus}`); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "scrapeOptions": { "formats": ["markdown", "changeTracking"] } }' ``` </CodeGroup> ## Batch scrape with change tracking Use [batch scrape](/features/batch-scrape) to monitor a specific set of URLs: <CodeGroup> ```python Python theme={null} result = firecrawl.batch_scrape( [ "https://example.com/pricing", "https://example.com/product/widget", "https://example.com/blog/latest" ], formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}] ) ``` ```js Node theme={null} const result = await firecrawl.batchScrape([ 'https://example.com/pricing', 'https://example.com/product/widget', 'https://example.com/blog/latest' ], { options: { formats: ['markdown', { type: 'changeTracking', modes: ['git-diff'] }] } }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example.com/pricing", "https://example.com/product/widget", "https://example.com/blog/latest" ], "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }' ``` </CodeGroup> ## Scheduling change tracking Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools. ### Cron job Create a script that scrapes a URL and alerts on changes: ```bash check-pricing.sh theme={null} #!/bin/bash RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://competitor.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "starter_price": { "type": "string" }, "pro_price": { "type": "string" } } } } ] }') STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus') if [ "$STATUS" = "changed" ]; then echo "$RESPONSE" | jq '.data.changeTracking.json' # Send alert via email, Slack, etc. fi ``` Schedule it with `crontab -e`: ```bash theme={null} 0 /6 * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1 ``` | Schedule | Expression | | ------------------------ | ------------- | | Every hour | `0 * * * ` | | Every 6 hours | `0 /6 * * ` | | Daily at 9 AM | `0 9 * ` | | Weekly on Monday at 8 AM | `0 8 * 1` | ### Cloud and serverless schedulers * AWS: EventBridge rule triggering a Lambda function * GCP: Cloud Scheduler triggering a Cloud Function * Vercel / Netlify: Cron-triggered serverless functions * GitHub Actions: Scheduled workflows with `schedule` and `cron` trigger ### Workflow automation No-code platforms like n8n, Zapier, and Make can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the [workflow automation guides](/developer-guides/workflow-automation/n8n). ## Webhooks For async operations like crawl and batch scrape, use [webhooks](/webhooks/overview) to receive change tracking results as they arrive instead of polling. <CodeGroup> ```python Python theme={null} job = firecrawl.start_crawl( "https://example.com", limit=50, scrape_options={ "formats": [ "markdown", {"type": "changeTracking", "modes": ["git-diff"]} ] }, webhook={ "url": "https://your-server.com/firecrawl-webhook", "headers": {"Authorization": "Bearer your-webhook-secret"}, "events": ["crawl.page", "crawl.completed"] } ) ``` ```js Node theme={null} const { id } = await firecrawl.startCrawl('https://example.com', { limit: 50, scrapeOptions: { formats: [ 'markdown', { type: 'changeTracking', modes: ['git-diff'] } ] }, webhook: { url: 'https://your-server.com/firecrawl-webhook', headers: { Authorization: 'Bearer your-webhook-secret' }, events: ['crawl.page', 'crawl.completed'] } }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "scrapeOptions": { "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }, "webhook": { "url": "https://your-server.com/firecrawl-webhook", "headers": { "Authorization": "Bearer your-webhook-secret" }, "events": ["crawl.page", "crawl.completed"] } }' ``` </CodeGroup> The `crawl.page` event payload includes the `changeTracking` object for each page: ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [{ "markdown": "# Pricing\n\nStarter: $12/mo...", "metadata": { "title": "Pricing", "url": "https://example.com/pricing", "statusCode": 200 }, "changeTracking": { "previousScrapeAt": "2025-06-05T12:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "diff": { "text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo" } } }] } ``` For webhook configuration details (headers, metadata, events, retries, signature verification), see the [Webhooks documentation](/webhooks/overview). ## Configuration reference The full set of options available when passing a `changeTracking` format object: | Parameter | Type | Default | Description | | --------- | ---------- | ---------- | ----------------------------------------------------------------- | | `type` | `string` | (required) | Must be `"changeTracking"` | | `modes` | `string[]` | `[]` | Diff modes to enable: `"git-diff"`, `"json"`, or both | | `schema` | `object` | (none) | JSON Schema for field-level comparison (required for `json` mode) | | `prompt` | `string` | (none) | Custom prompt to guide LLM extraction (used with `json` mode) | | `tag` | `string` | `null` | Separate tracking history identifier | ### Data models <CodeGroup> ```ts TypeScript theme={null} interface ChangeTrackingResult { previousScrapeAt: string | null; changeStatus: "new" | "same" | "changed" | "removed"; visibility: "visible" | "hidden"; diff?: { text: string; json: { files: Array<{ from: string | null; to: string | null; chunks: Array<{ content: string; changes: Array<{ type: "add" | "del" | "normal"; ln?: number; ln1?: number; ln2?: number; content: string; }>; }>; }>; }; }; json?: Record<string, { previous: any; current: any }>; } ``` ```python Python theme={null} class ChangeTrackingData(BaseModel): previous_scrape_at: Optional[str] = None change_status: str # "new" | "same" | "changed" | "removed" visibility: str # "visible" | "hidden" diff: Optional[Dict[str, Any]] = None json: Optional[Dict[str, Any]] = None ``` </CodeGroup> ## Important details <Warning> The `markdown` format must always be included alongside `changeTracking`. Change tracking compares pages via their markdown content. </Warning> * Snapshot retention: Snapshots are stored persistently and do not expire. A scrape performed months after the previous one will still compare correctly against the earlier snapshot. * Scoping: Comparisons are scoped to your team. Your first scrape of any URL returns `"new"`, even if other users have scraped it. * URL matching: Previous scrapes are matched on exact source URL, team ID, `markdown` format, and `tag`. Keep URLs consistent between scrapes. * Parameter consistency: Using different `includeTags`, `excludeTags`, or `onlyMainContent` settings across scrapes of the same URL produces unreliable comparisons. * Comparison algorithm: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization. * Caching: Requests with `changeTracking` bypass the index cache. The `maxAge` parameter is ignored. * Error handling: Monitor the `warning` field in responses and handle the `changeTracking` object potentially being absent (this can occur if the database lookup for the previous scrape times out). ## Billing | Mode | Cost | | --------------------- | --------------------------------------- | | Basic change tracking | No extra cost (standard scrape credits) | | `git-diff` mode | No extra cost | | `json` mode | 5 credits per page | > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Crawl Source: https://docs.firecrawl.dev/features/crawl Recursively crawl a website and get content from every page Crawl submits a URL to Firecrawl and recursively discovers and scrapes every reachable subpage. It handles sitemaps, JavaScript rendering, and rate limits automatically, returning clean markdown or structured data for each page. * Discovers pages via sitemap and recursive link traversal * Supports path filtering, depth limits, and subdomain/external link control * Returns results via polling, WebSocket, or webhook <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=crawl"> Test crawling in the interactive playground — no code required. </Card> ## Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ## Basic usage Submit a crawl job by calling `POST /v2/crawl` with a starting URL. The endpoint returns a job ID that you use to poll for results. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") docs = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=10) print(docs) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const docs = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 10 }); console.log(docs); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ```bash CLI theme={null} # Start a crawl job (returns job ID) firecrawl crawl https://firecrawl.dev # Wait for completion with progress firecrawl crawl https://firecrawl.dev --wait --progress --limit 100 ``` </CodeGroup> <Info> Each page crawled consumes 1 credit. The default crawl `limit` is 10,000 pages. Before starting, the crawl endpoint checks that your remaining credits can cover the `limit` — if not, it returns a 402 (Payment Required) error. Set a lower `limit` to match your intended crawl size (e.g. `limit: 100`) to avoid this. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page. </Info> ### Scrape options All options from the [Scrape endpoint](/api-reference/endpoint/scrape) are available in crawl via `scrapeOptions` (JS) / `scrape_options` (Python). These apply to every page the crawler scrapes, including formats, proxy, caching, actions, location, and tags. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') # Crawl with scrape options response = firecrawl.crawl('https://example.com', limit=100, scrape_options={ 'formats': [ 'markdown', { 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'title': { 'type': 'string' } } } } ], 'proxy': 'auto', 'max_age': 600000, 'only_main_content': True } ) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }); // Crawl with scrape options const crawlResponse = await firecrawl.crawl('https://example.com', { limit: 100, scrapeOptions: { formats: [ 'markdown', { type: 'json', schema: { type: 'object', properties: { title: { type: 'string' } } }, }, ], proxy: 'auto', maxAge: 600000, onlyMainContent: true, }, }); ``` </CodeGroup> ## Checking crawl status Use the job ID to poll for the crawl status and retrieve results. <CodeGroup> ```python Python theme={null} status = firecrawl.get_crawl_status("<crawl-id>") print(status) ``` ```js Node theme={null} const status = await firecrawl.getCrawlStatus("<crawl-id>"); console.log(status); ``` ```bash cURL theme={null} # After starting a crawl, poll status by jobId curl -s -X GET "https://api.firecrawl.dev/v2/crawl/<jobId>" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # Check crawl status using job ID firecrawl crawl <job-id> ``` </CodeGroup> <Note> Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the [activity logs](https://www.firecrawl.dev/app/logs). </Note> <Note> Pages in the crawl results `data` array are pages that Firecrawl successfully scraped, even if the target site returned an HTTP error like 404. The `metadata.statusCode` field shows the HTTP status code from the target site. To retrieve pages that Firecrawl itself failed to scrape (e.g. network errors, timeouts, or robots.txt blocks), use the dedicated [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint (`GET /crawl/{id}/errors`). </Note> ### Response handling The response varies based on the crawl's status. For incomplete or large responses exceeding 10MB, a `next` URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the `next` parameter is absent, it indicates the end of the crawl data. <Info> The `skip` and `next` parameters are only relevant when hitting the API directly. If you're using the SDK, pagination is handled automatically and all results are returned at once. </Info> <CodeGroup> ```json Scraping theme={null} { "status": "scraping", "total": 36, "completed": 10, "creditsUsed": 10, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=10", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` </CodeGroup> ## SDK methods There are two ways to use crawl with the SDK. ### Crawl and wait The `crawl` method waits for the crawl to complete and returns the full response. It handles pagination automatically. This is recommended for most use cases. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from firecrawl.types import ScrapeOptions firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Crawl a website: crawl_status = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options=ScrapeOptions(formats=['markdown', 'html']), poll_interval=30 ) print(crawl_status) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); const crawlResponse = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown', 'html'], } }) console.log(crawlResponse) ``` </CodeGroup> The response includes the crawl status and all scraped data: <CodeGroup> ```bash Python theme={null} success=True status='completed' completed=100 total=100 creditsUsed=100 expiresAt=datetime.datetime(2025, 4, 23, 19, 21, 17, tzinfo=TzInfo(UTC)) next=None data=[ Document( markdown='[Day 7 - Launch Week III.Integrations DayApril 14th to 20th](...', metadata={ 'title': '15 Python Web Scraping Projects: From Beginner to Advanced', ... 'scrapeId': '97dcf796-c09b-43c9-b4f7-868a7a5af722', 'sourceURL': 'https://www.firecrawl.dev/blog/python-web-scraping-projects', 'url': 'https://www.firecrawl.dev/blog/python-web-scraping-projects', 'statusCode': 200 } ), ... ] ``` ```json Node theme={null} { success: true, status: "completed", completed: 100, total: 100, creditsUsed: 100, expiresAt: "2025-04-23T19:28:45.000Z", data: [ { markdown: "[Day 7 - Launch Week III.Integrations DayApril ...", html: `<!DOCTYPE html><html lang="en" class="light" style="color...`, metadata: [Object], }, ... ] } ``` </CodeGroup> ### Start and check later The `startCrawl` / `start_crawl` method returns immediately with a crawl ID. You then poll for status manually. This is useful for long-running crawls or custom polling logic. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") job = firecrawl.start_crawl(url="https://docs.firecrawl.dev", limit=10) print(job) # Check the status of the crawl status = firecrawl.get_crawl_status(job.id) print(status) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const { id } = await firecrawl.startCrawl('https://docs.firecrawl.dev', { limit: 10 }); console.log(id); // Check the status of the crawl const status = await firecrawl.getCrawlStatus(id); console.log(status); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ```bash CLI theme={null} # Start crawl (async, returns job ID immediately) firecrawl crawl https://firecrawl.dev --limit 100 # Then check status later firecrawl crawl <job-id> ``` </CodeGroup> The initial response returns the job ID: ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/crawl/123-456-789" } ``` ## Real-time results with WebSocket The watcher method provides real-time updates as pages are crawled. Start a crawl, then subscribe to events for immediate data processing. <CodeGroup> ```python Python theme={null} import asyncio from firecrawl import AsyncFirecrawl async def main(): firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY") # Start a crawl first started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5) # Watch updates (snapshots) until terminal status async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120): if snapshot.status == "completed": print("DONE", snapshot.status) for doc in snapshot.data: print("DOC", doc.metadata.source_url if doc.metadata else None) elif snapshot.status == "failed": print("ERR", snapshot.status) else: print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total) asyncio.run(main()) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start a crawl and then watch it const { id } = await firecrawl.startCrawl('https://mendable.ai', { excludePaths: ['blog/'], limit: 5, }); const watcher = firecrawl.watcher(id, { kind: 'crawl', pollInterval: 2, timeout: 120 }); watcher.on('document', (doc) => { console.log('DOC', doc); }); watcher.on('error', (err) => { console.error('ERR', err?.error || err); }); watcher.on('done', (state) => { console.log('DONE', state.status); }); // Begin watching (WS with HTTP fallback) await watcher.start(); ``` </CodeGroup> ## Webhooks You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they are scraped instead of waiting for the entire crawl to complete. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Event types | Event | Description | | ----------------- | ---------------------------------------- | | `crawl.started` | Fires when the crawl begins | | `crawl.page` | Fires for each page successfully scraped | | `crawl.completed` | Fires when the crawl finishes | | `crawl.failed` | Fires if the crawl encounters an error | ### Payload ```json theme={null} { "success": true, "type": "crawl.page", "id": "crawl-job-id", "data": [...], // Page data for 'page' events "metadata": {}, // Your custom metadata "error": null } ``` ### Verifying webhook signatures Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with. 1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings 2. Extract the signature from the `X-Firecrawl-Signature` header 3. Compute HMAC-SHA256 of the raw request body using your secret 4. Compare with the signature header using a timing-safe function <Warning> Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...` </Warning> For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, payload structure, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview). ## Configuration reference The full set of parameters available when submitting a crawl job: | Parameter | Type | Default | Description | | ----------------------- | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `url` | `string` | (required) | The starting URL to crawl from | | `limit` | `integer` | `10000` | Maximum number of pages to crawl | | `maxDiscoveryDepth` | `integer` | (none) | Maximum depth from the root URL based on link-discovery hops, not the number of `/` segments in the URL. Each time a new URL is found on a page, it is assigned a depth one higher than the page it was discovered on. The root site and sitemapped pages have a discovery depth of 0. Pages at the max depth are still scraped, but links on them are not followed. | | `includePaths` | `string[]` | (none) | URL pathname regex patterns to include. Only matching paths are crawled. | | `excludePaths` | `string[]` | (none) | URL pathname regex patterns to exclude from the crawl | | `regexOnFullURL` | `boolean` | `false` | Match `includePaths`/`excludePaths` against the full URL (including query parameters) instead of just the pathname | | `crawlEntireDomain` | `boolean` | `false` | Follow internal links to sibling or parent URLs, not just child paths | | `allowSubdomains` | `boolean` | `false` | Follow links to subdomains of the main domain | | `allowExternalLinks` | `boolean` | `false` | Follow links to external websites | | `sitemap` | `string` | `"include"` | Sitemap handling: `"include"` (default), `"skip"`, or `"only"` | | `ignoreQueryParameters` | `boolean` | `false` | Avoid re-scraping the same path with different query parameters | | `ignoreRobotsTxt` | `boolean` | `false` | Ignore the website's robots.txt rules. Enterprise only — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. | | `robotsUserAgent` | `string` | (none) | Custom User-Agent string for robots.txt evaluation. When set, robots.txt is fetched with this User-Agent and rules are matched against it instead of the default. Enterprise only — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. | | `delay` | `number` | (none) | Delay in seconds between scrapes to respect rate limits. Setting this forces concurrency to 1. | | `maxConcurrency` | `integer` | (none) | Maximum concurrent scrapes. Defaults to your team's concurrency limit. | | `scrapeOptions` | `object` | (none) | Options applied to every scraped page (formats, proxy, caching, actions, etc.) | | `webhook` | `object` | (none) | Webhook configuration for real-time notifications | | `prompt` | `string` | (none) | Natural language prompt to generate crawl options. Explicitly set parameters override generated equivalents. | ## Important details <Warning> By default, crawl ignores sublinks that are not children of the URL you provide. For example, `website.com/other-parent/blog-1` would not be returned if you crawled `website.com/blogs/`. Use the `crawlEntireDomain` parameter to include sibling and parent paths. To crawl subdomains like `blog.website.com` when crawling `website.com`, use the `allowSubdomains` parameter. </Warning> Sitemap discovery: By default, the crawler includes the website's sitemap to discover URLs (`sitemap: "include"`). If you set `sitemap: "skip"`, only pages reachable through HTML links from the root URL are found. Assets like PDFs or deeply nested pages listed in the sitemap but not directly linked from HTML will be missed. For maximum coverage, keep the default setting. * Credit usage: Each page crawled costs 1 credit. JSON mode adds 4 credits per page, enhanced proxy adds 4 credits per page, and PDF parsing costs 1 credit per PDF page. * Result expiration: Job results are available via the API for 24 hours after completion. After that, view results in the [activity logs](https://www.firecrawl.dev/app/logs). * Crawl errors: The `data` array contains pages Firecrawl successfully scraped. Use the [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint to retrieve pages that failed due to network errors, timeouts, or robots.txt blocks. * Non-deterministic results: Crawl results may vary between runs of the same configuration. Pages are scraped concurrently, so the order in which links are discovered depends on network timing and which pages finish loading first. This means different branches of a site may be explored to different extents near the depth boundary, especially at higher `maxDiscoveryDepth` values. To get more deterministic results, set `maxConcurrency` to `1` or use `sitemap: "only"` if the site has a comprehensive sitemap. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Enhanced Mode Source: https://docs.firecrawl.dev/features/enhanced-mode Use enhanced proxies for reliable scraping on complex sites Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. Set the `proxy` parameter to control which proxy strategy is used for a request. ## Proxy types Firecrawl supports three proxy types: | Type | Description | Speed | Cost | | ---------- | ------------------------------------------------------------ | ------ | ----------------------------------------------------------- | | `basic` | Standard proxies suitable for most sites | Fast | 1 credit | | `enhanced` | Enhanced proxies for complex sites | Slower | 5 credits per request | | `auto` | Tries `basic` first, then retries with `enhanced` on failure | Varies | 1 credit if basic succeeds, 5 credits if enhanced is needed | If you do not specify a proxy, Firecrawl defaults to `auto`. ## Basic usage Set the `proxy` parameter to choose a proxy strategy. The following example uses `auto`, which lets Firecrawl decide when to escalate to enhanced proxies. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') # Choose proxy strategy: 'basic' | 'enhanced' | 'auto' doc = firecrawl.scrape('https://example.com', formats=['markdown'], proxy='auto') print(doc.warning or 'ok') ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Choose proxy strategy: 'basic' | 'enhanced' | 'auto' const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], proxy: 'auto' }); console.log(doc.warning || 'ok'); ``` ```bash cURL theme={null} // Choose proxy strategy: 'basic' | 'enhanced' | 'auto' curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "proxy": "auto" }' ``` </CodeGroup> <Info> Enhanced proxy requests cost 5 credits per request. When using `auto`, the 5-credit cost only applies if the basic proxy fails and the enhanced retry succeeds. </Info> > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Faster Scraping Source: https://docs.firecrawl.dev/features/fast-scraping Speed up your scrapes by 500% with the maxAge parameter ## How It Works Firecrawl caches previously scraped pages and, by default, returns a recent copy when available. * Default freshness: `maxAge = 172800000` ms (2 days). If the cached copy is newer than this, it’s returned instantly; otherwise, Firecrawl scrapes fresh and updates the cache. * Force fresh: Set `maxAge: 0` to always scrape. Be aware this bypasses the cache entirely, meaning every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if you don't need real-time content on every request. * Skip caching: Set `storeInCache: false` if you don’t want to store results for a request. Get your results up to 500% faster when you don’t need the absolute freshest data. Control freshness via `maxAge`: 1. Return instantly if we have a recent version of the page 2. Scrape fresh only if our version is older than your specified age 3. Save you time - results come back in milliseconds instead of seconds ## When to Use This Great for: * Documentation, articles, product pages * Bulk processing jobs * Development and testing * Building knowledge bases Skip for: * Real-time data (stock prices, live scores, breaking news) * Frequently updated content * Time-sensitive applications ## Usage Add `maxAge` to your scrape request. Values are in milliseconds (e.g., `3600000` = 1 hour). <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Use cached data if it's less than 1 hour old (3600000 ms) # This can be 500% faster than a fresh scrape! scrape_result = firecrawl.scrape( 'https://firecrawl.dev', formats=['markdown'], max_age=3600000 # 1 hour in milliseconds ) print(scrape_result.markdown) ``` ```javascript JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Use cached data if it's less than 1 hour old (3600000 ms) // This can be 500% faster than a fresh scrape! const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'], maxAge: 3600000 // 1 hour in milliseconds }); console.log(scrapeResult.markdown); ``` </CodeGroup> ## Common maxAge values Here are some helpful reference values: * 5 minutes: `300000` - For semi-dynamic content * 1 hour: `3600000` - For content that updates hourly * 1 day: `86400000` - For daily-updated content * 1 week: `604800000` - For relatively static content ## Performance impact With `maxAge` enabled: * 500% faster response times for recent content * Instant results instead of waiting for fresh scrapes ## Important notes * Default: `maxAge` is `172800000` (2 days) * Fresh when needed: If our data is older than `maxAge`, we scrape fresh automatically * No stale data: You'll never get data older than your specified `maxAge` * Credits: Cached results still cost 1 credit per page. Caching improves speed and latency, not credit usage. ### When caching is bypassed Caching is automatically skipped when your request includes any of the following: * Custom `headers` * `actions` (browser automation steps) * A browser `profile` * `changeTracking` format * Custom `screenshot` viewport or quality settings ### Cache hit matching For a cache hit, these parameters must match exactly between the original and subsequent requests: `url`, `mobile`, `location`, `waitFor`, `blockAds`, `screenshot` (enabled/disabled and full-page), and stealth proxy mode. You can verify cache behavior by checking `metadata.cacheState` in the response — it will be `"hit"` or `"miss"`. ## Faster crawling The same speed benefits apply when crawling multiple pages. Use `maxAge` within `scrapeOptions` to get cached results for pages we’ve seen recently. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from firecrawl.v2.types import ScrapeOptions firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Crawl with cached scraping - 500% faster for pages we've seen recently crawl_result = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options=ScrapeOptions( formats=['markdown'], max_age=3600000 # Use cached data if less than 1 hour old ) ) for page in crawl_result.data: print(f"URL: {page.metadata.source_url}") print(f"Content: {page.markdown[:200]}...") ``` ```javascript JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Crawl with cached scraping - 500% faster for pages we've seen recently const crawlResult = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown'], maxAge: 3600000 // Use cached data if less than 1 hour old } }); crawlResult.data.forEach(page => { console.log(`URL: ${page.metadata.sourceURL}`); console.log(`Content: ${page.markdown.substring(0, 200)}...`); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "limit": 100, "scrapeOptions": { "formats": ["markdown"], "maxAge": 3600000 } }' ``` </CodeGroup> When crawling with `maxAge`, each page in your crawl will benefit from the 500% speed improvement if we have recent cached data for that page. Start using `maxAge` today for dramatically faster scrapes and crawls! > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # JSON mode - Structured result Source: https://docs.firecrawl.dev/features/llm-extract Extract structured data from pages via LLMs <Note> v2 API Change: JSON schema extraction is fully supported in v2, but the API format has changed. In v2, the schema is embedded directly inside the format object as `formats: [{type: "json", schema: {...}}]`. The v1 `jsonOptions` parameter no longer exists in v2. </Note> ## Scrape and extract structured data with Firecrawl Firecrawl uses AI to get structured data from web pages in 3 steps: 1. Set the Schema (optional): Define a JSON schema (using OpenAI's format) to specify the data you want, or just provide a `prompt` if you don't need a strict schema, along with the webpage URL. 2. Make the Request: Send your URL and schema to our scrape endpoint using JSON mode. See how here: [Scrape Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/scrape) 3. Get Your Data: Get back clean, structured data matching your schema that you can use right away. This makes getting web data in the format you need quick and easy. ## Extract structured data ### JSON mode via /scrape Used to extract structured data from scraped pages. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Define schema to extract contents into const schema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", schema: schema }], }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", "supports_sso": true, "is_open_source": true, "is_in_yc": true }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### Structured data without schema You can also extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "prompt": "Extract the company mission from the page." }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", prompt: "Extract the company mission from the page." }] }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the company mission from the page." }] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### Real-world example: Extracting company information Here's a comprehensive example extracting structured company information from a website: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev/', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }] ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const companyInfoSchema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev/", { formats: [{ type: "json", schema: companyInfoSchema }] }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev/", "formats": [{ "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } }] }' ``` </CodeGroup> Output: ```json Output theme={null} { "success": true, "data": { "json": { "company_mission": "Turn websites into LLM-ready data", "supports_sso": true, "is_open_source": true, "is_in_yc": true } } } ``` ### JSON format options When using JSON mode in v2, include an object in `formats` with the schema embedded directly: `formats: [{ type: 'json', schema: { ... }, prompt: '...' }]` Parameters: * `schema`: JSON Schema describing the structured output you want (required for schema-based extraction). * `prompt`: Optional prompt to guide extraction (also used for no-schema extraction). Important: Unlike v1, there is no separate `jsonOptions` parameter in v2. The schema must be included directly inside the format object in the `formats` array. <Note> HTML attributes are not available in JSON extraction. JSON extraction works on the markdown conversion of the page, which only preserves visible text content. HTML attributes (e.g., `data-id`, custom attributes on elements) are stripped during conversion and the LLM cannot see them. If you need to extract HTML attribute values, use `rawHtml` format and parse attributes client-side, or use an `executeJavascript` action to inject attribute values into visible text before extraction. </Note> ## Tips for consistent extraction If you are seeing inconsistent or incomplete results from JSON extraction, these practices can help: * Keep prompts short and focused. Long prompts with many rules increase variability. Move specific constraints (like allowed values) into the schema instead. * Use concise property names. Avoid embedding instructions or enum lists in property names. Use a short key like `"installation_type"` and put allowed values in an `enum` array. * Add `enum` arrays for constrained fields. When a field has a fixed set of values, list them in `enum` and make sure they match the exact text shown on the page. * Include null-handling in field descriptions. Add `"Return null if not found on the page."` to each field's `description` so the model does not guess missing values. * Add location hints. Tell the model where to find data on the page, e.g. `"Flow rate in GPM from the Specifications table."`. * Split large schemas into smaller requests. Schemas with many fields (e.g. 30+) produce less consistent results. Split them into 2–3 requests of 10–15 fields each. * Avoid `minItems`/`maxItems` on arrays. JSON Schema validation keywords like `minItems` and `maxItems` do not control how much content the scraper collects. Setting `minItems: 20` will not make the LLM return more items — it may instead hallucinate entries to satisfy the constraint. Remove these keywords and use a `prompt` instead (e.g. `"Extract ALL reviews from the page. Do not skip any."`) to guide completeness. * Use `"type": "array"` to extract lists of items. If you need to extract multiple items (e.g. a list of people, products, or reviews), wrap them in an array property with an `items` block. Using `"type": "object"` for a list will return only a single item. See the array schema example below. Example of a well-structured schema: ```json theme={null} { "type": "object", "properties": { "product_name": { "type": ["string", "null"], "description": "Full descriptive product name as shown on the page. Return null if not found." }, "installation_type": { "type": ["string", "null"], "description": "Installation type from the Specifications section. Return null if not found.", "enum": ["Deck-mount", "Wall-mount", "Countertop", "Drop-in", "Undermount"] }, "flow_rate_gpm": { "type": ["string", "null"], "description": "Flow rate in GPM from the Specifications section. Return null if not found." } } } ``` Example of extracting a list of items: When a page contains multiple items (e.g. team members, products, reviews), use `"type": "array"` with `"items"` to get the full list: ```json theme={null} { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "role": { "type": "string" }, "department": { "type": "string" } } } } } } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Map Source: https://docs.firecrawl.dev/features/map Input a website and get all the urls on the website - extremely fast ## Introducing /map The easiest way to go from a single url to a map of the entire website. This is extremely useful for: * When you need to prompt the end-user to choose which links to scrape * Need to quickly know the links on a website * Need to scrape pages of a website that are related to a specific topic (use the `search` parameter) * Only need to scrape specific pages of a website <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=map"> Test mapping in the interactive playground — no code required. </Card> ## Mapping ### /map endpoint Used to map a URL and get urls of the website. This returns most links present on the website. URLs are primarily discovered from the website's sitemap, supplemented with SERP (search engine) results and previously crawled pages to improve coverage. You can control sitemap behavior with the `sitemap` parameter. ### Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ### Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") res = firecrawl.map(url="https://firecrawl.dev", limit=50, sitemap="include") print(res) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const res = await firecrawl.map('https://firecrawl.dev', { limit: 50, sitemap: 'include' }); console.log(res); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev" }' ``` ```bash CLI theme={null} # Map a website to discover URLs firecrawl map https://firecrawl.dev # Output as JSON with limit firecrawl map https://firecrawl.dev --json --limit 100 --pretty ``` </CodeGroup> <Info> Each map request consumes 1 credit per call, regardless of the number of URLs returned. For example, setting `limit` to 100,000 still uses 1 credit. </Info> ### Response SDKs will return the data object directly. cURL will return the payload exactly as shown below. ```json theme={null} { "success": true, "links": [ { "url": "https://docs.firecrawl.dev/features/scrape", "title": "Scrape | Firecrawl", "description": "Turn any url into clean data" }, { "url": "https://www.firecrawl.dev/blog/5_easy_ways_to_access_glm_4_5", "title": "5 Easy Ways to Access GLM-4.5", "description": "Discover how to access GLM-4.5 models locally, through chat applications, via the official API, and using the LLM marketplaces API for seamless integration i..." }, { "url": "https://www.firecrawl.dev/playground", "title": "Playground - Firecrawl", "description": "Preview the API response and get the code snippets for the API" }, { "url": "https://www.firecrawl.dev/?testId=2a7e0542-077b-4eff-bec7-0130395570d6", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ..." }, { "url": "https://www.firecrawl.dev/?testId=af391f07-ca0e-40d3-8ff2-b1ecf2e3fcde", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ..." }, ... ] } ``` <Warning> Title and description are not always present as it depends on the website. </Warning> #### Map with search Map with `search` param allows you to search for specific urls inside a website. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "search": "docs" }' ``` Response will be an ordered list from the most relevant to the least relevant. ```json theme={null} { "status": "success", "links": [ { "url": "https://docs.firecrawl.dev", "title": "Firecrawl Docs", "description": "Firecrawl documentation" }, { "url": "https://docs.firecrawl.dev/sdks/python", "title": "Firecrawl Python SDK", "description": "Firecrawl Python SDK documentation" }, ... ] } ``` ## Location and Language Specify country and preferred languages to get relevant content based on your target location and language preferences, similar to the scrape endpoint. ### How it works When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified. ### Usage To use the location and language settings, include the `location` object in your request body with the following properties: * `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'. * `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") res = firecrawl.map('https://example.com', location={ 'country': 'US', 'languages': ['en'] } ) print(res) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const res = await firecrawl.map('https://example.com', { location: { country: 'US', languages: ['en'] }, }); console.log(res.metadata); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/map" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "location": { "country": "US", "languages": ["en"] } }' ``` </CodeGroup> For more details about supported locations, refer to the [Proxies documentation](/features/proxies). ## Considerations This endpoint prioritizes speed, so it may not capture all website links. It primarily relies on the website's sitemap, supplemented by cached crawl data and search engine results. For a more thorough and up-to-date list of URLs, consider using the [/crawl](/features/crawl) endpoint instead. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Models Source: https://docs.firecrawl.dev/features/models Choose the right model for your agent extraction tasks. Firecrawl Agent offers two models optimized for different use cases. Choose the right model based on your extraction complexity and cost requirements. ## Available Models | Model | Cost | Accuracy | Best For | | -------------- | --------------- | -------- | ------------------------------------- | | `spark-1-mini` | 60% cheaper | Standard | Most tasks (default) | | `spark-1-pro` | Standard | Higher | Complex research, critical extraction | <Tip> Start with Spark 1 Mini (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical. </Tip> ## Spark 1 Mini (Default) `spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks. Use Mini when: * Extracting simple data points (contact info, pricing, etc.) * Working with well-structured websites * Cost efficiency is a priority * Running high-volume extraction jobs Example use cases: * Extracting product prices from e-commerce sites * Gathering contact information from company pages * Pulling basic metadata from articles * Simple data point lookups ## Spark 1 Pro `spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks. Use Pro when: * Performing complex competitive analysis * Extracting data that requires deep reasoning * Accuracy is critical for your use case * Dealing with ambiguous or hard-to-find data Example use cases: * Multi-domain competitive analysis * Complex research tasks requiring reasoning * Extracting nuanced information from multiple sources * Critical business intelligence gathering ## Specifying a Model Pass the `model` parameter to select which model to use: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Using Spark 1 Mini (default - can be omitted) result = app.agent( prompt="Find the pricing of Firecrawl", model="spark-1-mini" ) # Using Spark 1 Pro for complex tasks result = app.agent( prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Using Spark 1 Mini (default - can be omitted) const result = await firecrawl.agent({ prompt: "Find the pricing of Firecrawl", model: "spark-1-mini" }); // Using Spark 1 Pro for complex tasks const resultPro = await firecrawl.agent({ prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model: "spark-1-pro" }); console.log(result.data); ``` ```bash cURL theme={null} # Using Spark 1 Mini (default) curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the pricing of Firecrawl", "model": "spark-1-mini" }' # Using Spark 1 Pro for complex tasks curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", "model": "spark-1-pro" }' ``` </CodeGroup> ## Model Comparison | Feature | Spark 1 Mini | Spark 1 Pro | | ---------------- | ------------ | ------------- | | Cost | 60% cheaper | Standard | | Accuracy | Standard | Higher | | Speed | Fast | Fast | | Best for | Most tasks | Complex tasks | | Reasoning | Standard | Advanced | | Multi-domain | Good | Excellent | ## Pricing by Model Both models use dynamic, credit-based pricing that scales with task complexity: * Spark 1 Mini: Uses approximately 60% fewer credits than Pro for equivalent tasks * Spark 1 Pro: Standard credit consumption for maximum accuracy <Info> Credit usage varies based on prompt complexity, data processed, and output structure — regardless of model selected. </Info> ## Choosing the Right Model ``` ┌─────────────────────────────────┐ │ What type of task? │ └─────────────────────────────────┘ │ ┌──────────────┴──────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Simple/Direct │ │ Complex/Research│ │ extraction │ │ multi-domain │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ spark-1-mini │ │ spark-1-pro │ │ (60% cheaper) │ │ (higher acc.) │ └─────────────────┘ └─────────────────┘ ``` ## API Reference See the [Agent API Reference](/api-reference/endpoint/agent) for complete parameter documentation. Have questions about which model to use? Email [help@firecrawl.com](mailto:help@firecrawl.com). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Proxies Source: https://docs.firecrawl.dev/features/proxies Learn about proxy types, locations, and how Firecrawl selects proxies for your requests. Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. The proxy type can be specified using the `proxy` parameter. > By default, Firecrawl routes all requests through proxies to help ensure reliability and access, even if you do not specify a proxy type or location. ## Location-Based Proxy Selection Firecrawl automatically selects the best proxy based on your specified or detected location. This helps optimize scraping performance and reliability. However, not all locations are currently supported. The following locations are available: | Country Code | Country Name | Basic Proxy Support | Enhanced Proxy Support | | ------------ | -------------------- | ------------------- | ---------------------- | | AE | United Arab Emirates | Yes | No | | AU | Australia | Yes | No | | BR | Brazil | Yes | No | | CA | Canada | Yes | No | | CN | China | Yes | No | | CZ | Czechia | Yes | No | | DE | Germany | Yes | No | | DK | Denmark | Yes | Yes | | EE | Estonia | Yes | No | | EG | Egypt | Yes | No | | ES | Spain | Yes | No | | FR | France | Yes | No | | GB | United Kingdom | Yes | No | | GR | Greece | Yes | No | | HU | Hungary | Yes | No | | ID | Indonesia | Yes | No | | IL | Israel | Yes | No | | IN | India | Yes | No | | IT | Italy | Yes | No | | JP | Japan | Yes | No | | MY | Malaysia | Yes | No | | NO | Norway | Yes | No | | PL | Poland | Yes | No | | PT | Portugal | Yes | No | | QA | Qatar | Yes | No | | SG | Singapore | Yes | No | | US | United States | Yes | Yes | | VN | Vietnam | Yes | No | <Warning>The list of supported proxy locations will change over time.</Warning> If you need proxies in a location not listed above, please [contact us](mailto:help@firecrawl.com) and let us know your requirements. If you do not specify a proxy or location, Firecrawl will automatically use US proxies. ## How to Specify Proxy Location You can request a specific proxy location by setting the `location.country` parameter in your request. For example, to use a Brazilian proxy, set `location.country` to `BR`. For full details, see the [API reference for `location.country`](https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-location). <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape('https://example.com', formats=['markdown'], location={ 'country': 'US', 'languages': ['en'] } ) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], location: { country: 'US', languages: ['en'] }, }); console.log(doc.metadata); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "formats": ["markdown"], "location": { "country": "US", "languages": ["en"] } }' ``` </CodeGroup> <Info>If you request a country where a proxy is not available, Firecrawl will use the closest available region (EU or US) and set the browser location to your requested country.</Info> ## Proxy Types Firecrawl supports three types of proxies: * basic: Proxies for scraping most sites. Fast and usually works. * enhanced: Enhanced proxies for scraping complex sites while maintaining privacy. Slower, but more reliable on certain sites. [Learn more about Enhanced Mode →](/features/enhanced-mode) * auto: Firecrawl will automatically retry scraping with enhanced proxies if the basic proxy fails. If the retry with enhanced is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed. * > Note:** For detailed information on using enhanced proxies, including credit costs and retry strategies, see the [Enhanced Mode documentation](/features/enhanced-mode). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Scrape Source: https://docs.firecrawl.dev/features/scrape Turn any url into clean data Firecrawl converts web pages into markdown, ideal for LLM applications. * It manages complexities: proxies, caching, rate limits, js-blocked content * Handles dynamic content: dynamic websites, js-rendered sites, PDFs, images * Outputs clean markdown, structured data, screenshots or html. For details, see the [Scrape Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape). <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=scrape"> Test scraping in the interactive playground — no code required. </Card> ## Scraping a URL with Firecrawl ### /scrape endpoint Used to scrape a URL and get its content. ### Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ### Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") # Scrape a website: doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"]) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Scrape a website: const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["markdown", "html"] }' ``` ```bash CLI theme={null} # Scrape a URL and get markdown firecrawl https://firecrawl.dev # With multiple formats (returns JSON) firecrawl https://firecrawl.dev --format markdown,html,links --pretty ``` </CodeGroup> For more details about the parameters, refer to the [API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape). <Info> Each scrape consumes 1 credit. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, PDF parsing costs 1 credit per PDF page, and audio extraction costs 4 additional credits per page. </Info> ### Response SDKs will return the data object directly. cURL will return the payload exactly as shown below. ```json theme={null} { "success": true, "data" : { "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"light\" style=\"color-scheme: light;\"><body class=\"__variable_36bd41 __variable_d7dc5d font-inter ...", "metadata": { "title": "Home - Firecrawl", "description": "Firecrawl crawls and converts any website into clean markdown.", "language": "en", "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "Turn any website into LLM-ready data.", "ogUrl": "https://www.firecrawl.dev/", "ogImage": "https://www.firecrawl.dev/og.png?123", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev", "statusCode": 200, "contentType": "text/html" } } } ``` ## Scrape Formats You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are: * Markdown (`markdown`) * Summary (`summary`) * HTML (`html`) - cleaned version of the page's HTML * Raw HTML (`rawHtml`) - unmodified HTML as received from the page * Screenshot (`screenshot`, with options like `fullPage`, `quality`, `viewport`) — screenshot URLs expire after 24 hours * Links (`links`) * JSON (`json`) - structured output * Images (`images`) - extract all image URLs from the page * Branding (`branding`) - extract brand identity and design system * Audio (`audio`) - extract MP3 audio from supported video URLs, e.g. YouTube (returns a signed GCS URL, expires after 1 hour) Output keys will match the format you choose. ## Extract structured data ### /scrape (with json) endpoint Used to extract structured data from scraped pages. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Define schema to extract contents into const schema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", schema: schema }], }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", "supports_sso": true, "is_open_source": true, "is_in_yc": true }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### Extracting without schema You can now extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "prompt": "Extract the company mission from the page." }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", prompt: "Extract the company mission from the page." }] }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the company mission from the page." }] }' ``` </CodeGroup> Output: ```json JSON theme={null} { "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, } } ``` ### JSON format options When using the `json` format, pass an object inside `formats` with the following parameters: * `schema`: JSON Schema for the structured output. * `prompt`: Optional prompt to help guide extraction when a schema is present or when you prefer light guidance. ## Extract brand identity ### /scrape (with branding) endpoint The branding format extracts comprehensive brand identity information from a webpage, including colors, fonts, typography, spacing, UI components, and more. This is useful for design system analysis, brand monitoring, or building tools that need to understand a website's visual identity. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') result = firecrawl.scrape( url='https://firecrawl.dev', formats=['branding'] ) print(result['branding']) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['branding'] }); console.log(result.branding); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["branding"] }' ``` </CodeGroup> ### Response The branding format returns a comprehensive `BrandingProfile` object with the following structure: ```json Output theme={null} { "success": true, "data": { "branding": { "colorScheme": "dark", "logo": "https://firecrawl.dev/logo.svg", "colors": { "primary": "#FF6B35", "secondary": "#004E89", "accent": "#F77F00", "background": "#1A1A1A", "textPrimary": "#FFFFFF", "textSecondary": "#B0B0B0" }, "fonts": [ { "family": "Inter" }, { "family": "Roboto Mono" } ], "typography": { "fontFamilies": { "primary": "Inter", "heading": "Inter", "code": "Roboto Mono" }, "fontSizes": { "h1": "48px", "h2": "36px", "h3": "24px", "body": "16px" }, "fontWeights": { "regular": 400, "medium": 500, "bold": 700 } }, "spacing": { "baseUnit": 8, "borderRadius": "8px" }, "components": { "buttonPrimary": { "background": "#FF6B35", "textColor": "#FFFFFF", "borderRadius": "8px" }, "buttonSecondary": { "background": "transparent", "textColor": "#FF6B35", "borderColor": "#FF6B35", "borderRadius": "8px" } }, "images": { "logo": "https://firecrawl.dev/logo.svg", "favicon": "https://firecrawl.dev/favicon.ico", "ogImage": "https://firecrawl.dev/og-image.png" } } } } ``` ### Branding Profile Structure The `branding` object contains the following properties: * `colorScheme`: The detected color scheme (`"light"` or `"dark"`) * `logo`: URL of the primary logo * `colors`: Object containing brand colors: * `primary`, `secondary`, `accent`: Main brand colors * `background`, `textPrimary`, `textSecondary`: UI colors * `link`, `success`, `warning`, `error`: Semantic colors * `fonts`: Array of font families used on the page * `typography`: Detailed typography information: * `fontFamilies`: Primary, heading, and code font families * `fontSizes`: Size definitions for headings and body text * `fontWeights`: Weight definitions (light, regular, medium, bold) * `lineHeights`: Line height values for different text types * `spacing`: Spacing and layout information: * `baseUnit`: Base spacing unit in pixels * `borderRadius`: Default border radius * `padding`, `margins`: Spacing values * `components`: UI component styles: * `buttonPrimary`, `buttonSecondary`: Button styles * `input`: Input field styles * `icons`: Icon style information * `images`: Brand images (logo, favicon, og:image) * `animations`: Animation and transition settings * `layout`: Layout configuration (grid, header/footer heights) * `personality`: Brand personality traits (tone, energy, target audience) ### Combining with other formats You can combine the branding format with other formats to get comprehensive page data: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') result = firecrawl.scrape( url='https://firecrawl.dev', formats=['markdown', 'branding', 'screenshot'] ) print(result['markdown']) print(result['branding']) print(result['screenshot']) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'branding', 'screenshot'] }); console.log(result.markdown); console.log(result.branding); console.log(result.screenshot); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["markdown", "branding", "screenshot"] }' ``` </CodeGroup> ## Audio extraction The `audio` format extracts audio from supported websites (e.g. YouTube) as MP3 files and returns a signed Google Cloud Storage URL. This is useful for building audio processing pipelines, transcription services, or podcast tools. <Info> Audio extraction costs 5 credits per page (1 base + 4 additional). </Info> <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape("https://www.youtube.com/watch?v=dQw4w9WgXcQ", formats=["audio"]) print(doc.audio) # Signed GCS URL to the MP3 file ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://www.youtube.com/watch?v=dQw4w9WgXcQ', { formats: ['audio'] }); console.log(doc.audio); // Signed GCS URL to the MP3 file ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "formats": ["audio"] }' ``` </CodeGroup> ## Interacting with the page with Actions Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction. Here is an example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot. It is important to almost always use the `wait` action before/after executing other actions to give enough time for the page to load. ### Example <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape( url="https://example.com/login", formats=["markdown"], actions=[ {"type": "write", "text": "john@example.com"}, {"type": "press", "key": "Tab"}, {"type": "write", "text": "secret"}, {"type": "click", "selector": 'button[type="submit"]'}, {"type": "wait", "milliseconds": 1500}, {"type": "screenshot", "full_page": True}, ], ) print(doc.markdown, doc.screenshot) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com/login', { formats: ['markdown'], actions: [ { type: 'write', text: 'john@example.com' }, { type: 'press', key: 'Tab' }, { type: 'write', text: 'secret' }, { type: 'click', selector: 'button[type="submit"]' }, { type: 'wait', milliseconds: 1500 }, { type: 'screenshot', fullPage: true }, ], }); console.log(doc.markdown, doc.screenshot); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://example.com/login", "formats": ["markdown"], "actions": [ { "type": "write", "text": "john@example.com" }, { "type": "press", "key": "Tab" }, { "type": "write", "text": "secret" }, { "type": "click", "selector": "button[type=\"submit\"]" }, { "type": "wait", "milliseconds": 1500 }, { "type": "screenshot", "fullPage": true }, ], }' ``` </CodeGroup> ### Output <CodeGroup> ```json JSON theme={null} { "success": true, "data": { "markdown": "Our first Launch Week is over! [See the recap 🚀](blog/firecrawl-launch-week-1-recap)...", "actions": { "screenshots": [ "https://alttmdsdujxrfnakrkyi.supabase.co/storage/v1/object/public/media/screenshot-75ef2d87-31e0-4349-a478-fb432a29e241.png" ], "scrapes": [ { "url": "https://www.firecrawl.dev/", "html": "<html><body><h1>Firecrawl</h1></body></html>" } ] }, "metadata": { "title": "Home - Firecrawl", "description": "Firecrawl crawls and converts any website into clean markdown.", "language": "en", "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "Turn any website into LLM-ready data.", "ogUrl": "https://www.firecrawl.dev/", "ogImage": "https://www.firecrawl.dev/og.png?123", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "http://google.com", "statusCode": 200 } } } ``` </CodeGroup> For more details about the actions parameters, refer to the [API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape). ## Location and Language Specify country and preferred languages to get relevant content based on your target location and language preferences. ### How it works When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified. ### Usage To use the location and language settings, include the `location` object in your request body with the following properties: * `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'. * `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape('https://example.com', formats=['markdown'], location={ 'country': 'US', 'languages': ['en'] } ) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], location: { country: 'US', languages: ['en'] }, }); console.log(doc.metadata); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "formats": ["markdown"], "location": { "country": "US", "languages": ["en"] } }' ``` </CodeGroup> For more details about supported locations, refer to the [Proxies documentation](/features/proxies). ## Caching and maxAge To make requests faster, Firecrawl serves results from cache by default when a recent copy is available. * Default freshness window: `maxAge = 172800000` ms (2 days). If a cached page is newer than this, it’s returned instantly; otherwise, the page is scraped and then cached. * Performance: This can speed up scrapes by up to 5x when data doesn’t need to be ultra-fresh. * Always fetch fresh: Set `maxAge` to `0`. Note that this bypasses the cache entirely, so every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if freshness on every request is not critical. * Avoid storing: Set `storeInCache` to `false` if you don’t want Firecrawl to cache/store results for this request. * Cache-only lookup: Set `minAge` to perform a cache-only lookup without triggering a fresh scrape. The value is in milliseconds and specifies the minimum age the cached data must be. If no cached data is found, a `404` with error code `SCRAPE_NO_CACHED_DATA` is returned. Set `minAge` to `1` to accept any cached data regardless of age. * Change tracking: Requests that include `changeTracking` bypass the cache, so `maxAge` is ignored. * Credits: Cached results still cost 1 credit per page. Caching improves speed, not credit usage. Example (force fresh content): <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') doc = firecrawl.scrape(url='https://example.com', max_age=0, formats=['markdown']) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { maxAge: 0, formats: ['markdown'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "maxAge": 0, "formats": ["markdown"] }' ``` </CodeGroup> Example (use a 10-minute cache window): <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') doc = firecrawl.scrape(url='https://example.com', max_age=600000, formats=['markdown', 'html']) print(doc) ``` ```js Node theme={null} const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com', { maxAge: 600000, formats: ['markdown', 'html'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "maxAge": 600000, "formats": ["markdown", "html"] }' ``` </CodeGroup> ## Batch scraping multiple URLs You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats. ### How it works It is very similar to how the `/crawl` endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape. The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape. ### Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") job = firecrawl.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"], poll_interval=2, wait_timeout=120) print(job) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const job = await firecrawl.batchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev', ], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 }); console.log(job); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"], "formats": ["markdown"] }' ``` </CodeGroup> ### Response If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape. #### Synchronous ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` #### Asynchronous You can then use the job ID to check the status of the batch scrape by calling the `/batch/scrape/{id}` endpoint. This endpoint is meant to be used while the job is still running or right after it has completed as batch scrape jobs expire after 24 hours. ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Enhanced Mode For complex websites, Firecrawl offers enhanced mode that provides better success rates while maintaining privacy. Learn more about [Enhanced Mode](/features/enhanced-mode). ## Zero Data Retention (ZDR) Firecrawl supports Zero Data Retention (ZDR) for teams with strict data handling requirements. When enabled, Firecrawl will not persist any page content or extracted data beyond the lifetime of the request. To enable ZDR, set `zeroDataRetention: true` in your request: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "url": "https://example.com", "formats": ["markdown"], "zeroDataRetention": true }' ``` ZDR is available on Enterprise plans and must be enabled for your team. Visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started. ZDR adds 1 additional credit per page on top of the base scrape cost. <Note> Screenshots are not available in ZDR mode. Because screenshots require uploading to persistent storage, they are incompatible with the ZDR guarantee. Requests that include both `zeroDataRetention: true` and a `screenshot` format will return an error. </Note> > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Search Source: https://docs.firecrawl.dev/features/search Search the web and get full content from results Search the web and get clean, structured content from every result in a single API call. Pass a query to `/search` and Firecrawl returns titles, descriptions, and URLs. Add `scrapeOptions` to also retrieve full-page markdown, HTML, links, or screenshots for each result. For the full parameter list, see the [Search Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/search). <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground?endpoint=search"> Test searching in the interactive playground — no code required. </Card> ## Performing a Search with Firecrawl ### /search endpoint Used to perform web searches and optionally retrieve content from the results. ### Installation <CodeGroup> ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` </CodeGroup> ### Basic Usage <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( query="firecrawl", limit=3, ) print(results) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(results); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/search" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "firecrawl", "limit": 3 }' ``` ```bash CLI theme={null} # Search the web firecrawl search "firecrawl web scraping" --limit 5 --pretty ``` </CodeGroup> ### Response SDKs will return the data object directly. cURL will return the complete payload. ```json JSON theme={null} { "success": true, "data": { "web": [ { "url": "https://www.firecrawl.dev/", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.", "position": 1 }, { "url": "https://github.com/firecrawl/firecrawl", "title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub", "description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.", "position": 2 }, ... ], "images": [ { "title": "Quickstart | Firecrawl", "imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png", "imageWidth": 5814, "imageHeight": 1200, "url": "https://docs.firecrawl.dev/", "position": 1 }, ... ], "news": [ { "title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees", "url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/", "snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.", "date": "3 months ago", "position": 1 }, ... ] } } ``` ## Search result types In addition to regular web results, Search supports specialized result types via the `sources` parameter: * `web`: standard web results (default) * `news`: news-focused results * `images`: image search results You can request multiple sources in a single call (e.g., `sources: ["web", "news"]`). When you do, the `limit` parameter applies per source type — so `limit: 5` with `sources: ["web", "news"]` returns up to 5 web results and up to 5 news results (10 total). If you need different parameters per source (for example, different `limit` values or different `scrapeOptions`), make separate calls instead. ## Search Categories Filter search results by specific categories using the `categories` parameter: * `github`: Search within GitHub repositories, code, issues, and documentation * `research`: Search academic and research websites (arXiv, Nature, IEEE, PubMed, etc.) * `pdf`: Search for PDFs ### GitHub Category Search Search specifically within GitHub repositories: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "web scraping python", "categories": ["github"], "limit": 10 }' ``` ### Research Category Search Search academic and research websites: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "machine learning transformers", "categories": ["research"], "limit": 10 }' ``` ### Mixed Category Search Combine multiple categories in one search: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "neural networks", "categories": ["github", "research"], "limit": 15 }' ``` ### Category Response Format Each search result includes a `category` field indicating its source: ```json theme={null} { "success": true, "data": { "web": [ { "url": "https://github.com/example/neural-network", "title": "Neural Network Implementation", "description": "A PyTorch implementation of neural networks", "category": "github" }, { "url": "https://arxiv.org/abs/2024.12345", "title": "Advances in Neural Network Architecture", "description": "Research paper on neural network improvements", "category": "research" } ] } } ``` Examples: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "openai", "sources": ["news"], "limit": 5 }' ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "jupiter", "sources": ["images"], "limit": 8 }' ``` ### HD Image Search with Size Filtering Use images operators to find high-resolution images: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sunset imagesize:1920x1080", "sources": ["images"], "limit": 5 }' ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "mountain wallpaper larger:2560x1440", "sources": ["images"], "limit": 8 }' ``` Common HD resolutions: * `imagesize:1920x1080` - Full HD (1080p) * `imagesize:2560x1440` - QHD (1440p) * `imagesize:3840x2160` - 4K UHD * `larger:1920x1080` - HD and above * `larger:2560x1440` - QHD and above ## Search with Content Scraping Search and retrieve content from the search results in one operation. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Search and scrape content results = firecrawl.search( "firecrawl web scraping", limit=3, scrape_options={ "formats": ["markdown", "links"] } ) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(results); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "firecrawl web scraping", "limit": 3, "scrapeOptions": { "formats": ["markdown", "links"] } }' ``` ```bash CLI theme={null} # Search and scrape results firecrawl search "firecrawl" --scrape --scrape-formats markdown --limit 5 --pretty ``` </CodeGroup> Every option in scrape endpoint is supported by this search endpoint through the `scrapeOptions` parameter. ### Response with Scraped Content ```json theme={null} { "success": true, "data": [ { "title": "Firecrawl - The Ultimate Web Scraping API", "description": "Firecrawl is a powerful web scraping API that turns any website into clean, structured data for AI and analysis.", "url": "https://firecrawl.dev/", "markdown": "# Firecrawl\n\nThe Ultimate Web Scraping API\n\n## Turn any website into clean, structured data\n\nFirecrawl makes it easy to extract data from websites for AI applications, market research, content aggregation, and more...", "links": [ "https://firecrawl.dev/pricing", "https://firecrawl.dev/docs", "https://firecrawl.dev/guides" ], "metadata": { "title": "Firecrawl - The Ultimate Web Scraping API", "description": "Firecrawl is a powerful web scraping API that turns any website into clean, structured data for AI and analysis.", "sourceURL": "https://firecrawl.dev/", "statusCode": 200 } } ] } ``` ## Advanced Search Options Firecrawl's search API supports various parameters to customize your search: ### Location Customization <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Search with location settings (Germany) search_result = firecrawl.search( "web scraping tools", limit=5, location="Germany" ) # Process the results for result in search_result.data: print(f"Title: {result['title']}") print(f"URL: {result['url']}") ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Search with location settings (Germany) const results = await firecrawl.search('web scraping tools', { limit: 5, location: "Germany" }); // Process the results console.log(results); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "web scraping tools", "limit": 5, "location": "Germany" }' ``` ```bash CLI theme={null} # Search with location firecrawl search "local restaurants" --location "San Francisco,California,United States" --country US --pretty ``` </CodeGroup> ### Time-Based Search Use the `tbs` parameter to filter results by time. Note that `tbs` only applies to `web` source results — it does not filter `news` or `images` results. If you need time-filtered news, consider using a `web` source with the `site:` operator to target specific news domains. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( query="firecrawl", limit=5, tbs="qdr:d", ) print(len(results.get('web', []))) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 5, tbs: 'qdr:d', // past day }); console.log(results.web); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "latest web scraping techniques", "limit": 5, "tbs": "qdr:w" }' ``` ```bash CLI theme={null} # Search with time filter (past week) firecrawl search "firecrawl updates" --tbs qdr:w --limit 5 --pretty ``` </CodeGroup> Common `tbs` values: * `qdr:h` - Past hour * `qdr:d` - Past 24 hours * `qdr:w` - Past week * `qdr:m` - Past month * `qdr:y` - Past year * `sbd:1` - Sort by date (newest first) For more precise time filtering, you can specify exact date ranges using the custom date range format: <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl # Initialize the client with your API key firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Search for results from December 2024 search_result = firecrawl.search( "firecrawl updates", limit=10, tbs="cdr:1,cd_min:12/1/2024,cd_max:12/31/2024" ) ``` ```js JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; // Initialize the client with your API key const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); // Search for results from December 2024 firecrawl.search("firecrawl updates", { limit: 10, tbs: "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024" }) .then(searchResult => { console.log(searchResult.data); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "firecrawl updates", "limit": 10, "tbs": "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024" }' ``` </CodeGroup> You can combine `sbd:1` with time filters to get date-sorted results within a time range. For example, `sbd:1,qdr:w` returns results from the past week sorted newest first, and `sbd:1,cdr:1,cd_min:12/1/2024,cd_max:12/31/2024` returns results from December 2024 sorted by date. ### Custom Timeout Set a custom timeout for search operations: <CodeGroup> ```python Python theme={null} from firecrawl import FirecrawlApp # Initialize the client with your API key app = FirecrawlApp(api_key="fc-YOUR_API_KEY") # Set a 30-second timeout search_result = app.search( "complex search query", limit=10, timeout=30000 # 30 seconds in milliseconds ) ``` ```js JavaScript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; // Initialize the client with your API key const app = new FirecrawlApp({apiKey: "fc-YOUR_API_KEY"}); // Set a 30-second timeout app.search("complex search query", { limit: 10, timeout: 30000 // 30 seconds in milliseconds }) .then(searchResult => { // Process results console.log(searchResult.data); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "complex search query", "limit": 10, "timeout": 30000 }' ``` </CodeGroup> ## Zero Data Retention (ZDR) For teams with strict data handling requirements, Firecrawl offers Zero Data Retention (ZDR) options for the `/search` endpoint via the `enterprise` parameter. ZDR search is available on Enterprise plans — visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started. <Note> This is separate from the `zeroDataRetention` scrape option, which controls ZDR for scraping operations. See [Scrape ZDR](/features/scrape#zero-data-retention-zdr) for details. The `enterprise` parameter only applies to the search portion of the request. </Note> ### End-to-End ZDR With end-to-end ZDR, both Firecrawl and our upstream search provider enforce zero data retention. No query or result data is stored at any point in the pipeline. * Cost: 10 credits per 10 results * Parameter: `enterprise: ["zdr"]` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sensitive topic", "limit": 10, "enterprise": ["zdr"] }' ``` ### Anonymized ZDR With anonymized ZDR, Firecrawl enforces full zero data retention on our side. Our search provider may cache the query, but it is fully anonymized — no identifying information is attached. * Cost: 2 credits per 10 results * Parameter: `enterprise: ["anon"]` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sensitive topic", "limit": 10, "enterprise": ["anon"] }' ``` ### Combining Search ZDR with Scrape ZDR If you are using search with content scraping (`scrapeOptions`), the `enterprise` parameter covers the search portion while `zeroDataRetention` in `scrapeOptions` covers the scraping portion. To get full ZDR across both, set both: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer fc-YOUR_API_KEY" \ -d '{ "query": "sensitive topic", "limit": 5, "enterprise": ["zdr"], "scrapeOptions": { "formats": ["markdown"], "zeroDataRetention": true } }' ``` ## Cost Implications The cost of a search is 2 credits per 10 results, rounded up (1–10 results = 2 credits, 11–20 = 4 credits, and so on). If scraping options are enabled, the standard scraping costs apply to each search result: * Basic scrape: 1 credit per webpage * PDF parsing: 1 credit per PDF page * Enhanced proxy mode: 4 additional credits per webpage * JSON mode: 4 additional credits per webpage To help control costs: * Set `parsers: []` if PDF parsing isn’t required * Use `proxy: "basic"` instead of `"enhanced"` when possible, or set it to `"auto"` * Limit the number of search results with the `limit` parameter ## Advanced Scraping Options For more details about the scraping options, refer to the [Scrape Feature documentation](https://docs.firecrawl.dev/features/scrape). Everything except for the FIRE-1 Agent and Change-Tracking features are supported by this Search endpoint. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Introduction Source: https://docs.firecrawl.dev/introduction Search the web, scrape any page, and interact with it — all through one API. ## Get started <CardGroup> <Card title="Get your API key" icon="key" href="https://www.firecrawl.dev/app/api-keys"> Sign up and get your API key to start using Firecrawl </Card> <Card title="Try it in the Playground" icon="play" href="https://www.firecrawl.dev/playground"> Test the API instantly without writing any code </Card> </CardGroup> ### Use Firecrawl with AI agents (recommended) The Firecrawl skill is the fastest way for agents to discover and use Firecrawl. Without it, your agent will not know Firecrawl is available. ```bash theme={null} npx -y firecrawl-cli@latest init --all --browser ``` <Note> Restart your agent after installing the skill. See [Skill + CLI](/sdks/cli) for the full setup. </Note> Or use the [MCP Server](/mcp-server) to connect Firecrawl directly to Claude, Cursor, Windsurf, VS Code, and other AI tools. *** ## What can Firecrawl do? <CardGroup> <Card title="Search" icon="magnifying-glass" href="#search"> Search the web and get full page content from results </Card> <Card title="Scrape" icon="file-lines" href="#scrape"> Extract content from any URL as markdown, HTML, or structured JSON </Card> <Card title="Interact" icon="hand-pointer" href="#interact"> Continue working with any scraped page — click, fill forms, extract dynamic content </Card> </CardGroup> ### Why Firecrawl? * Search, scrape, and interact: Three APIs that cover every step of gathering web data reliably. * LLM-ready output: Clean markdown, structured JSON, screenshots, and more. * Handles the hard stuff: Proxies, anti-bot, JavaScript rendering, and dynamic content. * Reliable: Built for production with high uptime and consistent results. * Fast: Results in seconds, optimized for high throughput. * MCP Server: Connect Firecrawl to any AI tool via the [Model Context Protocol](/mcp-server). * ## Search Search the web and get full page content from results in one call. See the [Search feature docs](/features/search) for all options. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") results = firecrawl.search( query="firecrawl", limit=3, ) print(results) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const results = await firecrawl.search('firecrawl', { limit: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(results); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/search" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "firecrawl", "limit": 3 }' ``` ```bash CLI theme={null} # Search the web firecrawl search "firecrawl web scraping" --limit 5 --pretty ``` </CodeGroup> <Accordion title="Response"> SDKs will return the data object directly. cURL will return the complete payload. ```json JSON theme={null} { "success": true, "data": { "web": [ { "url": "https://www.firecrawl.dev/", "title": "Firecrawl - The Web Data API for AI", "description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.", "position": 1 }, { "url": "https://github.com/firecrawl/firecrawl", "title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub", "description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.", "position": 2 }, ... ], "images": [ { "title": "Quickstart | Firecrawl", "imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png", "imageWidth": 5814, "imageHeight": 1200, "url": "https://docs.firecrawl.dev/", "position": 1 }, ... ], "news": [ { "title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees", "url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/", "snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.", "date": "3 months ago", "position": 1 }, ... ] } } ``` </Accordion> ## Scrape Scrape any URL and get its content in markdown, HTML, or other formats. See the [Scrape feature docs](/features/scrape) for all options. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") # Scrape a website: doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"]) print(doc) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Scrape a website: const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] }); console.log(doc); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://firecrawl.dev", "formats": ["markdown", "html"] }' ``` ```bash CLI theme={null} # Scrape a URL and get markdown firecrawl https://firecrawl.dev # With multiple formats (returns JSON) firecrawl https://firecrawl.dev --format markdown,html,links --pretty ``` </CodeGroup> <Accordion title="Response"> SDKs will return the data object directly. cURL will return the payload exactly as shown below. ```json theme={null} { "success": true, "data" : { "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...", "html": "<!DOCTYPE html><html lang=\"en\" class=\"light\" style=\"color-scheme: light;\"><body class=\"__variable_36bd41 __variable_d7dc5d font-inter ...", "metadata": { "title": "Home - Firecrawl", "description": "Firecrawl crawls and converts any website into clean markdown.", "language": "en", "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "Turn any website into LLM-ready data.", "ogUrl": "https://www.firecrawl.dev/", "ogImage": "https://www.firecrawl.dev/og.png?123", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev", "statusCode": 200, "contentType": "text/html" } } } ``` </Accordion> ## Interact Scrape a page, then keep working with it — click buttons, fill forms, extract dynamic content, or navigate deeper. Describe what you want in plain English or write code for full control. See the [Interact feature docs](/features/interact) for all options. <CodeGroup> ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") # 1. Scrape Amazon's homepage result = app.scrape("https://www.amazon.com", formats=["markdown"]) scrape_id = result.metadata.scrape_id # 2. Interact — search for a product and get its price app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max") response = app.interact(scrape_id, prompt="Click on the first result and tell me the price") print(response.output) # 3. Stop the session app.stop_interaction(scrape_id) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // 1. Scrape Amazon's homepage const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; // 2. Interact — search for a product and get its price await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); console.log(response.output); // 3. Stop the session await app.stopInteraction(scrapeId); ``` ```bash cURL theme={null} # 1. Scrape Amazon's homepage RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "formats": ["markdown"]}') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # 2. Interact — search for a product and get its price curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Search for iPhone 16 Pro Max"}' curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Click on the first result and tell me the price"}' # 3. Stop the session curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # 1. Scrape Amazon's homepage (scrape ID is saved automatically) firecrawl scrape https://www.amazon.com # 2. Interact — search for a product and get its price firecrawl interact "Search for iPhone 16 Pro Max" firecrawl interact "Click on the first result and tell me the price" # 3. Stop the session firecrawl interact stop ``` </CodeGroup> <Accordion title="Response"> ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.", "exitCode": 0, "killed": false } ``` </Accordion> * ## More capabilities <CardGroup> <Card title="Agent" icon="robot" href="/features/agent"> Autonomous web data gathering powered by AI </Card> <Card title="Browser Sandbox" icon="browser" href="/features/browser"> Managed browser sessions for interactive workflows </Card> <Card title="Map" icon="map" href="/features/map"> Discover all URLs on a website </Card> <Card title="Crawl" icon="spider-web" href="/features/crawl"> Recursively gather content from entire sites </Card> </CardGroup> *** ## Resources <CardGroup> <Card title="API Reference" icon="code" href="/api-reference/v2-introduction"> Complete API documentation with interactive examples </Card> <Card title="SDKs" icon="boxes-stacked" href="/sdks/overview"> Python, Node.js, CLI, and community SDKs </Card> <Card title="Open Source" icon="github" href="/contributing/open-source-or-cloud"> Self-host Firecrawl or contribute to the project </Card> <Card title="Integrations" icon="puzzle-piece" href="/developer-guides/llm-sdks-and-frameworks/openai"> LangChain, LlamaIndex, OpenAI, and more </Card> </CardGroup> # Firecrawl MCP Server Source: https://docs.firecrawl.dev/mcp-server Use Firecrawl's API through the Model Context Protocol A Model Context Protocol (MCP) server implementation that integrates [Firecrawl](https://github.com/firecrawl/firecrawl) for searching, scraping, and interacting with the web. Our MCP server is open-source and available on [GitHub](https://github.com/firecrawl/firecrawl-mcp-server). ## Features * Search the web and get full page content * Scrape any URL into clean, structured data * Interact with pages — click, navigate, and operate * Deep research with autonomous agent * Browser session management * Cloud and self-hosted support * Streamable HTTP support ## Installation You can either use our remote hosted URL or run the server locally. Get your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) ### Remote hosted URL ```bash theme={null} https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp ``` ### Running with npx ```bash theme={null} env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp ``` ### Manual Installation ```bash theme={null} npm install -g firecrawl-mcp ``` ### Running on Cursor <a href="cursor://anysphere.cursor-deeplink/mcp/install?name=firecrawl&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsImZpcmVjcmF3bC1tY3AiXSwiZW52Ijp7IkZJUkVDUkFXTF9BUElfS0VZIjoiWU9VUi1BUEktS0VZIn19"> <img alt="Add Firecrawl MCP server to Cursor" /> </a> #### Manual Installation Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: [Cursor MCP Server Configuration Guide](https://docs.cursor.com/context/model-context-protocol#configuring-mcp-servers) To configure Firecrawl MCP in Cursor v0.48.6 1. Open Cursor Settings 2. Go to Features > MCP Servers 3. Click "+ Add new global MCP server" 4. Enter the following code: ```json theme={null} { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR-API-KEY" } } } } ``` To configure Firecrawl MCP in Cursor v0.45.6 1. Open Cursor Settings 2. Go to Features > MCP Servers 3. Click "+ Add New MCP Server" 4. Enter the following: * Name: "firecrawl-mcp" (or your preferred name) * Type: "command" * Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp` > If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"` Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from [https://www.firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web data needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query. ### Running on Windsurf Add this to your `./codeium/windsurf/model_config.json`: ```json theme={null} { "mcpServers": { "mcp-server-firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` ### Running with Streamable HTTP Mode To run the server using streamable HTTP transport locally instead of the default stdio transport: ```bash theme={null} env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp ``` Use the url: [http://localhost:3000/v2/mcp](http://localhost:3000/v2/mcp) or [https://mcp.firecrawl.dev/\{FIRECRAWL\_API\_KEY}/v2/mcp](https://mcp.firecrawl.dev/\{FIRECRAWL_API_KEY}/v2/mcp) ### Installing via Smithery (Legacy) To install Firecrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl): ```bash theme={null} npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude ``` ### Running on VS Code For one-click installation, click one of the install buttons below\... [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-NPM-0098FF?style=flat-square\&logo=visualstudiocode\&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-NPM-24bfa5?style=flat-square\&logo=visualstudiocode\&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D\&quality=insiders) For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing `Ctrl + Shift + P` and typing `Preferences: Open User Settings (JSON)`. ```json theme={null} { "mcp": { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "Firecrawl API Key", "password": true } ], "servers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "${input:apiKey}" } } } } } ``` Optionally, you can add it to a file called `.vscode/mcp.json` in your workspace. This will allow you to share the configuration with others: ```json theme={null} { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "Firecrawl API Key", "password": true } ], "servers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "${input:apiKey}" } } } } ``` Note: Some users have reported issues when adding the MCP server to VS Code due to how it validates JSON with an outdated schema format ([microsoft/vscode#155379](https://github.com/microsoft/vscode/issues/155379)). This affects several MCP tools, including Firecrawl. Workaround: Disable JSON validation in VS Code to allow the MCP server to load properly.\ See reference: [directus/directus#25906 (comment)](https://github.com/directus/directus/issues/25906#issuecomment-3369169513). The MCP server still works fine when invoked via other extensions, but the issue occurs specifically when registering it directly in the MCP server list. We plan to add guidance once VS Code updates their schema validation. ### Running on Claude Desktop Add this to the Claude config file: ```json theme={null} { "mcpServers": { "firecrawl": { "url": "https://mcp.firecrawl.dev/v2/mcp", "headers": { "Authorization": "Bearer YOUR_API_KEY" } } } } ``` If you get a "Couldn't reach the MCP server" error, your Claude Desktop version may not support streamable HTTP transport. Use the local npx approach instead (requires [Node.js](https://nodejs.org)): ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` If you see a `spawn npx ENOENT` error, Node.js is not installed or not in your system PATH. Install Node.js from [nodejs.org](https://nodejs.org) (LTS version), then fully restart Claude Desktop. On Windows, you can also run `where npx` in Command Prompt and use the full path (e.g. `C:\\Program Files\\nodejs\\npx.cmd`) as the `command` value. ### Running on Claude Code Add the Firecrawl MCP server using the Claude Code CLI. You can use the remote hosted URL or run locally: ```bash theme={null} # Remote hosted URL (recommended) claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp # Or run locally via npx claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp ``` ### Running on Google Antigravity Google Antigravity allows you to configure MCP servers directly through its Agent interface. <img alt="Antigravity MCP Installation" /> 1. Open the Agent sidebar in the Editor or the Agent Manager view 2. Click the "..." (More Actions) menu and select MCP Servers 3. Select View raw config to open your local `mcp_config.json` file 4. Add the following configuration: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_FIRECRAWL_API_KEY" } } } } ``` 5. Save the file and click Refresh in the Antigravity MCP interface to see the new tools Replace `YOUR_FIRECRAWL_API_KEY` with your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys). ### Running on n8n To connect the Firecrawl MCP server in n8n: 1. Get your Firecrawl API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) 2. In your n8n workflow, add an AI Agent node 3. In the AI Agent configuration, add a new Tool 4. Select MCP Client Tool as the tool type 5. Enter the MCP server Endpoint (replace `{YOUR_FIRECRAWL_API_KEY}` with your actual API key): ``` https://mcp.firecrawl.dev/{YOUR_FIRECRAWL_API_KEY}/v2/mcp ``` 6. Set Server Transport to HTTP Streamable 7. Set Authentication to None 8. For Tools to include, you can select All, Selected, or All Except - this will expose the Firecrawl tools (scrape, crawl, map, search, extract, etc.) For self-hosted deployments, run the MCP server with npx and enable HTTP transport mode: ```bash theme={null} env HTTP_STREAMABLE_SERVER=true \ FIRECRAWL_API_KEY=fc-YOUR_API_KEY \ FIRECRAWL_API_URL=YOUR_FIRECRAWL_INSTANCE \ npx -y firecrawl-mcp ``` This will start the server on `http://localhost:3000/v2/mcp` which you can use in your n8n workflow as Endpoint. The `HTTP_STREAMABLE_SERVER=true` environment variable is required since n8n needs HTTP transport. ## Configuration ### Environment Variables #### Required for Cloud API * `FIRECRAWL_API_KEY`: Your Firecrawl API key * Required when using cloud API (default) * Optional when using self-hosted instance with `FIRECRAWL_API_URL` * `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances * Example: `https://firecrawl.your-domain.com` * If not provided, the cloud API will be used (requires API key) #### Optional Configuration ##### Retry Configuration * `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3) * `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000) * `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000) * `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2) ##### Credit Usage Monitoring * `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000) * `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100) ### Configuration Examples For cloud API usage with custom retry and credit monitoring: ```bash theme={null} # Required for cloud API export FIRECRAWL_API_KEY=your-api-key # Optional retry configuration export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff # Optional credit monitoring export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits ``` For self-hosted instance: ```bash theme={null} # Required for self-hosted export FIRECRAWL_API_URL=https://firecrawl.your-domain.com # Optional authentication for self-hosted export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth # Custom retry configuration export FIRECRAWL_RETRY_MAX_ATTEMPTS=10 export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries ``` ### Custom configuration with Claude Desktop Add this to your `claude_desktop_config.json`: ```json theme={null} { "mcpServers": { "mcp-server-firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE", "FIRECRAWL_RETRY_MAX_ATTEMPTS": "5", "FIRECRAWL_RETRY_INITIAL_DELAY": "2000", "FIRECRAWL_RETRY_MAX_DELAY": "30000", "FIRECRAWL_RETRY_BACKOFF_FACTOR": "3", "FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000", "FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500" } } } } ``` ### System Configuration The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured: ```typescript theme={null} const CONFIG = { retry: { maxAttempts: 3, // Number of retry attempts for rate-limited requests initialDelay: 1000, // Initial delay before first retry (in milliseconds) maxDelay: 10000, // Maximum delay between retries (in milliseconds) backoffFactor: 2, // Multiplier for exponential backoff }, credit: { warningThreshold: 1000, // Warn when credit usage reaches this level criticalThreshold: 100, // Critical alert when credit usage reaches this level }, }; ``` These configurations control: 1. Retry Behavior * Automatically retries failed requests due to rate limits * Uses exponential backoff to avoid overwhelming the API * Example: With default settings, retries will be attempted at: * 1st retry: 1 second delay * 2nd retry: 2 seconds delay * 3rd retry: 4 seconds delay (capped at maxDelay) 2. Credit Usage Monitoring * Tracks API credit consumption for cloud API usage * Provides warnings at specified thresholds * Helps prevent unexpected service interruption * Example: With default settings: * Warning at 1000 credits remaining * Critical alert at 100 credits remaining ### Rate Limiting and Batch Processing The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities: * Automatic rate limit handling with exponential backoff * Efficient parallel processing for batch operations * Smart request queuing and throttling * Automatic retries for transient errors ## Available Tools ### 1. Scrape Tool (`firecrawl_scrape`) Scrape content from a single URL with advanced options. ```json theme={null} { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["markdown"], "onlyMainContent": true, "waitFor": 1000, "mobile": false, "includeTags": ["article", "main"], "excludeTags": ["nav", "footer"], "skipTlsVerification": false } } ``` ### 2. Map Tool (`firecrawl_map`) Map a website to discover all indexed URLs on the site. ```json theme={null} { "name": "firecrawl_map", "arguments": { "url": "https://example.com", "search": "blog", "sitemap": "include", "includeSubdomains": false, "limit": 100, "ignoreQueryParameters": true } } ``` #### Map Tool Options: * `url`: The base URL of the website to map * `search`: Optional search term to filter URLs * `sitemap`: Control sitemap usage - "include", "skip", or "only" * `includeSubdomains`: Whether to include subdomains in the mapping * `limit`: Maximum number of URLs to return * `ignoreQueryParameters`: Whether to ignore query parameters when mapping Best for: Discovering URLs on a website before deciding what to scrape; finding specific sections of a website. Returns: Array of URLs found on the site. ### 3. Search Tool (`firecrawl_search`) Search the web and optionally extract content from search results. ```json theme={null} { "name": "firecrawl_search", "arguments": { "query": "your search query", "limit": 5, "location": "United States", "tbs": "qdr:m", "scrapeOptions": { "formats": ["markdown"], "onlyMainContent": true } } } ``` #### Search Tool Options: * `query`: The search query string (required) * `limit`: Maximum number of results to return * `location`: Geographic location for search results * `tbs`: Time-based search filter (e.g., `qdr:d` for past day, `qdr:w` for past week, `qdr:m` for past month) * `filter`: Additional search filter * `sources`: Array of source types to search (`web`, `images`, `news`) * `scrapeOptions`: Options for scraping search result pages * `enterprise`: Array of enterprise options (`default`, `anon`, `zdr`) ### 4. Crawl Tool (`firecrawl_crawl`) Start an asynchronous crawl with advanced options. ```json theme={null} { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com", "maxDiscoveryDepth": 2, "limit": 100, "allowExternalLinks": false, "deduplicateSimilarURLs": true } } ``` ### 5. Check Crawl Status (`firecrawl_check_crawl_status`) Check the status of a crawl job. ```json theme={null} { "name": "firecrawl_check_crawl_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` Returns: Status and progress of the crawl job, including results if available. ### 6. Extract Tool (`firecrawl_extract`) Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction. ```json theme={null} { "name": "firecrawl_extract", "arguments": { "urls": ["https://example.com/page1", "https://example.com/page2"], "prompt": "Extract product information including name, price, and description", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } }, "required": ["name", "price"] }, "allowExternalLinks": false, "enableWebSearch": false, "includeSubdomains": false } } ``` Example response: ```json theme={null} { "content": [ { "type": "text", "text": { "name": "Example Product", "price": 99.99, "description": "This is an example product description" } } ], "isError": false } ``` #### Extract Tool Options: * `urls`: Array of URLs to extract information from * `prompt`: Custom prompt for the LLM extraction * `schema`: JSON schema for structured data extraction * `allowExternalLinks`: Allow extraction from external links * `enableWebSearch`: Enable web search for additional context * `includeSubdomains`: Include subdomains in extraction When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service. ### 7. Agent Tool (`firecrawl_agent`) Autonomous web research agent that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. This runs asynchronously -- it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results. ```json theme={null} { "name": "firecrawl_agent", "arguments": { "prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts", "schema": { "type": "object", "properties": { "startups": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "funding": { "type": "string" }, "founded": { "type": "string" } } } } } } } } ``` You can also provide specific URLs for the agent to focus on: ```json theme={null} { "name": "firecrawl_agent", "arguments": { "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], "prompt": "Compare the features and pricing information from these pages" } } ``` #### Agent Tool Options: * `prompt`: Natural language description of the data you want (required, max 10,000 characters) * `urls`: Optional array of URLs to focus the agent on specific pages * `schema`: Optional JSON schema for structured output Best for: Complex research tasks where you don't know the exact URLs; multi-source data gathering; finding information scattered across the web; extracting data from JavaScript-heavy SPAs that fail with regular scrape. Returns: Job ID for status checking. Use `firecrawl_agent_status` to poll for results. ### 8. Check Agent Status (`firecrawl_agent_status`) Check the status of an agent job and retrieve results when complete. Poll every 15-30 seconds and keep polling for at least 2-3 minutes before considering the request failed. ```json theme={null} { "name": "firecrawl_agent_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` #### Agent Status Options: * `id`: The agent job ID returned by `firecrawl_agent` (required) Possible statuses: * `processing`: Agent is still researching -- keep polling * `completed`: Research finished -- response includes the extracted data * `failed`: An error occurred Returns: Status, progress, and results (if completed) of the agent job. ### 9. Create Browser Session (`firecrawl_browser_create`) Create a persistent browser session for code execution via CDP (Chrome DevTools Protocol). ```json theme={null} { "name": "firecrawl_browser_create", "arguments": { "ttl": 120, "activityTtl": 60 } } ``` #### Browser Create Options: * `ttl`: Total session lifetime in seconds (30-3600, optional) * `activityTtl`: Idle timeout in seconds (10-3600, optional) Best for: Running code (Python/JS) that interacts with a live browser page, multi-step browser automation, sessions with profiles that survive across multiple tool calls. Returns: Session ID, CDP URL, and live view URL. ### 10. Execute Code in Browser (`firecrawl_browser_execute`) Execute code in an active browser session. Supports agent-browser commands (bash), Python, or JavaScript. ```json theme={null} { "name": "firecrawl_browser_execute", "arguments": { "sessionId": "session-id-here", "code": "agent-browser open https://example.com", "language": "bash" } } ``` Python example with Playwright: ```json theme={null} { "name": "firecrawl_browser_execute", "arguments": { "sessionId": "session-id-here", "code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)", "language": "python" } } ``` #### Browser Execute Options: * `sessionId`: The browser session ID (required) * `code`: The code to execute (required) * `language`: `bash`, `python`, or `node` (optional, defaults to `bash`) Common agent-browser commands (bash): * `agent-browser open <url>` -- Navigate to URL * `agent-browser snapshot` -- Get accessibility tree with clickable refs * `agent-browser click @e5` -- Click element by ref from snapshot * `agent-browser type @e3 "text"` -- Type into element * `agent-browser screenshot [path]` -- Take screenshot * `agent-browser scroll down` -- Scroll page * `agent-browser wait 2000` -- Wait 2 seconds Returns: Execution result including stdout, stderr, and exit code. ### 11. Delete Browser Session (`firecrawl_browser_delete`) Destroy a browser session. ```json theme={null} { "name": "firecrawl_browser_delete", "arguments": { "sessionId": "session-id-here" } } ``` #### Browser Delete Options: * `sessionId`: The browser session ID to destroy (required) Returns: Success confirmation. ### 12. List Browser Sessions (`firecrawl_browser_list`) List browser sessions, optionally filtered by status. ```json theme={null} { "name": "firecrawl_browser_list", "arguments": { "status": "active" } } ``` #### Browser List Options: * `status`: Filter by session status -- `active` or `destroyed` (optional) Returns: Array of browser sessions. ### 13. Interact with Scraped Page (`firecrawl_interact`) Interact with a previously scraped page in a live browser session. Scrape a page first with `firecrawl_scrape`, then use the returned `scrapeId` (from the scrape response metadata) to click buttons, fill forms, extract dynamic content, or navigate deeper. The response includes a `liveViewUrl` and `interactiveLiveViewUrl` you can open in your browser to watch or control the session in real time. ```json theme={null} { "name": "firecrawl_interact", "arguments": { "scrapeId": "scrape-id-from-previous-scrape", "prompt": "Click the Sign In button" } } ``` #### Interact Tool Options: * `scrapeId`: The scrape job ID from a previous `firecrawl_scrape` call (required) * `prompt`: Natural language instruction describing the action to take (provide `prompt` or `code`) * `code`: Code to execute in the browser session (provide `code` or `prompt`) * `language`: `bash`, `python`, or `node` (optional, defaults to `node`, only used with `code`) * `timeout`: Execution timeout in seconds, 1–300 (optional, defaults to 30) Best for: Multi-step workflows on a single page — searching a site, clicking through results, filling forms, extracting data that requires interaction. Returns: Interaction result including `liveViewUrl` and `interactiveLiveViewUrl`. ### 14. Stop Interact Session (`firecrawl_interact_stop`) Stop an interact session for a scraped page. Call this when you are done interacting to free resources. ```json theme={null} { "name": "firecrawl_interact_stop", "arguments": { "scrapeId": "scrape-id-from-previous-scrape" } } ``` #### Interact Stop Options: * `scrapeId`: The scrape ID for the session to stop (required) Returns: Confirmation that the session has been stopped. ## Logging System The server includes comprehensive logging: * Operation status and progress * Performance metrics * Credit usage monitoring * Rate limit tracking * Error conditions Example log messages: ``` [INFO] Firecrawl MCP Server initialized successfully [INFO] Starting scrape for URL: https://example.com [INFO] Starting crawl for URL: https://example.com [WARNING] Credit usage has reached warning threshold [ERROR] Rate limit exceeded, retrying in 2s... ``` ## Error Handling The server provides robust error handling: * Automatic retries for transient errors * Rate limit handling with backoff * Detailed error messages * Credit usage warnings * Network resilience Example error response: ```json theme={null} { "content": [ { "type": "text", "text": "Error: Rate limit exceeded. Retrying in 2 seconds..." } ], "isError": true } ``` ## Development ```bash theme={null} # Install dependencies npm install # Build npm run build # Run tests npm test ``` ### Contributing 1. Fork the repository 2. Create your feature branch 3. Run tests: `npm test` 4. Submit a pull request ### Thanks to contributors Thanks to [@vrknetha](https://github.com/vrknetha), [@cawstudios](https://caw.tech) for the initial implementation! Thanks to MCP.so and Klavis AI for hosting and [@gstarwd](https://github.com/gstarwd), [@xiangkaiz](https://github.com/xiangkaiz) and [@zihaolin96](https://github.com/zihaolin96) for integrating our server. ## License MIT License - see LICENSE file for details # Rate Limits Source: https://docs.firecrawl.dev/rate-limits Rate limits for different pricing plans and API requests ## Billing Model Firecrawl uses subscription-based monthly plans. There is no pure pay-as-you-go model, but the auto-recharge feature provides flexible scaling. Once you subscribe to a plan, you can automatically purchase additional credits when you dip below a threshold. Larger auto-recharge packs offer better rates. To test before committing to a larger plan, start with the Free or Hobby tier. Plan downgrades take effect at the next renewal. Unused-time credits are not issued. ## Concurrent Browser Limits Concurrent browsers represent how many web pages Firecrawl can process for you at the same time. Your plan determines how many of these jobs can run simultaneously - if you exceed this limit, additional jobs will wait in a queue until resources become available. Note that time spent waiting in the queue counts against the request's [`timeout`](/advanced-scraping-guide#timing-and-cache) parameter, so you can set a lower timeout to fail fast instead of waiting. You can also check current availability via the [Queue Status](/api-reference/endpoint/queue-status) endpoint. ### Current Plans <div> <table> <thead> <tr> <th>Plan</th> <th>Concurrent Browsers</th> <th>Max Queued Jobs</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>2</td> <td>50,000</td> </tr> <tr> <td>Hobby</td> <td>5</td> <td>50,000</td> </tr> <tr> <td>Standard</td> <td>50</td> <td>100,000</td> </tr> <tr> <td>Growth</td> <td>100</td> <td>200,000</td> </tr> <tr> <td>Scale / Enterprise</td> <td>150+</td> <td>300,000+</td> </tr> </tbody> </table> </div> Each team has a maximum number of jobs that can be waiting in the concurrency queue. If you exceed this limit, new jobs will be rejected with a `429` status code until existing jobs complete. For larger plans with custom concurrency limits, the max queued jobs is 2,000 times your concurrency limit, capped at 2,000,000. If you require higher concurrency limits, [contact us about enterprise plans](https://firecrawl.dev/enterprise). ### Extract Plans (Legacy) <div> <table> <thead> <tr> <th>Plan</th> <th>Concurrent Browsers</th> <th>Max Queued Jobs</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>2</td> <td>50,000</td> </tr> <tr> <td>Starter</td> <td>50</td> <td>100,000</td> </tr> <tr> <td>Explorer</td> <td>100</td> <td>200,000</td> </tr> <tr> <td>Pro</td> <td>200</td> <td>400,000</td> </tr> </tbody> </table> </div> ## API Rate Limits Rate limits are measured in requests per minute and are primarily in place to prevent abuse. When configured correctly, your real bottleneck will be concurrent browsers. Rate limits are applied per team, so all API keys on the same team share the same rate limit counters. ### Current Plans <div> <table> <thead> <tr> <th>Plan</th> <th>/scrape</th> <th>/map</th> <th>/crawl</th> <th>/search</th> <th>/agent</th> <th>/crawl/status</th> <th>/agent/status</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>10</td> <td>10</td> <td>1</td> <td>5</td> <td>10</td> <td>1500</td> <td>500</td> </tr> <tr> <td>Hobby</td> <td>100</td> <td>100</td> <td>15</td> <td>50</td> <td>100</td> <td>1500</td> <td>25000</td> </tr> <tr> <td>Standard</td> <td>500</td> <td>500</td> <td>50</td> <td>250</td> <td>500</td> <td>1500</td> <td>25000</td> </tr> <tr> <td>Growth</td> <td>5000</td> <td>5000</td> <td>250</td> <td>2500</td> <td>1000</td> <td>1500</td> <td>25000</td> </tr> <tr> <td>Scale</td> <td>7500</td> <td>7500</td> <td>750</td> <td>7500</td> <td>1000</td> <td>25000</td> <td>25000</td> </tr> </tbody> </table> </div> These rate limits are enforced to ensure fair usage and availability of the API for all users. If you require higher limits, please contact us at [help@firecrawl.com](mailto:help@firecrawl.com) to discuss custom plans. ### Extract Endpoints The extract endpoints share limits with the corresponding /agent rate limits. ### Batch Scrape Endpoints The batch scrape endpoints share limits with the corresponding /crawl rate limits. ### Browser Sandbox The browser sandbox endpoints have per-plan rate limits that scale with your subscription: <div> <table> <thead> <tr> <th>Plan</th> <th>/browser</th> <th>/browser/\{id}/execute</th> </tr> </thead> <tbody> <tr> <td>Free</td> <td>2</td> <td>10</td> </tr> <tr> <td>Hobby</td> <td>20</td> <td>100</td> </tr> <tr> <td>Standard</td> <td>100</td> <td>500</td> </tr> <tr> <td>Growth</td> <td>1,000</td> <td>5,000</td> </tr> <tr> <td>Scale</td> <td>1,500</td> <td>7,500</td> </tr> </tbody> </table> </div> In addition, each team's plan determines how many browser sessions can be active simultaneously (see Concurrent Browser Limits above). If you exceed this limit, new session requests will return a `429` status code until existing sessions are destroyed. ### FIRE-1 Agent Requests involving the FIRE-1 agent requests have separate rate limits that are counted independently for each endpoint: <div> <table> <thead> <tr> <th>Endpoint</th> <th>Rate Limit (requests/min)</th> </tr> </thead> <tbody> <tr> <td>/scrape</td> <td>10</td> </tr> <tr> <td>/extract</td> <td>10</td> </tr> </tbody> </table> </div> ### Extract Plans (Legacy) <div> <table> <thead> <tr> <th>Plan</th> <th>/extract (requests/min)</th> <th>/extract/status (requests/min)</th> </tr> </thead> <tbody> <tr> <td>Starter</td> <td>100</td> <td>25000</td> </tr> <tr> <td>Explorer</td> <td>500</td> <td>25000</td> </tr> <tr> <td>Pro</td> <td>1000</td> <td>25000</td> </tr> </tbody> </table> </div> # Skill + CLI Source: https://docs.firecrawl.dev/sdks/cli Firecrawl Skill is an easy way for AI agents such as Claude Code, Antigravity and OpenCode to use Firecrawl through the CLI. Search, scrape, and interact with the web directly from the terminal. The Firecrawl CLI works standalone or as a skill that AI coding agents like Claude Code, Antigravity, and OpenCode can discover and use automatically. ## Installation If you are using an AI agent like Claude Code, you can install the Firecrawl skill below and the agent will set it up for you. ```bash theme={null} npx -y firecrawl-cli@latest init --all --browser ``` * `--all` installs the Firecrawl skill to every detected AI coding agent * `--browser` opens the browser for Firecrawl authentication automatically <Note> After installing the skill, restart your agent for it to discover the new skill. </Note> You can also manually install the Firecrawl CLI globally using npm: ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl-cli ``` ## Authentication Before using the CLI, you need to authenticate with your Firecrawl API key. ### Login ```bash CLI theme={null} # Interactive login (opens browser or prompts for API key) firecrawl login # Login with browser authentication (recommended for agents) firecrawl login --browser # Login with API key directly firecrawl login --api-key fc-YOUR-API-KEY # Or set via environment variable export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ### View Configuration ```bash CLI theme={null} # View current configuration and authentication status firecrawl view-config ``` ### Logout ```bash CLI theme={null} # Clear stored credentials firecrawl logout ``` ### Self-Hosted / Local Development For self-hosted Firecrawl instances or local development, use the `--api-url` option: ```bash CLI theme={null} # Use a local Firecrawl instance (no API key required) firecrawl --api-url http://localhost:3002 scrape https://example.com # Or set via environment variable export FIRECRAWL_API_URL=http://localhost:3002 firecrawl scrape https://example.com # Configure and persist the custom API URL firecrawl config --api-url http://localhost:3002 ``` When using a custom API URL (anything other than `https://api.firecrawl.dev`), API key authentication is automatically skipped, allowing you to use local instances without an API key. ### Check Status Verify installation, authentication, and view rate limits: ```bash CLI theme={null} firecrawl --status ``` Output when ready: ``` 🔥 firecrawl cli v1.1.1 ● Authenticated via FIRECRAWL_API_KEY Concurrency: 0/100 jobs (parallel scrape limit) Credits: 500,000 remaining ``` * Concurrency: Maximum parallel jobs. Run parallel operations close to this limit but not above. * Credits: Remaining API credits. Each scrape/crawl consumes credits. ## Commands ### Scrape Scrape a single URL and extract its content in various formats. <Tip> Use `--only-main-content` to get clean output without navigation, footers, and ads. This is recommended for most use cases where you want just the article or main page content. </Tip> ```bash CLI theme={null} # Scrape a URL (default: markdown output) firecrawl https://example.com # Or use the explicit scrape command firecrawl scrape https://example.com # Recommended: use --only-main-content for clean output without nav/footer firecrawl https://example.com --only-main-content ``` #### Output Formats ```bash CLI theme={null} # Get HTML output firecrawl https://example.com --html # Multiple formats (returns JSON) firecrawl https://example.com --format markdown,links # Get images from a page firecrawl https://example.com --format images # Get a summary of the page content firecrawl https://example.com --format summary # Track changes on a page firecrawl https://example.com --format changeTracking # Available formats: markdown, html, rawHtml, links, screenshot, json, images, summary, changeTracking, attributes, branding ``` #### Scrape Options ```bash CLI theme={null} # Extract only main content (removes navs, footers) firecrawl https://example.com --only-main-content # Wait for JavaScript rendering firecrawl https://example.com --wait-for 3000 # Take a screenshot firecrawl https://example.com --screenshot # Include/exclude specific HTML tags firecrawl https://example.com --include-tags article,main firecrawl https://example.com --exclude-tags nav,footer # Save output to file firecrawl https://example.com -o output.md # Pretty print JSON output firecrawl https://example.com --format markdown,links --pretty # Force JSON output even with single format firecrawl https://example.com --json # Show request timing information firecrawl https://example.com --timing ``` Available Options: | Option | Short | Description | | ----------------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `--url <url>` | `-u` | URL to scrape (alternative to positional argument) | | `--format <formats>` | `-f` | Output formats (comma-separated): `markdown`, `html`, `rawHtml`, `links`, `screenshot`, `json`, `images`, `summary`, `changeTracking`, `attributes`, `branding` | | `--html` | `-H` | Shortcut for `--format html` | | `--only-main-content` | | Extract only main content | | `--wait-for <ms>` | | Wait time in milliseconds for JS rendering | | `--screenshot` | | Take a screenshot | | `--include-tags <tags>` | | HTML tags to include (comma-separated) | | `--exclude-tags <tags>` | | HTML tags to exclude (comma-separated) | | `--output <path>` | `-o` | Save output to file | | `--json` | | Force JSON output even with single format | | `--pretty` | | Pretty print JSON output | | `--timing` | | Show request timing and other useful information | * ### Search Search the web and optionally scrape the results. ```bash CLI theme={null} # Search the web firecrawl search "web scraping tutorials" # Limit results firecrawl search "AI news" --limit 10 # Pretty print results firecrawl search "machine learning" --pretty ``` #### Search Options ```bash CLI theme={null} # Search specific sources firecrawl search "AI" --sources web,news,images # Search with category filters firecrawl search "react hooks" --categories github firecrawl search "machine learning" --categories research,pdf # Time-based filtering firecrawl search "tech news" --tbs qdr:h # Last hour firecrawl search "tech news" --tbs qdr:d # Last day firecrawl search "tech news" --tbs qdr:w # Last week firecrawl search "tech news" --tbs qdr:m # Last month firecrawl search "tech news" --tbs qdr:y # Last year # Location-based search firecrawl search "restaurants" --location "Berlin,Germany" --country DE # Search and scrape results firecrawl search "documentation" --scrape --scrape-formats markdown # Save to file firecrawl search "firecrawl" --pretty -o results.json ``` Available Options: | Option | Description | | ---------------------------- | ------------------------------------------------------------------------------------------- | | `--limit <number>` | Maximum results (default: 5, max: 100) | | `--sources <sources>` | Sources to search: `web`, `images`, `news` (comma-separated) | | `--categories <categories>` | Filter by category: `github`, `research`, `pdf` (comma-separated) | | `--tbs <value>` | Time filter: `qdr:h` (hour), `qdr:d` (day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year) | | `--location <location>` | Geo-targeting (e.g., "Berlin,Germany") | | `--country <code>` | ISO country code (default: US) | | `--timeout <ms>` | Timeout in milliseconds (default: 60000) | | `--ignore-invalid-urls` | Exclude URLs invalid for other Firecrawl endpoints | | `--scrape` | Scrape search results | | `--scrape-formats <formats>` | Formats for scraped content (default: markdown) | | `--only-main-content` | Include only main content when scraping (default: true) | | `--json` | Output as JSON | | `--output <path>` | Save output to file | | `--pretty` | Pretty print JSON output | * ### Map Discover all URLs on a website quickly. ```bash CLI theme={null} # Discover all URLs on a website firecrawl map https://example.com # Output as JSON firecrawl map https://example.com --json # Limit number of URLs firecrawl map https://example.com --limit 500 ``` #### Map Options ```bash CLI theme={null} # Filter URLs by search query firecrawl map https://example.com --search "blog" # Include subdomains firecrawl map https://example.com --include-subdomains # Control sitemap usage firecrawl map https://example.com --sitemap include # Use sitemap firecrawl map https://example.com --sitemap skip # Skip sitemap firecrawl map https://example.com --sitemap only # Only use sitemap # Ignore query parameters (dedupe URLs) firecrawl map https://example.com --ignore-query-parameters # Wait for map to complete with timeout firecrawl map https://example.com --wait --timeout 60 # Save to file firecrawl map https://example.com -o urls.txt firecrawl map https://example.com --json --pretty -o urls.json ``` Available Options: | Option | Description | | --------------------------- | ----------------------------------------------- | | `--url <url>` | URL to map (alternative to positional argument) | | `--limit <number>` | Maximum URLs to discover | | `--search <query>` | Filter URLs by search query | | `--sitemap <mode>` | Sitemap handling: `include`, `skip`, `only` | | `--include-subdomains` | Include subdomains | | `--ignore-query-parameters` | Treat URLs with different params as same | | `--wait` | Wait for map to complete | | `--timeout <seconds>` | Timeout in seconds | | `--json` | Output as JSON | | `--output <path>` | Save output to file | | `--pretty` | Pretty print JSON output | * ### Browser Have your agents interact with the web using a secure browser sandbox. Launch cloud browser sessions and execute Python, JavaScript, or bash code remotely. Each session runs a full Chromium instance — no local browser install required. Code runs server-side with a pre-configured [Playwright](https://playwright.dev/) `page` object ready to use. ```bash CLI theme={null} # Launch a cloud browser session firecrawl browser launch-session # Execute agent-browser commands (default - "agent-browser" is auto-prefixed) firecrawl browser execute "open https://example.com" firecrawl browser execute "snapshot" firecrawl browser execute "click @e5" firecrawl browser execute "scrape" # Execute Playwright Python code firecrawl browser execute --python 'await page.goto("https://example.com") print(await page.title())' # Execute Playwright JavaScript code firecrawl browser execute --node 'await page.goto("https://example.com"); console.log(await page.title());' # List all sessions (or: list active / list destroyed) firecrawl browser list # Close the active session firecrawl browser close ``` #### Browser Options ```bash CLI theme={null} # Launch with custom TTL (10 minutes) and live view firecrawl browser launch-session --ttl 600 --stream # Launch with inactivity timeout firecrawl browser launch-session --ttl 120 --ttl-inactivity 60 # agent-browser commands (default - "agent-browser" is auto-prefixed) firecrawl browser execute "open https://news.ycombinator.com" firecrawl browser execute "snapshot" firecrawl browser execute "click @e3" firecrawl browser execute "scrape" # Playwright Python - navigate, interact, extract firecrawl browser execute --python ' await page.goto("https://news.ycombinator.com") items = await page.query_selector_all(".titleline > a") for item in items[:5]: print(await item.text_content()) ' # Playwright JavaScript - same page object firecrawl browser execute --node ' await page.goto("https://example.com"); const title = await page.title(); console.log(title); ' # Explicit bash mode - runs in the sandbox firecrawl browser execute --bash "agent-browser snapshot" # Target a specific session firecrawl browser execute --session <id> --python 'print(await page.title())' # Save output to file firecrawl browser execute "scrape" -o result.txt # Close a specific session firecrawl browser close --session <id> # List sessions (all / active / destroyed) firecrawl browser list firecrawl browser list active --json ``` Subcommands: | Subcommand | Description | | ---------------- | ----------------------------------------------------------------------------------- | | `launch-session` | Launch a new cloud browser session (returns session ID, CDP URL, and live view URL) | | `execute <code>` | Execute Playwright Python/JS code or bash commands in a session | | `list [status]` | List browser sessions (filter by `active` or `destroyed`) | | `close` | Close a browser session | Execute Options: | Option | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `--bash` | Execute bash commands remotely in the sandbox (default). [agent-browser](https://github.com/vercel-labs/agent-browser) (40+ commands) is pre-installed and auto-prefixed. `CDP_URL` is auto-injected so agent-browser connects to your session automatically. Best approach for AI agents. | | `--python` | Execute as Playwright Python code. A Playwright `page` object is available — use `await page.goto()`, `await page.title()`, etc. | | `--node` | Execute as Playwright JavaScript code. Same `page` object available. | | `--session <id>` | Target a specific session (default: active session) | Launch Options: | Option | Description | | ---------------------------- | ------------------------------------------------------------------- | | `--ttl <seconds>` | Total session TTL (default: 600, range: 30–3600) | | `--ttl-inactivity <seconds>` | Auto-close after inactivity (range: 10–3600) | | `--profile <name>` | Name for a profile (saves and reuses browser state across sessions) | | `--no-save-changes` | Load existing profile data without saving changes back | | `--stream` | Enable live view streaming | Common Options: | Option | Description | | ----------------- | --------------------- | | `--output <path>` | Save output to file | | `--json` | Output as JSON format | * ### Crawl Crawl an entire website starting from a URL. ```bash CLI theme={null} # Start a crawl (returns job ID immediately) firecrawl crawl https://example.com # Wait for crawl to complete firecrawl crawl https://example.com --wait # Wait with progress indicator firecrawl crawl https://example.com --wait --progress ``` #### Check Crawl Status ```bash CLI theme={null} # Check crawl status using job ID firecrawl crawl <job-id> # Example with a real job ID firecrawl crawl 550e8400-e29b-41d4-a716-446655440000 ``` #### Crawl Options ```bash CLI theme={null} # Limit crawl depth and pages firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait # Include only specific paths firecrawl crawl https://example.com --include-paths /blog,/docs --wait # Exclude specific paths firecrawl crawl https://example.com --exclude-paths /admin,/login --wait # Include subdomains firecrawl crawl https://example.com --allow-subdomains --wait # Crawl entire domain firecrawl crawl https://example.com --crawl-entire-domain --wait # Rate limiting firecrawl crawl https://example.com --delay 1000 --max-concurrency 2 --wait # Custom polling interval and timeout firecrawl crawl https://example.com --wait --poll-interval 10 --timeout 300 # Save results to file firecrawl crawl https://example.com --wait --pretty -o results.json ``` Available Options: | Option | Description | | --------------------------- | ------------------------------------------------- | | `--url <url>` | URL to crawl (alternative to positional argument) | | `--wait` | Wait for crawl to complete | | `--progress` | Show progress indicator while waiting | | `--poll-interval <seconds>` | Polling interval (default: 5) | | `--timeout <seconds>` | Timeout when waiting | | `--status` | Check status of existing crawl job | | `--limit <number>` | Maximum pages to crawl | | `--max-depth <number>` | Maximum crawl depth | | `--include-paths <paths>` | Paths to include (comma-separated) | | `--exclude-paths <paths>` | Paths to exclude (comma-separated) | | `--sitemap <mode>` | Sitemap handling: `include`, `skip`, `only` | | `--allow-subdomains` | Include subdomains | | `--allow-external-links` | Follow external links | | `--crawl-entire-domain` | Crawl entire domain | | `--ignore-query-parameters` | Treat URLs with different params as same | | `--delay <ms>` | Delay between requests | | `--max-concurrency <n>` | Maximum concurrent requests | | `--output <path>` | Save output to file | | `--pretty` | Pretty print JSON output | * ### Agent Search and gather data from the web using natural language prompts. ```bash CLI theme={null} # Basic usage - URLs are optional firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait # Focus on specific URLs firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait # Use a schema for structured output firecrawl agent "Get company information" --urls https://example.com --schema '{"name": "string", "founded": "number"}' --wait # Use schema from a file firecrawl agent "Get product details" --urls https://example.com --schema-file schema.json --wait ``` #### Agent Options ```bash CLI theme={null} # Use Spark 1 Pro for higher accuracy firecrawl agent "Competitive analysis across multiple domains" --model spark-1-pro --wait # Set max credits to limit costs firecrawl agent "Gather contact information from company websites" --max-credits 100 --wait # Check status of an existing job firecrawl agent <job-id> --status # Custom polling interval and timeout firecrawl agent "Summarize recent blog posts" --wait --poll-interval 10 --timeout 300 # Save output to file firecrawl agent "Find pricing information" --urls https://example.com --wait -o pricing.json --pretty ``` Available Options: | Option | Description | | --------------------------- | -------------------------------------------------------------------------------------- | | `--urls <urls>` | Optional list of URLs to focus the agent on (comma-separated) | | `--model <model>` | Model to use: `spark-1-mini` (default, 60% cheaper) or `spark-1-pro` (higher accuracy) | | `--schema <json>` | JSON schema for structured output (inline JSON string) | | `--schema-file <path>` | Path to JSON schema file for structured output | | `--max-credits <number>` | Maximum credits to spend (job fails if limit reached) | | `--status` | Check status of existing agent job | | `--wait` | Wait for agent to complete before returning results | | `--poll-interval <seconds>` | Polling interval when waiting (default: 5) | | `--timeout <seconds>` | Timeout when waiting (default: no timeout) | | `--output <path>` | Save output to file | | `--json` | Output as JSON format | * ### Credit Usage Check your team's credit balance and usage. ```bash CLI theme={null} # View credit usage firecrawl credit-usage # Output as JSON firecrawl credit-usage --json --pretty ``` *** ### Version Display the CLI version. ```bash CLI theme={null} firecrawl version # or firecrawl --version ``` ## Global Options These options are available for all commands: | Option | Short | Description | | ----------------- | ----- | ------------------------------------------------------ | | `--status` | | Show version, auth, concurrency, and credits | | `--api-key <key>` | `-k` | Override stored API key for this command | | `--api-url <url>` | | Use custom API URL (for self-hosted/local development) | | `--help` | `-h` | Show help for a command | | `--version` | `-V` | Show CLI version | ## Output Handling The CLI outputs to stdout by default, making it easy to pipe or redirect: ```bash CLI theme={null} # Pipe markdown to another command firecrawl https://example.com | head -50 # Redirect to a file firecrawl https://example.com > output.md # Save JSON with pretty formatting firecrawl https://example.com --format markdown,links --pretty -o data.json ``` ### Format Behavior * Single format: Outputs raw content (markdown text, HTML, etc.) * Multiple formats: Outputs JSON with all requested data ```bash CLI theme={null} # Raw markdown output firecrawl https://example.com --format markdown # JSON output with multiple formats firecrawl https://example.com --format markdown,links ``` ## Examples ### Quick Scrape ```bash CLI theme={null} # Get markdown content from a URL (use --only-main-content for clean output) firecrawl https://docs.firecrawl.dev --only-main-content # Get HTML content firecrawl https://example.com --html -o page.html ``` ### Full Site Crawl ```bash CLI theme={null} # Crawl a docs site with limits firecrawl crawl https://docs.example.com --limit 50 --max-depth 2 --wait --progress -o docs.json ``` ### Site Discovery ```bash CLI theme={null} # Find all blog posts firecrawl map https://example.com --search "blog" -o blog-urls.txt ``` ### Research Workflow ```bash CLI theme={null} # Search and scrape results for research firecrawl search "machine learning best practices 2024" --scrape --scrape-formats markdown --pretty ``` ### Agent ```bash CLI theme={null} # URLs are optional firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait # Focus on specific URLs firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait ``` ### Browser Automation ```bash CLI theme={null} # Launch a session, scrape a page, and close firecrawl browser launch-session firecrawl browser execute "open https://news.ycombinator.com" firecrawl browser execute "snapshot" firecrawl browser execute "scrape" firecrawl browser close # Use agent-browser via bash mode (default — recommended for AI agents) firecrawl browser launch-session firecrawl browser execute "open https://example.com" firecrawl browser execute "snapshot" # snapshot returns @ref IDs — use them to interact firecrawl browser execute "click @e5" firecrawl browser execute "fill @e3 'search query'" firecrawl browser execute "scrape" # Run --help to see all 40+ commands firecrawl browser execute --bash "agent-browser --help" firecrawl browser close ``` ### Combine with Other Tools ```bash CLI theme={null} # Extract URLs from search results jq -r '.data.web[].url' search-results.json # Get titles from search results jq -r '.data.web[] | "\(.title): \(.url)"' search-results.json # Extract links and process with jq firecrawl https://example.com --format links | jq '.links[].url' # Count URLs from map firecrawl map https://example.com | wc -l ``` ## Telemetry The CLI collects anonymous usage data during authentication to help improve the product: * CLI version, OS, and Node.js version * Development tool detection (e.g., Cursor, VS Code, Claude Code) No command data, URLs, or file contents are collected via the CLI. To disable telemetry, set the environment variable: ```bash CLI theme={null} export FIRECRAWL_NO_TELEMETRY=1 ``` ## Open Source The Firecrawl CLI and Skill are open source and available on GitHub: [firecrawl/cli](https://github.com/firecrawl/cli) > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # .NET Source: https://docs.firecrawl.dev/sdks/dotnet Firecrawl .NET SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation The official .NET SDK is maintained in the Firecrawl monorepo at [apps/.net-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/.net-sdk). To install the Firecrawl .NET SDK, add the NuGet package: <Tabs> <Tab title=".NET CLI"> ```bash theme={null} dotnet add package firecrawl-sdk ``` </Tab> <Tab title="Package Manager"> ```powershell theme={null} Install-Package firecrawl-sdk ``` </Tab> <Tab title="PackageReference"> ```xml theme={null} <PackageReference Include="firecrawl-sdk" Version="1.0.0" /> ``` </Tab> </Tabs> <Note>Requires .NET 8.0 or later.</Note> ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it to the `FirecrawlClient` constructor Here is a quick example using the current SDK API surface: ```csharp theme={null} using Firecrawl; using Firecrawl.Models; var client = new FirecrawlClient("fc-your-api-key"); // Scrape a single page var doc = await client.ScrapeAsync("https://firecrawl.dev", new ScrapeOptions { Formats = new List<object> { "markdown" } }); // Crawl a website var job = await client.CrawlAsync("https://firecrawl.dev", new CrawlOptions { Limit = 5 }); Console.WriteLine(doc.Markdown); Console.WriteLine($"Crawled pages: {job.Data?.Count ?? 0}"); ``` ### Scraping a URL To scrape a single URL, use the `ScrapeAsync` method. ```csharp theme={null} using Firecrawl.Models; var doc = await client.ScrapeAsync("https://firecrawl.dev", new ScrapeOptions { Formats = new List<object> { "markdown", "html" }, OnlyMainContent = true, WaitFor = 5000 }); Console.WriteLine(doc.Markdown); Console.WriteLine(doc.Metadata?["title"]); ``` #### JSON Extraction Extract structured JSON with `JsonFormat` via the scrape endpoint: ```csharp theme={null} using Firecrawl.Models; var jsonFmt = new JsonFormat { Prompt = "Extract the product name and price", Schema = new Dictionary<string, object> { ["type"] = "object", ["properties"] = new Dictionary<string, object> { ["name"] = new Dictionary<string, object> { ["type"] = "string" }, ["price"] = new Dictionary<string, object> { ["type"] = "number" } } } }; var doc = await client.ScrapeAsync("https://example.com/product", new ScrapeOptions { Formats = new List<object> { jsonFmt } }); Console.WriteLine(doc.Json); ``` ### Crawling a Website To crawl a website and wait for completion, use `CrawlAsync`. This method handles polling and pagination automatically. ```csharp theme={null} using Firecrawl.Models; var job = await client.CrawlAsync("https://firecrawl.dev", new CrawlOptions { Limit = 50, MaxDiscoveryDepth = 3, ScrapeOptions = new ScrapeOptions { Formats = new List<object> { "markdown" } } }); Console.WriteLine($"Status: {job.Status}"); Console.WriteLine($"Progress: {job.Completed}/{job.Total}"); if (job.Data != null) { foreach (var page in job.Data) { Console.WriteLine(page.Metadata?["sourceURL"]); } } ``` ### Start a Crawl Start a job without waiting using `StartCrawlAsync`. ```csharp theme={null} using Firecrawl.Models; var start = await client.StartCrawlAsync("https://firecrawl.dev", new CrawlOptions { Limit = 100 }); Console.WriteLine($"Job ID: {start.Id}"); ``` ### Checking Crawl Status Check crawl progress with `GetCrawlStatusAsync`. ```csharp theme={null} var status = await client.GetCrawlStatusAsync(start.Id!); Console.WriteLine($"Status: {status.Status}"); Console.WriteLine($"Progress: {status.Completed}/{status.Total}"); ``` ### Cancelling a Crawl Cancel a running crawl with `CancelCrawlAsync`. ```csharp theme={null} var result = await client.CancelCrawlAsync(start.Id!); Console.WriteLine(result); ``` ### Mapping a Website Discover links on a site using `MapAsync`. ```csharp theme={null} using Firecrawl.Models; var data = await client.MapAsync("https://firecrawl.dev", new MapOptions { Limit = 100, Search = "blog" }); if (data.Links != null) { foreach (var link in data.Links) { Console.WriteLine(link); } } ``` ### Searching the Web Search with optional search settings using `SearchAsync`. ```csharp theme={null} using Firecrawl.Models; var results = await client.SearchAsync("firecrawl web scraping", new SearchOptions { Limit = 10, Location = "US" }); if (results.Web != null) { foreach (var hit in results.Web) { Console.WriteLine($"{hit.Title} - {hit.Url}"); } } ``` ### Batch Scraping Scrape multiple URLs in parallel using `BatchScrapeAsync`. This method handles polling and pagination automatically. ```csharp theme={null} using Firecrawl.Models; var urls = new List<string> { "https://firecrawl.dev", "https://firecrawl.dev/blog" }; var job = await client.BatchScrapeAsync(urls, new BatchScrapeOptions { Options = new ScrapeOptions { Formats = new List<object> { "markdown" } } }); if (job.Data != null) { foreach (var doc in job.Data) { Console.WriteLine(doc.Markdown); } } ``` #### Batch Scrape with Idempotency Key To ensure duplicate requests are not processed, pass an `IdempotencyKey`: ```csharp theme={null} var job = await client.BatchScrapeAsync(urls, new BatchScrapeOptions { IdempotencyKey = "my-unique-key", Options = new ScrapeOptions { Formats = new List<object> { "markdown" } } }); ``` ### Usage & Metrics Check concurrency and remaining credits: ```csharp theme={null} using Firecrawl.Models; var concurrency = await client.GetConcurrencyAsync(); Console.WriteLine($"Concurrency: {concurrency.Current}/{concurrency.MaxConcurrency}"); var credits = await client.GetCreditUsageAsync(); Console.WriteLine($"Remaining credits: {credits.RemainingCredits}"); ``` ## Async Support All methods in the .NET SDK are async by default and return `Task<T>`. They support `CancellationToken` for cooperative cancellation. ```csharp theme={null} using Firecrawl.Models; using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30)); var doc = await client.ScrapeAsync("https://example.com", new ScrapeOptions { Formats = new List<object> { "markdown" } }, cancellationToken: cts.Token); Console.WriteLine(doc.Markdown); ``` ## Configuration The `FirecrawlClient` constructor supports the following options: | Option | Type | Default | Description | | --------------- | ------------- | ---------------------------------------------------- | ---------------------------------------- | | `apiKey` | `string?` | `FIRECRAWL_API_KEY` env var | Your Firecrawl API key | | `apiUrl` | `string?` | `https://api.firecrawl.dev` (or `FIRECRAWL_API_URL`) | API base URL | | `timeout` | `TimeSpan?` | 5 minutes | HTTP request timeout | | `maxRetries` | `int` | `3` | Automatic retries for transient failures | | `backoffFactor` | `double` | `0.5` | Exponential backoff factor in seconds | | `httpClient` | `HttpClient?` | Built from `timeout` | Pre-configured HttpClient instance | ```csharp theme={null} using Firecrawl; var client = new FirecrawlClient( apiKey: "fc-your-api-key", apiUrl: "https://api.firecrawl.dev", timeout: TimeSpan.FromMinutes(5), maxRetries: 3, backoffFactor: 0.5); ``` ### Custom HTTP Client You can pass a pre-configured `HttpClient` to control connection pooling, proxies, message handlers, and any other `HttpClient` feature. When provided, the `timeout` setting is ignored in favor of the client's own configuration. ```csharp theme={null} using Firecrawl; var handler = new HttpClientHandler { Proxy = new WebProxy("http://proxy.example.com:8080"), UseProxy = true }; var httpClient = new HttpClient(handler) { Timeout = TimeSpan.FromSeconds(60) }; var client = new FirecrawlClient( apiKey: "fc-your-api-key", httpClient: httpClient); ``` ### Environment Variable Configuration The SDK resolves configuration from environment variables when constructor parameters are omitted: ```csharp theme={null} // Uses FIRECRAWL_API_KEY and FIRECRAWL_API_URL environment variables var client = new FirecrawlClient(); ``` ## Error Handling The SDK throws specific exceptions under `Firecrawl.Exceptions`. ```csharp theme={null} using Firecrawl.Exceptions; using Firecrawl.Models; try { var doc = await client.ScrapeAsync("https://example.com"); } catch (AuthenticationException ex) { Console.Error.WriteLine($"Auth failed: {ex.Message}"); } catch (RateLimitException ex) { Console.Error.WriteLine($"Rate limited: {ex.Message}"); } catch (JobTimeoutException ex) { Console.Error.WriteLine($"Job {ex.JobId} timed out after {ex.TimeoutSeconds}s"); } catch (FirecrawlException ex) { Console.Error.WriteLine($"Error {ex.StatusCode}: {ex.Message}"); } ``` The exception hierarchy: | Exception | HTTP Code | When | | ------------------------- | --------- | ------------------------------------------------------- | | `AuthenticationException` | 401 | Invalid or missing API key | | `RateLimitException` | 429 | Too many requests | | `JobTimeoutException` | — | Async job (crawl/batch scrape) did not complete in time | | `FirecrawlException` | varies | Base exception for all other API errors | Transient failures (408, 409, 502, and other 5xx errors) are retried automatically with exponential backoff before an exception is thrown. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Elixir Source: https://docs.firecrawl.dev/sdks/elixir Firecrawl Elixir SDK is an auto-generated client for the Firecrawl API v2, built with Req and NimbleOptions. Scrape single pages, crawl entire sites, and map URLs from your Elixir application. The SDK validates all parameters at runtime with NimbleOptions and uses Req for HTTP, so you get clear errors for typos and invalid options before any request is made. Every function has a bang (`!`) variant that raises on error instead of returning `{:error, ...}` tuples. ## Installation Add `firecrawl` to your list of dependencies in `mix.exs` and configure your API key: ```elixir Elixir theme={null} # Add to mix.exs {:firecrawl, "~> 1.0"} # Then configure your API key in config.exs config :firecrawl, api_key: "fc-YOUR-API-KEY" ``` Or pass the API key per-request: ```elixir Elixir theme={null} Firecrawl.scrape_and_extract_from_url([url: "https://example.com"], api_key: "fc-YOUR-API-KEY") ``` ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key in your application config or pass it as an option to any function. ```elixir Elixir theme={null} # Scrape a website: {:ok, scrape_result} = Firecrawl.scrape_and_extract_from_url( url: "https://firecrawl.dev", formats: ["markdown", "html"] ) IO.inspect(scrape_result.body) # Crawl a website: {:ok, crawl_result} = Firecrawl.crawl_urls( url: "https://firecrawl.dev", limit: 100, scrape_options: [ formats: ["markdown", "html"] ] ) IO.inspect(crawl_result.body) ``` ### Scraping a URL Scrape a single URL with `scrape_and_extract_from_url`. It returns the page content as structured data, including markdown, metadata, and any other formats you request. ```elixir Elixir theme={null} # Scrape a website: {:ok, result} = Firecrawl.scrape_and_extract_from_url(url: "https://firecrawl.dev", formats: ["markdown", "html"]) IO.inspect(result.body) ``` ### Crawl a Website To crawl a website, use `crawl_urls`. It takes the starting URL and optional parameters such as page limit, allowed domains, and output format. ```elixir Elixir theme={null} {:ok, result} = Firecrawl.crawl_urls(url: "https://docs.firecrawl.dev", limit: 5) IO.inspect(result.body) ``` ### Start a Crawl Start a crawl job and get back the job ID without blocking: ```elixir Elixir theme={null} {:ok, job} = Firecrawl.crawl_urls(url: "https://docs.firecrawl.dev", limit: 10) crawl_id = job.body["id"] IO.puts(crawl_id) ``` ### Checking Crawl Status Check the status of a crawl job with `get_crawl_status`: ```elixir Elixir theme={null} {:ok, status} = Firecrawl.get_crawl_status("<crawl-id>") IO.inspect(status.body) ``` ### Cancelling a Crawl Cancel a crawl job with `cancel_crawl`: ```elixir Elixir theme={null} {:ok, result} = Firecrawl.cancel_crawl("<crawl-id>") IO.puts("Cancelled: #{inspect(result.body)}") ``` ### Map a Website Use `map_urls` to generate a list of URLs from a website: ```elixir Elixir theme={null} {:ok, result} = Firecrawl.map_urls(url: "https://firecrawl.dev", limit: 10) IO.inspect(result.body) ``` ### Search Search the web and optionally scrape results: ```elixir Elixir theme={null} {:ok, result} = Firecrawl.search_and_scrape(query: "firecrawl web scraping", limit: 5) IO.inspect(result.body["data"]["web"]) ``` ### Batch Scrape Scrape multiple URLs in a single batch job: ```elixir Elixir theme={null} {:ok, result} = Firecrawl.scrape_and_extract_from_urls( urls: ["https://firecrawl.dev", "https://docs.firecrawl.dev"], formats: ["markdown"] ) IO.inspect(result.body) ``` ### Agent Start an agentic data extraction task: ```elixir Elixir theme={null} {:ok, job} = Firecrawl.start_agent( prompt: "Extract all product names and prices", urls: ["https://example.com/products"] ) job_id = job.body["id"] # Poll for status {:ok, status} = Firecrawl.get_agent_status(job_id) IO.inspect(status.body) ``` ## Browser Launch cloud browser sessions and execute code remotely. ### Create a Session ```elixir Elixir theme={null} {:ok, session} = Firecrawl.create_browser_session(ttl: 600) session_id = session.body["id"] cdp_url = session.body["cdpUrl"] live_view_url = session.body["liveViewUrl"] ``` ### Execute Code ```elixir Elixir theme={null} {:ok, result} = Firecrawl.execute_browser_code(session_id, code: ~s(await page.goto("https://news.ycombinator.com")\ntitle = await page.title()\nprint(title)), language: "python" ) IO.inspect(result.body) ``` ### Profiles Save and reuse browser state (cookies, localStorage, etc.) across sessions: ```elixir Elixir theme={null} {:ok, session} = Firecrawl.create_browser_session( ttl: 600, profile: [ name: "my-profile", save_changes: true ] ) ``` ### List & Close Sessions ```elixir Elixir theme={null} # List active sessions {:ok, sessions} = Firecrawl.list_browser_sessions(status: "active") IO.inspect(sessions.body) # Close a session {:ok, _} = Firecrawl.delete_browser_session(session_id) ``` ## Self-Hosted Instances To use a self-hosted Firecrawl instance, pass the `base_url` option: ```elixir Elixir theme={null} {:ok, result} = Firecrawl.scrape_and_extract_from_url( [url: "https://example.com"], base_url: "https://your-instance.com/v2" ) ``` ## Error Handling Non-bang functions return `{:ok, response}` or `{:error, exception}`. Bang variants raise on error. NimbleOptions validates all parameters before the request is sent, catching typos, missing required fields, and type errors immediately. ```elixir Elixir theme={null} case Firecrawl.scrape_and_extract_from_url(url: "https://example.com") do {:ok, response} -> IO.inspect(response.body) {:error, error} -> IO.puts("Error: #{Exception.message(error)}") end # Or use the bang variant to raise on error: response = Firecrawl.scrape_and_extract_from_url!(url: "https://example.com") IO.inspect(response.body) ``` ## All Available Functions | Function | Description | | ------------------------------ | --------------------------------- | | `scrape_and_extract_from_url` | Scrape a single URL | | `scrape_and_extract_from_urls` | Batch scrape multiple URLs | | `crawl_urls` | Crawl a website | | `get_crawl_status` | Check crawl job status | | `get_crawl_errors` | Get crawl job errors | | `get_active_crawls` | List active crawls | | `cancel_crawl` | Cancel a crawl job | | `map_urls` | Map URLs on a website | | `search_and_scrape` | Search and scrape results | | `start_agent` | Start an agent extraction task | | `get_agent_status` | Check agent job status | | `cancel_agent` | Cancel an agent job | | `create_browser_session` | Create a browser session | | `execute_browser_code` | Execute code in a browser session | | `list_browser_sessions` | List browser sessions | | `delete_browser_session` | Delete a browser session | | `get_batch_scrape_status` | Check batch scrape status | | `get_batch_scrape_errors` | Get batch scrape errors | | `cancel_batch_scrape` | Cancel a batch scrape | | `get_credit_usage` | Get remaining credits | Every function above has a bang (`!`) variant (e.g., `scrape_and_extract_from_url!`) that raises instead of returning error tuples. For full API documentation, see [hexdocs.pm/firecrawl](https://hexdocs.pm/firecrawl). # Go Source: https://docs.firecrawl.dev/sdks/go Firecrawl Go SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation The official Go SDK is maintained in the Firecrawl monorepo at [apps/go-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/go-sdk). To install the Firecrawl Go SDK, run: ```bash theme={null} go get github.com/firecrawl/firecrawl/apps/go-sdk ``` <Note>Requires Go 1.23 or later.</Note> ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it with `option.WithAPIKey(...)` Here is a quick example using the current SDK API surface: ```go theme={null} package main import ( "context" "fmt" "log" firecrawl "github.com/firecrawl/firecrawl/apps/go-sdk" "github.com/firecrawl/firecrawl/apps/go-sdk/option" ) func main() { // Create a client (reads FIRECRAWL_API_KEY from environment) client, err := firecrawl.NewClient() if err != nil { log.Fatal(err) } // Or provide the API key directly client, err = firecrawl.NewClient( option.WithAPIKey("fc-your-api-key"), ) if err != nil { log.Fatal(err) } ctx := context.Background() // Scrape a single page doc, err := client.Scrape(ctx, "https://firecrawl.dev", &firecrawl.ScrapeOptions{ Formats: []string{"markdown"}, }) if err != nil { log.Fatal(err) } fmt.Println(doc.Markdown) // Crawl a website job, err := client.Crawl(ctx, "https://firecrawl.dev", &firecrawl.CrawlOptions{ Limit: firecrawl.Int(5), }) if err != nil { log.Fatal(err) } fmt.Printf("Crawled pages: %d\n", len(job.Data)) } ``` ### Scraping a URL To scrape a single URL, use the `Scrape` method. ```go theme={null} doc, err := client.Scrape(ctx, "https://firecrawl.dev", &firecrawl.ScrapeOptions{ Formats: []string{"markdown", "html"}, OnlyMainContent: firecrawl.Bool(true), WaitFor: firecrawl.Int(5000), }) if err != nil { log.Fatal(err) } fmt.Println(doc.Markdown) fmt.Println(doc.Metadata["title"]) ``` #### JSON Extraction Extract structured JSON using `JsonOptions` via the `Scrape` endpoint: ```go theme={null} doc, err := client.Scrape(ctx, "https://example.com/product", &firecrawl.ScrapeOptions{ Formats: []string{"json"}, JsonOptions: &firecrawl.JsonOptions{ Prompt: "Extract the product name and price", Schema: map[string]interface{}{ "type": "object", "properties": map[string]interface{}{ "name": map[string]interface{}{"type": "string"}, "price": map[string]interface{}{"type": "number"}, }, }, }, }) if err != nil { log.Fatal(err) } fmt.Println(doc.JSON) ``` ### Crawling a Website To crawl a website and wait for completion, use `Crawl`. ```go theme={null} job, err := client.Crawl(ctx, "https://firecrawl.dev", &firecrawl.CrawlOptions{ Limit: firecrawl.Int(50), MaxDiscoveryDepth: firecrawl.Int(3), ScrapeOptions: &firecrawl.ScrapeOptions{ Formats: []string{"markdown"}, }, }) if err != nil { log.Fatal(err) } fmt.Printf("Status: %s\n", job.Status) fmt.Printf("Progress: %d/%d\n", job.Completed, job.Total) for _, page := range job.Data { fmt.Println(page.Metadata["sourceURL"]) } ``` ### Start a Crawl Start a job without waiting using `StartCrawl`. ```go theme={null} resp, err := client.StartCrawl(ctx, "https://firecrawl.dev", &firecrawl.CrawlOptions{ Limit: firecrawl.Int(100), }) if err != nil { log.Fatal(err) } fmt.Printf("Job ID: %s\n", resp.ID) ``` ### Checking Crawl Status Check crawl progress with `GetCrawlStatus`. ```go theme={null} status, err := client.GetCrawlStatus(ctx, resp.ID) if err != nil { log.Fatal(err) } fmt.Printf("Status: %s\n", status.Status) fmt.Printf("Progress: %d/%d\n", status.Completed, status.Total) ``` ### Cancelling a Crawl Cancel a running crawl with `CancelCrawl`. ```go theme={null} result, err := client.CancelCrawl(ctx, resp.ID) if err != nil { log.Fatal(err) } fmt.Println(result) ``` ### Mapping a Website Discover links on a site using `Map`. ```go theme={null} mapData, err := client.Map(ctx, "https://firecrawl.dev", &firecrawl.MapOptions{ Limit: firecrawl.Int(100), Search: firecrawl.String("blog"), IncludeSubdomains: firecrawl.Bool(true), }) if err != nil { log.Fatal(err) } for _, link := range mapData.Links { fmt.Println(link["url"], "-", link["title"]) } ``` ### Searching the Web Search with optional search settings using `Search`. ```go theme={null} results, err := client.Search(ctx, "firecrawl web scraping", &firecrawl.SearchOptions{ Limit: firecrawl.Int(10), ScrapeOptions: &firecrawl.ScrapeOptions{ Formats: []string{"markdown"}, }, }) if err != nil { log.Fatal(err) } for _, result := range results.Web { fmt.Println(result["title"], "-", result["url"]) } ``` ### Batch Scraping Scrape multiple URLs in parallel using `BatchScrape`. ```go theme={null} urls := []string{ "https://firecrawl.dev", "https://firecrawl.dev/blog", } job, err := client.BatchScrape(ctx, urls, &firecrawl.BatchScrapeOptions{ ScrapeOptions: &firecrawl.ScrapeOptions{ Formats: []string{"markdown"}, }, }) if err != nil { log.Fatal(err) } for _, doc := range job.Data { fmt.Println(doc.Markdown) } ``` ### Agent Run an AI-powered agent with `Agent`. ```go theme={null} status, err := client.Agent(ctx, &firecrawl.AgentOptions{ Prompt: "Find the pricing plans for Firecrawl and compare them", }) if err != nil { log.Fatal(err) } fmt.Println(status.Data) ``` With a JSON schema for structured output: ```go theme={null} status, err := client.Agent(ctx, &firecrawl.AgentOptions{ Prompt: "Extract pricing plan details", URLs: []string{"https://firecrawl.dev"}, Schema: map[string]interface{}{ "type": "object", "properties": map[string]interface{}{ "plans": map[string]interface{}{ "type": "array", "items": map[string]interface{}{ "type": "object", "properties": map[string]interface{}{ "name": map[string]interface{}{"type": "string"}, "price": map[string]interface{}{"type": "string"}, }, }, }, }, }, }) if err != nil { log.Fatal(err) } fmt.Println(status.Data) ``` ### Usage & Metrics Check concurrency and remaining credits: ```go theme={null} concurrency, err := client.GetConcurrency(ctx) if err != nil { log.Fatal(err) } fmt.Printf("Concurrency: %d/%d\n", concurrency.Concurrency, concurrency.MaxConcurrency) credits, err := client.GetCreditUsage(ctx) if err != nil { log.Fatal(err) } fmt.Printf("Remaining credits: %d\n", credits.RemainingCredits) ``` ## Browser The Go SDK includes Browser Sandbox helpers. ### Create a Session ```go theme={null} session, err := client.Browser(ctx, &firecrawl.BrowserOptions{ TTL: firecrawl.Int(300), StreamWebView: firecrawl.Bool(true), }) if err != nil { log.Fatal(err) } fmt.Println(session.ID) fmt.Println(session.CDPUrl) fmt.Println(session.LiveViewURL) ``` ### Execute Code ```go theme={null} result, err := client.BrowserExecute(ctx, session.ID, `await page.goto("https://example.com"); console.log(await page.title());`, &firecrawl.BrowserExecuteParams{ Language: "node", Timeout: firecrawl.Int(60), }, ) if err != nil { log.Fatal(err) } fmt.Println(result.Stdout) fmt.Println(result.ExitCode) ``` ### Scrape-Bound Interactive Session Use a scrape job ID to run follow-up browser code in the same replayed context: `Interact(...)` runs code in the scrape-bound browser session (and initializes it on first use). * `StopInteractiveBrowser(...)` explicitly stops the interactive session when you are done. ```go theme={null} scrapeJobID := "550e8400-e29b-41d4-a716-446655440000" execResp, err := client.Interact(ctx, scrapeJobID, "console.log(page.url())", &firecrawl.InteractParams{ Language: "node", Timeout: firecrawl.Int(60), }) if err != nil { log.Fatal(err) } fmt.Println(execResp.Stdout) deleteResp, err := client.StopInteractiveBrowser(ctx, scrapeJobID) if err != nil { log.Fatal(err) } fmt.Printf("Deleted: %v\n", deleteResp.Success) ``` ### List & Close Sessions ```go theme={null} active, err := client.ListBrowsers(ctx, "active") if err != nil { log.Fatal(err) } for _, s := range active.Sessions { fmt.Printf("%s - %s\n", s.ID, s.Status) } closed, err := client.DeleteBrowser(ctx, session.ID) if err != nil { log.Fatal(err) } fmt.Printf("Closed: %v\n", closed.Success) ``` ## Configuration `firecrawl.NewClient()` accepts functional options: | Option | Type | Default | Description | | -------------------------- | ---------------- | ---------------------------------------------------- | ---------------------------------------- | | `option.WithAPIKey` | `string` | `FIRECRAWL_API_KEY` env var | Your Firecrawl API key | | `option.WithAPIURL` | `string` | `https://api.firecrawl.dev` (or `FIRECRAWL_API_URL`) | API base URL | | `option.WithTimeout` | `time.Duration` | `5 * time.Minute` | HTTP client timeout | | `option.WithMaxRetries` | `int` | `3` | Automatic retries for transient failures | | `option.WithBackoffFactor` | `float64` | `0.5` | Exponential backoff factor in seconds | | `option.WithHTTPClient` | `http.Client` | Built from timeout | Pre-configured HTTP client instance | | `option.WithHeader` | `string, string` | — | Add an extra header to all requests | ```go theme={null} import ( "net/http" "time" firecrawl "github.com/firecrawl/firecrawl/apps/go-sdk" "github.com/firecrawl/firecrawl/apps/go-sdk/option" ) client, err := firecrawl.NewClient( option.WithAPIKey("fc-your-api-key"), option.WithAPIURL("https://api.firecrawl.dev"), option.WithTimeout(5 time.Minute), option.WithMaxRetries(3), option.WithBackoffFactor(0.5), ) ``` ### Custom HTTP Client You can pass a pre-configured `http.Client` to control transport settings, proxy configuration, TLS settings, and more. When provided, the `WithTimeout` setting is ignored in favor of the client's own configuration. ```go theme={null} import ( "crypto/tls" "net" "net/http" "time" firecrawl "github.com/firecrawl/firecrawl/apps/go-sdk" "github.com/firecrawl/firecrawl/apps/go-sdk/option" ) transport := &http.Transport{ TLSClientConfig: &tls.Config{MinVersion: tls.VersionTLS12}, DialContext: (&net.Dialer{ Timeout: 10 time.Second, }).DialContext, } custom := &http.Client{ Transport: transport, Timeout: 60 * time.Second, } client, err := firecrawl.NewClient( option.WithAPIKey("fc-your-api-key"), option.WithHTTPClient(custom), ) ``` ## Context Support All methods accept a `context.Context` as the first argument for cancellation and deadline control: ```go theme={null} ctx, cancel := context.WithTimeout(context.Background(), 30time.Second) defer cancel() doc, err := client.Scrape(ctx, "https://example.com", nil) ``` ## Error Handling The SDK uses typed errors under the `firecrawl` package. ```go theme={null} import "errors" doc, err := client.Scrape(ctx, "https://example.com", nil) if err != nil { var authErr firecrawl.AuthenticationError var rateErr firecrawl.RateLimitError var timeoutErr firecrawl.JobTimeoutError var fcErr firecrawl.FirecrawlError switch { case errors.As(err, &authErr): fmt.Println("Auth failed:", authErr.Message) case errors.As(err, &rateErr): fmt.Println("Rate limited:", rateErr.Message) case errors.As(err, &timeoutErr): fmt.Printf("Job %s timed out after %ds\n", timeoutErr.JobID, timeoutErr.TimeoutSeconds) case errors.As(err, &fcErr): fmt.Printf("Error %d: %s\n", fcErr.StatusCode, fcErr.Message) default: fmt.Println("Unexpected error:", err) } } ``` ### Retry Logic The SDK automatically retries transient failures: Retried: 408, 409, 5xx errors, and connection failures * Not retried: 401, 429, and other 4xx errors * Backoff: Exponential backoff with configurable factor ## Pointer Helpers The SDK provides convenience functions for optional pointer fields: ```go theme={null} firecrawl.Bool(true) // bool firecrawl.Int(50) // int firecrawl.Int64(1000) // int64 firecrawl.String("test") // string firecrawl.Float64(0.5) // float64 ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Java Source: https://docs.firecrawl.dev/sdks/java Firecrawl Java SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation The official Java SDK is maintained in the Firecrawl monorepo at [apps/java-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/java-sdk). To install the Firecrawl Java SDK, add the dependency from Maven Central: <Tabs> <Tab title="Gradle (Kotlin DSL)"> ```kotlin theme={null} repositories { mavenCentral() } dependencies { implementation("com.firecrawl:firecrawl-java:1.1.1") } ``` </Tab> <Tab title="Gradle (Groovy)"> ```groovy theme={null} repositories { mavenCentral() } dependencies { implementation 'com.firecrawl:firecrawl-java:1.1.1' } ``` </Tab> <Tab title="Maven"> ```xml theme={null} <dependency> <groupId>com.firecrawl</groupId> <artifactId>firecrawl-java</artifactId> <version>1.1.1</version> </dependency> ``` </Tab> </Tabs> <Note>Requires Java 11 or later.</Note> ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it with `FirecrawlClient.builder().apiKey(...)` Here is a quick example using the current SDK API surface: ```java theme={null} import com.firecrawl.client.FirecrawlClient; import com.firecrawl.models.CrawlJob; import com.firecrawl.models.CrawlOptions; import com.firecrawl.models.Document; import com.firecrawl.models.ScrapeOptions; import java.util.List; public class Example { public static void main(String[] args) { FirecrawlClient client = FirecrawlClient.fromEnv(); Document doc = client.scrape( "https://firecrawl.dev", ScrapeOptions.builder() .formats(List.of((Object) "markdown")) .build() ); CrawlJob crawl = client.crawl( "https://firecrawl.dev", CrawlOptions.builder().limit(5).build() ); System.out.println(doc.getMarkdown()); System.out.println("Crawled pages: " + (crawl.getData() != null ? crawl.getData().size() : 0)); } } ``` ### Scraping a URL To scrape a single URL, use the `scrape` method. ```java theme={null} import com.firecrawl.models.Document; import com.firecrawl.models.ScrapeOptions; import java.util.List; Document doc = client.scrape( "https://firecrawl.dev", ScrapeOptions.builder() .formats(List.of((Object) "markdown", "html")) .onlyMainContent(true) .waitFor(5000) .build() ); System.out.println(doc.getMarkdown()); System.out.println(doc.getMetadata().get("title")); ``` #### JSON Extraction Extract structured JSON with `JsonFormat` via the `scrape` endpoint: ```java theme={null} import com.firecrawl.models.Document; import com.firecrawl.models.JsonFormat; import com.firecrawl.models.ScrapeOptions; import java.util.List; import java.util.Map; JsonFormat jsonFmt = JsonFormat.builder() .prompt("Extract the product name and price") .schema(Map.of( "type", "object", "properties", Map.of( "name", Map.of("type", "string"), "price", Map.of("type", "number") ) )) .build(); Document doc = client.scrape( "https://example.com/product", ScrapeOptions.builder() .formats(List.of((Object) jsonFmt)) .build() ); System.out.println(doc.getJson()); ``` ### Crawling a Website To crawl a website and wait for completion, use `crawl`. ```java theme={null} import com.firecrawl.models.CrawlJob; import com.firecrawl.models.CrawlOptions; import com.firecrawl.models.Document; import com.firecrawl.models.ScrapeOptions; import java.util.List; CrawlJob job = client.crawl( "https://firecrawl.dev", CrawlOptions.builder() .limit(50) .maxDiscoveryDepth(3) .scrapeOptions( ScrapeOptions.builder() .formats(List.of((Object) "markdown")) .build() ) .build() ); System.out.println("Status: " + job.getStatus()); System.out.println("Progress: " + job.getCompleted() + "/" + job.getTotal()); if (job.getData() != null) { for (Document page : job.getData()) { System.out.println(page.getMetadata().get("sourceURL")); } } ``` ### Start a Crawl Start a job without waiting using `startCrawl`. ```java theme={null} import com.firecrawl.models.CrawlOptions; import com.firecrawl.models.CrawlResponse; CrawlResponse start = client.startCrawl( "https://firecrawl.dev", CrawlOptions.builder().limit(100).build() ); System.out.println("Job ID: " + start.getId()); ``` ### Checking Crawl Status Check crawl progress with `getCrawlStatus`. ```java theme={null} import com.firecrawl.models.CrawlJob; CrawlJob status = client.getCrawlStatus(start.getId()); System.out.println("Status: " + status.getStatus()); System.out.println("Progress: " + status.getCompleted() + "/" + status.getTotal()); ``` ### Cancelling a Crawl Cancel a running crawl with `cancelCrawl`. ```java theme={null} import java.util.Map; Map<String, Object> result = client.cancelCrawl(start.getId()); System.out.println(result); ``` ### Mapping a Website Discover links on a site using `map`. ```java theme={null} import com.firecrawl.models.MapData; import com.firecrawl.models.MapOptions; import java.util.Map; MapData data = client.map( "https://firecrawl.dev", MapOptions.builder() .limit(100) .search("blog") .build() ); if (data.getLinks() != null) { for (Map<String, Object> link : data.getLinks()) { System.out.println(link.get("url") + " - " + link.get("title")); } } ``` ### Searching the Web Search with optional search settings using `search`. ```java theme={null} import com.firecrawl.models.SearchData; import com.firecrawl.models.SearchOptions; import java.util.Map; SearchData results = client.search( "firecrawl web scraping", SearchOptions.builder() .limit(10) .build() ); if (results.getWeb() != null) { for (Map<String, Object> result : results.getWeb()) { System.out.println(result.get("title") + " - " + result.get("url")); } } ``` ### Batch Scraping Scrape multiple URLs in parallel using `batchScrape`. ```java theme={null} import com.firecrawl.models.BatchScrapeJob; import com.firecrawl.models.BatchScrapeOptions; import com.firecrawl.models.Document; import com.firecrawl.models.ScrapeOptions; import java.util.List; BatchScrapeJob job = client.batchScrape( List.of("https://firecrawl.dev", "https://firecrawl.dev/blog"), BatchScrapeOptions.builder() .options( ScrapeOptions.builder() .formats(List.of((Object) "markdown")) .build() ) .build() ); if (job.getData() != null) { for (Document doc : job.getData()) { System.out.println(doc.getMarkdown()); } } ``` ### Agent Run an AI-powered agent with `agent`. ```java theme={null} import com.firecrawl.models.AgentOptions; import com.firecrawl.models.AgentStatusResponse; AgentStatusResponse result = client.agent( AgentOptions.builder() .prompt("Find the pricing plans for Firecrawl and compare them") .build() ); System.out.println(result.getData()); ``` With a JSON schema for structured output: ```java theme={null} import com.firecrawl.models.AgentOptions; import com.firecrawl.models.AgentStatusResponse; import java.util.List; import java.util.Map; AgentStatusResponse result = client.agent( AgentOptions.builder() .prompt("Extract pricing plan details") .urls(List.of("https://firecrawl.dev")) .schema(Map.of( "type", "object", "properties", Map.of( "plans", Map.of( "type", "array", "items", Map.of( "type", "object", "properties", Map.of( "name", Map.of("type", "string"), "price", Map.of("type", "string") ) ) ) ) )) .build() ); System.out.println(result.getData()); ``` ### Usage & Metrics Check concurrency and remaining credits: ```java theme={null} import com.firecrawl.models.ConcurrencyCheck; import com.firecrawl.models.CreditUsage; ConcurrencyCheck concurrency = client.getConcurrency(); System.out.println("Concurrency: " + concurrency.getConcurrency() + "/" + concurrency.getMaxConcurrency()); CreditUsage credits = client.getCreditUsage(); System.out.println("Remaining credits: " + credits.getRemainingCredits()); ``` ## Async Support Async variants are built in and return `CompletableFuture`. ```java theme={null} import com.firecrawl.models.Document; import com.firecrawl.models.ScrapeOptions; import java.util.List; import java.util.concurrent.CompletableFuture; CompletableFuture<Document> future = client.scrapeAsync( "https://example.com", ScrapeOptions.builder() .formats(List.of((Object) "markdown")) .build() ); future.thenAccept(doc -> System.out.println(doc.getMarkdown())); ``` ## Browser The Java SDK includes Browser Sandbox helpers. ### Create a Session ```java theme={null} import com.firecrawl.models.BrowserCreateResponse; BrowserCreateResponse session = client.browser(120, 60, true); System.out.println(session.getId()); System.out.println(session.getCdpUrl()); System.out.println(session.getLiveViewUrl()); ``` ### Execute Code ```java theme={null} import com.firecrawl.models.BrowserExecuteResponse; BrowserExecuteResponse run = client.browserExecute( session.getId(), "await page.goto(\"https://example.com\"); console.log(await page.title());", "node", 60 ); System.out.println(run.getStdout()); System.out.println(run.getExitCode()); ``` ### Scrape-Bound Interactive Session Use a scrape job ID to run follow-up browser code in the same replayed context: `interact(...)` runs code in the scrape-bound browser session (and initializes it on first use). * `stopInteraction(...)` explicitly stops the interactive session when you are done. ```java theme={null} import com.firecrawl.models.BrowserDeleteResponse; import com.firecrawl.models.BrowserExecuteResponse; String scrapeJobId = "550e8400-e29b-41d4-a716-446655440000"; BrowserExecuteResponse scrapeRun = client.interact( scrapeJobId, "console.log(page.url());", "node", 60 ); System.out.println(scrapeRun.getStdout()); BrowserDeleteResponse deleted = client.stopInteraction(scrapeJobId); System.out.println("Deleted: " + deleted.isSuccess()); ``` ### List & Close Sessions ```java theme={null} import com.firecrawl.models.BrowserDeleteResponse; import com.firecrawl.models.BrowserListResponse; import com.firecrawl.models.BrowserSession; BrowserListResponse active = client.listBrowsers("active"); if (active.getSessions() != null) { for (BrowserSession s : active.getSessions()) { System.out.println(s.getId() + " - " + s.getStatus()); } } BrowserDeleteResponse closed = client.deleteBrowser(session.getId()); System.out.println("Closed: " + closed.isSuccess()); ``` ## Configuration `FirecrawlClient.builder()` supports the following options: | Option | Type | Default | Description | | --------------- | -------------- | ----------------------------------------------------------------- | ---------------------------------------- | | `apiKey` | `String` | `FIRECRAWL_API_KEY` env var or `firecrawl.apiKey` system property | Your Firecrawl API key | | `apiUrl` | `String` | `https://api.firecrawl.dev` (or `FIRECRAWL_API_URL`) | API base URL | | `timeoutMs` | `long` | `300000` | HTTP request timeout in ms | | `maxRetries` | `int` | `3` | Automatic retries for transient failures | | `backoffFactor` | `double` | `0.5` | Exponential backoff factor in seconds | | `asyncExecutor` | `Executor` | `ForkJoinPool.commonPool()` | Custom executor for async methods | | `httpClient` | `OkHttpClient` | Built from `timeoutMs` | Pre-configured OkHttpClient instance | ```java theme={null} import com.firecrawl.client.FirecrawlClient; FirecrawlClient client = FirecrawlClient.builder() .apiKey("fc-your-api-key") .apiUrl("https://api.firecrawl.dev") .timeoutMs(300_000) .maxRetries(3) .backoffFactor(0.5) .build(); ``` ### Custom HTTP Client You can pass a pre-configured `OkHttpClient` to control connection pooling, interceptors, SSL configuration, proxy settings, and any other OkHttp feature. When provided, the `timeoutMs` setting is ignored in favor of the client's own configuration. ```java theme={null} import com.firecrawl.client.FirecrawlClient; import okhttp3.OkHttpClient; import java.net.InetSocketAddress; import java.net.Proxy; import java.util.concurrent.TimeUnit; OkHttpClient custom = new OkHttpClient.Builder() .proxy(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("proxy.example.com", 8080))) .addInterceptor(chain -> { System.out.println("Request: " + chain.request().url()); return chain.proceed(chain.request()); }) .connectTimeout(10, TimeUnit.SECONDS) .readTimeout(60, TimeUnit.SECONDS) .build(); FirecrawlClient client = FirecrawlClient.builder() .apiKey("fc-your-api-key") .httpClient(custom) .build(); ``` ## Error Handling The SDK throws runtime exceptions under `com.firecrawl.errors`. ```java theme={null} import com.firecrawl.errors.AuthenticationException; import com.firecrawl.errors.FirecrawlException; import com.firecrawl.errors.JobTimeoutException; import com.firecrawl.errors.RateLimitException; import com.firecrawl.models.Document; try { Document doc = client.scrape("https://example.com"); } catch (AuthenticationException e) { System.err.println("Auth failed: " + e.getMessage()); } catch (RateLimitException e) { System.err.println("Rate limited: " + e.getMessage()); } catch (JobTimeoutException e) { System.err.println("Job " + e.getJobId() + " timed out after " + e.getTimeoutSeconds() + "s"); } catch (FirecrawlException e) { System.err.println("Error " + e.getStatusCode() + ": " + e.getMessage()); } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Node Source: https://docs.firecrawl.dev/sdks/node Scrape, crawl, and extract structured data from websites using the Firecrawl Node SDK. Scrape single pages, crawl entire sites, and map URLs from your Node.js application. The SDK handles pagination, retries, and async job polling so you can focus on working with the returned data. ## Installation To install the Firecrawl Node SDK, you can use npm: ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class. Here's an example of how to use the SDK with error handling: ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); // Scrape a website const scrapeResponse = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'], }); console.log(scrapeResponse) // Crawl a website const crawlResponse = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown', 'html'], } }); console.log(crawlResponse) ``` ### Scraping a URL To scrape a single URL with error handling, use the `scrapeUrl` method. It takes the URL as a parameter and returns the scraped data as a dictionary. ```js Node theme={null} // Scrape a website: const scrapeResult = await firecrawl.scrape('firecrawl.dev', { formats: ['markdown', 'html'] }); console.log(scrapeResult) ``` ### Crawling a Website To crawl a website with error handling, use the `crawlUrl` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format. See [Pagination](#pagination) for auto/ manual pagination and limiting. ```js Node theme={null} const job = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 5, pollInterval: 1, timeout: 120 }); console.log(job.status); ``` ### Sitemap-Only Crawl Use `sitemap: "only"` to crawl sitemap URLs only (the start URL is always included, and HTML link discovery is skipped). ```js Node theme={null} const job = await firecrawl.crawl('https://docs.firecrawl.dev', { sitemap: 'only', limit: 25, }); console.log(job.status, job.data.length); ``` ### Start a Crawl Start a job without waiting using `startCrawl`. It returns a job `ID` you can use to check status. Use `crawl` when you want a waiter that blocks until completion. See [Pagination](#pagination) for paging behavior and limits. ```js Node theme={null} const { id } = await firecrawl.startCrawl('https://docs.firecrawl.dev', { limit: 10 }); console.log(id); ``` ### Checking Crawl Status To check the status of a crawl job with error handling, use the `checkCrawlStatus` method. It takes the `ID` as a parameter and returns the current status of the crawl job. ```js Node theme={null} const status = await firecrawl.getCrawlStatus("<crawl-id>"); console.log(status); ``` ### Cancelling a Crawl To cancel an crawl job, use the `cancelCrawl` method. It takes the job ID of the `startCrawl` as a parameter and returns the cancellation status. ```js Node theme={null} const ok = await firecrawl.cancelCrawl("<crawl-id>"); console.log("Cancelled:", ok); ``` ### Mapping a Website To map a website with error handling, use the `mapUrl` method. It takes the starting URL as a parameter and returns the mapped data as a dictionary. ```js Node theme={null} const res = await firecrawl.map('https://firecrawl.dev', { limit: 10 }); console.log(res.links); ``` ### Crawling a Website with WebSockets To crawl a website with WebSockets, use the `crawlUrlAndWatch` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format. ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start a crawl and then watch it const { id } = await firecrawl.startCrawl('https://mendable.ai', { excludePaths: ['blog/'], limit: 5, }); const watcher = firecrawl.watcher(id, { kind: 'crawl', pollInterval: 2, timeout: 120 }); watcher.on('document', (doc) => { console.log('DOC', doc); }); watcher.on('error', (err) => { console.error('ERR', err?.error || err); }); watcher.on('done', (state) => { console.log('DONE', state.status); }); // Begin watching (WS with HTTP fallback) await watcher.start(); ``` ### Pagination Firecrawl endpoints for crawl and batch return a `next` URL when more data is available. The Node SDK auto-paginates by default and aggregates all documents; in that case `next` will be `null`. You can disable auto-pagination or set limits. #### Crawl Use the waiter method `crawl` for the simplest experience, or start a job and page manually. ##### Simple crawl (auto-pagination, default) See the default flow in [Crawling a Website](#crawling-a-website). ##### Manual crawl with pagination control (single page) * Start a job, then fetch one page at a time with `autoPaginate: false`. ```js Node theme={null} const crawlStart = await firecrawl.startCrawl('https://docs.firecrawl.dev', { limit: 5 }); const crawlJobId = crawlStart.id; const crawlSingle = await firecrawl.getCrawlStatus(crawlJobId, { autoPaginate: false }); console.log('crawl single page:', crawlSingle.status, 'docs:', crawlSingle.data.length, 'next:', crawlSingle.next); ``` ##### Manual crawl with limits (auto-pagination + early stop) * Keep auto-pagination on but stop early with `maxPages`, `maxResults`, or `maxWaitTime`. ```js Node theme={null} const crawlLimited = await firecrawl.getCrawlStatus(crawlJobId, { autoPaginate: true, maxPages: 2, maxResults: 50, maxWaitTime: 15, }); console.log('crawl limited:', crawlLimited.status, 'docs:', crawlLimited.data.length, 'next:', crawlLimited.next); ``` #### Batch Scrape Use the waiter method `batchScrape`, or start a job and page manually. ##### Simple batch scrape (auto-pagination, default) * See the default flow in [Batch Scrape](/features/batch-scrape). ##### Manual batch scrape with pagination control (single page) * Start a job, then fetch one page at a time with `autoPaginate: false`. ```js Node theme={null} const batchStart = await firecrawl.startBatchScrape([ 'https://docs.firecrawl.dev', 'https://firecrawl.dev', ], { options: { formats: ['markdown'] } }); const batchJobId = batchStart.id; const batchSingle = await firecrawl.getBatchScrapeStatus(batchJobId, { autoPaginate: false }); console.log('batch single page:', batchSingle.status, 'docs:', batchSingle.data.length, 'next:', batchSingle.next); ``` ##### Manual batch scrape with limits (auto-pagination + early stop) * Keep auto-pagination on but stop early with `maxPages`, `maxResults`, or `maxWaitTime`. ```js Node theme={null} const batchLimited = await firecrawl.getBatchScrapeStatus(batchJobId, { autoPaginate: true, maxPages: 2, maxResults: 100, maxWaitTime: 20, }); console.log('batch limited:', batchLimited.status, 'docs:', batchLimited.data.length, 'next:', batchLimited.next); ``` ## Browser Launch cloud browser sessions and execute code remotely. ### Create a Session ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const session = await firecrawl.browser({ ttl: 600 }); console.log(session.id); // session ID console.log(session.cdpUrl); // wss://cdp-proxy.firecrawl.dev/cdp/... console.log(session.liveViewUrl); // https://liveview.firecrawl.dev/... ``` ### Execute Code ```js Node theme={null} const result = await firecrawl.browserExecute(session.id, { code: 'await page.goto("https://news.ycombinator.com")\ntitle = await page.title()\nprint(title)', }); console.log(result.result); // "Hacker News" ``` Execute JavaScript instead of Python: ```js Node theme={null} const result = await firecrawl.browserExecute(session.id, { code: 'await page.goto("https://example.com"); const t = await page.title(); console.log(t);', language: "node", }); ``` Execute bash with agent-browser: ```js Node theme={null} const result = await firecrawl.browserExecute(session.id, { code: "agent-browser open https://example.com && agent-browser snapshot", language: "bash", }); ``` ### Profiles Save and reuse browser state (cookies, localStorage, etc.) across sessions: ```js Node theme={null} const session = await firecrawl.browser({ ttl: 600, profile: { name: "my-profile", saveChanges: true, }, }); ``` ### Connect via CDP For full Playwright control, connect directly using the CDP URL: ```js Node theme={null} import { chromium } from "playwright"; const browser = await chromium.connectOverCDP(session.cdpUrl); const context = browser.contexts()[0]; const page = context.pages()[0] || await context.newPage(); await page.goto("https://example.com"); console.log(await page.title()); await browser.close(); ``` ### List & Close Sessions ```js Node theme={null} // List active sessions const { sessions } = await firecrawl.listBrowsers({ status: "active" }); for (const s of sessions) { console.log(s.id, s.status, s.createdAt); } // Close a session await firecrawl.deleteBrowser(session.id); ``` ### Scrape-Bound Interactive Session Use a scrape job ID to keep interacting with the replayed page context from that scrape: * `interact(jobId, {...})` runs code in the scrape-bound browser session. * First `interact` call auto-initializes the session from the scrape context. * Additional `interact` calls on the same job ID reuse that live browser state. * `stopInteraction(jobId)` stops the interactive session when you are done. ```js Node theme={null} const doc = await firecrawl.scrape("https://example.com", { actions: [{ type: "click", selector: "a[href='/pricing']" }], }); const scrapeJobId = doc.metadata?.scrapeId; if (!scrapeJobId) throw new Error("Missing scrape job id"); const run = await firecrawl.interact(scrapeJobId, { code: "console.log(await page.url())", language: "node", timeout: 60, }); console.log(run.stdout); await firecrawl.stopInteraction(scrapeJobId); ``` ## Error Handling The SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message. The examples above demonstrate how to handle these errors using `try/catch` blocks. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Overview Source: https://docs.firecrawl.dev/sdks/overview Firecrawl SDKs are wrappers around the Firecrawl API to help you easily search, scrape, and interact with the web. Use an SDK to search, scrape, and interact with the web without managing raw HTTP requests. Each SDK wraps the Firecrawl API with idiomatic helpers for authentication, polling, and error handling. ## Official SDKs <CardGroup> <Card title="Python SDK" icon="python" href="python"> Search, scrape, and interact with Python. Supports sync and async usage. </Card> <Card title="Node SDK" icon="node" href="node"> Integrate Firecrawl into Node.js and TypeScript projects. </Card> <Card title="Go SDK" icon="golang" href="go"> Call the Firecrawl API from Go applications. </Card> <Card title="Java SDK" icon="java" href="java"> Add Firecrawl to Java and JVM-based applications. </Card> <Card title="Ruby SDK" icon="gem" href="ruby"> Integrate Firecrawl into Ruby and Rails applications. </Card> <Card title="Rust SDK" icon="rust" href="rust"> Add Firecrawl to Rust applications with full v2 API support. </Card> <Card title=".NET SDK" icon="microsoft" href="dotnet"> Add Firecrawl to .NET and ASP.NET Core applications. </Card> <Card title="PHP SDK" icon="php" href="php"> Add Firecrawl to PHP and Laravel applications. </Card> <Card title="Elixir SDK" icon="droplet" href="elixir"> Add Firecrawl to Elixir and Phoenix applications. </Card> <Card title="CLI" icon="terminal" href="cli"> Run Firecrawl directly from the command line. </Card> </CardGroup> # PHP Source: https://docs.firecrawl.dev/sdks/php Firecrawl PHP SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation The official PHP SDK is maintained in the Firecrawl monorepo at [apps/php-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/php-sdk). To install the Firecrawl PHP SDK, add the dependency via Composer: ```bash theme={null} composer require firecrawl/firecrawl-sdk ``` <Note>Requires PHP 8.1 or later.</Note> ### Laravel Integration The SDK includes first-class Laravel support with auto-discovery. After installing the package, publish the configuration file: ```bash theme={null} php artisan vendor:publish --provider="Firecrawl\Laravel\FirecrawlServiceProvider" ``` Then add your API key to your `.env` file: ```env theme={null} FIRECRAWL_API_KEY=fc-your-api-key ``` The following environment variables are supported: | Variable | Default | Description | | -------------------------- | --------------------------- | ---------------------------------------- | | `FIRECRAWL_API_KEY` | — | Your Firecrawl API key (required) | | `FIRECRAWL_API_URL` | `https://api.firecrawl.dev` | API base URL | | `FIRECRAWL_TIMEOUT` | `300` | HTTP request timeout in seconds | | `FIRECRAWL_MAX_RETRIES` | `3` | Automatic retries for transient failures | | `FIRECRAWL_BACKOFF_FACTOR` | `0.5` | Exponential backoff factor in seconds | ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it with `FirecrawlClient::create(apiKey: ...)` Here is a quick example using the current SDK API surface: ```php theme={null} use Firecrawl\Client\FirecrawlClient; use Firecrawl\Models\CrawlOptions; use Firecrawl\Models\ScrapeOptions; $client = FirecrawlClient::fromEnv(); $doc = $client->scrape( 'https://firecrawl.dev', ScrapeOptions::with(formats: ['markdown']) ); $crawl = $client->crawl( 'https://firecrawl.dev', CrawlOptions::with(limit: 5) ); echo $doc->getMarkdown(); echo 'Crawled pages: ' . count($crawl->getData()); ``` ### Using the Laravel Facade In a Laravel application you can use the `Firecrawl` facade or dependency injection: ```php theme={null} use Firecrawl\Client\FirecrawlClient; use Firecrawl\Laravel\Facades\Firecrawl; // Via Facade $doc = Firecrawl::scrape('https://example.com'); // Via Dependency Injection class ScrapeController { public function __construct( private readonly FirecrawlClient $firecrawl, ) {} public function index() { $doc = $this->firecrawl->scrape('https://example.com'); return response()->json(['markdown' => $doc->getMarkdown()]); } } ``` ### Scraping a URL To scrape a single URL, use the `scrape` method. ```php theme={null} use Firecrawl\Models\Document; use Firecrawl\Models\ScrapeOptions; $doc = $client->scrape( 'https://firecrawl.dev', ScrapeOptions::with( formats: ['markdown', 'html'], onlyMainContent: true, waitFor: 5000, ) ); echo $doc->getMarkdown(); echo $doc->getMetadata()['title'] ?? ''; ``` #### JSON Extraction Extract structured JSON with `JsonFormat` via the `scrape` endpoint: ```php theme={null} use Firecrawl\Models\JsonFormat; use Firecrawl\Models\ScrapeOptions; $jsonFmt = JsonFormat::with( prompt: 'Extract the product name and price', schema: [ 'type' => 'object', 'properties' => [ 'name' => ['type' => 'string'], 'price' => ['type' => 'number'], ], ], ); $doc = $client->scrape( 'https://example.com/product', ScrapeOptions::with(formats: [$jsonFmt]) ); print_r($doc->getJson()); ``` ### Crawling a Website To crawl a website and wait for completion, use `crawl`. ```php theme={null} use Firecrawl\Models\CrawlOptions; use Firecrawl\Models\ScrapeOptions; $job = $client->crawl( 'https://firecrawl.dev', CrawlOptions::with( limit: 50, maxDiscoveryDepth: 3, scrapeOptions: ScrapeOptions::with(formats: ['markdown']), ) ); echo 'Status: ' . $job->getStatus(); echo 'Progress: ' . $job->getCompleted() . '/' . $job->getTotal(); foreach ($job->getData() as $page) { echo $page->getMetadata()['sourceURL'] ?? ''; } ``` ### Start a Crawl Start a job without waiting using `startCrawl`. ```php theme={null} use Firecrawl\Models\CrawlOptions; $start = $client->startCrawl( 'https://firecrawl.dev', CrawlOptions::with(limit: 100) ); echo 'Job ID: ' . $start->getId(); ``` ### Checking Crawl Status Check crawl progress with `getCrawlStatus`. ```php theme={null} $status = $client->getCrawlStatus($start->getId()); echo 'Status: ' . $status->getStatus(); echo 'Progress: ' . $status->getCompleted() . '/' . $status->getTotal(); ``` ### Cancelling a Crawl Cancel a running crawl with `cancelCrawl`. ```php theme={null} $result = $client->cancelCrawl($start->getId()); print_r($result); ``` ### Crawl Errors Fetch crawl-level errors (if any) with `getCrawlErrors`. ```php theme={null} $errors = $client->getCrawlErrors($start->getId()); print_r($errors); ``` ### Mapping a Website Discover links on a site using `map`. ```php theme={null} use Firecrawl\Models\MapOptions; $data = $client->map( 'https://firecrawl.dev', MapOptions::with( limit: 100, search: 'blog', ) ); foreach ($data->getLinks() as $link) { echo ($link['url'] ?? '') . ' - ' . ($link['title'] ?? ''); } ``` ### Searching the Web Search with optional search settings using `search`. ```php theme={null} use Firecrawl\Models\SearchOptions; $results = $client->search( 'firecrawl web scraping', SearchOptions::with(limit: 10) ); foreach ($results->getWeb() as $result) { echo ($result['title'] ?? '') . ' - ' . ($result['url'] ?? ''); } ``` ### Batch Scraping Scrape multiple URLs in parallel using `batchScrape`. ```php theme={null} use Firecrawl\Models\BatchScrapeOptions; use Firecrawl\Models\ScrapeOptions; $job = $client->batchScrape( ['https://firecrawl.dev', 'https://firecrawl.dev/blog'], BatchScrapeOptions::with( options: ScrapeOptions::with(formats: ['markdown']), ) ); foreach ($job->getData() as $doc) { echo $doc->getMarkdown(); } ``` For manual async control, use `startBatchScrape`, `getBatchScrapeStatus`, and `cancelBatchScrape`: ```php theme={null} use Firecrawl\Models\BatchScrapeOptions; use Firecrawl\Models\ScrapeOptions; $start = $client->startBatchScrape( ['https://firecrawl.dev', 'https://firecrawl.dev/blog'], BatchScrapeOptions::with( options: ScrapeOptions::with(formats: ['markdown']), ) ); $status = $client->getBatchScrapeStatus($start->getId()); echo 'Batch status: ' . $status->getStatus(); $cancel = $client->cancelBatchScrape($start->getId()); print_r($cancel); ``` ### Agent Run an AI-powered agent with `agent`. ```php theme={null} use Firecrawl\Models\AgentOptions; $result = $client->agent( AgentOptions::with( prompt: 'Find the pricing plans for Firecrawl and compare them', ) ); print_r($result->getData()); ``` With a JSON schema for structured output: ```php theme={null} use Firecrawl\Models\AgentOptions; $result = $client->agent( AgentOptions::with( prompt: 'Extract pricing plan details', urls: ['https://firecrawl.dev'], schema: [ 'type' => 'object', 'properties' => [ 'plans' => [ 'type' => 'array', 'items' => [ 'type' => 'object', 'properties' => [ 'name' => ['type' => 'string'], 'price' => ['type' => 'string'], ], ], ], ], ], ) ); print_r($result->getData()); ``` For manual async control, use `startAgent`, `getAgentStatus`, and `cancelAgent`: ```php theme={null} use Firecrawl\Models\AgentOptions; $start = $client->startAgent( AgentOptions::with( prompt: 'Summarize what Firecrawl does in one sentence', urls: ['https://firecrawl.dev'], ) ); $status = $client->getAgentStatus($start->getId()); echo 'Agent status: ' . $status->getStatus(); $cancel = $client->cancelAgent($start->getId()); print_r($cancel); ``` ### Usage & Metrics Check concurrency and remaining credits: ```php theme={null} use Firecrawl\Models\ConcurrencyCheck; use Firecrawl\Models\CreditUsage; $concurrency = $client->getConcurrency(); echo 'Concurrency: ' . $concurrency->getConcurrency() . '/' . $concurrency->getMaxConcurrency(); $credits = $client->getCreditUsage(); echo 'Remaining credits: ' . $credits->getRemainingCredits(); ``` ## Browser The PHP SDK includes Browser Sandbox helpers. ### Create a Session ```php theme={null} use Firecrawl\Models\BrowserCreateResponse; $session = $client->browser(ttl: 120, activityTtl: 60, streamWebView: true); echo $session->getId(); echo $session->getCdpUrl(); echo $session->getLiveViewUrl(); ``` ### Execute Code ```php theme={null} use Firecrawl\Models\BrowserExecuteResponse; $run = $client->browserExecute( sessionId: $session->getId(), code: 'await page.goto("https://example.com"); console.log(await page.title());', language: 'node', timeout: 60, ); echo $run->getStdout(); echo $run->getExitCode(); ``` ### Scrape-Bound Interactive Session Use a scrape job ID to run follow-up browser code in the same replayed context: * `interact(...)` runs code in the scrape-bound browser session (and initializes it on first use). * `stopInteractiveBrowser(...)` explicitly stops the interactive session when you are done. ```php theme={null} use Firecrawl\Models\BrowserExecuteResponse; use Firecrawl\Models\BrowserDeleteResponse; use Firecrawl\Models\ScrapeOptions; $doc = $client->scrape( 'https://example.com', ScrapeOptions::with(formats: ['markdown']) ); $scrapeJobId = $doc->getMetadata()['scrapeId'] ?? null; if ($scrapeJobId === null) { throw new RuntimeException('scrapeId not found in metadata'); } $scrapeRun = $client->interact( jobId: $scrapeJobId, code: 'console.log(page.url());', language: 'node', timeout: 60, ); echo $scrapeRun->getStdout(); $deleted = $client->stopInteractiveBrowser($scrapeJobId); echo 'Deleted: ' . ($deleted->isSuccess() ? 'true' : 'false'); ``` ### List & Close Sessions ```php theme={null} use Firecrawl\Models\BrowserListResponse; use Firecrawl\Models\BrowserSession; $active = $client->listBrowsers('active'); foreach ($active->getSessions() as $s) { echo $s->getId() . ' - ' . $s->getStatus(); } $closed = $client->deleteBrowser($session->getId()); echo 'Closed: ' . ($closed->isSuccess() ? 'true' : 'false'); ``` ## Configuration `FirecrawlClient::create()` supports the following options: | Option | Type | Default | Description | | ---------------- | ---------------------------- | ---------------------------------------------------- | ---------------------------------------- | | `apiKey` | `string` | `FIRECRAWL_API_KEY` env var | Your Firecrawl API key | | `apiUrl` | `string` | `https://api.firecrawl.dev` (or `FIRECRAWL_API_URL`) | API base URL | | `timeoutSeconds` | `float` | `300` | HTTP request timeout in seconds | | `maxRetries` | `int` | `3` | Automatic retries for transient failures | | `backoffFactor` | `float` | `0.5` | Exponential backoff factor in seconds | | `httpClient` | `GuzzleHttp\ClientInterface` | Built from timeout | Custom Guzzle-compatible HTTP client | ```php theme={null} use Firecrawl\Client\FirecrawlClient; $client = FirecrawlClient::create( apiKey: 'fc-your-api-key', apiUrl: 'https://api.firecrawl.dev', timeoutSeconds: 300, maxRetries: 3, backoffFactor: 0.5, ); ``` ### Custom HTTP Client You can pass a pre-configured `GuzzleHttp\ClientInterface` implementation to control connection pooling, middleware, proxy settings, and other HTTP features. When provided, the `timeoutSeconds` setting is ignored in favor of the client's own configuration. ```php theme={null} use Firecrawl\Client\FirecrawlClient; use GuzzleHttp\Client as GuzzleClient; $guzzle = new GuzzleClient([ 'proxy' => 'http://proxy.example.com:8080', 'timeout' => 60, 'connect_timeout' => 10, ]); $client = FirecrawlClient::create( apiKey: 'fc-your-api-key', httpClient: $guzzle, ); ``` ## Error Handling The SDK throws runtime exceptions under `Firecrawl\Exceptions`. ```php theme={null} use Firecrawl\Exceptions\AuthenticationException; use Firecrawl\Exceptions\FirecrawlException; use Firecrawl\Exceptions\JobTimeoutException; use Firecrawl\Exceptions\RateLimitException; try { $doc = $client->scrape('https://example.com'); } catch (AuthenticationException $e) { echo 'Auth failed: ' . $e->getMessage(); } catch (RateLimitException $e) { echo 'Rate limited: ' . $e->getMessage(); } catch (JobTimeoutException $e) { echo 'Job ' . $e->getJobId() . ' timed out after ' . $e->getTimeoutSeconds() . 's'; } catch (FirecrawlException $e) { echo 'Error ' . $e->getStatusCode() . ': ' . $e->getMessage(); } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Python Source: https://docs.firecrawl.dev/sdks/python Firecrawl Python SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation To install the Firecrawl Python SDK, you can use pip: ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ## Usage Get an API key from [firecrawl.dev](https://firecrawl.dev), then either set it as the `FIRECRAWL_API_KEY` environment variable or pass it directly to the `Firecrawl` class. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Scrape a website: scrape_status = firecrawl.scrape( 'https://firecrawl.dev', formats=['markdown', 'html'] ) print(scrape_status) # Crawl a website: crawl_status = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options={ 'formats': ['markdown', 'html'] } ) print(crawl_status) ``` ### Scraping a URL Scrape a single URL with the `scrape` method. It returns the page content as structured data, including markdown, metadata, and any other formats you request. ```python Python theme={null} # Scrape a website: scrape_result = firecrawl.scrape('firecrawl.dev', formats=['markdown', 'html']) print(scrape_result) ``` <Note> The Python SDK converts all response field names from camelCase to snake\_case. For example, metadata fields like `ogImage`, `ogTitle`, and `sourceURL` from the API become `og_image`, `og_title`, and `source_url` in the SDK response. </Note> ### Crawl a Website To crawl a website, use the `crawl` method. It takes the starting URL and optional options as arguments. The options allow you to specify additional settings for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format. See [Pagination](#pagination) for auto/manual pagination and limiting. ```python Python theme={null} job = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=5, poll_interval=1, timeout=120) print(job) ``` ### Sitemap-Only Crawl Use `sitemap="only"` to crawl sitemap URLs only (the start URL is always included, and HTML link discovery is skipped). ```python Python theme={null} job = firecrawl.crawl(url="https://docs.firecrawl.dev", sitemap="only", limit=25) print(job.status, len(job.data)) ``` ### Start a Crawl <Tip>Prefer non-blocking? Check out the [Async Class](#async-class) section below.</Tip> Start a job without waiting using `start_crawl`. It returns a job `ID` you can use to check status. Use `crawl` when you want a waiter that blocks until completion. See [Pagination](#pagination) for paging behavior and limits. ```python Python theme={null} job = firecrawl.start_crawl(url="https://docs.firecrawl.dev", limit=10) print(job) ``` ### Checking Crawl Status Check the status of a crawl job with `get_crawl_status`. Pass the job ID and receive the current status along with any results collected so far. ```python Python theme={null} status = firecrawl.get_crawl_status("<crawl-id>") print(status) ``` ### Cancelling a Crawl Cancel a crawl job with the `cancel_crawl` method. Pass the job ID returned by `start_crawl` to receive the cancellation status. ```python Python theme={null} ok = firecrawl.cancel_crawl("<crawl-id>") print("Cancelled:", ok) ``` ### Map a Website Use `map` to generate a list of URLs from a website. The options let you customize the mapping process, including excluding subdomains or utilizing the sitemap. ```python Python theme={null} res = firecrawl.map(url="https://firecrawl.dev", limit=10) print(res) ``` ### Crawling a Website with WebSockets To crawl a website with WebSockets, start the job with `start_crawl` and subscribe using the `watcher` helper. Create a watcher with the job ID and attach handlers (e.g., for page, completed, failed) before calling `start()`. ```python Python theme={null} import asyncio from firecrawl import AsyncFirecrawl async def main(): firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY") # Start a crawl first started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5) # Watch updates (snapshots) until terminal status async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120): if snapshot.status == "completed": print("DONE", snapshot.status) for doc in snapshot.data: print("DOC", doc.metadata.source_url if doc.metadata else None) elif snapshot.status == "failed": print("ERR", snapshot.status) else: print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total) asyncio.run(main()) ``` ### Pagination Firecrawl endpoints for crawl and batch scrape return a `next` URL when more data is available. The Python SDK auto-paginates by default and aggregates all documents; in that case `next` will be `None`. You can disable auto-pagination or set limits to control pagination behavior. #### PaginationConfig Use `PaginationConfig` to control pagination behavior when calling `get_crawl_status` or `get_batch_scrape_status`: ```python Python theme={null} from firecrawl.v2.types import PaginationConfig ``` | Option | Type | Default | Description | | --------------- | ------ | ------- | ---------------------------------------------------------------------------------------------------------------- | | `auto_paginate` | `bool` | `True` | When `True`, automatically fetches all pages and aggregates results. Set to `False` to fetch one page at a time. | | `max_pages` | `int` | `None` | Stop after fetching this many pages (only applies when `auto_paginate=True`). | | `max_results` | `int` | `None` | Stop after collecting this many documents (only applies when `auto_paginate=True`). | | `max_wait_time` | `int` | `None` | Stop after this many seconds (only applies when `auto_paginate=True`). | #### Manual Pagination Helpers When `auto_paginate=False`, the response includes a `next` URL if more data is available. Use these helper methods to fetch subsequent pages: * `get_crawl_status_page(next_url)` - Fetch the next page of crawl results using the opaque `next` URL from a previous response. * `get_batch_scrape_status_page(next_url)` - Fetch the next page of batch scrape results using the opaque `next` URL from a previous response. These methods return the same response type as the original status call, including a new `next` URL if more pages remain. #### Crawl Use the waiter method `crawl` for the simplest experience, or start a job and page manually. ##### Simple crawl (auto-pagination, default) * See the default flow in [Crawl a Website](#crawl-a-website). ##### Manual crawl with pagination control Start a job, then fetch one page at a time with `auto_paginate=False`. Use `get_crawl_status_page` to fetch subsequent pages: ```python Python theme={null} crawl_job = client.start_crawl("https://example.com", limit=100) # Fetch first page status = client.get_crawl_status( crawl_job.id, pagination_config=PaginationConfig(auto_paginate=False) ) print("First page:", len(status.data), "docs") # Fetch subsequent pages using get_crawl_status_page while status.next: status = client.get_crawl_status_page(status.next) print("Next page:", len(status.data), "docs") ``` ##### Manual crawl with limits (auto-pagination + early stop) Keep auto-pagination on but stop early with `max_pages`, `max_results`, or `max_wait_time`: ```python Python theme={null} status = client.get_crawl_status( crawl_job.id, pagination_config=PaginationConfig(max_pages=2, max_results=50, max_wait_time=15), ) print("crawl limited:", status.status, "docs:", len(status.data), "next:", status.next) ``` #### Batch Scrape Use the waiter method `batch_scrape`, or start a job and page manually. ##### Simple batch scrape (auto-pagination, default) * See the default flow in [Batch Scrape](/features/batch-scrape). ##### Manual batch scrape with pagination control Start a job, then fetch one page at a time with `auto_paginate=False`. Use `get_batch_scrape_status_page` to fetch subsequent pages: ```python Python theme={null} batch_job = client.start_batch_scrape(urls) # Fetch first page status = client.get_batch_scrape_status( batch_job.id, pagination_config=PaginationConfig(auto_paginate=False) ) print("First page:", len(status.data), "docs") # Fetch subsequent pages using get_batch_scrape_status_page while status.next: status = client.get_batch_scrape_status_page(status.next) print("Next page:", len(status.data), "docs") ``` ##### Manual batch scrape with limits (auto-pagination + early stop) Keep auto-pagination on but stop early with `max_pages`, `max_results`, or `max_wait_time`: ```python Python theme={null} status = client.get_batch_scrape_status( batch_job.id, pagination_config=PaginationConfig(max_pages=2, max_results=100, max_wait_time=20), ) print("batch limited:", status.status, "docs:", len(status.data), "next:", status.next) ``` ## Error Handling The SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message. ## Async Class For async operations, use the `AsyncFirecrawl` class. Its methods mirror `Firecrawl`, but they don't block the main thread. ```python Python theme={null} import asyncio from firecrawl import AsyncFirecrawl async def main(): firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY") # Scrape doc = await firecrawl.scrape("https://firecrawl.dev", formats=["markdown"]) # type: ignore[arg-type] print(doc.get("markdown")) # Search results = await firecrawl.search("firecrawl", limit=2) print(results.get("web", [])) # Crawl (start + status) started = await firecrawl.start_crawl("https://docs.firecrawl.dev", limit=3) status = await firecrawl.get_crawl_status(started.id) print(status.status) # Batch scrape (wait) job = await firecrawl.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"], poll_interval=1, timeout=60) print(job.status, job.completed, job.total) asyncio.run(main()) ``` ## Browser Launch cloud browser sessions and execute code remotely. ### Create a Session ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") session = app.browser() print(session.id) # session ID print(session.cdp_url) # wss://cdp-proxy.firecrawl.dev/cdp/... print(session.live_view_url) # https://liveview.firecrawl.dev/... ``` ### Execute Code ```python Python theme={null} result = app.browser_execute( session.id, code='await page.goto("https://news.ycombinator.com")\ntitle = await page.title()\nprint(title)', language="python", ) print(result.result) # "Hacker News" ``` Execute JavaScript instead of Python: ```python Python theme={null} result = app.browser_execute( session.id, code='await page.goto("https://example.com"); const t = await page.title(); console.log(t);', language="node", ) ``` ### Profiles Save and reuse browser state (cookies, localStorage, etc.) across sessions: ```python Python theme={null} session = app.browser( ttl=600, profile={ "name": "my-profile", "save_changes": True, }, ) ``` ### Connect via CDP For full Playwright control, connect directly using the CDP URL: ```python Python theme={null} from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.connect_over_cdp(session.cdp_url) context = browser.contexts[0] page = context.pages[0] if context.pages else context.new_page() page.goto("https://example.com") print(page.title()) browser.close() ``` ### List & Close Sessions ```python Python theme={null} # List active sessions sessions = app.list_browsers(status="active") for s in sessions.sessions: print(s.id, s.status, s.created_at) # Close a session app.delete_browser(session.id) ``` ### Scrape-Bound Interactive Session Use a scrape job ID to keep interacting with the replayed page context from that scrape: * `interact(job_id, ...)` runs code in the scrape-bound browser session. * The first `interact` call auto-initializes the session from the scrape context. * Additional `interact` calls on the same job ID reuse that live browser state. * `stop_interaction(job_id)` stops the interactive session when you are done. ```python Python theme={null} doc = app.scrape( "https://example.com", actions=[{"type": "click", "selector": "a[href='/pricing']"}], ) scrape_job_id = doc.metadata_typed.scrape_id if not scrape_job_id: raise RuntimeError("Missing scrape job id") run = app.interact( scrape_job_id, code="print(await page.url())", language="python", timeout=60, ) print(run.stdout) app.stop_interaction(scrape_job_id) ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Ruby Source: https://docs.firecrawl.dev/sdks/ruby Firecrawl Ruby SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation The official Ruby SDK is maintained in the Firecrawl monorepo at [apps/ruby-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/ruby-sdk). To install the Firecrawl Ruby SDK, add it to your project: <Tabs> <Tab title="Bundler"> Add to your `Gemfile`: ```ruby theme={null} gem "firecrawl-sdk", "~> 1.0" ``` Then run: ```bash theme={null} bundle install ``` </Tab> <Tab title="Gem install"> ```bash theme={null} gem install firecrawl-sdk ``` </Tab> </Tabs> <Note>Requires Ruby 3.0 or later.</Note> ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it directly with `Firecrawl::Client.new(api_key: ...)` Here is a quick example using the current SDK API surface: ```ruby theme={null} require "firecrawl" client = Firecrawl::Client.from_env doc = client.scrape( "https://firecrawl.dev", Firecrawl::Models::ScrapeOptions.new(formats: ["markdown"]) ) job = client.crawl( "https://firecrawl.dev", Firecrawl::Models::CrawlOptions.new(limit: 5) ) puts doc.markdown puts "Crawled pages: #{job.data&.size || 0}" ``` ### Scraping a URL To scrape a single URL, use the `scrape` method. ```ruby theme={null} doc = client.scrape( "https://firecrawl.dev", Firecrawl::Models::ScrapeOptions.new( formats: ["markdown", "html"], only_main_content: true, wait_for: 5000 ) ) puts doc.markdown puts doc.metadata["title"] ``` #### JSON Extraction Extract structured JSON via the `scrape` endpoint by including a `json` format with a prompt and schema: ```ruby theme={null} doc = client.scrape( "https://example.com/product", Firecrawl::Models::ScrapeOptions.new( formats: [ { "type" => "json", "prompt" => "Extract the product name and price", "schema" => { "type" => "object", "properties" => { "name" => { "type" => "string" }, "price" => { "type" => "number" } } } } ] ) ) puts doc.json ``` ### Crawling a Website To crawl a website and wait for completion, use `crawl`. It auto-polls until the job finishes. ```ruby theme={null} job = client.crawl( "https://firecrawl.dev", Firecrawl::Models::CrawlOptions.new( limit: 50, max_discovery_depth: 3, scrape_options: Firecrawl::Models::ScrapeOptions.new( formats: ["markdown"] ) ) ) puts "Status: #{job.status}" puts "Progress: #{job.completed}/#{job.total}" job.data&.each do |page| puts page.metadata["sourceURL"] end ``` ### Start a Crawl Start a job without waiting using `start_crawl`. ```ruby theme={null} response = client.start_crawl( "https://firecrawl.dev", Firecrawl::Models::CrawlOptions.new(limit: 100) ) puts "Job ID: #{response.id}" ``` ### Checking Crawl Status Check crawl progress with `get_crawl_status`. ```ruby theme={null} status = client.get_crawl_status(response.id) puts "Status: #{status.status}" puts "Progress: #{status.completed}/#{status.total}" ``` ### Cancelling a Crawl Cancel a running crawl with `cancel_crawl`. ```ruby theme={null} result = client.cancel_crawl(response.id) puts result ``` ### Mapping a Website Discover links on a site using `map`. ```ruby theme={null} data = client.map( "https://firecrawl.dev", Firecrawl::Models::MapOptions.new( limit: 100, search: "blog" ) ) data.links&.each do |link| puts "#{link["url"]} - #{link["title"]}" end ``` ### Searching the Web Search with optional settings using `search`. ```ruby theme={null} results = client.search( "firecrawl web scraping", Firecrawl::Models::SearchOptions.new(limit: 10) ) results.web&.each do |result| puts "#{result["title"]} - #{result["url"]}" end ``` ### Batch Scraping Scrape multiple URLs in parallel using `batch_scrape`. ```ruby theme={null} job = client.batch_scrape( ["https://firecrawl.dev", "https://firecrawl.dev/blog"], Firecrawl::Models::BatchScrapeOptions.new( options: Firecrawl::Models::ScrapeOptions.new( formats: ["markdown"] ) ) ) job.data&.each do |doc| puts doc.markdown end ``` ### Agent Run an AI-powered agent with `agent`. ```ruby theme={null} result = client.agent( Firecrawl::Models::AgentOptions.new( prompt: "Find the pricing plans for Firecrawl and compare them" ) ) puts result.data ``` With a JSON schema for structured output: ```ruby theme={null} result = client.agent( Firecrawl::Models::AgentOptions.new( prompt: "Extract pricing plan details", urls: ["https://firecrawl.dev"], schema: { "type" => "object", "properties" => { "plans" => { "type" => "array", "items" => { "type" => "object", "properties" => { "name" => { "type" => "string" }, "price" => { "type" => "string" } } } } } } ) ) puts result.data ``` ### Usage & Metrics Check concurrency and remaining credits: ```ruby theme={null} concurrency = client.get_concurrency puts "Concurrency: #{concurrency.concurrency}/#{concurrency.max_concurrency}" credits = client.get_credit_usage puts "Remaining credits: #{credits.remaining_credits}" ``` ## Browser The Ruby SDK includes Browser Sandbox helpers. ### Scrape-Bound Interactive Session Use a scrape job ID to run follow-up browser code in the same replayed context: * `interact(...)` runs code in the scrape-bound browser session (and initializes it on first use). * `stop_interactive_browser(...)` explicitly stops the interactive session when you are done. ```ruby theme={null} scrape_job_id = "550e8400-e29b-41d4-a716-446655440000" run = client.interact( scrape_job_id, "console.log(page.url());", language: "node", timeout: 60 ) puts run["stdout"] deleted = client.stop_interactive_browser(scrape_job_id) puts "Deleted: #{deleted["success"]}" ``` ## Configuration `Firecrawl::Client.new` supports the following options: | Option | Type | Default | Description | | ---------------- | --------- | ---------------------------------------------------- | ---------------------------------------- | | `api_key` | `String` | `FIRECRAWL_API_KEY` env var | Your Firecrawl API key | | `api_url` | `String` | `https://api.firecrawl.dev` (or `FIRECRAWL_API_URL`) | API base URL | | `timeout` | `Integer` | `300` | HTTP request timeout in seconds | | `max_retries` | `Integer` | `3` | Automatic retries for transient failures | | `backoff_factor` | `Float` | `0.5` | Exponential backoff factor in seconds | ```ruby theme={null} client = Firecrawl::Client.new( api_key: "fc-your-api-key", api_url: "https://api.firecrawl.dev", timeout: 300, max_retries: 3, backoff_factor: 0.5 ) ``` ## Error Handling The SDK raises exceptions under the `Firecrawl` module. ```ruby theme={null} begin doc = client.scrape("https://example.com") rescue Firecrawl::AuthenticationError => e puts "Auth failed: #{e.message}" rescue Firecrawl::RateLimitError => e puts "Rate limited: #{e.message}" rescue Firecrawl::JobTimeoutError => e puts "Job #{e.job_id} timed out after #{e.timeout_seconds}s" rescue Firecrawl::FirecrawlError => e puts "Error (#{e.status_code}): #{e.message}" end ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Rust Source: https://docs.firecrawl.dev/sdks/rust Firecrawl Rust SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown. ## Installation The official Rust SDK is maintained in the Firecrawl monorepo at [apps/rust-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/rust-sdk). To install the Firecrawl Rust SDK, add the dependency from [crates.io](https://crates.io/crates/firecrawl): ```toml theme={null} [dependencies] firecrawl = "2" tokio = { version = "1", features = ["full"] } serde_json = "1" ``` Or install via Cargo: ```bash theme={null} cargo add firecrawl cargo add tokio --features full cargo add serde_json ``` <Note>Requires Rust 1.70 or later.</Note> ## Usage 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it directly to `Client::new(...)` Here is a quick example using the SDK: ```rust theme={null} use firecrawl::{Client, CrawlOptions, ScrapeOptions, Format}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let client = Client::new("fc-YOUR-API-KEY")?; let doc = client.scrape( "https://firecrawl.dev", ScrapeOptions { formats: Some(vec![Format::Markdown]), ..Default::default() }, ).await?; let job = client.crawl( "https://firecrawl.dev", CrawlOptions { limit: Some(5), ..Default::default() }, ).await?; println!("{}", doc.markdown.unwrap_or_default()); println!("Crawled pages: {}", job.data.len()); Ok(()) } ``` ### Scraping a URL To scrape a single URL, use the `scrape` method. ```rust theme={null} use firecrawl::{Client, ScrapeOptions, Format}; let doc = client.scrape( "https://firecrawl.dev", ScrapeOptions { formats: Some(vec![Format::Markdown, Format::Html]), only_main_content: Some(true), wait_for: Some(5000), ..Default::default() }, ).await?; println!("{}", doc.markdown.unwrap_or_default()); if let Some(meta) = &doc.metadata { println!("{:?}", meta.title); } ``` #### JSON Extraction Extract structured JSON using `scrape_with_schema`: ```rust theme={null} use firecrawl::Client; use serde_json::json; let schema = json!({ "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" } } }); let data = client.scrape_with_schema( "https://example.com/product", schema, Some("Extract the product name and price"), ).await?; println!("{}", serde_json::to_string_pretty(&data)?); ``` Or configure JSON extraction via `ScrapeOptions` directly: ```rust theme={null} use firecrawl::{Client, ScrapeOptions, Format, JsonOptions}; use serde_json::json; let doc = client.scrape( "https://example.com/product", ScrapeOptions { formats: Some(vec![Format::Json]), json_options: Some(JsonOptions { schema: Some(json!({ "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" } } })), prompt: Some("Extract the product name and price".to_string()), ..Default::default() }), ..Default::default() }, ).await?; println!("{:?}", doc.json); ``` ### Crawling a Website To crawl a website and wait for completion, use `crawl`. ```rust theme={null} use firecrawl::{Client, CrawlOptions, ScrapeOptions, Format}; let job = client.crawl( "https://firecrawl.dev", CrawlOptions { limit: Some(50), max_discovery_depth: Some(3), scrape_options: Some(ScrapeOptions { formats: Some(vec![Format::Markdown]), ..Default::default() }), ..Default::default() }, ).await?; println!("Status: {:?}", job.status); println!("Progress: {}/{}", job.completed, job.total); for page in &job.data { if let Some(meta) = &page.metadata { println!("{:?}", meta.source_url); } } ``` ### Start a Crawl Start a job without waiting using `start_crawl`. ```rust theme={null} use firecrawl::{Client, CrawlOptions}; let start = client.start_crawl( "https://firecrawl.dev", CrawlOptions { limit: Some(100), ..Default::default() }, ).await?; println!("Job ID: {}", start.id); ``` ### Checking Crawl Status Check crawl progress with `get_crawl_status`. ```rust theme={null} let status = client.get_crawl_status(&start.id).await?; println!("Status: {:?}", status.status); println!("Progress: {}/{}", status.completed, status.total); ``` ### Cancelling a Crawl Cancel a running crawl with `cancel_crawl`. ```rust theme={null} let result = client.cancel_crawl(&start.id).await?; println!("{:?}", result); ``` ### Checking Crawl Errors Retrieve errors from a crawl job with `get_crawl_errors`. ```rust theme={null} let errors = client.get_crawl_errors(&start.id).await?; println!("{:?}", errors); ``` ### Mapping a Website Discover links on a site using `map`. ```rust theme={null} use firecrawl::{Client, MapOptions}; let response = client.map( "https://firecrawl.dev", MapOptions { limit: Some(100), search: Some("blog".to_string()), ..Default::default() }, ).await?; for link in &response.links { println!("{} - {}", link.url, link.title.as_deref().unwrap_or("")); } ``` For a simpler result with just URLs, use `map_urls`: ```rust theme={null} let urls = client.map_urls("https://firecrawl.dev", None).await?; for url in &urls { println!("{}", url); } ``` ### Searching the Web Search with optional settings using `search`. ```rust theme={null} use firecrawl::{Client, SearchOptions}; let results = client.search( "firecrawl web scraping", SearchOptions { limit: Some(10), ..Default::default() }, ).await?; if let Some(web) = results.data.web { for item in web { match item { firecrawl::SearchResultOrDocument::WebResult(r) => { println!("{} - {}", r.url, r.title.unwrap_or_default()); } firecrawl::SearchResultOrDocument::Document(d) => { println!("{}", d.markdown.unwrap_or_default()); } } } } ``` For a convenience method that returns scraped documents directly: ```rust theme={null} let docs = client.search_and_scrape("firecrawl web scraping", 5).await?; for doc in &docs { println!("{}", doc.markdown.as_deref().unwrap_or("")); } ``` ### Batch Scraping Scrape multiple URLs in parallel using `batch_scrape`. ```rust theme={null} use firecrawl::{Client, BatchScrapeOptions, ScrapeOptions, Format}; let urls = vec![ "https://firecrawl.dev".to_string(), "https://firecrawl.dev/blog".to_string(), ]; let job = client.batch_scrape( urls, BatchScrapeOptions { options: Some(ScrapeOptions { formats: Some(vec![Format::Markdown]), ..Default::default() }), ..Default::default() }, ).await?; for doc in &job.data { println!("{}", doc.markdown.as_deref().unwrap_or("")); } ``` ### Agent Run an AI-powered agent with `agent`. ```rust theme={null} use firecrawl::{Client, AgentOptions}; let result = client.agent( AgentOptions { prompt: "Find the pricing plans for Firecrawl and compare them".to_string(), ..Default::default() }, ).await?; println!("{:?}", result.data); ``` With a JSON schema for structured output: ```rust theme={null} use firecrawl::{Client, AgentOptions, AgentModel}; use serde::Deserialize; use serde_json::json; #[derive(Debug, Deserialize)] struct PricingPlan { name: String, price: String, } #[derive(Debug, Deserialize)] struct PricingData { plans: Vec<PricingPlan>, } let schema = json!({ "type": "object", "properties": { "plans": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "string" } } } } } }); let result: Option<PricingData> = client.agent_with_schema( vec!["https://firecrawl.dev".to_string()], "Extract pricing plan details", schema, ).await?; if let Some(data) = result { for plan in &data.plans { println!("{}: {}", plan.name, plan.price); } } ``` ## Scrape-Bound Interactive Session Use a scrape job ID to run follow-up browser code in the same context: * `interact(...)` runs code or prompts in the scrape-bound browser session. * `stop_interaction(...)` stops the interactive session when you are done. ```rust theme={null} use firecrawl::{Client, ScrapeExecuteOptions, ScrapeExecuteLanguage}; let scrape_job_id = "550e8400-e29b-41d4-a716-446655440000"; // Run code in the browser session let run = client.interact( scrape_job_id, ScrapeExecuteOptions { code: Some("console.log(await page.title())".to_string()), language: Some(ScrapeExecuteLanguage::Node), timeout: Some(60), ..Default::default() }, ).await?; println!("{:?}", run.stdout); // Or use a natural language prompt let run = client.interact( scrape_job_id, ScrapeExecuteOptions { prompt: Some("Click the pricing tab and summarize the plans".to_string()), ..Default::default() }, ).await?; // Stop the session when done client.stop_interaction(scrape_job_id).await?; ``` ## Configuration `Client::new(...)` and `Client::new_selfhosted(...)` create the client. | Option | Description | | ------------------------------------------ | ----------------------------------------------------------------------------- | | `Client::new(api_key)` | Create a client for the Firecrawl cloud service (`https://api.firecrawl.dev`) | | `Client::new_selfhosted(api_url, api_key)` | Create a client for a self-hosted Firecrawl instance | ```rust theme={null} use firecrawl::Client; // Cloud service let client = Client::new("fc-your-api-key")?; // Self-hosted let client = Client::new_selfhosted( "http://localhost:3002", Some("fc-your-api-key"), )?; // Self-hosted without authentication let client = Client::new_selfhosted( "http://localhost:3002", None::<&str>, )?; ``` ### Environment Variable Set the `FIRECRAWL_API_KEY` environment variable instead of passing the key directly: ```bash theme={null} export FIRECRAWL_API_KEY=fc-YOUR-API-KEY ``` ```rust theme={null} let api_key = std::env::var("FIRECRAWL_API_KEY") .expect("FIRECRAWL_API_KEY must be set"); let client = Client::new(api_key)?; ``` ### Poll Intervals Synchronous methods (`crawl`, `batch_scrape`, `agent`) poll until completion. You can customize the poll interval via the options struct: ```rust theme={null} use firecrawl::CrawlOptions; let options = CrawlOptions { limit: Some(50), poll_interval: Some(3000), // Poll every 3 seconds (default: 2000ms) ..Default::default() }; ``` ## Error Handling The SDK uses the `FirecrawlError` enum, which implements `Error`, `Debug`, and `Display`. All methods return `Result<T, FirecrawlError>`. ```rust theme={null} use firecrawl::{Client, FirecrawlError}; match client.scrape("https://example.com", None).await { Ok(doc) => println!("{}", doc.markdown.unwrap_or_default()), Err(FirecrawlError::HttpRequestFailed(action, status, msg)) => { eprintln!("HTTP {}: {} ({})", status, msg, action); } Err(FirecrawlError::APIError(action, api_err)) => { eprintln!("API error ({}): {}", action, api_err.error); } Err(FirecrawlError::JobFailed(msg)) => { eprintln!("Job failed: {}", msg); } Err(FirecrawlError::Misuse(msg)) => { eprintln!("SDK misuse: {}", msg); } Err(e) => eprintln!("Error: {}", e), } ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Event Types Source: https://docs.firecrawl.dev/webhooks/events Webhook event reference Firecrawl sends webhook events at each stage of a job's lifecycle, so you can track progress, capture results, and handle failures in real time without polling. ## Quick Reference | Event | Trigger | | ------------------------ | ---------------------------------------------------- | | `crawl.started` | Crawl job begins processing | | `crawl.page` | A page is scraped during a crawl | | `crawl.completed` | Crawl job finishes and all pages have been processed | | `batch_scrape.started` | Batch scrape job begins processing | | `batch_scrape.page` | A URL is scraped during a batch scrape | | `batch_scrape.completed` | All URLs in the batch have been processed | | `extract.started` | Extract job begins processing | | `extract.completed` | Extraction finishes successfully | | `extract.failed` | Extraction fails | | `agent.started` | Agent job begins processing | | `agent.action` | Agent executes a tool (scrape, search, etc.) | | `agent.completed` | Agent finishes successfully | | `agent.failed` | Agent encounters an error | | `agent.cancelled` | Agent job is cancelled by the user | ## Payload Structure All webhook events share this structure: ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [...], "metadata": {} } ``` | Field | Type | Description | | ---------- | ------- | ----------------------------------------- | | `success` | boolean | Whether the operation succeeded | | `type` | string | Event type (e.g. `crawl.page`) | | `id` | string | Job ID | | `data` | array | Event-specific data (see examples below) | | `metadata` | object | Custom metadata from your webhook config | | `error` | string | Error message (when `success` is `false`) | ## Crawl Events ### `crawl.started` Sent when the crawl job begins processing. ```json theme={null} { "success": true, "type": "crawl.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `crawl.page` Sent for each page scraped. The `data` array contains the page content and metadata. ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "markdown": "# Page content...", "metadata": { "title": "Page Title", "description": "Page description", "url": "https://example.com/page", "statusCode": 200, "contentType": "text/html", "scrapeId": "550e8400-e29b-41d4-a716-446655440001", "sourceURL": "https://example.com/page", "proxyUsed": "basic", "cacheState": "hit", "cachedAt": "2025-09-03T21:11:25.636Z", "creditsUsed": 1 } } ], "metadata": {} } ``` ### `crawl.completed` Sent when the crawl job finishes and all pages have been processed. ```json theme={null} { "success": true, "type": "crawl.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ## Batch Scrape Events ### `batch_scrape.started` Sent when the batch scrape job begins processing. ```json theme={null} { "success": true, "type": "batch_scrape.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `batch_scrape.page` Sent for each URL scraped. The `data` array contains the page content and metadata. ```json theme={null} { "success": true, "type": "batch_scrape.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "markdown": "# Page content...", "metadata": { "title": "Page Title", "description": "Page description", "url": "https://example.com", "statusCode": 200, "contentType": "text/html", "scrapeId": "550e8400-e29b-41d4-a716-446655440001", "sourceURL": "https://example.com", "proxyUsed": "basic", "cacheState": "miss", "cachedAt": "2025-09-03T23:30:53.434Z", "creditsUsed": 1 } } ], "metadata": {} } ``` ### `batch_scrape.completed` Sent when all URLs in the batch have been processed. ```json theme={null} { "success": true, "type": "batch_scrape.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ## Extract Events ### `extract.started` Sent when the extract job begins processing. ```json theme={null} { "success": true, "type": "extract.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `extract.completed` Sent when extraction finishes successfully. The `data` array contains the extracted data and usage info. ```json theme={null} { "success": true, "type": "extract.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "success": true, "data": { "siteName": "Example Site", "category": "Technology" }, "extractId": "550e8400-e29b-41d4-a716-446655440000", "llmUsage": 0.0020118, "totalUrlsScraped": 1, "sources": { "siteName": ["https://example.com"], "category": ["https://example.com"] } } ], "metadata": {} } ``` ### `extract.failed` Sent when extraction fails. The `error` field contains the failure reason. ```json theme={null} { "success": false, "type": "extract.failed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "error": "Failed to extract data: timeout exceeded", "metadata": {} } ``` ## Agent Events ### `agent.started` Sent when the agent job begins processing. ```json theme={null} { "success": true, "type": "agent.started", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [], "metadata": {} } ``` ### `agent.action` Sent after each tool execution (scrape, search, etc.). ```json theme={null} { "success": true, "type": "agent.action", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 5, "action": "mcptoolsscrape", "input": { "url": "https://example.com" } } ], "metadata": {} } ``` <Note> The `creditsUsed` value in `action` events is an estimate of the total credits used so far. The final accurate credit count is only available in the `completed`, `failed`, or `cancelled` events. </Note> ### `agent.completed` Sent when the agent finishes successfully. The `data` array contains the extracted data and total credits used. ```json theme={null} { "success": true, "type": "agent.completed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 15, "data": { "company": "Example Corp", "industry": "Technology", "founded": 2020 } } ], "metadata": {} } ``` ### `agent.failed` Sent when the agent encounters an error. The `error` field contains the failure reason. ```json theme={null} { "success": false, "type": "agent.failed", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 8 } ], "error": "Max credits exceeded", "metadata": {} } ``` ### `agent.cancelled` Sent when the agent job is cancelled by the user. ```json theme={null} { "success": false, "type": "agent.cancelled", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [ { "creditsUsed": 3 } ], "metadata": {} } ``` ## Event Filtering By default, you receive all events. To subscribe to specific events only, use the `events` array in your webhook config: ```json theme={null} { "url": "https://your-app.com/webhook", "events": ["completed", "failed"] } ``` This is useful if you only care about job completion and don't need per-page updates. # Overview Source: https://docs.firecrawl.dev/webhooks/overview Real-time notifications for your Firecrawl operations Get notified the moment a crawl, batch scrape, extract, or agent job starts, progresses, or finishes. Instead of polling for status, you provide an HTTPS endpoint and Firecrawl delivers events to it in real time. ## Supported Operations | Operation | Events | | ------------ | ------------------------------------------------------- | | Crawl | `started`, `page`, `completed` | | Batch Scrape | `started`, `page`, `completed` | | Extract | `started`, `completed`, `failed` | | Agent | `started`, `action`, `completed`, `failed`, `cancelled` | See [Event Types](/webhooks/events) for full payload details and examples. ## Configuration Add a `webhook` object to your request: ```json JSON theme={null} { "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed", "failed"] } } ``` | Field | Type | Required | Description | | ---------- | ------ | -------- | ------------------------------------- | | `url` | string | Yes | Your endpoint URL (HTTPS) | | `headers` | object | No | Custom headers to include | | `metadata` | object | No | Custom data included in payloads | | `events` | array | No | Event types to receive (default: all) | ## Usage ### Crawl with Webhook ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Batch Scrape with Webhook ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ## Timeouts & Retries Your endpoint must respond with a `2xx` status within 10 seconds. If delivery fails (timeout, non-2xx, or network error), Firecrawl retries automatically: | Retry | Delay after failure | | ----- | ------------------- | | 1st | 1 minute | | 2nd | 5 minutes | | 3rd | 15 minutes | After 3 failed retries, the webhook is marked as failed and no further attempts are made. # Security Source: https://docs.firecrawl.dev/webhooks/security Verify webhook authenticity Firecrawl signs every webhook request using HMAC-SHA256. Verifying signatures ensures requests are authentic and haven't been tampered with. ## Secret Key Your webhook secret is available in the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings. Each account has a unique secret used to sign all webhook requests. <Warning> Keep your webhook secret secure and never expose it publicly. If you believe your secret has been compromised, regenerate it immediately from your account settings. </Warning> ## Signature Verification Each webhook request includes an `X-Firecrawl-Signature` header: ``` X-Firecrawl-Signature: sha256=abc123def456... ``` ### How to Verify 1. Extract the signature from the `X-Firecrawl-Signature` header 2. Get the raw request body (before parsing) 3. Compute HMAC-SHA256 using your secret key 4. Compare signatures using a timing-safe function ### Implementation <CodeGroup> ```js Node/Express theme={null} import crypto from 'crypto'; import express from 'express'; const app = express(); // Use raw body parser for signature verification app.use('/webhook/firecrawl', express.raw({ type: 'application/json' })); app.post('/webhook/firecrawl', (req, res) => { const signature = req.get('X-Firecrawl-Signature'); const webhookSecret = process.env.FIRECRAWL_WEBHOOK_SECRET; if (!signature || !webhookSecret) { return res.status(401).send('Unauthorized'); } // Extract hash from signature header const [algorithm, hash] = signature.split('='); if (algorithm !== 'sha256') { return res.status(401).send('Invalid signature algorithm'); } // Compute expected signature const expectedSignature = crypto .createHmac('sha256', webhookSecret) .update(req.body) .digest('hex'); // Verify signature using timing-safe comparison if (!crypto.timingSafeEqual(Buffer.from(hash, 'hex'), Buffer.from(expectedSignature, 'hex'))) { return res.status(401).send('Invalid signature'); } // Parse and process verified webhook const event = JSON.parse(req.body); console.log('Verified Firecrawl webhook:', event); res.status(200).send('ok'); }); app.listen(3000, () => console.log('Listening on 3000')); ``` ```python Python/Flask theme={null} import hmac import hashlib from flask import Flask, request, abort app = Flask(name) WEBHOOK_SECRET = 'your-webhook-secret-here' # Get from Firecrawl dashboard @app.post('/webhook/firecrawl') def webhook(): signature = request.headers.get('X-Firecrawl-Signature') if not signature: abort(401, 'Missing signature header') # Extract hash from signature header try: algorithm, hash_value = signature.split('=', 1) if algorithm != 'sha256': abort(401, 'Invalid signature algorithm') except ValueError: abort(401, 'Invalid signature format') # Compute expected signature expected_signature = hmac.new( WEBHOOK_SECRET.encode('utf-8'), request.data, hashlib.sha256 ).hexdigest() # Verify signature using timing-safe comparison if not hmac.compare_digest(hash_value, expected_signature): abort(401, 'Invalid signature') # Parse and process verified webhook event = request.get_json(force=True) print('Verified Firecrawl webhook:', event) return 'ok', 200 if name == 'main': app.run(port=3000) ``` </CodeGroup> ## Best Practices * Verify every request. Always check the signature before processing a webhook payload. Reject any request that fails verification with a `401` status. * Use timing-safe comparisons. Standard string comparison can leak timing information. Use `crypto.timingSafeEqual()` in Node.js or `hmac.compare_digest()` in Python. * Serve your endpoint over HTTPS. This ensures webhook payloads are encrypted in transit. # Testing Source: https://docs.firecrawl.dev/webhooks/testing Test and debug webhooks Verify that your webhook integration works before deploying to production. This page covers how to receive webhooks on your local machine and how to diagnose common delivery and verification failures. ## Local Development Webhooks require a publicly reachable URL, so you need to expose your local server to the internet during development. ### Using Cloudflare Tunnels [Cloudflare Tunnels](https://github.com/cloudflare/cloudflared/releases) provide a free way to expose your local server without opening firewall ports: ```bash theme={null} cloudflared tunnel --url localhost:3000 ``` You'll get a public URL like `https://abc123.trycloudflare.com`. Use this in your webhook config: ```json theme={null} { "url": "https://abc123.trycloudflare.com/webhook" } ``` ## Troubleshooting ### Webhooks Not Arriving * Endpoint not accessible - Verify your server is publicly reachable and firewalls allow incoming connections * Using HTTP - Webhook URLs must use HTTPS * Wrong events - Check the `events` filter in your webhook config ### Signature Verification Failing The most common cause is using the parsed JSON body instead of the raw request body. ```javascript theme={null} // Wrong - using parsed body const signature = crypto .createHmac('sha256', secret) .update(JSON.stringify(req.body)) .digest('hex'); // Correct - using raw body app.use('/webhook', express.raw({ type: 'application/json' })); app.post('/webhook', (req, res) => { const signature = crypto .createHmac('sha256', secret) .update(req.body) // Raw buffer .digest('hex'); }); ``` ### Other Issues * Wrong secret - Verify you're using the correct secret from your [account settings](https://www.firecrawl.dev/app/settings?tab=advanced) * Timeout errors - Ensure your endpoint responds within 10 seconds