# Advanced Scraping Guide Source: https://docs.firecrawl.dev/advanced-scraping-guide Configure scrape options, browser actions, crawl, map, and the agent endpoint with Firecrawl's full API surface. Reference for every option across Firecrawl's scrape, crawl, map, and agent endpoints. ## Basic scraping To scrape a single page and get clean markdown content, use the `/scrape` endpoint. ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") doc = firecrawl.scrape("https://firecrawl.dev") print(doc.markdown) ``` ```js Node theme={null} // npm install @mendable/firecrawl-js import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://firecrawl.dev'); console.log(doc.markdown); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ## Scraping PDFs Firecrawl supports PDFs. Use the `parsers` option (e.g., `parsers: ["pdf"]`) when you want to ensure PDF parsing. You can control the parsing strategy with the `mode` option: * **`auto`** (default) — attempts fast text-based extraction first, then falls back to OCR if needed. * **`fast`** — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages. * **`ocr`** — forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page. `{ type: "pdf" }` and `"pdf"` both default to `mode: "auto"`. ```json theme={null} "parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }] ``` ## Scrape options When using the `/scrape` endpoint, you can customize the request with the following options. ### Formats (`formats`) The `formats` array controls which output types the scraper returns. Default: `["markdown"]`. **String formats**: pass the name directly (e.g. `"markdown"`). | Format | Description | | ---------- | ---------------------------------------------------------------------------- | | `markdown` | Page content converted to clean Markdown. | | `html` | Processed HTML with unnecessary elements removed. | | `rawHtml` | Original HTML exactly as returned by the server. | | `links` | All links found on the page. | | `images` | All images found on the page. | | `summary` | An LLM-generated summary of the page content. | | `branding` | Extracts brand identity (colors, fonts, typography, spacing, UI components). | **Object formats**: pass an object with `type` and additional options. | Format | Options | Description | | ---------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | `json` | `prompt?: string`, `schema?: object` | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters). | | `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. | | `changeTracking` | `modes?: ("json" \| "git-diff")[]`, `tag?: string`, `schema?: object`, `prompt?: string` | Track changes between scrapes. Requires `"markdown"` to also be in the formats array. | | `attributes` | `selectors: [{ selector: string, attribute: string }]` | Extract specific HTML attributes from elements matching CSS selectors. | ### Content filtering These parameters control which parts of the page appear in the output. When `onlyMainContent` is `true` (the default), boilerplate (nav, footer, etc.) is stripped. `includeTags` and `excludeTags` are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set `onlyMainContent: false` to use the full page as the starting point for tag filtering. | Parameter | Type | Default | Description | | ----------------- | --------- | ------- | ---------------------------------------------------------------------------- | | `onlyMainContent` | `boolean` | `true` | Return only the main content. Set `false` for the full page. | | `includeTags` | `array` | — | HTML tags, classes, or IDs to include (e.g. `["h1", "p", ".main-content"]`). | | `excludeTags` | `array` | — | HTML tags, classes, or IDs to exclude (e.g. `["#ad", "#footer"]`). | ### Timing and cache | Parameter | Type | Default | Description | | --------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------ | | `waitFor` | `integer` (ms) | `0` | Extra wait time before scraping, on top of smart-wait. Use sparingly. | | `maxAge` | `integer` (ms) | `172800000` | Return a cached version if fresher than this value (default is 2 days). Set `0` to always fetch fresh. | | `timeout` | `integer` (ms) | `60000` | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second). | ### PDF parsing | Parameter | Type | Default | Description | | --------- | ------- | --------- | -------------------------------------------------------------------------------- | | `parsers` | `array` | `["pdf"]` | Controls PDF processing. `[]` to skip parsing and return base64 (1 credit flat). | ```json theme={null} { "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 } ``` | Property | Type | Default | Description | | ---------- | --------------------------- | ------------ | ------------------------------------------------------------------------------------- | | `type` | `"pdf"` | *(required)* | Parser type. | | `mode` | `"fast" \| "auto" \| "ocr"` | `"auto"` | `fast`: text-based extraction only. `auto`: fast with OCR fallback. `ocr`: force OCR. | | `maxPages` | `integer` | — | Cap the number of pages to parse. | ### Actions Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all `wait` actions and `waitFor` must not exceed 60 seconds. | Action | Parameters | Description | | ------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | | `wait` | `milliseconds?: number`, `selector?: string` | Wait for a fixed duration **or** until an element is visible (provide one, not both). When using `selector`, times out after 30 seconds. | | `click` | `selector: string`, `all?: boolean` | Click an element matching the CSS selector. Set `all: true` to click every match. | | `write` | `text: string` | Type text into the currently focused field. You must focus the element with a `click` action first. | | `press` | `key: string` | Press a keyboard key (e.g. `"Enter"`, `"Tab"`, `"Escape"`). | | `scroll` | `direction?: "up" \| "down"`, `selector?: string` | Scroll the page or a specific element. Direction defaults to `"down"`. | | `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max viewport resolution is 7680×4320. | | `scrape` | *(none)* | Capture the current page HTML at this point in the action sequence. | | `executeJavascript` | `script: string` | Run JavaScript code in the page. Return values are available in the `actions.javascriptReturns` array of the response. | | `pdf` | `format?: string`, `landscape?: boolean`, `scale?: number` | Generate a PDF. Supported formats: `"A0"` through `"A6"`, `"Letter"`, `"Legal"`, `"Tabloid"`, `"Ledger"`. Defaults to `"Letter"`. | ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') doc = firecrawl.scrape('https://example.com', { 'actions': [ { 'type': 'wait', 'milliseconds': 1000 }, { 'type': 'click', 'selector': '#accept' }, { 'type': 'scroll', 'direction': 'down' }, { 'type': 'click', 'selector': '#q' }, { 'type': 'write', 'text': 'firecrawl' }, { 'type': 'press', 'key': 'Enter' }, { 'type': 'wait', 'milliseconds': 2000 } ], 'formats': ['markdown'] }) print(doc.markdown) ``` ```js Node theme={null} import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://example.com', { actions: [ { type: 'wait', milliseconds: 1000 }, { type: 'click', selector: '#accept' }, { type: 'scroll', direction: 'down' }, { type: 'click', selector: '#q' }, { type: 'write', text: 'firecrawl' }, { type: 'press', key: 'Enter' }, { type: 'wait', milliseconds: 2000 } ], formats: ['markdown'] }); console.log(doc.markdown); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "wait", "milliseconds": 1000 }, { "type": "click", "selector": "#accept" }, { "type": "scroll", "direction": "down" }, { "type": "click", "selector": "#q" }, { "type": "write", "text": "firecrawl" }, { "type": "press", "key": "Enter" }, { "type": "wait", "milliseconds": 2000 } ], "formats": ["markdown"] }' ``` #### Action execution notes * **Write** requires a preceding `click` to focus the target element. * **Scroll** accepts an optional `selector` to scroll a specific element instead of the page. * **Wait** accepts either `milliseconds` (fixed delay) or `selector` (wait until visible). * Actions run **sequentially**: each step completes before the next begins. * Actions are **not supported for PDFs**. If the URL resolves to a PDF the request will fail. #### Advanced action examples **Taking a screenshot:** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "click", "selector": "#load-more" }, { "type": "wait", "milliseconds": 1000 }, { "type": "screenshot", "fullPage": true, "quality": 80 } ] }' ``` **Clicking multiple elements:** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "click", "selector": ".expand-button", "all": true }, { "type": "wait", "milliseconds": 500 } ], "formats": ["markdown"] }' ``` **Generating a PDF:** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "pdf", "format": "A4", "landscape": false } ] }' ``` **Executing JavaScript (e.g. extracting embedded page data):** ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "actions": [ { "type": "executeJavascript", "script": "document.querySelector(\"#__NEXT_DATA__\").textContent" } ], "formats": ["markdown"] }' ``` The return value of each `executeJavascript` action is captured in the `actions.javascriptReturns` array of the response. ### Full scrape example The following request combines multiple scrape options: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "formats": [ "markdown", "links", "html", "rawHtml", { "type": "screenshot", "fullPage": true, "quality": 80 } ], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000, "parsers": ["pdf"] }' ``` This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to `

`, `

`, ``, and `.main-content` while excluding `#ad` and `#footer`, waits 1 second before scraping, sets a 15 second timeout, and enables PDF parsing. See the full [Scrape API reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape) for details. ## JSON extraction via formats Use the JSON format object in `formats` to extract structured data in one pass: ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') doc = firecrawl.scrape('https://firecrawl.dev', { 'formats': [{ 'type': 'json', 'prompt': 'Extract the features of the product', 'schema': { 'type': 'object', 'properties': { 'features': { 'type': 'object' } }, 'required': ['features'] } }] }) print(doc.json) ``` ```js Node theme={null} import { Firecrawl } from 'firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: [{ type: 'json', prompt: 'Extract the features of the product', schema: { type: 'object', properties: { features: { type: 'object' } }, required: ['features'] } }] }); console.log(doc.json); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [{ "type": "json", "prompt": "Extract the features of the product", "schema": {"type": "object", "properties": {"features": {"type": "object"}}, "required": ["features"]} }] }' ``` ## Agent endpoint Use the `/v2/agent` endpoint for autonomous, multi-page data extraction. The agent runs asynchronously: you start a job, then poll for results. ### Agent options | Parameter | Type | Default | Description | | ----------------------- | --------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `prompt` | `string` | *(required)* | Natural-language instructions describing what data to extract (max 10,000 characters). | | `urls` | `array` | — | URLs to constrain the agent to. | | `schema` | `object` | — | JSON schema to structure the extracted data. | | `maxCredits` | `number` | `2500` | Maximum credits the agent can spend. The dashboard supports up to 2,500; for higher limits, set this via the API (values above 2,500 are always billed as paid requests). | | `strictConstrainToURLs` | `boolean` | `false` | When `true`, the agent only visits the provided URLs. | | `model` | `string` | `"spark-1-mini"` | AI model to use. `"spark-1-mini"` (default, 60% cheaper) or `"spark-1-pro"` (higher accuracy). | ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') # Start agent job started = firecrawl.start_agent( prompt="Extract the title and description", urls=["https://docs.firecrawl.dev"], schema={"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]} ) # Poll status status = firecrawl.get_agent_status(started["id"]) print(status.get("status"), status.get("data")) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start agent job const started = await firecrawl.startAgent({ prompt: 'Extract the title and description', urls: ['https://docs.firecrawl.dev'], schema: { type: 'object', properties: { title: { type: 'string' }, description: { type: 'string' } }, required: ['title'] } }); // Poll status const status = await firecrawl.getAgentStatus(started.id); console.log(status.status, status.data); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/agent \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "prompt": "Extract the title and description", "urls": ["https://docs.firecrawl.dev"], "schema": {"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]} }' ``` ### Check agent status Poll `GET /v2/agent/{jobId}` to check progress. The response `status` field will be `"processing"`, `"completed"`, or `"failed"`. ```bash cURL theme={null} curl -X GET https://api.firecrawl.dev/v2/agent/YOUR-JOB-ID \ -H 'Authorization: Bearer fc-YOUR-API-KEY' ``` The Python and Node SDKs also provide a convenience method (`firecrawl.agent()`) that starts the job and polls automatically until completion. ## Crawling multiple pages To crawl multiple pages, use the `/v2/crawl` endpoint. The crawl runs asynchronously and returns a job ID. Use the `limit` parameter to control how many pages are crawled. If omitted, the crawl will process up to 10,000 pages. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ### Response ```json theme={null} { "id": "1234-5678-9101" } ``` ### Check crawl job Use the job ID to check the status of a crawl and retrieve its results. ```bash cURL theme={null} curl -X GET https://api.firecrawl.dev/v2/crawl/1234-5678-9101 \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' ``` If the content is larger than 10MB or the crawl job is still running, the response may include a `next` parameter, a URL to the next page of results. ### Crawl prompt and params preview You can provide a natural-language `prompt` to let Firecrawl derive crawl settings. Preview them first: ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl/params-preview \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "prompt": "Extract docs and blog" }' ``` ### Crawler options When using the `/v2/crawl` endpoint, you can customize crawling behavior with the following options. #### Path filtering | Parameter | Type | Default | Description | | ---------------- | --------- | ------- | ----------------------------------------------------------------- | | `includePaths` | `array` | — | Regex patterns for URLs to include (pathname only by default). | | `excludePaths` | `array` | — | Regex patterns for URLs to exclude (pathname only by default). | | `regexOnFullURL` | `boolean` | `false` | Match patterns against the full URL instead of just the pathname. | The starting URL is also checked against `includePaths`. If it does not match any of the patterns, the crawl may return 0 pages. #### Crawl scope | Parameter | Type | Default | Description | | -------------------- | ------------ | ------- | ------------------------------------------------------------ | | `maxDiscoveryDepth` | `integer` | — | Max link-depth for discovering new URLs. | | `limit` | `integer` | `10000` | Max pages to crawl. | | `crawlEntireDomain` | `boolean` | `false` | Explore siblings and parents to cover the entire domain. | | `allowExternalLinks` | `boolean` | `false` | Follow links to external domains. | | `allowSubdomains` | `boolean` | `false` | Follow subdomains of the main domain. | | `delay` | `number` (s) | — | Delay between scrapes. Setting this forces concurrency to 1. | #### Sitemap and deduplication | Parameter | Type | Default | Description | | ------------------------ | --------- | ----------- | ------------------------------------------------------------------------------------------------------- | | `sitemap` | `string` | `"include"` | `"include"`: use sitemap + link discovery. `"skip"`: ignore sitemap. `"only"`: crawl only sitemap URLs. | | `deduplicateSimilarURLs` | `boolean` | `true` | Normalize URL variants (`www.`, `https`, trailing slashes, `index.html`) as duplicates. | | `ignoreQueryParameters` | `boolean` | `false` | Strip query strings before deduplication (e.g. `/page?a=1` and `/page?a=2` become one URL). | #### Scrape options for crawl | Parameter | Type | Default | Description | | --------------- | -------- | --------------------------- | ---------------------------------------------------------------------------- | | `scrapeOptions` | `object` | `{ formats: ["markdown"] }` | Per-page scrape config. Accepts all [scrape options](#scrape-options) above. | ### Crawl example ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "includePaths": ["^/blog/.*$", "^/docs/.*$"], "excludePaths": ["^/admin/.*$", "^/private/.*$"], "maxDiscoveryDepth": 2, "limit": 1000 }' ``` ## Mapping website links The `/v2/map` endpoint identifies URLs related to a given website. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ### Map options | Parameter | Type | Default | Description | | ------------------- | --------- | ----------- | ----------------------------------- | | `search` | `string` | — | Filter links by text match. | | `limit` | `integer` | `100` | Max links to return. | | `sitemap` | `string` | `"include"` | `"include"`, `"skip"`, or `"only"`. | | `includeSubdomains` | `boolean` | `true` | Include subdomains. | Here is the API Reference for it: [Map Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/map) ## Whitelisting Firecrawl ### Allowing Firecrawl to scrape your website * **User Agent**: Allow `FirecrawlAgent` in your firewall or security rules. * **IP addresses**: Firecrawl does not use a fixed set of outbound IPs. ### Allowing your application to call the Firecrawl API If your firewall blocks outbound requests from your application to external services, you need to whitelist Firecrawl's API server IP address so your application can reach the Firecrawl API (`api.firecrawl.dev`): * **IP Address**: `35.245.250.27` Add this IP to your firewall's outbound allowlist so your backend can send scrape, crawl, map, and agent requests to Firecrawl. # Billing Source: https://docs.firecrawl.dev/billing How Firecrawl billing, credits, and plans work ## Overview Firecrawl uses a **credit-based billing system**. Every API call you make consumes credits, and the number of credits consumed depends on the endpoint and options you use. You get a monthly credit allotment based on your plan, and can purchase additional credits via auto-recharge packs if you need more. For current plan pricing, visit the [Firecrawl pricing page](https://www.firecrawl.dev/pricing). ## Credits Credits are the unit of usage in Firecrawl. Each plan includes a monthly credit allotment that resets at the start of each billing cycle. Different API endpoints consume different amounts of credits. ### Credit costs per endpoint | Endpoint | Credit Cost | Notes | | ----------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Scrape** | 1 credit / page | Convert a single URL into clean markdown, HTML, or structured data. Additional credits apply when using scrape options (see below). | | **Crawl** | 1 credit / page | Scrape an entire website by following links from a starting URL. The same per-page scrape option costs apply to each page crawled. | | **Map** | 1 credit / call | Discover all URLs on a website without scraping their content. | | **Search** | 2 credits / 10 results | Search the web and optionally scrape the results. Rounded up per 10 results (e.g., 11 results = 4 credits). Additional per-page scrape costs apply to each result that is scraped. Enterprise ZDR search costs 10 credits / 10 results (see [ZDR Search](/features/search#zero-data-retention-zdr)). | | **Browser** | 2 credits / browser minute | Interactive browser sandbox session, billed per minute. | | **Agent** | Dynamic | Autonomous web research agent. 5 daily runs free; usage-based pricing beyond that. | ### Additional credit costs for scrape options Certain scrape options add credits on top of the base cost per page: | Option | Additional Cost | Description | | ---------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------ | | PDF parsing | +1 credit / PDF page | Extract content from PDF documents | | JSON format (LLM extraction) | +4 credits / page | Use an LLM to extract structured JSON data from the page | | Enhanced Mode | +4 credits / page | Improved scraping for pages that are difficult to access | | Zero Data Retention (ZDR) | +1 credit / page | Ensures no data is persisted beyond the request (see [Scrape ZDR](/features/scrape#zero-data-retention-zdr)) | These modifiers stack. For example, scraping a page with both JSON format and Enhanced Mode costs **1 + 4 + 4 = 9 credits** per page. These same modifiers apply to the Crawl and Search endpoints since they use scrape internally for each page. ### When credits are charged For **batch scrape** and **crawl** jobs, credits are billed asynchronously as each page completes processing — not when the job is submitted. This means there can be a delay between submitting a job and seeing the full credit cost reflected on your account. If a batch contains many URLs or pages are queued during high-traffic periods, credits may continue to appear minutes or hours after submission. Polling or checking batch status does not consume credits. ### Tracking your usage You can monitor your credit usage in two ways: * **Dashboard**: View your current and historical usage at [firecrawl.dev/app](https://www.firecrawl.dev/app) * **API**: Use the [Credit Usage](/api-reference/endpoint/credit-usage) and [Credit Usage Historical](/api-reference/endpoint/credit-usage-historical) endpoints to programmatically check your usage We are actively working on improvements to make credit usage easier to understand. Stay tuned for updates. ## Plans Firecrawl offers subscription-based monthly plans. There is no pure pay-as-you-go option, but auto-recharge (described below) provides flexible scaling on top of your base plan. ### Available plans | Plan | Monthly Credits | Concurrent Browsers | | ------------ | --------------- | ------------------: | | **Free** | 500 (one-time) | 2 | | **Hobby** | 3,000 | 5 | | **Standard** | 100,000 | 50 | | **Growth** | 500,000 | 100 | For high-volume needs beyond the Growth plan, Firecrawl offers **Scale** and **Enterprise** plans with 1M+ credits, 150+ concurrent browsers, dedicated support, SLAs, bulk discounts, zero-data retention, and SSO. Visit the [Enterprise page](https://www.firecrawl.dev/enterprise) for details. All paid plans are available with **monthly** or **yearly** billing. Yearly billing offers a discount compared to paying month-to-month. For current pricing on each plan, visit the [pricing page](https://www.firecrawl.dev/pricing). ### Concurrent browsers Concurrent browsers represent how many web pages Firecrawl can process for you simultaneously. Your plan determines this limit. If you exceed it, additional jobs wait in a queue until a slot opens. See [Rate Limits](/rate-limits) for full details on concurrency and API rate limits. ## Auto-Recharge If you occasionally need more credits than your plan includes, you can enable **auto-recharge** from the dashboard. When your remaining credits drop below a threshold, Firecrawl automatically purchases an additional credit pack and adds it to your balance. By default the threshold is **0 credits**, but you can customize it in your [billing settings](https://www.firecrawl.dev/app/settings?tab=billing) to trigger recharges earlier — for example, setting a higher threshold ensures you always have a buffer of credits available, which is useful if you run large crawls. Note that this check occurs when you make an API request — your balance will not be topped up until your next request triggers the recharge. * Auto-recharge packs are available on all paid plans * Pack sizes and prices vary by plan (visible on the [pricing page](https://www.firecrawl.dev/pricing)) * You can toggle auto-recharge on or off at any time * Auto-recharge is limited to **4 packs per month** * Credits from auto-recharge packs **do not reset monthly** — they persist across billing cycles and expire after **1 year** from purchase, unlike your monthly plan credits which reset each period. Auto-recharge is best for handling occasional spikes in usage. If you find yourself consistently exceeding your plan's allotment, upgrading to a higher plan will give you better value per credit. ## Coupons Firecrawl supports two types of coupons: * **Subscription coupons** apply a discount to your plan subscription (e.g. a percentage off your monthly or yearly price). These can **only** be applied during the Stripe checkout flow when you first subscribe to a paid plan or change plans. You cannot apply a subscription coupon after checkout has completed. * **Credit coupons** add bonus credits to your account. These can be redeemed from the **Billing** section of your dashboard at [firecrawl.dev/app/billing](https://www.firecrawl.dev/app/billing). Look for the coupon input field on the billing page to apply your code. Bonus credits from credit coupons are separate from your plan's monthly allotment and persist even if you upgrade or downgrade your plan. If you have a coupon code and are unsure which type it is, try applying it in the billing section of your dashboard first. If it is a subscription coupon, you will need to use it at the Stripe checkout page instead. ## Billing Cycle * **Monthly plans**: Credits reset on your monthly renewal date * **Yearly plans**: You are billed annually, but credits still reset each month on your virtual monthly renewal date * **Unused plan credits do not roll over** — your monthly allotment resets to the plan amount at the start of each billing period. Credits from auto-recharge packs are not tied to your billing cycle — they persist and expire **1 year** from the date of purchase. ## Upgrading and Downgrading * **Upgrades** take effect immediately. You are billed a prorated amount for the remainder of the current billing period, and your credit allotment and limits update right away. * **Downgrades** are scheduled to take effect at your next renewal date. You keep your current plan's credits and limits until then. ## What Happens When You Run Out of Credits If you exhaust your credit allotment and do not have auto-recharge enabled, API requests that consume credits will return an **HTTP 402 (Payment Required)** error. If you have auto-recharge enabled, usage will continue while recharge packs are purchased automatically — but if the monthly pack limit is reached or a recharge fails, your balance may go negative until the next billing cycle. To resume usage after a hard stop, you can: 1. Enable auto-recharge to automatically purchase more credits 2. Upgrade to a higher plan 3. Wait for your credits to reset at the next billing cycle ## Free Plan The Free plan provides a **one-time allotment of 500 credits** with no credit card required. These credits do not renew — once they are used up, you will need to upgrade to a paid plan to continue using Firecrawl. The Free plan also has lower rate limits and concurrency compared to paid plans (see [Rate Limits](/rate-limits)). ## FAQs **Plan credits** do not roll over — your monthly allotment resets at the start of each billing period. However, credits purchased through **auto-recharge packs** are not tied to your billing cycle and carry forward. Auto-recharge credits expire **1 year** from the date of purchase. Check the dashboard at [firecrawl.dev/app](https://www.firecrawl.dev/app), or call the [Credit Usage API endpoint](/api-reference/endpoint/credit-usage) programmatically. Credits are the billing unit for Firecrawl API calls. Tokens refer to LLM tokens used internally by endpoints like Extract and Agent. Each credit is equivalent to 15 tokens. For most users, you only need to think in terms of credits. It depends on the coupon type. **Credit coupons** are applied in the Billing section of your dashboard. **Subscription coupons** (discounts on your plan price) can only be applied at the Stripe checkout page when subscribing or changing plans. Reach out to [help@firecrawl.dev](mailto:help@firecrawl.dev), or visit the [Enterprise page](https://www.firecrawl.dev/enterprise) to learn more about custom plans. # Running Locally Source: https://docs.firecrawl.dev/contributing/guide Set up Firecrawl on your local machine for development and contribution. This guide walks you through running the Firecrawl API server on your local machine. Follow these steps to set up the development environment, start the services, and send your first request. If you're contributing, the process follows standard open-source conventions: fork the repo, make changes, run tests, and open a pull request. For questions or help getting started, reach out to [help@firecrawl.com](mailto:help@firecrawl.com) or [submit an issue](https://github.com/mendableai/firecrawl/issues). ## Prerequisites Install the following before proceeding: | Dependency | Required | Install guide | | ---------- | -------- | ------------------------------------------------------------------------------------- | | Node.js | Yes | [nodejs.org](https://nodejs.org/en/learn/getting-started/how-to-install-nodejs) | | pnpm (v9+) | Yes | [pnpm.io](https://pnpm.io/installation) | | Redis | Yes | [redis.io](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/) | | PostgreSQL | Yes | Via Docker (see below) or installed directly | | Docker | Optional | Required for the PostgreSQL container setup | ## Set up the database You need a PostgreSQL database initialized with the schema at `apps/nuq-postgres/nuq.sql`. The easiest approach is to use the Docker image inside `apps/nuq-postgres`. With Docker running, build and start the container: ```bash theme={null} docker build -t nuq-postgres apps/nuq-postgres ``` ```bash theme={null} docker run --name nuqdb \ -e POSTGRES_PASSWORD=postgres \ -p 5433:5432 \ -v nuq-data:/var/lib/postgresql/data \ -d nuq-postgres ``` ## Configure environment variables Copy the template to create your `.env` file in the `apps/api/` directory: ```bash theme={null} cp apps/api/.env.example apps/api/.env ``` For a minimal local setup without authentication or optional sub-services (PDF parsing, JS blocking, AI features), use the following configuration: ```bash apps/api/.env theme={null} # ===== Required ===== NUM_WORKERS_PER_QUEUE=8 PORT=3002 HOST=0.0.0.0 REDIS_URL=redis://localhost:6379 REDIS_RATE_LIMIT_URL=redis://localhost:6379 ## To turn on DB authentication, you need to set up supabase. USE_DB_AUTHENTICATION=false ## PostgreSQL connection for queuing — change if credentials, host, or DB differ NUQ_DATABASE_URL=postgres://postgres:postgres@localhost:5433/postgres # ===== Optional ===== # SUPABASE_ANON_TOKEN= # SUPABASE_URL= # SUPABASE_SERVICE_TOKEN= # TEST_API_KEY= # Set if you've configured authentication and want to test with a real API key # OPENAI_API_KEY= # Required for LLM-dependent features (image alt generation, etc.) # BULL_AUTH_KEY=@ # PLAYWRIGHT_MICROSERVICE_URL= # Set to run a Playwright fallback # LLAMAPARSE_API_KEY= # Set to parse PDFs with LlamaParse # SLACK_WEBHOOK_URL= # Set to send Slack server health status messages # POSTHOG_API_KEY= # Set to send PostHog events like job logs # POSTHOG_HOST= # Set to send PostHog events like job logs ``` ## Install dependencies From the `apps/api/` directory, install packages with pnpm: ```bash theme={null} cd apps/api pnpm install ``` ## Start the services You need three terminal sessions running simultaneously: Redis, the API server, and a terminal for sending requests. ### Terminal 1 — Redis Start the Redis server from anywhere in the project: ```bash theme={null} redis-server ``` ### Terminal 2 — API server Navigate to `apps/api/` and start the service: ```bash theme={null} pnpm start ``` This starts the API server and the workers responsible for processing crawl jobs. If you plan to use the [LLM extract feature](https://github.com/firecrawl/firecrawl/pull/586/), export your OpenAI key first: `export OPENAI_API_KEY=sk-...` ### Terminal 3 — Send a test request Verify the server is running with a health check: ```bash theme={null} curl -X GET http://localhost:3002/test ``` This should return `Hello, world!`. To test the crawl endpoint: ```bash theme={null} curl -X POST http://localhost:3002/v1/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://mendable.ai" }' ``` ## Alternative: Docker Compose For a simpler setup, Docker Compose runs all services (Redis, API server, and workers) in a single command. 1. Make sure Docker and Docker Compose are installed. 2. Copy `.env.example` to `.env` in the `apps/api/` directory and configure as needed. 3. From the project root, run: ```bash theme={null} docker compose up ``` This starts all services automatically in the correct configuration. ## Running tests Run the test suite with: ```bash theme={null} npm run test:snips ``` # Open Source vs Cloud Source: https://docs.firecrawl.dev/contributing/open-source-or-cloud Understand the differences between Firecrawl's open-source and cloud offerings Firecrawl is open source under the [AGPL-3.0 license](https://github.com/firecrawl/firecrawl/blob/main/LICENSE). To deliver the best possible product, we also offer a hosted cloud version at [firecrawl.dev](https://firecrawl.dev) with additional features and managed infrastructure. ## Feature comparison The open-source version includes the core scraping and crawling engine. Firecrawl Cloud adds infrastructure-level capabilities, a management dashboard, and enterprise features on top of everything in the open-source edition. Firecrawl Cloud vs Open Source | Feature | Open Source | Cloud | | ------------------- | :---------: | :---: | | Scrape | ✔ | ✔ | | Crawl | ✔ | ✔ | | Map | ✔ | ✔ | | Search | ✔ | ✔ | | Batch scrape | ✔ | ✔ | | Extract | ✔ | ✔ | | JSON mode | ✔ | ✔ | | LLM-ready formats | ✔ | ✔ | | Change tracking | ✔ | ✔ | | SDKs | ✔ | ✔ | | Agent | ✘ | ✔ | | Browser sandbox | ✘ | ✔ | | Actions | ✘ | ✔ | | Enhanced proxies | ✘ | ✔ | | Proxy rotations | ✘ | ✔ | | Dashboard | ✘ | ✔ | | Enterprise features | ✘ | ✔ | ## Why both? The cloud solution allows us to continuously innovate and maintain a high-quality, sustainable service for all users. Revenue from Firecrawl Cloud funds ongoing development of the open-source project. * **Open source** — Run Firecrawl on your own infrastructure with full control over the core scraping engine. See [Running locally](/contributing/guide) and [Self-hosting](/contributing/self-host). * **Cloud** — Get started immediately with a managed service that includes enhanced proxies, a usage dashboard, and enterprise features. Sign up at [firecrawl.dev](https://firecrawl.dev). # Self-hosting Source: https://docs.firecrawl.dev/contributing/self-host Learn how to self-host Firecrawl to run on your own and contribute to the project. #### Contributor? Welcome to [Firecrawl](https://firecrawl.dev) 🔥! Here are some instructions on how to get the project locally so you can run it on your own and contribute. If you're contributing, note that the process is similar to other open-source repos, i.e., fork Firecrawl, make changes, run tests, PR. If you have any questions or would like help getting on board, join our Discord community [here](https://discord.gg/firecrawl) for more information or submit an issue on Github [here](https://github.com/firecrawl/firecrawl/issues/new/choose)! ## Self-hosting Firecrawl Refer to [SELF\_HOST.md](https://github.com/firecrawl/firecrawl/blob/main/SELF_HOST.md) for instructions on how to run it locally. ## Why? Self-hosting Firecrawl is particularly beneficial for organizations with stringent security policies that require data to remain within controlled environments. Here are some key reasons to consider self-hosting: * **Enhanced Security and Compliance:** By self-hosting, you ensure that all data handling and processing complies with internal and external regulations, keeping sensitive information within your secure infrastructure. Note that Firecrawl is a Mendable product and relies on SOC2 Type2 certification, which means that the platform adheres to high industry standards for managing data security. * **Customizable Services:** Self-hosting allows you to tailor the services, such as the Playwright service, to meet specific needs or handle particular use cases that may not be supported by the standard cloud offering. * **Learning and Community Contribution:** By setting up and maintaining your own instance, you gain a deeper understanding of how Firecrawl works, which can also lead to more meaningful contributions to the project. ### Considerations However, there are some limitations and additional responsibilities to be aware of: 1. **Limited Access to Fire-engine:** Currently, self-hosted instances of Firecrawl do not have access to Fire-engine, which includes advanced features for handling IP blocks, robot detection mechanisms, and more. This means that while you can manage basic scraping tasks, more complex scenarios might require additional configuration or might not be supported. 2. **Manual Configuration Required:** If you need to use scraping methods beyond the basic fetch and Playwright options, you will need to manually configure these in the `.env` file. This requires a deeper understanding of the technologies and might involve more setup time. \| Capability | Cloud | Self-hosting | \| --- | --- | --- | \| All API endpoints supported | Yes | Not always; `/agent` and `/browser` are not supported in self-hosting | \| Screenshot support | Yes | Yes, when the Playwright service is running | \| Local LLMs (e.g., Ollama) | Not supported | Supported via `OLLAMA_BASE_URL` (experimental) | Self-hosting Firecrawl is ideal for those who need full control over their scraping and data processing environments but comes with the trade-off of additional maintenance and configuration efforts. ## Steps 1. First, start by installing the dependencies * Docker [instructions](https://docs.docker.com/get-docker/) 2. Set environment variables Create an `.env` in the root directory you can copy over the template in `apps/api/.env.example` To start, we wont set up authentication, or any optional sub services (pdf parsing, JS blocking support, AI features) ``` # .env # ===== Required ENVS ====== PORT=3002 HOST=0.0.0.0 # Note: PORT is used by both the main API server and worker liveness check endpoint # To turn on DB authentication, you need to set up Supabase. USE_DB_AUTHENTICATION=false # ===== Optional ENVS ====== ## === AI features (JSON format on scrape, /extract API) === # Provide your OpenAI API key here to enable AI features # OPENAI_API_KEY= # Experimental: Use Ollama # OLLAMA_BASE_URL=http://localhost:11434/api # MODEL_NAME=deepseek-r1:7b # MODEL_EMBEDDING_NAME=nomic-embed-text # Experimental: Use any OpenAI-compatible API # OPENAI_BASE_URL=https://example.com/v1 # OPENAI_API_KEY= ## === Proxy === # PROXY_SERVER can be a full URL (e.g. http://0.1.2.3:1234) or just an IP and port combo (e.g. 0.1.2.3:1234) # Do not uncomment PROXY_USERNAME and PROXY_PASSWORD if your proxy is unauthenticated # PROXY_SERVER= # PROXY_USERNAME= # PROXY_PASSWORD= ## === /search API === # You can specify a SearXNG server with the JSON format enabled, if you'd like to use that instead of direct Google. # You can also customize the engines and categories parameters, but the defaults should also work just fine. # SEARXNG_ENDPOINT=http://your.searxng.server # SEARXNG_ENGINES= # SEARXNG_CATEGORIES= ## === Other === # Supabase Setup (used to support DB authentication, advanced logging, etc.) # SUPABASE_ANON_TOKEN= # SUPABASE_URL= # SUPABASE_SERVICE_TOKEN= # Use if you've set up authentication and want to test with a real API key # TEST_API_KEY= # This key lets you access the queue admin panel. Change this if your deployment is publicly accessible. BULL_AUTH_KEY=CHANGEME # This is now autoconfigured by the docker-compose.yaml. You shouldn't need to set it. # PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape # REDIS_URL=redis://redis:6379 # REDIS_RATE_LIMIT_URL=redis://redis:6379 # Set if you have a llamaparse key you'd like to use to parse pdfs # LLAMAPARSE_API_KEY= # Set if you'd like to send server health status messages to Slack # SLACK_WEBHOOK_URL= # Set if you'd like to send posthog events like job logs # POSTHOG_API_KEY= # POSTHOG_HOST= ## === System Resource Configuration === # Maximum CPU usage threshold (0.0-1.0). Worker will reject new jobs when CPU usage exceeds this value. # Default: 0.8 (80%) # MAX_CPU=0.8 # Maximum RAM usage threshold (0.0-1.0). Worker will reject new jobs when memory usage exceeds this value. # Default: 0.8 (80%) # MAX_RAM=0.8 # Set if you'd like to allow local webhooks to be sent to your self-hosted instance # ALLOW_LOCAL_WEBHOOKS=true ``` The following AI features require an LLM provider configured (e.g., `OPENAI_API_KEY` or alternatives in the AI features section above): * JSON format on scrape * /extract API * Summary format * Branding format * Change tracking format 3. *(Optional) Running with TypeScript Playwright Service* * Update the `docker-compose.yml` file to change the Playwright service: ```plaintext theme={null} build: apps/playwright-service ``` TO ```plaintext theme={null} build: apps/playwright-service-ts ``` * Set the `PLAYWRIGHT_MICROSERVICE_URL` in your `.env` file: ```plaintext theme={null} PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/scrape ``` * Don't forget to set the proxy server in your `.env` file as needed. 4. Build and run the Docker containers: ```bash theme={null} docker compose build docker compose up ``` This will run a local instance of Firecrawl which can be accessed at `http://localhost:3002`. You should be able to see the Bull Queue Manager UI on `http://localhost:3002/admin/{BULL_AUTH_KEY}/queues`. 5. *(Optional)* Test the API If you’d like to test the crawl endpoint, you can run this: ```bash theme={null} curl -X POST http://localhost:3002/v2/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://docs.firecrawl.dev" }' ``` ## Troubleshooting This section provides solutions to common issues you might encounter while setting up or running your self-hosted instance of Firecrawl. ### Supabase client is not configured **Symptom:** ```bash theme={null} [YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Attempted to access Supabase client when it's not configured. [YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Error inserting scrape event: Error: Supabase client is not configured. ``` **Explanation:** This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances. ### You're bypassing authentication **Symptom:** ```bash theme={null} [YYYY-MM-DDTHH:MM:SS.SSSz]WARN - You're bypassing authentication ``` **Explanation:** This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances. ### Docker containers fail to start **Symptom:** Docker containers exit unexpectedly or fail to start. **Solution:** Check the Docker logs for any error messages using the command: ```bash theme={null} docker logs [container_name] ``` * Ensure all required environment variables are set correctly in the .env file. * Verify that all Docker services defined in docker-compose.yml are correctly configured and the necessary images are available. ### Connection issues with Redis **Symptom:** Errors related to connecting to Redis, such as timeouts or "Connection refused". **Solution:** * Ensure that the Redis service is up and running in your Docker environment. * Verify that the REDIS\_URL and REDIS\_RATE\_LIMIT\_URL in your .env file point to the correct Redis instance. * Check network settings and firewall rules that may block the connection to the Redis port. ### API endpoint does not respond **Symptom:** API requests to the Firecrawl instance timeout or return no response. **Solution:** * Ensure that the Firecrawl service is running by checking the Docker container status. * Verify that the PORT and HOST settings in your .env file are correct and that no other service is using the same port. * Check the network configuration to ensure that the host is accessible from the client making the API request. By addressing these common issues, you can ensure a smoother setup and operation of your self-hosted Firecrawl instance. ## Install Firecrawl on a Kubernetes Cluster (Simple Version) Read the [examples/kubernetes-cluster-install/README.md](https://github.com/firecrawl/firecrawl/tree/main/examples/kubernetes/cluster-install#readme) for instructions on how to install Firecrawl on a Kubernetes Cluster. # Overview Source: https://docs.firecrawl.dev/dashboard Overview of the Firecrawl dashboard and its key features The [Firecrawl dashboard](https://www.firecrawl.dev/app) is where you manage your account, test endpoints, and monitor usage. Below is a quick tour of each section. ## Playground The playground lets you try Firecrawl endpoints directly in the browser before integrating them into your code. * **[Scrape](https://www.firecrawl.dev/app/playground?endpoint=scrape)** — Extract content from a single page. * **[Search](https://www.firecrawl.dev/app/playground?endpoint=search)** — Search the web and get scraped results. * **[Crawl](https://www.firecrawl.dev/app/playground?endpoint=crawl)** — Crawl an entire website and extract content from every page. * **[Map](https://www.firecrawl.dev/app/playground?endpoint=map)** — Discover all URLs on a website. ## Browser [Interact with the web](https://www.firecrawl.dev/app/browser) through a live browser session. You can create persistent profiles, run actions, and take screenshots — useful for pages that require authentication or complex interaction. ## Agent The [Agent](https://www.firecrawl.dev/app/agent) is an AI-powered research tool that can autonomously browse the web, follow links, and extract structured data based on a prompt. ## Activity Logs [Activity Logs](https://www.firecrawl.dev/app/logs) show a history of your recent API requests, including status, duration, and credits consumed. ## Usage The [Usage](https://www.firecrawl.dev/app/usage) page shows your credit consumption over time and current billing-cycle totals. ## API Keys From the [API Keys](https://www.firecrawl.dev/app/api-keys) page you can create, view, and revoke API keys for your team. Your account must always have at least one active API key, so to revoke a key you will need to create a new one first. ## Settings The [Settings](https://www.firecrawl.dev/app/settings) page has three tabs: * **Team** — Invite members, assign roles, and manage your team. See [Team management & roles](#team-management--roles) below. * **Billing** — View your current plan, invoices, auto-recharge settings, and apply coupons. See also [Billing](/billing). * **Advanced** — Webhook signing secret and team deletion. *** ## Team management & roles Firecrawl lets you invite teammates to collaborate under a shared account. From the **Team** tab in Settings you can invite members, assign roles, and manage your team. ### Roles Every team member is assigned one of two roles: **Admin** or **Member**. You choose the role when sending an invitation. | Capability | Admin | Member | | -------------------------------------------- | :---: | :----: | | **General** | | | | Use the team's API keys and shared resources | ✓ | ✓ | | **Team management** | | | | View the team member list | ✓ | ✓ | | Leave the team | ✓ | ✓ | | Invite new team members | ✓ | ✗ | | Remove team members | ✓ | ✗ | | Change a member's role | ✓ | ✗ | | Revoke pending invitations | ✓ | ✗ | | Edit the team name | ✓ | ✗ | | **Billing** | | | | View invoices and usage | ✓ | ✓ | | Apply credit coupons | ✓ | ✓ | | Manage subscription and billing portal | ✓ | ✗ | | **Settings** | | | | View the webhook signing secret | ✓ | ✓ | | Regenerate the webhook signing secret | ✓ | ✗ | | Delete the team | ✓ | ✗ | In short, **Admins** have full control over team management, billing, and settings, while **Members** can use the team's resources, view usage, and apply coupons but cannot modify the team or subscription. # Scraping Amazon Source: https://docs.firecrawl.dev/developer-guides/common-sites/amazon Extract product data, prices, and reviews from Amazon using Firecrawl Amazon is one of the most scraped e-commerce sites. This guide shows you how to effectively extract product data, pricing, reviews, and search results using Firecrawl's powerful features. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Overview When scraping Amazon, you'll typically want to: * Extract product information (title, price, availability) * Get customer reviews and ratings * Monitor price changes * Search for products programmatically * Track competitor listings ## Scrape with JSON Mode Extract structured product data using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; // Define Zod schema const ProductSchema = z.object({ title: z.string(), price: z.string(), rating: z.number(), availability: z.string(), features: z.array(z.string()) }); const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.amazon.com/dp/B0DZZWMB2L', { formats: [{ type: 'json', schema: z.toJSONSchema(ProductSchema) }], }); // Parse and validate with Zod const jsonData = typeof result.json === 'string' ? JSON.parse(result.json) : result.json; const validated = ProductSchema.parse(jsonData); console.log('✅ Validated product data:'); console.log(validated); ``` ## Search Find products on Amazon. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('gaming laptop site:amazon.com', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single Amazon product page. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.amazon.com/ASUS-ROG-Strix-Gaming-Laptop/dp/B0DZZWMB2L', { formats: ['markdown'], // i.e. html, links, etc. onlyMainContent: true }); console.log(result); ``` ## Map Discover all available URLs on Amazon product or category pages. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from Amazon category or search results. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://www.amazon.com/s?k=mechanical+keyboards', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple Amazon product URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://www.amazon.com/ASUS-ROG-Strix-Gaming-Laptop/dp/B0DZZWMB2L', 'https://www.amazon.com/Razer-Blade-Gaming-Laptop-Lightweight/dp/B0FP47DNFQ', 'https://www.amazon.com/HP-2025-Omen-Gaming-Laptop/dp/B0FL4RMGSH'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Scraping Etsy Source: https://docs.firecrawl.dev/developer-guides/common-sites/etsy Extract handmade products, shop data, and pricing from Etsy marketplace Etsy is a global marketplace for unique and creative goods. This guide shows you how to extract product listings, shop information, reviews, and trending items using Firecrawl. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Overview When scraping Etsy, you'll typically want to: * Extract product listings and variations * Get shop information and ratings * Monitor trending items and categories * Track pricing and sales data * Extract customer reviews ## Scrape with JSON Mode Extract structured listing data using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; // Define Zod schema const ListingSchema = z.object({ title: z.string(), price: z.string(), shopName: z.string(), rating: z.number() }); const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.etsy.com/listing/1844315896/handmade-925-sterling-silver-jewelry-set', { formats: [{ type: 'json', schema: z.toJSONSchema(ListingSchema) }], }); // Parse and validate with Zod const jsonData = typeof result.json === 'string' ? JSON.parse(result.json) : result.json; const validated = ListingSchema.parse(jsonData); console.log('✅ Validated listing data:'); console.log(validated); ``` ## Search Find products on Etsy marketplace. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('handmade jewelry site:etsy.com', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single Etsy product listing. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://www.etsy.com/listing/1844315896/handmade-925-sterling-silver-jewelry-set', { formats: ['markdown'], // i.e. html, links, etc. onlyMainContent: true }); console.log(result); ``` ## Map Discover all available URLs in an Etsy shop or category. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://www.etsy.com/shop/YourShopName'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from Etsy shop or category. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://www.etsy.com/c/jewelry', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple Etsy listing URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://www.etsy.com/listing/1844315896/handmade-925-sterling-silver-jewelry-set', 'https://www.etsy.com/market/handmade_jewelry', 'https://www.etsy.com/market/jewelry_handmade'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Scraping GitHub Source: https://docs.firecrawl.dev/developer-guides/common-sites/github Learn how to scrape GitHub using Firecrawl's core features Learn how to use Firecrawl's core features to scrape GitHub repositories, issues, and documentation. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Scrape with JSON Mode Extract structured data from repositories using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', { formats: [{ type: 'json', schema: z.object({ name: z.string(), description: z.string(), stars: z.number(), forks: z.number(), language: z.string(), topics: z.array(z.string()) }) }] }); console.log(result.json); ``` ## Search Find repositories, issues, or documentation on GitHub. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('machine learning site:github.com', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single GitHub page - repository, issue, or file. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', { formats: ['markdown'] // i.e. html, links, etc. }); console.log(result); ``` ## Map Discover all available URLs in a repository or documentation site. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://github.com/vercel/next.js/tree/canary/docs'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from a repository or documentation. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://github.com/facebook/react/wiki', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple GitHub URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://github.com/vercel/next.js', 'https://github.com/facebook/react', 'https://github.com/microsoft/typescript'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` ## Batch Scrape with JSON Mode Extract structured data from multiple repositories at once. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://github.com/vercel/next.js', 'https://github.com/facebook/react'], { options: { formats: [{ type: 'json', schema: z.object({ name: z.string(), description: z.string(), stars: z.number(), language: z.string() }) }] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Scraping Wikipedia Source: https://docs.firecrawl.dev/developer-guides/common-sites/wikipedia Extract articles, infoboxes, and build knowledge graphs from Wikipedia Learn how to effectively scrape Wikipedia for research, knowledge extraction, and building AI applications. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js zod ``` ## Use Cases * Research automation and fact-checking * Building knowledge graphs * Multi-language content extraction * Educational content aggregation * Entity information extraction ## Scrape with JSON Mode Extract structured data from Wikipedia articles using Zod schemas. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://en.wikipedia.org/wiki/JavaScript', { formats: [{ type: 'json', schema: z.object({ name: z.string(), creator: z.string(), firstAppeared: z.string(), typingDiscipline: z.string(), website: z.string() }) }] }); console.log(result.json); ``` ## Search Find articles on Wikipedia. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const searchResult = await firecrawl.search('quantum computing site:en.wikipedia.org', { limit: 10, sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log(searchResult); ``` ## Scrape Scrape a single Wikipedia article. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const result = await firecrawl.scrape('https://en.wikipedia.org/wiki/Artificial_intelligence', { formats: ['markdown'], // i.e. html, links, etc. onlyMainContent: true }); console.log(result); ``` ## Map Discover all available URLs in a Wikipedia portal or category. Note: Map returns URLs only, without content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const mapResult = await firecrawl.map('https://en.wikipedia.org/wiki/Portal:Computer_science'); console.log(mapResult.links); // Returns array of URLs without content ``` ## Crawl Crawl multiple pages from Wikipedia documentation or categories. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://en.wikipedia.org/wiki/Portal:Artificial_intelligence', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(crawlResult.data); ``` ## Batch Scrape Scrape multiple Wikipedia URLs simultaneously. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://en.wikipedia.org/wiki/Machine_learning', 'https://en.wikipedia.org/wiki/Artificial_intelligence', 'https://en.wikipedia.org/wiki/Deep_learning'], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 } ); console.log(job.status, job.completed, job.total); console.log(job); ``` # Building an AI Research Assistant with Firecrawl and AI SDK Source: https://docs.firecrawl.dev/developer-guides/cookbooks/ai-research-assistant-cookbook Build a complete AI-powered research assistant with web scraping and search capabilities Build a complete AI-powered research assistant that can scrape websites and search the web to answer questions. The assistant automatically decides when to use web scraping or search tools to gather information, then provides comprehensive answers based on collected data. AI research assistant chatbot interface showing real-time web scraping with Firecrawl and conversational responses powered by OpenAI ## What You'll Build An AI chat interface where users can ask questions about any topic. The AI assistant automatically decides when to use web scraping or search tools to gather information, then provides comprehensive answers based on the data it collects. ## Prerequisites * Node.js 18 or later installed * An OpenAI API key from [platform.openai.com](https://platform.openai.com) * A Firecrawl API key from [firecrawl.dev](https://firecrawl.dev) * Basic knowledge of React and Next.js Start by creating a fresh Next.js application and navigating into the project directory: ```bash theme={null} npx create-next-app@latest ai-sdk-firecrawl && cd ai-sdk-firecrawl ``` When prompted, select the following options: * TypeScript: Yes * ESLint: Yes * Tailwind CSS: Yes * App Router: Yes * Use `src/` directory: No * Import alias: Yes (@/\*) ### Install AI SDK Packages The AI SDK is a TypeScript toolkit that provides a unified API for working with different LLM providers: ```bash theme={null} npm i ai @ai-sdk/react zod ``` These packages provide: * `ai`: Core SDK with streaming, tool calling, and response handling * `@ai-sdk/react`: React hooks like `useChat` for building chat interfaces * `zod`: Schema validation for tool inputs Learn more at [ai-sdk.dev/docs](https://ai-sdk.dev/docs). ### Install AI Elements AI Elements provides pre-built UI components for AI applications. Run the following command to scaffold all the necessary components: ```bash theme={null} npx ai-elements@latest ``` This sets up AI Elements in your project, including conversation components, message displays, prompt inputs, and tool call visualizations. Documentation: [ai-sdk.dev/elements/overview](https://ai-sdk.dev/elements/overview). ### Install OpenAI Provider Install the OpenAI provider to connect with OpenAI's models: ```bash theme={null} npm install @ai-sdk/openai ``` Create the main page at `app/page.tsx` and copy the code from the Code tab below. This will be the chat interface where users interact with the AI assistant. AI research assistant chatbot interface showing real-time web scraping with Firecrawl and conversational responses powered by OpenAI ```typescript app/page.tsx theme={null} "use client"; import { Conversation, ConversationContent, ConversationScrollButton, } from "@/components/ai-elements/conversation"; import { PromptInput, PromptInputActionAddAttachments, PromptInputActionMenu, PromptInputActionMenuContent, PromptInputActionMenuTrigger, PromptInputAttachment, PromptInputAttachments, PromptInputBody, PromptInputButton, PromptInputHeader, type PromptInputMessage, PromptInputSelect, PromptInputSelectContent, PromptInputSelectItem, PromptInputSelectTrigger, PromptInputSelectValue, PromptInputSubmit, PromptInputTextarea, PromptInputFooter, PromptInputTools, } from "@/components/ai-elements/prompt-input"; import { MessageResponse, Message, MessageContent, MessageActions, MessageAction, } from "@/components/ai-elements/message"; import { Fragment, useState } from "react"; import { useChat } from "@ai-sdk/react"; import type { ToolUIPart } from "ai"; import { Tool, ToolContent, ToolHeader, ToolInput, ToolOutput, } from "@/components/ai-elements/tool"; import { CopyIcon, GlobeIcon, RefreshCcwIcon } from "lucide-react"; import { Source, Sources, SourcesContent, SourcesTrigger, } from "@/components/ai-elements/sources"; import { Reasoning, ReasoningContent, ReasoningTrigger, } from "@/components/ai-elements/reasoning"; import { Loader } from "@/components/ai-elements/loader"; const models = [ { name: "GPT 5 Mini (Thinking)", value: "gpt-5-mini", }, { name: "GPT 4o Mini", value: "gpt-4o-mini", }, ]; const ChatBotDemo = () => { const [input, setInput] = useState(""); const [model, setModel] = useState(models[0].value); const [webSearch, setWebSearch] = useState(false); const { messages, sendMessage, status, regenerate } = useChat(); const handleSubmit = (message: PromptInputMessage) => { const hasText = Boolean(message.text); const hasAttachments = Boolean(message.files?.length); if (!(hasText || hasAttachments)) { return; } sendMessage( { text: message.text || "Sent with attachments", files: message.files, }, { body: { model: model, webSearch: webSearch, }, } ); setInput(""); }; return (

{messages.map((message) => (
{message.role === "assistant" && message.parts.filter((part) => part.type === "source-url") .length > 0 && ( part.type === "source-url" ).length } /> {message.parts .filter((part) => part.type === "source-url") .map((part, i) => ( ))} )} {message.parts.map((part, i) => { switch (part.type) { case "text": return ( {part.text} {message.role === "assistant" && i === messages.length - 1 && ( regenerate()} label="Retry" > navigator.clipboard.writeText(part.text) } label="Copy" > )} ); case "reasoning": return ( {part.text} ); default: { if (part.type.startsWith("tool-")) { const toolPart = part as ToolUIPart; return ( ); } return null; } } })}
))} {status === "submitted" && }
{(attachment) => } setInput(e.target.value)} value={input} /> setWebSearch(!webSearch)} > Search { setModel(value); }} value={model} > {models.map((model) => ( {model.name} ))}
); }; export default ChatBotDemo; ``` ### Understanding the Frontend The frontend uses AI Elements components to provide a complete chat interface: **Key Features:** * **Conversation Display**: The `Conversation` component automatically handles message scrolling and display * **Message Rendering**: Each message part is rendered based on its type (text, reasoning, tool calls) * **Tool Visualization**: Tool calls are displayed with collapsible sections showing inputs and outputs * **Interactive Controls**: Users can toggle web search, select models, and attach files * **Message Actions**: Copy and retry actions for assistant messages To ensure the markdown from the LLM is correctly rendered, add the following import to your `app/globals.css` file: ```css theme={null} @source "../node_modules/streamdown/dist/index.js"; ``` This imports the necessary styles for rendering markdown content in the message responses. Create the chat API endpoint at `app/api/chat/route.ts`. This route will handle incoming messages and stream responses from the AI. ```typescript theme={null} import { streamText, UIMessage, convertToModelMessages } from "ai"; import { createOpenAI } from "@ai-sdk/openai"; const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY!, }); // Allow streaming responses up to 5 minutes export const maxDuration = 300; export async function POST(req: Request) { const { messages, model, webSearch, }: { messages: UIMessage[]; model: string; webSearch: boolean; } = await req.json(); const result = streamText({ model: openai(model), messages: convertToModelMessages(messages), system: "You are a helpful assistant that can answer questions and help with tasks.", }); // send sources and reasoning back to the client return result.toUIMessageStreamResponse({ sendSources: true, sendReasoning: true, }); } ``` This basic route: * Receives messages from the frontend * Uses the OpenAI model selected by the user * Streams responses back to the client * Doesn't include tools yet - we'll add those next Create a `.env.local` file in your project root: ```bash theme={null} touch .env.local ``` Add your OpenAI API key: ```env theme={null} OPENAI_API_KEY=sk-your-openai-api-key ``` The `OPENAI_API_KEY` is required for the AI model to function. Now you can test the AI SDK chatbot without Firecrawl integration. Start the development server: ```bash theme={null} npm run dev ``` Open [localhost:3000](http://localhost:3000) in your browser and test the basic chat functionality. The assistant should respond to messages, but won't have web scraping or search capabilities yet. Basic AI chatbot without web scraping capabilities Now let's enhance the assistant with web scraping and search capabilities using Firecrawl. ### Install Firecrawl SDK Firecrawl converts websites into LLM-ready formats with scraping and search capabilities: ```bash theme={null} npm i @mendable/firecrawl-js ``` ### Create the Tools File Create a `lib` folder and add a `tools.ts` file inside it: ```bash theme={null} mkdir lib && touch lib/tools.ts ``` Add the following code to define the web scraping and search tools: ```typescript lib/tools.ts theme={null} import FirecrawlApp from "@mendable/firecrawl-js"; import { tool } from "ai"; import { z } from "zod"; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); export const scrapeWebsiteTool = tool({ description: 'Scrape content from any website URL', inputSchema: z.object({ url: z.string().url().describe('The URL to scrape') }), execute: async ({ url }) => { console.log('Scraping:', url); const result = await firecrawl.scrape(url, { formats: ['markdown'], onlyMainContent: true, timeout: 30000 }); console.log('Scraped content preview:', result.markdown?.slice(0, 200) + '...'); return { content: result.markdown }; } }); export const searchWebTool = tool({ description: 'Search the web using Firecrawl', inputSchema: z.object({ query: z.string().describe('The search query'), limit: z.number().optional().describe('Number of results'), location: z.string().optional().describe('Location for localized results'), tbs: z.string().optional().describe('Time filter (qdr:h, qdr:d, qdr:w, qdr:m, qdr:y)'), sources: z.array(z.enum(['web', 'news', 'images'])).optional().describe('Result types'), categories: z.array(z.enum(['github', 'research', 'pdf'])).optional().describe('Filter categories'), }), execute: async ({ query, limit, location, tbs, sources, categories }) => { console.log('Searching:', query); const response = await firecrawl.search(query, { ...(limit && { limit }), ...(location && { location }), ...(tbs && { tbs }), ...(sources && { sources }), ...(categories && { categories }), }) as { web?: Array<{ title?: string; url?: string; description?: string }> }; const results = (response.web || []).map((item) => ({ title: item.title || item.url || 'Untitled', url: item.url || '', description: item.description || '', })); console.log('Search results:', results.length); return { results }; }, }); ``` ### Understanding the Tools **Scrape Website Tool:** * Accepts a URL as input (validated by Zod schema) * Uses Firecrawl's `scrape` method to fetch the page as markdown * Extracts only the main content to reduce token usage * Returns the scraped content for the AI to analyze **Search Web Tool:** * Accepts a search query with optional filters * Uses Firecrawl's `search` method to find relevant web pages * Supports advanced filters like location, time range, and content categories * Returns structured results with titles, URLs, and descriptions Learn more about tools: [ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools). Now update your `app/api/chat/route.ts` to include the Firecrawl tools we just created. ```typescript theme={null} import { streamText, UIMessage, stepCountIs, convertToModelMessages } from "ai"; import { createOpenAI } from "@ai-sdk/openai"; import { scrapeWebsiteTool, searchWebTool } from "@/lib/tools"; const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY!, }); export const maxDuration = 300; export async function POST(req: Request) { const { messages, model, webSearch, }: { messages: UIMessage[]; model: string; webSearch: boolean; } = await req.json(); const result = streamText({ model: openai(model), messages: convertToModelMessages(messages), system: "You are a helpful assistant that can answer questions and help with tasks.", // Add the Firecrawl tools here tools: { scrapeWebsite: scrapeWebsiteTool, searchWeb: searchWebTool, }, stopWhen: stepCountIs(5), toolChoice: webSearch ? "auto" : "none", }); return result.toUIMessageStreamResponse({ sendSources: true, sendReasoning: true, }); } ``` The key changes from the basic route: * Import `stepCountIs` from the AI SDK * Import the Firecrawl tools from `@/lib/tools` * Add the `tools` object with both `scrapeWebsite` and `searchWeb` tools * Add `stopWhen: stepCountIs(5)` to limit execution steps * Set `toolChoice` to "auto" when web search is enabled, "none" otherwise Learn more about `streamText`: [ai-sdk.dev/docs/reference/ai-sdk-core/stream-text](https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text). Update your `.env.local` file to include your Firecrawl API key: ```env theme={null} OPENAI_API_KEY=sk-your-openai-api-key FIRECRAWL_API_KEY=fc-your-firecrawl-api-key ``` Get your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev). Restart your development server: ```bash theme={null} npm run dev ``` AI chatbot with active Firecrawl tools Open [localhost:3000](http://localhost:3000) and test the enhanced assistant: 1. Toggle the "Search" button to enable web search 2. Ask: "What are the latest features from firecrawl.dev?" 3. Watch as the AI calls the `searchWeb` or `scrapeWebsite` tool 4. See the tool execution in the UI with inputs and outputs 5. Read the AI's analysis based on the scraped data ## How It Works ### Message Flow 1. **User sends a message**: The user types a question and clicks submit 2. **Frontend sends request**: `useChat` sends the message to `/api/chat` with the selected model and web search setting 3. **Backend processes message**: The API route receives the message and calls `streamText` 4. **AI decides on tools**: The model analyzes the question and decides whether to use `scrapeWebsite` or `searchWeb` (only if web search is enabled) 5. **Tools execute**: If tools are called, Firecrawl scrapes or searches the web 6. **AI generates response**: The model analyzes tool results and generates a natural language response 7. **Frontend displays results**: The UI shows tool calls and the final response in real-time ### Tool Calling Process The AI SDK's tool calling system ([ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools)) works as follows: 1. The model receives the user's message and available tool descriptions 2. If the model determines a tool is needed, it generates a tool call with parameters 3. The SDK executes the tool function with those parameters 4. The tool result is sent back to the model 5. The model uses the result to generate its final response This all happens automatically within a single `streamText` call, with results streaming to the frontend in real-time. ## Key Features ### Model Selection The application supports multiple OpenAI models: * **GPT-5 Mini (Thinking)**: Recent OpenAI model with advanced reasoning capabilities * **GPT-4o Mini**: Fast and cost-effective model Users can switch between models using the dropdown selector. ### Web Search Toggle The Search button controls whether the AI can use Firecrawl tools: * **Enabled**: AI can call `scrapeWebsite` and `searchWeb` tools as needed * **Disabled**: AI responds only with its training knowledge This gives users control over when to use web data versus the model's built-in knowledge. ## Customization Ideas ### Add More Tools Extend the assistant with additional tools: * Database lookups for internal company data * CRM integration to fetch customer information * Email sending capabilities * Document generation Each tool follows the same pattern: define a schema with Zod, implement the execute function, and register it in the `tools` object. ### Change the AI Model Swap OpenAI for another provider: ```typescript theme={null} import { anthropic } from "@ai-sdk/anthropic"; const result = streamText({ model: anthropic("claude-4.5-sonnet"), // ... rest of config }); ``` The AI SDK supports 20+ providers with the same API. Learn more: [ai-sdk.dev/docs/foundations/providers-and-models](https://ai-sdk.dev/docs/foundations/providers-and-models). ### Customize the UI AI Elements components are built on shadcn/ui, so you can: * Modify component styles in the component files * Add new variants to existing components * Create custom components that match the design system ## Best Practices 1. **Use appropriate tools**: Choose `searchWeb` to find relevant pages first, `scrapeWebsite` for single pages, or let the AI decide 2. **Monitor API usage**: Track your Firecrawl and OpenAI API usage to avoid unexpected costs 3. **Handle errors gracefully**: The tools include error handling, but consider adding user-facing error messages 4. **Optimize performance**: Use streaming to provide immediate feedback and consider caching frequently accessed content 5. **Set reasonable limits**: The `stopWhen: stepCountIs(5)` prevents excessive tool calls and runaway costs *** ## Related Resources Explore the AI SDK for building AI-powered applications with streaming, tool calling, and multi-provider support. Pre-built UI components for AI applications built on shadcn/ui. # Building a Brand Style Guide Generator with Firecrawl Source: https://docs.firecrawl.dev/developer-guides/cookbooks/brand-style-guide-generator-cookbook Generate professional PDF brand style guides by extracting design systems from any website using Firecrawl's branding format Build a brand style guide generator that automatically extracts colors, typography, spacing, and visual identity from any website and compiles it into a professional PDF document. Brand style guide PDF generator using Firecrawl branding format to extract design system ## What You'll Build A Node.js application that takes any website URL, extracts its complete brand identity using Firecrawl's branding format, and generates a polished PDF style guide with: * Color palette with hex values * Typography system (fonts, sizes, weights) * Spacing and layout specifications * Logo and brand imagery * Theme information (light/dark mode) Example of generated brand style guide PDF with colors typography and spacing ## Prerequisites * Node.js 18 or later installed * A Firecrawl API key from [firecrawl.dev](https://firecrawl.dev) * Basic knowledge of TypeScript and Node.js Start by creating a new directory for your project and initializing it: ```bash theme={null} mkdir brand-style-guide-generator && cd brand-style-guide-generator npm init -y ``` Update your `package.json` to use ES modules: ```json package.json theme={null} { "name": "brand-style-guide-generator", "version": "1.0.0", "type": "module", "scripts": { "start": "npx tsx index.ts" } } ``` Install the required packages for web scraping and PDF generation: ```bash theme={null} npm i @mendable/firecrawl-js pdfkit npm i -D typescript tsx @types/node @types/pdfkit ``` These packages provide: * `@mendable/firecrawl-js`: Firecrawl SDK for extracting brand identity from websites * `pdfkit`: PDF document generation library * `tsx`: TypeScript execution for Node.js Create the main application file at `index.ts`. This script extracts brand identity from a URL and generates a professional PDF style guide. ```typescript index.ts theme={null} import Firecrawl from "@mendable/firecrawl-js"; import PDFDocument from "pdfkit"; import fs from "fs"; const API_KEY = "fc-YOUR-API-KEY"; const URL = "https://firecrawl.dev"; async function main() { const fc = new Firecrawl({ apiKey: API_KEY }); const { branding: b } = (await fc.scrape(URL, { formats: ["branding"] })) as any; const doc = new PDFDocument({ size: "A4", margin: 50 }); doc.pipe(fs.createWriteStream("brand-style-guide.pdf")); // Fetch logo (PNG/JPG only) let logoImg: Buffer | null = null; try { const logoUrl = b.images?.favicon || b.images?.ogImage; if (logoUrl?.match(/\.(png|jpg|jpeg)$/i)) { const res = await fetch(logoUrl); logoImg = Buffer.from(await res.arrayBuffer()); } } catch {} // Header with logo doc.rect(0, 0, 595, 120).fill(b.colors?.primary || "#333"); const titleX = logoImg ? 130 : 50; if (logoImg) doc.image(logoImg, 50, 30, { height: 60 }); doc.fontSize(36).fillColor("#fff").text("Brand Style Guide", titleX, 38); doc.fontSize(14).text(URL, titleX, 80); // Colors doc.fontSize(18).fillColor("#333").text("Colors", 50, 160); const colors = Object.entries(b.colors || {}).filter(([, v]) => typeof v === "string" && (v as string).startsWith("#")); colors.forEach(([k, v], i) => { const x = 50 + i * 100; doc.rect(x, 195, 80, 80).fill(v as string); doc.fontSize(10).fillColor("#333").text(k, x, 282, { width: 80, align: "center" }); doc.fontSize(9).fillColor("#888").text(v as string, x, 296, { width: 80, align: "center" }); }); // Typography doc.fontSize(18).fillColor("#333").text("Typography", 50, 340); doc.fontSize(13).fillColor("#444"); doc.text(`Primary Font: ${b.typography?.fontFamilies?.primary || "—"}`, 50, 370); doc.text(`Heading Font: ${b.typography?.fontFamilies?.heading || "—"}`, 50, 392); doc.fontSize(12).fillColor("#666").text("Font Sizes:", 50, 422); Object.entries(b.typography?.fontSizes || {}).forEach(([k, v], i) => { doc.text(`${k.toUpperCase()}: ${v}`, 70, 445 + i * 22); }); // Spacing & Theme doc.fontSize(18).fillColor("#333").text("Spacing & Theme", 320, 340); doc.fontSize(13).fillColor("#444"); doc.text(`Base Unit: ${b.spacing?.baseUnit}px`, 320, 370); doc.text(`Border Radius: ${b.spacing?.borderRadius}`, 320, 392); doc.text(`Color Scheme: ${b.colorScheme}`, 320, 414); doc.end(); console.log("Generated: brand-style-guide.pdf"); } main(); ``` For this simple project, the API key is placed directly in the code. If you plan to push this to GitHub or share it with others, move the key to a `.env` file and use `process.env.FIRECRAWL_API_KEY` instead. Replace `fc-YOUR-API-KEY` with your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev). ### Understanding the Code **Key Components:** * **Firecrawl Branding Format**: The `branding` format extracts comprehensive brand identity including colors, typography, spacing, and images * **PDFKit Document**: Creates a professional A4 PDF with proper margins and sections * **Color Swatches**: Renders visual color blocks with hex values and semantic names * **Typography Display**: Shows font families and sizes in an organized layout * **Spacing & Theme**: Documents the design system's spacing units and color scheme Run the script to generate a brand style guide: ```bash theme={null} npm start ``` The script will: 1. Extract the brand identity from the target URL using Firecrawl's branding format 2. Generate a PDF named `brand-style-guide.pdf` 3. Save it in your project directory To generate a style guide for a different website, simply change the `URL` constant in `index.ts`. ## How It Works ### Extraction Process 1. **URL Input**: The generator receives a target website URL 2. **Firecrawl Scrape**: Calls the `/scrape` endpoint with the `branding` format 3. **Brand Analysis**: Firecrawl analyzes the page's CSS, fonts, and visual elements 4. **Data Return**: Returns a structured `BrandingProfile` object with all design tokens ### PDF Generation Process 1. **Header Creation**: Generates a colored header using the primary brand color 2. **Logo Fetch**: Downloads and embeds the logo or favicon if available 3. **Color Palette**: Renders each color as a visual swatch with metadata 4. **Typography Section**: Documents font families, sizes, and weights 5. **Spacing Info**: Includes base units, border radius, and theme mode ### Branding Profile Structure The [branding format](https://docs.firecrawl.dev/features/scrape#%2Fscrape-with-branding-endpoint) returns detailed brand information: ```typescript theme={null} { colorScheme: "dark" | "light", logo: "https://example.com/logo.svg", colors: { primary: "#FF6B35", secondary: "#004E89", accent: "#F77F00", background: "#1A1A1A", textPrimary: "#FFFFFF", textSecondary: "#B0B0B0" }, typography: { fontFamilies: { primary: "Inter", heading: "Inter", code: "Roboto Mono" }, fontSizes: { h1: "48px", h2: "36px", body: "16px" }, fontWeights: { regular: 400, medium: 500, bold: 700 } }, spacing: { baseUnit: 8, borderRadius: "8px" }, images: { logo: "https://example.com/logo.svg", favicon: "https://example.com/favicon.ico" } } ``` ## Key Features ### Automatic Color Extraction The generator identifies and categorizes all brand colors: * **Primary & Secondary**: Main brand colors * **Accent**: Highlight and CTA colors * **Background & Text**: UI foundation colors * **Semantic Colors**: Success, warning, error states ### Typography Documentation Captures the complete type system: * **Font Families**: Primary, heading, and monospace fonts * **Size Scale**: All heading and body text sizes * **Font Weights**: Available weight variations ### Visual Output The PDF includes: * Color-coded header matching the brand * Embedded logo when available * Professional layout with clear hierarchy * Metadata footer with generation date Side-by-side comparison of original website and generated brand style guide PDF ## Customization Ideas ### Add Component Documentation Extend the generator to include UI component styles: ```typescript theme={null} // Add after the Spacing & Theme section if (b.components) { doc.addPage(); doc.fontSize(20).fillColor("#333").text("UI Components", 50, 50); // Document button styles if (b.components.buttonPrimary) { const btn = b.components.buttonPrimary; doc.fontSize(14).text("Primary Button", 50, 90); doc.rect(50, 110, 120, 40).fill(btn.background); doc.fontSize(12).fillColor(btn.textColor).text("Button", 50, 120, { width: 120, align: "center" }); } } ``` ### Export Multiple Formats Add JSON export alongside the PDF: ```typescript theme={null} // Add before doc.end() fs.writeFileSync("brand-data.json", JSON.stringify(b, null, 2)); ``` ### Batch Processing Generate guides for multiple websites: ```typescript theme={null} const websites = [ "https://stripe.com", "https://linear.app", "https://vercel.com" ]; for (const site of websites) { const { branding } = await fc.scrape(site, { formats: ["branding"] }) as any; // Generate PDF for each site... } ``` ### Custom PDF Themes Create different PDF styles based on the extracted theme: ```typescript theme={null} const isDarkMode = b.colorScheme === "dark"; const headerBg = isDarkMode ? b.colors?.background : b.colors?.primary; const textColor = isDarkMode ? "#fff" : "#333"; ``` ## Best Practices 1. **Handle Missing Data**: Not all websites expose complete branding information. Always provide fallback values for missing properties. 2. **Respect Rate Limits**: When batch processing multiple sites, add delays between requests to respect Firecrawl's rate limits. 3. **Cache Results**: Store extracted branding data to avoid repeated API calls for the same site. 4. **Image Format Handling**: Some logos may be in formats PDFKit doesn't support (like SVG). Consider adding format conversion or graceful fallbacks. 5. **Error Handling**: Wrap the generation process in try-catch blocks and provide meaningful error messages. *** ## Related Resources Learn more about the branding format and all available properties you can extract. Complete API reference for the scrape endpoint with all format options. Learn more about PDFKit for advanced PDF customization options. Process multiple URLs efficiently with batch scraping. # Full-Stack Templates Source: https://docs.firecrawl.dev/developer-guides/examples Explore real-world examples and tutorials for Firecrawl ## Official Examples Open Lovable **Use Case**: Build a RAG-powered chatbot with live web data * [GitHub Repo](https://github.com/firecrawl/open-lovable) Demo **Use Case**: Build and deploy AI agents with web scraping capabilities * [GitHub Repo](https://github.com/firecrawl/open-agent-builder) Fireplexity **Use Case**: AI search engine with real-time citations, streaming responses, and live data * [GitHub Repo](https://github.com/firecrawl/fireplexity) FireGEO Demo **Use Case**: SaaS starter with brand monitoring, authentication, and billing * [GitHub Repo](https://github.com/firecrawl/firegeo) Fire Enrich **Use Case**: AI-powered data enrichment tool that transforms emails into rich datasets * [GitHub Repo](https://github.com/firecrawl/fire-enrich) Firesearch **Use Case**: AI-powered deep research tool with validated answers and citations * [GitHub Repo](https://github.com/firecrawl/firesearch) Firestarter **Use Case**: Instantly create AI chatbots for any website with RAG-powered search * [GitHub Repo](https://github.com/firecrawl/firestarter) AI Ready Website **Use Case**: Transform any website into AI-ready structured data * [GitHub Repo](https://github.com/firecrawl/ai-ready-website) Open Researcher **Use Case**: AI-powered research assistant that gathers and analyzes web data * [GitHub Repo](https://github.com/firecrawl/open-researcher) # Anthropic Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/anthropic Use Firecrawl with Claude for web scraping + AI workflows Integrate Firecrawl with Claude to build AI applications powered by web data. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js @anthropic-ai/sdk zod zod-to-json-schema ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key ANTHROPIC_API_KEY=your_anthropic_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Summarize This example demonstrates a simple workflow: scrape a website and summarize the content using Claude. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import Anthropic from '@anthropic-ai/sdk'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const message = await anthropic.messages.create({ model: 'claude-haiku-4-5', max_tokens: 1024, messages: [ { role: 'user', content: `Summarize in 100 words: ${scrapeResult.markdown}` } ] }); console.log('Response:', message); ``` ## Tool Use This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { Anthropic } from '@anthropic-ai/sdk'; import { z } from 'zod'; import { zodToJsonSchema } from 'zod-to-json-schema'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const ScrapeArgsSchema = z.object({ url: z.string() }); console.log("Sending user message to Claude and requesting tool use if necessary..."); const response = await anthropic.messages.create({ model: 'claude-haiku-4-5', max_tokens: 1024, tools: [{ name: 'scrape_website', description: 'Scrape and extract markdown content from a website URL', input_schema: zodToJsonSchema(ScrapeArgsSchema, 'ScrapeArgsSchema') as any }], messages: [{ role: 'user', content: 'What is Firecrawl? Check firecrawl.dev' }] }); const toolUse = response.content.find(block => block.type === 'tool_use'); if (toolUse && toolUse.type === 'tool_use') { const input = toolUse.input as { url: string }; console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`); const result = await firecrawl.scrape(input.url, { formats: ['markdown'] }); console.log(`Scraped content preview: ${result.markdown?.substring(0, 300)}...`); // Continue with the conversation or process the scraped content as needed } ``` ## Structured Extraction This example demonstrates how to use Claude to extract structured data from scraped website content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import Anthropic from '@anthropic-ai/sdk'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const CompanyInfoSchema = z.object({ name: z.string(), industry: z.string().optional(), description: z.string().optional() }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'], onlyMainContent: true }); const prompt = `Extract company information from this website content. Output ONLY valid JSON in this exact format (no markdown, no explanation): { "name": "Company Name", "industry": "Industry", "description": "One sentence description" } Website content: ${scrapeResult.markdown}`; const message = await anthropic.messages.create({ model: 'claude-haiku-4-5', max_tokens: 1024, messages: [ { role: 'user', content: prompt }, { role: 'assistant', content: '{' } ] }); const textBlock = message.content.find(block => block.type === 'text'); if (textBlock && textBlock.type === 'text') { const jsonText = '{' + textBlock.text; const companyInfo = CompanyInfoSchema.parse(JSON.parse(jsonText)); console.log(companyInfo); } ``` For more examples, check the [Claude documentation](https://docs.anthropic.com/claude/docs). # ElevenAgents Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/elevenagents Give ElevenLabs voice and chat agents real-time web access with Firecrawl Give your [ElevenAgents](https://elevenlabs.io/agents) voice and chat agents the ability to scrape, search, and crawl the web in real time using Firecrawl. This guide covers two integration paths: 1. **MCP server** — connect the hosted Firecrawl MCP server for zero-code setup. 2. **Server webhook tool** — point a custom tool at Firecrawl's REST API for full control over requests. ## Prerequisites * An [ElevenLabs](https://elevenlabs.io) account with access to ElevenAgents * A Firecrawl API key from your [firecrawl.dev dashboard](https://firecrawl.dev) ## Option 1: Firecrawl MCP Server The fastest way to give an agent web access. ElevenAgents supports remote MCP servers, and Firecrawl provides a hosted MCP endpoint. ### Add the MCP server 1. Open the [Integrations page](https://elevenlabs.io/app/agents/integrations) in ElevenLabs and click **+ Add integration**. 2. Select **Custom MCP Server** from the integration library. 3. Fill in the following fields: | Field | Value | | --------------- | ------------------------------------------------------------ | | **Name** | Firecrawl | | **Description** | Search, scrape, crawl, and extract content from any website. | | **Server type** | Streamable HTTP | | **Server URL** | `https://mcp.firecrawl.dev/YOUR_FIRECRAWL_API_KEY/v2/mcp` | Replace `YOUR_FIRECRAWL_API_KEY` with your actual key. Leave the **Type** dropdown set to **Value**. Treat this URL as a secret — it contains your API key. You must select **Streamable HTTP** as the server type. The default SSE option does not work with the Firecrawl MCP endpoint. 4. Under **Tool Approval Mode**, choose an approval level: * **No Approval** — the agent uses tools freely. Fine for read-only scraping. * **Fine-Grained Tool Approval** — lets you pre-select which tools can run automatically and which require approval. Good for controlling expensive crawl operations. * **Always Ask** (default) — the agent requests permission before every tool call. 5. Check **I trust this server**, then click **Add Server**. ElevenLabs will connect to the server and list the available tools (scrape, search, crawl, map, and more). ### Attach it to an agent 1. Create or open an agent in the [ElevenAgents dashboard](https://elevenlabs.io/app/agents/agents). 2. Go to the **Tools** tab, then select the **MCP** sub-tab. 3. Click **Add server** and select the **Firecrawl** integration from the dropdown. ### Update the system prompt In the **Agent** tab, add instructions to the **System prompt** so the agent knows when to use Firecrawl. For example: ```text theme={null} You are a helpful research assistant. When the user asks about a website, a company, or any topic that requires up-to-date information, use the Firecrawl tools to search the web or scrape the relevant page, then summarize the results. ``` ### Test it Click **Preview** in the top navigation bar. You can test using the text chat input or by starting a voice call. Try a prompt like: > "What does firecrawl.dev do? Go to the site and summarize it for me." The agent will call the Firecrawl MCP `scrape` tool, receive the page markdown, and respond with a summary. *** ## Option 2: Server Webhook Tool Use this approach when you need precise control over request parameters (formats, headers, timeouts, etc.) or want to call a specific Firecrawl endpoint without exposing the full MCP tool set. ### Scrape tool Create a tool that scrapes a single URL and returns its content as markdown. 1. Open your agent and go to the **Tools** tab. 2. Click **Add tool** and select **Webhook**. 3. Configure the tool: | Field | Value | | --------------- | ---------------------------------------------------------- | | **Name** | scrape\_website | | **Description** | Scrape content from a URL and return it as clean markdown. | | **Method** | POST | | **URL** | `https://api.firecrawl.dev/v2/scrape` | The **Method** field defaults to GET — make sure to change it to **POST**. 4. Scroll to the **Headers** section and click **Add header** for authentication: | Header | Value | | --------------- | ------------------------------- | | `Authorization` | `Bearer YOUR_FIRECRAWL_API_KEY` | Alternatively, if you have workspace auth connections configured, you can use the **Authentication** dropdown instead. 5. Add a **body parameter**: | Parameter | Type | Description | Required | | --------- | ------ | ----------------- | -------- | | `url` | string | The URL to scrape | Yes | 6. Click **Add tool**. The Firecrawl API returns the page content as markdown by default. The agent receives the JSON response and can use the `markdown` field to answer questions. ### Search tool Create a tool that searches the web and returns results with scraped content. 1. Click **Add tool** → **Webhook** again and configure: | Field | Value | | --------------- | ------------------------------------------------------------------------- | | **Name** | search\_web | | **Description** | Search the web for a query and return relevant results with page content. | | **Method** | POST | | **URL** | `https://api.firecrawl.dev/v2/search` | 2. Add the same `Authorization` header as above. 3. Add **body parameters**: | Parameter | Type | Description | Required | | --------- | ------ | ----------------------------------------------- | -------- | | `query` | string | The search query | Yes | | `limit` | number | Maximum number of results to return (default 5) | No | 4. Click **Add tool**. ### Update the system prompt In the **Agent** tab, update the **System prompt**: ```text theme={null} You are a knowledgeable assistant with access to web tools. - Use `scrape_website` when the user gives you a specific URL to read. - Use `search_web` when the user asks a general question that requires finding information online. Always summarize the information concisely and cite the source URL. ``` ### Test it Click **Preview** and try asking: > "Search for the latest Next.js features and give me a summary." The agent will call `search_web`, receive results from Firecrawl, and respond with a summary of the findings. *** ## Tips * **Model selection** — For reliable tool calling, use a high-intelligence model such as GPT-4o, Claude Sonnet 4.5 or later, or Gemini 2.5 Flash. Smaller models may struggle to extract the correct parameters. * **Keep prompts specific** — Tell the agent exactly when to use each tool. Vague instructions lead to missed or incorrect tool calls. * **Limit response size** — For voice agents, long scraped pages can overwhelm the LLM context. Use `onlyMainContent: true` in scrape options (or instruct the agent to summarize aggressively) to keep responses concise. * **Tool call sounds** — In the webhook or MCP tool settings, you can configure a **Tool call sound** to play ambient audio while a tool runs. This signals to the user that the agent is working. ## Resources * [ElevenAgents documentation](https://elevenlabs.io/docs/eleven-agents/overview) * [ElevenAgents tools overview](https://elevenlabs.io/docs/eleven-agents/customization/tools) * [ElevenAgents MCP integration](https://elevenlabs.io/docs/eleven-agents/customization/tools/mcp) * [Firecrawl API reference](https://docs.firecrawl.dev/api-reference/v2-introduction) * [Firecrawl MCP server](https://docs.firecrawl.dev/mcp-server) # Gemini Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/gemini Use Firecrawl with Google's Gemini AI for web scraping + AI workflows Integrate Firecrawl with Google's Gemini for AI applications powered by web data. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js @google/genai ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key GEMINI_API_KEY=your_gemini_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Summarize This example demonstrates a simple workflow: scrape a website and summarize the content using Gemini. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { GoogleGenAI } from '@google/genai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const response = await ai.models.generateContent({ model: 'gemini-2.5-flash', contents: `Summarize: ${scrapeResult.markdown}`, }); console.log('Summary:', response.text); ``` ## Content Analysis This example shows how to analyze website content using Gemini's multi-turn conversation capabilities. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { GoogleGenAI } from '@google/genai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://news.ycombinator.com/', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const chat = ai.chats.create({ model: 'gemini-2.5-flash' }); // Ask for the top 3 stories on Hacker News const result1 = await chat.sendMessage({ message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${scrapeResult.markdown}` }); console.log('Top 3 Stories:', result1.text); // Ask for the 4th and 5th stories on Hacker News const result2 = await chat.sendMessage({ message: `Now, what are the 4th and 5th top stories on Hacker News from the same content?` }); console.log('4th and 5th Stories:', result2.text); ``` ## Structured Extraction This example demonstrates how to extract structured data using Gemini's JSON mode from scraped website content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { GoogleGenAI, Type } from '@google/genai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const response = await ai.models.generateContent({ model: 'gemini-2.5-flash', contents: `Extract company information: ${scrapeResult.markdown}`, config: { responseMimeType: 'application/json', responseSchema: { type: Type.OBJECT, properties: { name: { type: Type.STRING }, industry: { type: Type.STRING }, description: { type: Type.STRING }, products: { type: Type.ARRAY, items: { type: Type.STRING } } }, propertyOrdering: ['name', 'industry', 'description', 'products'] } } }); console.log('Extracted company info:', response?.text); ``` For more examples, check the [Gemini documentation](https://ai.google.dev/docs). # Agent Development Kit (ADK) Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/google-adk Integrate Firecrawl with Google's ADK using MCP for advanced agent workflows Integrate Firecrawl with Google's Agent Development Kit (ADK) to build powerful AI agents with web scraping capabilities through the Model Context Protocol (MCP). ## Overview Firecrawl provides an MCP server that seamlessly integrates with Google's ADK, enabling your agents to efficiently scrape, crawl, and extract structured data from any website. The integration supports both cloud-based and self-hosted Firecrawl instances with streamable HTTP for optimal performance. ## Features * Efficient web scraping, crawling, and content discovery from any website * Advanced search capabilities and intelligent content extraction * Deep research and high-volume batch scraping * Flexible deployment (cloud-based or self-hosted) * Optimized for modern web environments with streamable HTTP support ## Prerequisites * Obtain an API key for Firecrawl from [firecrawl.dev](https://firecrawl.dev) * Install Google ADK ## Setup ```python Remote MCP Server theme={null} from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset FIRECRAWL_API_KEY = "YOUR-API-KEY" root_agent = Agent( model="gemini-2.5-pro", name="firecrawl_agent", description='A helpful assistant for scraping websites with Firecrawl', instruction='Help the user search for website content', tools=[ MCPToolset( connection_params=StreamableHTTPServerParams( url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp", ), ) ], ) ``` ```python Local MCP Server theme={null} from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset from mcp import StdioServerParameters root_agent = Agent( model='gemini-2.5-pro', name='firecrawl_agent', description='A helpful assistant for scraping websites with Firecrawl', instruction='Help the user search for website content', tools=[ MCPToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='npx', args=[ "-y", "firecrawl-mcp", ], env={ "FIRECRAWL_API_KEY": "YOUR-API-KEY", } ), timeout=30, ), ) ], ) ``` ## Available Tools | Tool | Name | Description | | ------------------ | ------------------------------ | ------------------------------------------------------------------------------------ | | Scrape Tool | `firecrawl_scrape` | Scrape content from a single URL with advanced options | | Batch Scrape Tool | `firecrawl_batch_scrape` | Scrape multiple URLs efficiently with built-in rate limiting and parallel processing | | Check Batch Status | `firecrawl_check_batch_status` | Check the status of a batch operation | | Map Tool | `firecrawl_map` | Map a website to discover all indexed URLs on the site | | Search Tool | `firecrawl_search` | Search the web and optionally extract content from search results | | Crawl Tool | `firecrawl_crawl` | Start an asynchronous crawl with advanced options | | Check Crawl Status | `firecrawl_check_crawl_status` | Check the status of a crawl job | | Extract Tool | `firecrawl_extract` | Extract structured information from web pages using LLM capabilities | ## Configuration ### Required Configuration **FIRECRAWL\_API\_KEY**: Your Firecrawl API key * Required when using cloud API (default) * Optional when using self-hosted instance with FIRECRAWL\_API\_URL ### Optional Configuration **Firecrawl API URL (for self-hosted instances)**: * `FIRECRAWL_API_URL`: Custom API endpoint * Example: `https://firecrawl.your-domain.com` * If not provided, the cloud API will be used **Retry configuration**: * `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum retry attempts (default: 3) * `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds (default: 1000) * `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds (default: 10000) * `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2) **Credit usage monitoring**: * `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Warning threshold (default: 1000) * `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Critical threshold (default: 100) ## Example: Web Research Agent ```python theme={null} from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset FIRECRAWL_API_KEY = "YOUR-API-KEY" # Create a research agent research_agent = Agent( model="gemini-2.5-pro", name="research_agent", description='An AI agent that researches topics by scraping and analyzing web content', instruction='''You are a research assistant. When given a topic or question: 1. Use the search tool to find relevant websites 2. Scrape the most relevant pages for detailed information 3. Extract structured data when needed 4. Provide comprehensive, well-sourced answers''', tools=[ MCPToolset( connection_params=StreamableHTTPServerParams( url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp", ), ) ], ) # Use the agent response = research_agent.run("What are the latest features in Python 3.13?") print(response) ``` ## Best Practices 1. **Use the right tool for the job**: * `firecrawl_search` when you need to find relevant pages first * `firecrawl_scrape` for single pages * `firecrawl_batch_scrape` for multiple known URLs * `firecrawl_crawl` for discovering and scraping entire sites 2. **Monitor your usage**: Configure credit thresholds to avoid unexpected usage 3. **Handle errors gracefully**: Configure retry settings based on your use case 4. **Optimize performance**: Use batch operations when scraping multiple URLs *** ## Related Resources Learn how to build powerful multi-agent AI systems using Google's ADK framework with Firecrawl for web scraping capabilities. Learn more about Firecrawl's Model Context Protocol (MCP) server integration and capabilities. Explore the official Google Agent Development Kit documentation for comprehensive guides and API references. # LangChain Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langchain Use Firecrawl with LangChain for web scraping + AI workflows Integrate Firecrawl with LangChain to build AI applications powered by web data. ## Setup ```bash theme={null} npm install @langchain/openai @mendable/firecrawl-js ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Chat This example demonstrates a simple workflow: scrape a website and process the content using LangChain. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { HumanMessage } from '@langchain/core/messages'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const chat = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const response = await chat.invoke([ new HumanMessage(`Summarize: ${scrapeResult.markdown}`) ]); console.log('Summary:', response.content); ``` ## Chains This example shows how to build a LangChain chain to process and analyze scraped content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { ChatPromptTemplate } from '@langchain/core/prompts'; import { StringOutputParser } from '@langchain/core/output_parsers'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const model = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); // Create processing chain const prompt = ChatPromptTemplate.fromMessages([ ['system', 'You are an expert at analyzing company websites.'], ['user', 'Extract the company name and main products from: {content}'] ]); const chain = prompt.pipe(model).pipe(new StringOutputParser()); // Execute the chain const result = await chain.invoke({ content: scrapeResult.markdown }); console.log('Chain result:', result); ``` ## Tool Calling This example demonstrates how to use LangChain's tool calling feature to let the model decide when to scrape websites. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { DynamicStructuredTool } from '@langchain/core/tools'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Create the scraping tool const scrapeWebsiteTool = new DynamicStructuredTool({ name: 'scrape_website', description: 'Scrape content from any website URL', schema: z.object({ url: z.string().url().describe('The URL to scrape') }), func: async ({ url }) => { console.log('Scraping:', url); const result = await firecrawl.scrape(url, { formats: ['markdown'] }); console.log('Scraped content preview:', result.markdown?.substring(0, 200) + '...'); return result.markdown || 'No content scraped'; } }); const model = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }).bindTools([scrapeWebsiteTool]); const response = await model.invoke('What is Firecrawl? Visit firecrawl.dev and tell me about it.'); console.log('Response:', response.content); console.log('Tool calls:', response.tool_calls); ``` ## Structured Data Extraction This example shows how to extract structured data using LangChain's structured output feature. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const CompanyInfoSchema = z.object({ name: z.string(), industry: z.string(), description: z.string(), products: z.array(z.string()) }); const model = new ChatOpenAI({ model: 'gpt-5-nano', apiKey: process.env.OPENAI_API_KEY }).withStructuredOutput(CompanyInfoSchema); const companyInfo = await model.invoke([ { role: 'system', content: 'Extract company information from website content.' }, { role: 'user', content: `Extract data: ${scrapeResult.markdown}` } ]); console.log('Extracted company info:', companyInfo); ``` For more examples, check the [LangChain documentation](https://js.langchain.com/docs). # LangGraph Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langgraph Integrate Firecrawl with LangGraph for building agent workflows This guide shows how to integrate Firecrawl with LangGraph to build AI agent workflows that can scrape and process web content. ## Setup ```bash theme={null} npm install @langchain/langgraph @langchain/openai @mendable/firecrawl-js ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Basic Workflow This example demonstrates a basic LangGraph workflow that scrapes a website and analyzes the content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { StateGraph, MessagesAnnotation, START, END } from '@langchain/langgraph'; // Initialize Firecrawl const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); // Initialize LLM const llm = new ChatOpenAI({ model: "gpt-5-nano", apiKey: process.env.OPENAI_API_KEY }); // Define the scrape node async function scrapeNode(state: typeof MessagesAnnotation.State) { console.log('Scraping...'); const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); return { messages: [{ role: "system", content: `Scraped content: ${result.markdown}` }] }; } // Define the analyze node async function analyzeNode(state: typeof MessagesAnnotation.State) { console.log('Analyzing...'); const response = await llm.invoke(state.messages); return { messages: [response] }; } // Build the graph const graph = new StateGraph(MessagesAnnotation) .addNode("scrape", scrapeNode) .addNode("analyze", analyzeNode) .addEdge(START, "scrape") .addEdge("scrape", "analyze") .addEdge("analyze", END); // Compile the graph const app = graph.compile(); // Run the workflow const result = await app.invoke({ messages: [{ role: "user", content: "Summarize the website" }] }); console.log(JSON.stringify(result, null, 2)); ``` ## Multi-Step Workflow This example demonstrates a more complex workflow that scrapes multiple URLs and processes them. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import { ChatOpenAI } from '@langchain/openai'; import { StateGraph, Annotation, START, END } from '@langchain/langgraph'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const llm = new ChatOpenAI({ model: "gpt-5-nano", apiKey: process.env.OPENAI_API_KEY }); // Define custom state const WorkflowState = Annotation.Root({ urls: Annotation(), scrapedData: Annotation>(), summary: Annotation() }); // Scrape multiple URLs async function scrapeMultiple(state: typeof WorkflowState.State) { const scrapedData = []; for (const url of state.urls) { const result = await firecrawl.scrape(url, { formats: ['markdown'] }); scrapedData.push({ url, content: result.markdown || '' }); } return { scrapedData }; } // Summarize all scraped content async function summarizeAll(state: typeof WorkflowState.State) { const combinedContent = state.scrapedData .map(item => `Content from ${item.url}:\n${item.content}`) .join('\n\n'); const response = await llm.invoke([ { role: "user", content: `Summarize these websites:\n${combinedContent}` } ]); return { summary: response.content as string }; } // Build the workflow graph const workflow = new StateGraph(WorkflowState) .addNode("scrape", scrapeMultiple) .addNode("summarize", summarizeAll) .addEdge(START, "scrape") .addEdge("scrape", "summarize") .addEdge("summarize", END); const app = workflow.compile(); // Execute workflow const result = await app.invoke({ urls: ["https://firecrawl.dev", "https://firecrawl.dev/pricing"] }); console.log(result.summary); ``` For more examples, check the [LangGraph documentation](https://langchain-ai.github.io/langgraphjs/). # LlamaIndex Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/llamaindex Use Firecrawl with LlamaIndex for RAG applications Integrate Firecrawl with LlamaIndex to build AI applications with vector search and embeddings powered by web content. ## Setup ```bash theme={null} npm install llamaindex @llamaindex/openai @mendable/firecrawl-js ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## RAG with Vector Search This example demonstrates how to use LlamaIndex with Firecrawl to crawl a website, create embeddings, and query the content using RAG. ```typescript theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { Document, VectorStoreIndex, Settings } from 'llamaindex'; import { OpenAI, OpenAIEmbedding } from '@llamaindex/openai'; Settings.llm = new OpenAI({ model: "gpt-4o" }); Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" }); const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawlResult = await firecrawl.crawl('https://firecrawl.dev', { limit: 10, scrapeOptions: { formats: ['markdown'] } }); console.log(`Crawled ${crawlResult.data.length } pages`); const documents = crawlResult.data.map((page: any, i: number) => new Document({ text: page.markdown, id_: `page-${i}`, metadata: { url: page.metadata?.sourceURL } }) ); const index = await VectorStoreIndex.fromDocuments(documents); console.log('Vector index created with embeddings'); const queryEngine = index.asQueryEngine(); const response = await queryEngine.query({ query: 'What is Firecrawl and how does it work?' }); console.log('\nAnswer:', response.toString()); ``` For more examples, check the [LlamaIndex documentation](https://ts.llamaindex.ai/). # Mastra Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/mastra Use Firecrawl with Mastra for building AI workflows Integrate Firecrawl with Mastra, the TypeScript framework for building AI agents and workflows. ## Setup ```bash theme={null} npm install @mastra/core @mendable/firecrawl-js zod ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Multi-Step Workflow This example demonstrates a complete workflow that searches, scrapes, and summarizes documentation using Firecrawl and Mastra. ```typescript theme={null} import { createWorkflow, createStep } from "@mastra/core/workflows"; import { z } from "zod"; import Firecrawl from "@mendable/firecrawl-js"; import { Agent } from "@mastra/core/agent"; const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY || "fc-YOUR_API_KEY" }); const agent = new Agent({ name: "summarizer", instructions: "You are a helpful assistant that creates concise summaries of documentation.", model: "openai/gpt-5-nano", }); // Step 1: Search with Firecrawl SDK const searchStep = createStep({ id: "search", inputSchema: z.object({ query: z.string(), }), outputSchema: z.object({ url: z.string(), title: z.string(), }), execute: async ({ inputData }: { inputData: { query: string } }) => { console.log(`Searching: ${inputData.query}`); const searchResults = await firecrawl.search(inputData.query, { limit: 1 }); const webResults = (searchResults as any)?.web; if (!webResults || !Array.isArray(webResults) || webResults.length === 0) { throw new Error("No search results found"); } const firstResult = webResults[0]; console.log(`Found: ${firstResult.title}`); return { url: firstResult.url, title: firstResult.title, }; }, }); // Step 2: Scrape the URL with Firecrawl SDK const scrapeStep = createStep({ id: "scrape", inputSchema: z.object({ url: z.string(), title: z.string(), }), outputSchema: z.object({ markdown: z.string(), title: z.string(), }), execute: async ({ inputData }: { inputData: { url: string; title: string } }) => { console.log(`Scraping: ${inputData.url}`); const scrapeResult = await firecrawl.scrape(inputData.url, { formats: ["markdown"], }); console.log(`Scraped: ${scrapeResult.markdown?.length || 0} characters`); return { markdown: scrapeResult.markdown || "", title: inputData.title, }; }, }); // Step 3: Summarize with Claude const summarizeStep = createStep({ id: "summarize", inputSchema: z.object({ markdown: z.string(), title: z.string(), }), outputSchema: z.object({ summary: z.string(), }), execute: async ({ inputData }: { inputData: { markdown: string; title: string } }) => { console.log(`Summarizing: ${inputData.title}`); const prompt = `Summarize the following documentation in 2-3 sentences:\n\nTitle: ${inputData.title}\n\n${inputData.markdown}`; const result = await agent.generate(prompt); console.log(`Summary generated`); return { summary: result.text }; }, }); // Create workflow export const workflow = createWorkflow({ id: "firecrawl-workflow", inputSchema: z.object({ query: z.string(), }), outputSchema: z.object({ summary: z.string(), }), steps: [searchStep, scrapeStep, summarizeStep], }) .then(searchStep) .then(scrapeStep) .then(summarizeStep) .commit(); async function testWorkflow() { const run = await workflow.createRunAsync(); const result = await run.start({ inputData: { query: "Firecrawl documentation" } }); if (result.status === "success") { const { summarize } = result.steps; if (summarize.status === "success") { console.log(`\n${summarize.output.summary}`); } } else { console.error("Workflow failed:", result.status); } } testWorkflow().catch(console.error); ``` For more examples, check the [Mastra documentation](https://mastra.ai/docs). # OpenAI Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/openai Use Firecrawl with OpenAI for web scraping + AI workflows Integrate Firecrawl with OpenAI to build AI applications powered by web data. ## Setup ```bash theme={null} npm install @mendable/firecrawl-js openai zod ``` Create `.env` file: ```bash theme={null} FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` > **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code. ## Scrape + Summarize This example demonstrates a simple workflow: scrape a website and summarize the content using an OpenAI model. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); // Scrape the website content const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); // Summarize with OpenAI model const completion = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [ { role: 'user', content: `Summarize: ${scrapeResult.markdown}` } ] }); console.log('Summary:', completion.choices[0]?.message.content); ``` ## Function Calling This example shows how to use OpenAI's function calling feature to let the model decide when to scrape websites based on user requests. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const ScrapeArgsSchema = z.object({ url: z.string().describe('The URL of the website to scrape') }); const tools = [{ type: 'function' as const, function: { name: 'scrape_website', description: 'Scrape content from any website URL', parameters: z.toJSONSchema(ScrapeArgsSchema) } }]; const response = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [{ role: 'user', content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.' }], tools }); const message = response.choices[0]?.message; if (message?.tool_calls && message.tool_calls.length > 0) { for (const toolCall of message.tool_calls) { if (toolCall.type === 'function') { console.log('Tool called:', toolCall.function.name); const args = ScrapeArgsSchema.parse(JSON.parse(toolCall.function.arguments)); const result = await firecrawl.scrape(args.url, { formats: ['markdown'] // Other formats: html, links, etc. }); console.log('Scraped content:', result.markdown?.substring(0, 200) + '...'); // Send the scraped content back to the model for final response const finalResponse = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [ { role: 'user', content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.' }, message, { role: 'tool', tool_call_id: toolCall.id, content: result.markdown || 'No content scraped' } ], tools }); console.log('Final response:', finalResponse.choices[0]?.message?.content); } } } else { console.log('Direct response:', message?.content); } ``` ## Structured Data Extraction This example demonstrates how to use OpenAI models with structured outputs to extract specific data from scraped content. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; import { z } from 'zod'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const scrapeResult = await firecrawl.scrape('https://stripe.com', { formats: ['markdown'] }); console.log('Scraped content length:', scrapeResult.markdown?.length); const CompanyInfoSchema = z.object({ name: z.string(), industry: z.string(), description: z.string(), products: z.array(z.string()) }); const response = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [ { role: 'system', content: 'Extract company information from website content.' }, { role: 'user', content: `Extract data: ${scrapeResult.markdown}` } ], response_format: { type: 'json_schema', json_schema: { name: 'company_info', schema: z.toJSONSchema(CompanyInfoSchema), strict: true } } }); const content = response.choices[0]?.message?.content; const companyInfo = content ? CompanyInfoSchema.parse(JSON.parse(content)) : null; console.log('Validated company info:', companyInfo); ``` ## Search + Analyze This example combines Firecrawl's search capabilities with OpenAI model analysis to find and summarize information from multiple sources. ```typescript theme={null} import FirecrawlApp from '@mendable/firecrawl-js'; import OpenAI from 'openai'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); // Search for relevant information const searchResult = await firecrawl.search('Next.js 16 new features', { limit: 3, sources: [{ type: 'web' }], // Other sources: { type: 'news' }, { type: 'images' } scrapeOptions: { formats: ['markdown'] } }); console.log('Search results:', searchResult.web?.length, 'pages found'); // Analyze and summarize the key features const analysis = await openai.chat.completions.create({ model: 'gpt-5-nano', messages: [{ role: 'user', content: `Summarize the key features: ${JSON.stringify(searchResult)}` }] }); console.log('Analysis:', analysis.choices[0]?.message?.content); ``` ## Responses API with MCP This example shows how to use OpenAI's Responses API with Firecrawl configured as an MCP (Model Context Protocol) server. ```typescript theme={null} import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await openai.responses.create({ model: 'gpt-5-nano', tools: [ { type: 'mcp', server_label: 'firecrawl', server_description: 'A web search and scraping MCP server to scrape and extract content from websites.', server_url: `https://mcp.firecrawl.dev/${process.env.FIRECRAWL_API_KEY}/v2/mcp`, require_approval: 'never' } ], input: 'Find out what the top stories on Hacker News are and the latest blog post on OpenAI and summarize them in a bullet point format' }); console.log('Response:', JSON.stringify(response.output, null, 2)); ``` For more examples, check the [OpenAI documentation](https://platform.openai.com/docs). # Vercel AI SDK Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/vercel-ai-sdk Firecrawl tools for Vercel AI SDK. Web scraping, search, interact, and crawling for AI applications. Firecrawl tools for the Vercel AI SDK. Search, scrape, interact with pages, and crawl the web in your AI applications. ## Install ```bash theme={null} npm install firecrawl-aisdk ai ``` Set environment variables: ```bash theme={null} FIRECRAWL_API_KEY=fc-your-key # https://firecrawl.dev AI_GATEWAY_API_KEY=your-key # https://vercel.com/ai-gateway ``` These examples use the [Vercel AI Gateway](https://vercel.com/ai-gateway) string model format, but Firecrawl tools work with any AI SDK provider. You can also use provider imports like `anthropic('claude-sonnet-4-5-20250514')` from `@ai-sdk/anthropic`. ## Quick Start `FirecrawlTools()` bundles `search`, `scrape`, and `interact` by default. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { FirecrawlTools } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', tools: FirecrawlTools(), stopWhen: stepCountIs(30), prompt: ` 1. Use interact on Hacker News to identify the top story 2. Search for other perspectives on the same topic 3. Scrape the most relevant pages you found 4. Summarize everything you found `, }); ``` ## FirecrawlTools `FirecrawlTools()` gives you the default tools plus an auto-generated `systemPrompt` you can pass to `generateText`. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { FirecrawlTools } from 'firecrawl-aisdk'; const tools = FirecrawlTools(); const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', system: `${tools.systemPrompt}\n\nAnswer with citations when possible.`, tools, stopWhen: stepCountIs(20), prompt: 'Find the current Firecrawl pricing page and explain the available plans.', }); ``` You can customize defaults, opt into async tools, or disable individual tools: ```typescript theme={null} const tools = FirecrawlTools({ search: { limit: 5 }, scrape: { formats: ['markdown'], onlyMainContent: true }, interact: { profile: { name: 'my-session', saveChanges: true } }, crawl: true, agent: true, }); ``` ```typescript theme={null} // Disable interact, keep search + scrape FirecrawlTools({ interact: false }); // Opt into deprecated browser compatibility FirecrawlTools({ browser: {} }); // Include every available tool FirecrawlTools({ all: true }); ``` When scraping to answer a question about a page, prefer query format: ```typescript theme={null} formats: [{ type: 'query', prompt: 'What does this page say about pricing and rate limits?' }] ``` Use `formats: ['markdown']` only when you need the full page content. ## Individual Tools Every tool can be used directly or called with options: ```typescript theme={null} import { generateText } from 'ai'; import { scrape, search } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Search for Firecrawl, then scrape the most relevant result.', tools: { search, scrape }, }); const customScrape = scrape({ apiKey: 'fc-custom-key', apiUrl: 'https://api.firecrawl.dev' }); ``` ### Search + Scrape ```typescript theme={null} import { generateText } from 'ai'; import { search, scrape } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Search for Firecrawl, scrape the top official result, and explain what it does.', tools: { search, scrape }, }); ``` ### Map ```typescript theme={null} import { generateText } from 'ai'; import { map } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Map https://docs.firecrawl.dev and list the main sections.', tools: { map }, }); ``` ### Stream ```typescript theme={null} import { streamText, stepCountIs } from 'ai'; import { scrape } from 'firecrawl-aisdk'; const result = streamText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'What is the first 100 words of firecrawl.dev?', tools: { scrape }, stopWhen: stepCountIs(3), }); for await (const chunk of result.textStream) { process.stdout.write(chunk); } await result.fullStream; ``` ## Interact `interact()` creates a scrape-backed interactive session. Call `start(url)` to bootstrap a session and get a live view URL, then let the model reuse that session through the `interact` tool. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { interact, search } from 'firecrawl-aisdk'; const interactTool = interact(); console.log('Live view:', await interactTool.start('https://news.ycombinator.com')); const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', tools: { interact: interactTool, search }, stopWhen: stepCountIs(25), prompt: 'Use interact on the current Hacker News session, find the top story, then search for more context.', }); await interactTool.close(); ``` If you need the explicit live view URL after startup, use `interactTool.interactiveLiveViewUrl`. Reuse browser state across sessions with profiles: ```typescript theme={null} const interactTool = interact({ profile: { name: 'my-session', saveChanges: true }, }); ``` `browser()` is deprecated. Prefer `interact()`. ## Async Tools Crawl, batch scrape, and agent return a job ID. Pair them with `poll`. ### Crawl ```typescript theme={null} import { generateText } from 'ai'; import { crawl, poll } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Crawl https://docs.firecrawl.dev (limit 3 pages) and summarize.', tools: { crawl, poll }, }); ``` ### Batch Scrape ```typescript theme={null} import { generateText } from 'ai'; import { batchScrape, poll } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Scrape https://firecrawl.dev and https://docs.firecrawl.dev, then compare them.', tools: { batchScrape, poll }, }); ``` ### Agent Autonomous web data gathering that searches, navigates, and extracts on its own. ```typescript theme={null} import { generateText, stepCountIs } from 'ai'; import { agent, poll } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Find the founders of Firecrawl, their roles, and their backgrounds.', tools: { agent, poll }, stopWhen: stepCountIs(10), }); ``` ## Logging ```typescript theme={null} import { generateText } from 'ai'; import { logStep, scrape, stepLogger } from 'firecrawl-aisdk'; const logger = stepLogger(); const { text, usage } = await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Scrape https://firecrawl.dev and summarize it.', tools: { scrape }, onStepFinish: logger.onStep, experimental_onToolCallFinish: logger.onToolCallFinish, }); logger.close(); logger.summary(usage); await generateText({ model: 'anthropic/claude-sonnet-4-5', prompt: 'Scrape https://firecrawl.dev and summarize it again.', tools: { scrape }, onStepFinish: logStep, }); ``` ## All Exports ```typescript theme={null} import { // Core tools search, // Search the web scrape, // Scrape a single URL map, // Discover URLs on a site crawl, // Crawl multiple pages (async, use poll) batchScrape, // Scrape multiple URLs (async, use poll) agent, // Autonomous web research (async, use poll) // Job management poll, // Poll async jobs for results status, // Check job status cancel, // Cancel running jobs // Browser/session tools interact, // interact({ profile: { name: '...' } }) browser, // deprecated compatibility export // All-in-one bundle FirecrawlTools, // FirecrawlTools({ search, scrape, interact, crawl, agent }) // Helpers stepLogger, // Token stats per tool call logStep, // Simple one-liner logging } from 'firecrawl-aisdk'; ``` # MCP Web Search & Scrape in ChatGPT Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/chatgpt Add web scraping and search to ChatGPT in 2 minutes MCP support in ChatGPT is currently a beta feature. The interface and availability may change as OpenAI continues development. **Availability:** Developer mode with MCP connectors is not available on the Free tier and requires a paid ChatGPT subscription. Availability and features vary by plan and region. See [OpenAI's documentation on Developer Mode](https://help.openai.com/en/articles/12584461-developer-mode-apps-and-full-mcp-connectors-in-chatgpt-beta) for current availability and setup instructions. Add web scraping and search capabilities to ChatGPT with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) and copy your API key. ### 2. Enable Developer Mode Open ChatGPT settings by clicking your username in the bottom left corner, or navigate directly to [chatgpt.com/#settings](https://chatgpt.com/#settings). In the settings modal, scroll to the bottom and select **Advanced Settings**. Toggle **Developer mode** to ON. ChatGPT settings showing Advanced Settings with Developer mode toggle ### 3. Create the Connector With Developer mode enabled, go to the **Apps & Connectors** tab in the same settings modal. Click the **Create** button in the top right corner. Apps & Connectors tab with Create button highlighted Fill in the connector details: * **Name:** `Firecrawl MCP` * **Description:** `Web scraping, crawling, search, and content extraction` (optional) * **MCP Server URL:** `https://mcp.firecrawl.dev/YOUR_API_KEY_HERE/v2/mcp` * **Authentication:** `None` Replace `YOUR_API_KEY_HERE` in the URL with your actual [Firecrawl API key](https://www.firecrawl.dev/app/api-keys). New Connector form filled out with Firecrawl MCP details Check the **"I understand and want to continue"** checkbox, then click **Create**. ### 4. Verify Setup Go back to the main ChatGPT interface. You should see **Developer mode** displayed, indicating that MCP connectors are active. If you do not see Developer mode, reload the page. If it still does not appear, open settings again and verify that Developer mode is toggled ON under Advanced Settings. ### 5. Access Firecrawl Tools To use Firecrawl in a conversation, click the **+** button in the chat input, then select **More** and choose **Firecrawl MCP**. ChatGPT chat input showing + menu expanded with More submenu and Firecrawl MCP option ## Quick Demo With Firecrawl MCP selected, try these prompts: **Search:** ``` Search for the latest React Server Components updates ``` **Scrape:** ``` Scrape firecrawl.dev and tell me what it does ``` **Get docs:** ``` Scrape the Vercel documentation for edge functions and summarize it ``` ## Tool Confirmation When ChatGPT uses the Firecrawl MCP tools, you may see a confirmation prompt asking for your approval. Some ChatGPT Desktop versions auto-approve tool calls without showing this dialog. If no prompt appears, you can verify the tool was invoked by checking for a "Called tool" section in the response or reviewing your usage at [firecrawl.dev/app/usage](https://www.firecrawl.dev/app/usage). **No confirmation prompt appearing?** If ChatGPT answers your question without showing a confirmation dialog, the Firecrawl MCP connector is most likely not attached to the conversation. Go back to [Step 5](#5-access-firecrawl-tools) and make sure you click the **+** button, select **More**, and choose **Firecrawl MCP** before sending your prompt. The connector must be attached to each new conversation. ChatGPT tool confirmation dialog showing Firecrawl MCP request You can check **"Remember for this conversation"** to avoid repeated confirmations during the same chat session. This security measure is implemented by OpenAI to ensure MCP tools do not perform unintended actions. Once confirmed, ChatGPT will execute the request and return the results. Example of Firecrawl search results displayed in ChatGPT # MCP Web Search & Scrape in Claude.ai Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/claude-ai Add web scraping and search to Claude.ai (Co-work) in 2 minutes Add web scraping and search capabilities to Claude.ai with Firecrawl MCP using custom connectors. Looking for Claude Code setup? See the [Claude Code guide](/developer-guides/mcp-setup-guides/claude-code) instead. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) and copy your API key. ### 2. Add Custom Connector Go to [Settings > Connectors](https://claude.ai/settings/connectors) in Claude.ai and click **Add custom connector**. Fill in the connector details: * **URL:** `https://mcp.firecrawl.dev/YOUR_API_KEY/v2/mcp` * **OAuth Client ID:** Leave blank * **OAuth Client Secret:** Leave blank Replace `YOUR_API_KEY` in the URL with your actual [Firecrawl API key](https://www.firecrawl.dev/app/api-keys). Your API key is embedded directly in the URL, so no additional authentication fields are needed. Click **Add** to save the connector. ### 3. Enable in Conversation In any Claude.ai conversation, click the **+** button at the bottom left, go to **Connectors**, and enable the Firecrawl connector. ## Quick Demo With the Firecrawl connector enabled, try these prompts: **Search the web:** ``` Search for the latest Next.js 15 features ``` **Scrape a page:** ``` Scrape firecrawl.dev and tell me what it does ``` **Get documentation:** ``` Find and scrape the Stripe API docs for payment intents ``` Claude will automatically use Firecrawl's search and scrape tools to get the information. # MCP Web Search & Scrape in Claude Code Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/claude-code Add web scraping and search to Claude Code in 2 minutes Add web scraping and search capabilities to Claude Code with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Add Firecrawl MCP Server **Option A: Remote hosted URL (recommended)** ```bash theme={null} claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp ``` **Option B: Local (npx)** ```bash theme={null} claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp ``` Replace `your-api-key` with your actual Firecrawl API key. Done! You can now search and scrape the web from Claude Code. ## Quick Demo Try these in Claude Code: **Search the web:** ``` Search for the latest Next.js 15 features ``` **Scrape a page:** ``` Scrape firecrawl.dev and tell me what it does ``` **Get documentation:** ``` Find and scrape the Stripe API docs for payment intents ``` Claude will automatically use Firecrawl's search and scrape tools to get the information. # MCP Web Search & Scrape in Cursor Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/cursor Add web scraping and search to Cursor in 2 minutes Add web scraping and search capabilities to Cursor with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Add to Cursor Open Settings (`Cmd+,`), search for "MCP", and add: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "your_api_key_here" } } } } ``` Replace `your_api_key_here` with your actual Firecrawl API key. ### 3. Restart Cursor Done! You can now search and scrape the web from Cursor. ## Quick Demo Try these in Cursor Chat (`Cmd+K`): **Search:** ``` Search for TypeScript best practices 2025 ``` **Scrape:** ``` Scrape firecrawl.dev and tell me what it does ``` **Get docs:** ``` Scrape the React hooks documentation and explain useEffect ``` Cursor will automatically use Firecrawl tools. ## Windows Troubleshooting If you see a `spawn npx ENOENT` or "No server info found" error on Windows, Cursor cannot find `npx` in your PATH. Try one of these fixes: **Option A: Use the full path to `npx.cmd`** Run `where npx` in Command Prompt to get the full path, then update your config: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "C:\\Program Files\\nodejs\\npx.cmd", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "your_api_key_here" } } } } ``` Replace the `command` path with the output from `where npx`. **Option B: Use the remote hosted URL (no Node.js required)** ```json theme={null} { "mcpServers": { "firecrawl": { "url": "https://mcp.firecrawl.dev/YOUR-API-KEY/v2/mcp" } } } ``` Replace `YOUR-API-KEY` with your Firecrawl API key. # MCP Web Search & Scrape in Factory AI Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/factory-ai Add web scraping and search to Factory AI in 2 minutes Add web scraping and search capabilities to Factory AI with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Install Factory AI CLI Install the [Factory AI CLI](https://docs.factory.ai/cli/getting-started/quickstart) if you haven't already: **macOS/Linux:** ```bash theme={null} curl -fsSL https://app.factory.ai/cli | sh ``` **Windows:** ```powershell theme={null} iwr https://app.factory.ai/cli/install.ps1 -useb | iex ``` ### 3. Add Firecrawl MCP Server In the Factory droid CLI, add Firecrawl using the `/mcp add` command: ```bash theme={null} /mcp add firecrawl "npx -y firecrawl-mcp" -e FIRECRAWL_API_KEY=your-api-key-here ``` Replace `your-api-key-here` with your actual Firecrawl API key. ### 4. Done! The Firecrawl tools are now available in your Factory AI session! ## Quick Demo Try these in Factory AI: **Search the web:** ``` Search for the latest Next.js 15 features ``` **Scrape a page:** ``` Scrape firecrawl.dev and tell me what it does ``` **Get documentation:** ``` Find and scrape the Stripe API docs for payment intents ``` Factory will automatically use Firecrawl's search and scrape tools to get the information. # MCP Web Search & Scrape in Windsurf Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/windsurf Add web scraping and search to Windsurf in 2 minutes Add web scraping and search capabilities to Windsurf with Firecrawl MCP. ## Quick Setup ### 1. Get Your API Key Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key. ### 2. Add to Windsurf Add this to your `./codeium/windsurf/model_config.json`: ```json theme={null} { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` Replace `YOUR_API_KEY` with your actual Firecrawl API key. ### 3. Restart Windsurf Done! Windsurf can now search and scrape the web. ## Quick Demo Try these in Windsurf: **Search:** ``` Search for the latest Tailwind CSS features ``` **Scrape:** ``` Scrape firecrawl.dev and explain what it does ``` **Get docs:** ``` Find and scrape the Supabase authentication documentation ``` Windsurf's AI agents will automatically use Firecrawl tools. # OpenClaw Source: https://docs.firecrawl.dev/developer-guides/openclaw Use Firecrawl with OpenClaw to give your agents web scraping, search, and browser automation capabilities. Integrate Firecrawl with OpenClaw to give your agents the ability to scrape, search, crawl, extract, and interact with the web — all through the [Firecrawl CLI](/sdks/cli). ## Why Firecrawl + OpenClaw * **No local browser needed** — every session runs in a remote, isolated sandbox. No Chromium installs, no driver conflicts, no RAM pressure on your machine. * **Real parallelism** — run many browser sessions at once without local resource fights. Agents can browse in batches across multiple sites simultaneously. * **Secure by default** — navigation, DOM evaluation, and extraction all happen inside disposable sandboxes, not on your workstation. * **Better token economics** — agents get back clean artifacts (snapshots, extracted fields) instead of hauling giant DOMs and driver logs into the context window. * **Full web toolkit** — scrape, search, and browser automation all through a single CLI that your agent already knows how to use. ## Setup Tell your agent to install the Firecrawl CLI, authenticate and initialize the skill with this command: ```bash theme={null} npx -y firecrawl-cli init --browser --all ``` * `--all` installs the Firecrawl skill to every detected AI coding agent * `--browser` opens the browser for Firecrawl authentication automatically or install everything seperately: ```bash theme={null} npm install -g firecrawl-cli firecrawl init skills firecrawl login --browser # Or, skip the browser and provide your API key directly: export FIRECRAWL_API_KEY="fc-YOUR-API-KEY" ``` Verify everything is set up correctly: ```bash theme={null} firecrawl --status ``` Once the skill is installed, your OpenClaw agent automatically discovers and uses Firecrawl commands — no extra configuration needed. ## Scrape Scrape a single page and extract its content: ```bash theme={null} firecrawl https://example.com --only-main-content ``` Get specific formats: ```bash theme={null} firecrawl https://example.com --format markdown,links --pretty ``` ## Search Search the web and optionally scrape the results: ```bash theme={null} firecrawl search "latest AI funding rounds 2025" --limit 10 # Search and scrape the results firecrawl search "OpenClaw documentation" --scrape --scrape-formats markdown ``` ## Browser Launch a remote browser session for interactive automation. Each session runs in an isolated sandbox — no local Chromium install required. `agent-browser` is pre-installed with 40+ commands. ```bash theme={null} # Shorthand: auto-launches a session if none is active firecrawl browser "open https://news.ycombinator.com" firecrawl browser "snapshot" firecrawl browser "scrape" firecrawl browser close ``` Interact with page elements using refs from the snapshot: ```bash theme={null} firecrawl browser "open https://example.com" firecrawl browser "snapshot" # snapshot returns @ref IDs — use them to interact firecrawl browser "click @e5" firecrawl browser "fill @e3 'search query'" firecrawl browser "scrape" firecrawl browser close ``` The shorthand form (`firecrawl browser "..."`) sends commands to `agent-browser` automatically and auto-launches a sandbox session if there isn't one active. Your agent issues intent-level actions (`open`, `click`, `fill`, `snapshot`, `scrape`) instead of writing Playwright code. ## Example: tell your agent Here are some prompts you can give your OpenClaw agent: * *"Use Firecrawl to scrape [https://example.com](https://example.com) and summarize the main content."* * *"Search for the latest OpenAI news and give me a summary of the top 5 results."* * *"Use Firecrawl Browser to open Hacker News, get the top 5 stories, and the first 10 comments on each."* * *"Crawl the docs at [https://docs.firecrawl.dev](https://docs.firecrawl.dev) and save them to a file."* ## Further reading * [Firecrawl CLI reference](/sdks/cli) * [Browser Sandbox docs](/features/browser) * [Agent docs](/features/agent) # Choosing the Data Extractor Source: https://docs.firecrawl.dev/developer-guides/usage-guides/choosing-the-data-extractor Compare /agent, /extract, and /scrape (JSON mode) to pick the right tool for structured data extraction Firecrawl offers three approaches for extracting structured data from web pages. Each serves different use cases with varying levels of automation and control. ## Quick Comparison | Feature | `/agent` | `/extract` | `/scrape` (JSON mode) | | ------------------- | -------------------------------------- | ------------------------------------------ | ---------------------------- | | **Status** | Active | Use `/agent` instead | Active | | **URL Required** | No (optional) | Yes (wildcards supported) | Yes (single URL) | | **Scope** | Web-wide discovery | Multiple pages/domains | Single page | | **URL Discovery** | Autonomous web search | Crawls from given URLs | None | | **Processing** | Asynchronous | Asynchronous | Synchronous | | **Schema Required** | No (prompt or schema) | No (prompt or schema) | No (prompt or schema) | | **Pricing** | Dynamic (5 free runs/day) | Token-based (1 credit = 15 tokens) | 1 credit/page | | **Best For** | Research, discovery, complex gathering | Multi-page extraction (when you know URLs) | Known single-page extraction | ## 1. `/agent` Endpoint The `/agent` endpoint is Firecrawl's most advanced offering—the successor to `/extract`. It uses AI agents to autonomously search, navigate, and gather data from across the web. ### Key Characteristics * **URLs Optional**: Just describe what you need via `prompt`; URLs are completely optional * **Autonomous Navigation**: The agent searches and navigates deep into sites to find your data * **Deep Web Search**: Autonomously discovers information across multiple domains and pages * **Parallel Processing**: Processes multiple sources simultaneously for faster results * **Models Available**: `spark-1-mini` (default, 60% cheaper) and `spark-1-pro` (higher accuracy) ### Example ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") background: Optional[str] = Field(None, description="Professional background") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema, model="spark-1-mini", max_credits=100 ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ prompt: "Find the founders of Firecrawl", schema: z.object({ founders: z.array(z.object({ name: z.string().describe("Full name of the founder"), role: z.string().describe("Role or position").optional(), background: z.string().describe("Professional background").optional() })).describe("List of founders") }), model: "spark-1-mini", maxCredits: 100 }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the founders of Firecrawl", "model": "spark-1-mini", "maxCredits": 100, "schema": { "type": "object", "properties": { "founders": { "type": "array", "description": "List of founders", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "Full name" }, "role": { "type": "string", "description": "Role or position" }, "background": { "type": "string", "description": "Professional background" } }, "required": ["name"] } } }, "required": ["founders"] } }' ``` ### Best Use Case: Autonomous Research & Discovery **Scenario**: You need to find information about AI startups that raised Series A funding, including their founders and funding amounts. **Why `/agent`**: You don't know which websites contain this information. The agent will autonomously search the web, navigate to relevant sources (Crunchbase, news sites, company pages), and compile the structured data for you. For more details, see the [Agent documentation](/features/agent). *** ## 2. `/extract` Endpoint **Use `/agent` instead**: We recommend migrating to [`/agent`](/features/agent)—it's faster, more reliable, doesn't require URLs, and handles all `/extract` use cases plus more. The `/extract` endpoint collects structured data from specified URLs or entire domains using LLM-powered extraction. ### Key Characteristics * **URLs Typically Required**: Provide at least one URL (supports wildcards like `example.com/*`) * **Domain Crawling**: Can crawl and parse all URLs discovered in a domain * **Web Search Enhancement**: Optional `enableWebSearch` to follow links outside specified domains * **Schema Optional**: Supports strict JSON schema OR natural language prompts * **Async Processing**: Returns job ID for status checking ### The URL Limitation The fundamental challenge with `/extract` is that you typically need to know URLs upfront: 1. **Discovery gap**: For tasks like "find YC W24 companies," you don't know which URLs contain the data. You'd need a separate search step before calling `/extract`. 2. **Awkward web search**: While `enableWebSearch` exists, it's constrained to start from URLs you provide—an awkward workflow for discovery tasks. 3. **Why `/agent` was created**: `/extract` is good at extracting from known locations, but less effective at discovering where data lives. ### Example ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") schema = { "type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"], } res = firecrawl.extract( urls=["https://docs.firecrawl.dev"], prompt="Extract the page description", schema=schema, ) print(res.data["description"]) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const schema = { type: 'object', properties: { title: { type: 'string' } }, required: ['title'] }; const res = await firecrawl.extract({ urls: ['https://docs.firecrawl.dev'], prompt: 'Extract the page title', schema, scrapeOptions: { formats: [{ type: 'json', prompt: 'Extract', schema }] } }); console.log(res.status || res.success, res.data); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/extract" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://docs.firecrawl.dev"], "prompt": "Extract the page title", "schema": { "type": "object", "properties": {"title": {"type": "string"}}, "required": ["title"] }, "scrapeOptions": { "formats": [{"type": "json", "prompt": "Extract", "schema": {"type": "object"}}] } }' ``` ### Best Use Case: Targeted Multi-Page Extraction **Scenario**: You have your competitor's documentation URL and want to extract all their API endpoints from `docs.competitor.com/*`. **Why `/extract` worked here**: You knew the exact domain. But even then, `/agent` with URLs provided would typically give better results today. For more details, see the [Extract documentation](/features/extract). *** ## 3. `/scrape` Endpoint with JSON Mode The `/scrape` endpoint with JSON mode is the most controlled approach—it extracts structured data from a single known URL using an LLM to parse the page content into your specified schema. ### Key Characteristics * **Single URL Only**: Designed for extracting data from one specific page at a time * **Exact URL Required**: You must know the precise URL containing the data * **Schema Optional**: Can use JSON schema OR just a prompt (LLM chooses structure) * **Synchronous**: Returns data immediately (no job polling needed) * **Additional Formats**: Can combine JSON extraction with markdown, HTML, screenshots in one request ### Example ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel app = Firecrawl(api_key="fc-YOUR-API-KEY") class CompanyInfo(BaseModel): company_mission: str supports_sso: bool is_open_source: bool is_in_yc: bool result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "schema": CompanyInfo.model_json_schema() }], only_main_content=False, timeout=120000 ) print(result) ``` ```js Node theme={null} import Firecrawl from "@mendable/firecrawl-js"; import { z } from "zod"; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Define schema to extract contents into const schema = z.object({ company_mission: z.string(), supports_sso: z.boolean(), is_open_source: z.boolean(), is_in_yc: z.boolean() }); const result = await app.scrape("https://firecrawl.dev", { formats: [{ type: "json", schema: schema }], }); console.log(result); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] }' ``` ### Best Use Case: Single-Page Precision Extraction **Scenario**: You're building a price monitoring tool and need to extract the price, stock status, and product details from a specific product page you already have the URL for. **Why `/scrape` with JSON mode**: You know exactly which page contains the data, need precise single-page extraction, and want synchronous results without job management overhead. For more details, see the [JSON mode documentation](/features/llm-extract). *** ## Decision Guide **Do you know the exact URL(s) containing your data?** * **NO** → Use `/agent` (autonomous web discovery) * **YES** * **Single page?** → Use `/scrape` with JSON mode * **Multiple pages?** → Use `/agent` with URLs (or batch `/scrape`) ### Recommendations by Scenario | Scenario | Recommended Endpoint | | -------------------------------------------------- | ------------------------------- | | "Find all AI startups and their funding" | `/agent` | | "Extract data from this specific product page" | `/scrape` (JSON mode) | | "Get all blog posts from competitor.com" | `/agent` with URL | | "Monitor prices across multiple known URLs" | `/scrape` with batch processing | | "Research companies in a specific industry" | `/agent` | | "Extract contact info from 50 known company pages" | `/scrape` with batch processing | *** ## Pricing | Endpoint | Cost | Notes | | --------------------- | ---------------------------------- | ------------------------------------- | | `/scrape` (JSON mode) | 1 credit/page | Fixed, predictable | | `/extract` | Token-based (1 credit = 15 tokens) | Variable based on content | | `/agent` | Dynamic | 5 free runs/day; varies by complexity | ### Example: "Find the founders of Firecrawl" | Endpoint | How It Works | Credits Used | | ---------- | ----------------------------------------------- | ---------------------- | | `/scrape` | You find the URL manually, then scrape 1 page | \~1 credit | | `/extract` | You provide URL(s), it extracts structured data | Variable (token-based) | | `/agent` | Just send the prompt—agent finds and extracts | \~100–500 credits | **Tradeoff**: `/scrape` is cheapest but requires you to know the URL. `/agent` costs more but handles discovery automatically. For detailed pricing, see [Firecrawl Pricing](https://firecrawl.dev/pricing). *** ## Migration: `/extract` → `/agent` If you're currently using `/extract`, migration is straightforward: **Before (extract):** ```python theme={null} result = app.extract( urls=["https://example.com/*"], prompt="Extract product information", schema=schema ) ``` **After (agent):** ```python theme={null} result = app.agent( urls=["https://example.com"], # Optional - can omit entirely prompt="Extract product information from example.com", schema=schema, model="spark-1-mini" # or "spark-1-pro" for higher accuracy ) ``` The key advantage: with `/agent`, you can drop the URLs entirely and just describe what you need. *** ## Key Takeaways 1. **Know the exact URL?** Use `/scrape` with JSON mode—it's the cheapest (1 credit/page), fastest (synchronous), and most predictable option. 2. **Need autonomous research?** Use `/agent`—it handles discovery automatically with 5 free runs/day, then dynamic pricing based on complexity. 3. **Migrate from `/extract`** to `/agent` for new projects—`/agent` is the successor with better capabilities. 4. **Cost vs. convenience tradeoff**: `/scrape` is most cost-effective when you know your URLs; `/agent` costs more but eliminates manual URL discovery. *** ## Further Reading * [Agent documentation](/features/agent) * [Agent models](/features/models) * [JSON mode documentation](/features/llm-extract) * [Extract documentation](/features/extract) * [Batch scraping](/features/batch-scrape) # Firecrawl + Dify Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/dify Official plugin for Firecrawl + Dify AI workflow automation **Official Dify Plugin:** [marketplace.dify.ai/plugins/langgenius/firecrawl](https://marketplace.dify.ai/plugins/langgenius/firecrawl) Official plugin by Dify team • 44,000+ installs • Chatflow & Agent apps • Free to use ## Dify Integration Overview Dify is an open-source LLM app development platform. The official Firecrawl plugin enables web crawling and scraping directly in your AI workflows. Build visual pipelines with Firecrawl nodes for data extraction Give AI agents the power to scrape live web data on demand ## Firecrawl Tools in Dify Convert any URL into clean, structured data. Transform raw HTML into actionable insights. **Use Cases:** Extract product data, scrape article content, get structured data with JSON mode. Perform recursive crawls of websites and subdomains to gather extensive content. **Use Cases:** Full site content extraction, documentation scraping, multi-page data collection. Generate a complete map of all URLs present on a website. **Use Cases:** Site structure analysis, SEO auditing, URL discovery for batch scraping. Retrieve scraping results based on a Job ID or cancel ongoing tasks. **Use Cases:** Monitor long-running crawls, manage async scraping workflows, cancel operations when needed. ## Getting Started Access the [Dify Plugin Marketplace](https://marketplace.dify.ai/plugins/langgenius/firecrawl) and install the Firecrawl tool Visit [Firecrawl API Keys](https://www.firecrawl.dev/app/api-keys) and create a new API key Navigate to **Plugins > Firecrawl > To Authorize** and input your API key Drag Firecrawl tools into your Chatflow, Workflow, or Agent application Set up parameters and test your workflow ## Usage Patterns **Visual Pipeline Integration** 1. Add Firecrawl node to your pipeline 2. Select action (Map, Crawl, Scrape) 3. Define input variables 4. Execute pipeline sequentially **Example Flow:** ``` User Input → Firecrawl (Scrape) → LLM Processing → Response ``` **Automated Data Processing** Build multi-step workflows with: * Scheduled scraping * Data transformation * Database storage * Notifications **Example Flow:** ``` Schedule Trigger → Firecrawl (Crawl) → Data Processing → Storage ``` **AI-Powered Web Access** Give agents real-time web scraping capabilities: 1. Add Firecrawl tool to Agent 2. Agent autonomously decides when to scrape 3. LLM analyzes extracted content 4. Agent provides informed responses **Use Case:** Customer support agents that reference live documentation ## Common Use Cases Build RAG-powered chatbots that scrape and reference live website content Agents that research topics by scraping and analyzing multiple sources Automated workflows that track competitor websites and alert on changes Extract and enrich data from websites into structured databases ## Firecrawl Actions | Tool | Description | Best For | | ------------- | ------------------------------ | ----------------------- | | **Scrape** | Single-page data extraction | Quick content capture | | **Crawl** | Multi-page recursive crawling | Full site extraction | | **Map** | URL discovery and site mapping | SEO analysis, URL lists | | **Crawl Job** | Async job management | Long-running operations | ## Best Practices * Let agents decide when to scrape * Use natural language instructions * Enable tool calling in LLM settings * Monitor token usage with large scrapes * Use Map before Crawl for large sites * Set appropriate crawl limits * Add error handling nodes * Test with small datasets first ## Dify vs Other Platforms | Feature | Dify | Make | Zapier | n8n | | --------------- | -------------------- | ------------------- | ------------------- | ------------------- | | **Type** | LLM app platform | Workflow automation | Workflow automation | Workflow automation | | **Best For** | AI agents & chatbots | Visual workflows | Quick automation | Developer control | | **Pricing** | Open-source + Cloud | Operations-based | Per-task | Flat monthly | | **AI-Native** | Yes | Partial | Partial | Partial | | **Self-Hosted** | Yes | No | No | Yes | **Pro Tip:** Dify excels at building AI-native applications where agents need dynamic web access. Perfect for chatbots, research assistants, and AI tools that need live data. # Firecrawl + Make Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/make Official integration and workflow automation for Firecrawl + Make **Official Make Integration:** [make.com/en/integrations/firecrawl](https://www.make.com/en/integrations/firecrawl) Connect with 3,000+ apps • Visual workflow builder • Enterprise-grade automation • AI-powered scenarios ## Make Integration Overview Make (formerly Integromat) provides a verified, officially supported Firecrawl integration maintained by Mendable. Design complex automations with Make's intuitive visual interface Scale securely with enterprise-grade automation and controls ## Firecrawl Modules in Make ### Crawl a Website Crawl a URL and get its content from multiple pages *** ### Extract a Website Extract structured data from pages using LLMs *** ### Scrape a Website Scrape a URL and get its content from a single page *** ### Map a Website Map multiple URLs based on options *** ### Search a Website Web search with Firecrawl's scraping capabilities *** ### Get Crawl Status Get the status of a given crawl event ID *** ### Get Extract Status Get the status of a given extraction event ID ### Search a Website Full-page content retrieval for any search query ### Make an API Call Perform arbitrary authorized API calls for custom use cases ## Popular App Integrations **Google Sheets** - Track and log scraped data in spreadsheets **Airtable** - Build structured databases with scraped content **Google Drive** - Store scraped files and reports **Notion** - Organize research and web data **Slack** - Get alerts for website changes and updates **Telegram Bot** - Instant notifications for monitoring **Gmail** - Email reports and digests **Microsoft 365 Email** - Enterprise email automation **HubSpot CRM** - Enrich leads with web data **monday.com** - Track competitor intelligence **ClickUp** - Manage research tasks **OpenAI (ChatGPT, DALL-E)** - Analyze and summarize scraped content **Google Gemini AI** - Process and extract insights **Perplexity AI** - Enhanced research workflows **Make AI Agents** - Build adaptive AI-powered automations ## Common Workflow Patterns **Schedule** → Firecrawl (Scrape) → Google Sheets (Log) → Slack (Alert) Track competitor websites and get instant notifications **Google Forms** → Firecrawl (Scrape company site) → HubSpot CRM (Update) Automatically enrich leads with company data **Schedule** → Firecrawl (Crawl blog) → OpenAI (Summarize) → Gmail (Send digest) Automated content curation and distribution **Schedule (Hourly)** → Firecrawl (Scrape) → Filter → Telegram (Alert) Real-time price tracking and alerts ## Getting Started Get your API key at [firecrawl.dev](https://firecrawl.dev) Log into [Make](https://make.com) and click "Create a new scenario" Search for "Firecrawl" and select your desired action Add your Firecrawl API key to authenticate Set up your workflow parameters and run a test ## Firecrawl Actions Overview | Module | Use Case | Best For | | --------------------- | -------------------------------- | ------------------- | | **Scrape a Website** | Single-page data extraction | Quick data capture | | **Crawl a Website** | Multi-page content collection | Full site scraping | | **Extract a Website** | AI-powered structured extraction | Complex data needs | | **Search a Website** | Search + full content | Research automation | | **Map a Website** | URL discovery | SEO analysis | ## Best Practices * Use **Scrape** for single pages (fastest) * Use **Crawl** with limits for large sites * Schedule appropriately to avoid rate limits * Add error handling modules * Schedule strategically (hourly/daily/weekly) * Use filters to prevent unnecessary runs * Set crawl limits to control API usage * Test in Firecrawl playground first ## Industry Use Cases * Competitor price monitoring * Product availability tracking * Review aggregation and analysis * Inventory level monitoring * Listing aggregation from multiple sources * Market trend analysis * Property data enrichment * Competitive pricing intelligence * Competitor content monitoring * SEO performance tracking * Backlink analysis * Social media mention tracking * Market data collection * News and sentiment aggregation * Regulatory filing monitoring * Stock price tracking * Job posting aggregation * Company research automation * Candidate background enrichment * Salary benchmarking ## Make vs Zapier vs n8n | Feature | Make | Zapier | n8n | | ------------------ | --------------------------------- | ---------------- | -------------------- | | **Setup** | Visual builder, cloud | No-code, cloud | Self-hosted or cloud | | **Pricing** | Operations-based | Per-task pricing | Flat monthly | | **Integrations** | 3,000+ apps | 8,000+ apps | 400+ integrations | | **Complexity** | Advanced workflows | Simple workflows | Complex workflows | | **Best For** | Visual automation, mid-complexity | Quick automation | Developer control | | **Learning Curve** | Moderate | Easy | Moderate-Advanced | **Pro Tip:** Make excels at visual workflow design and complex automations. Perfect for teams that need more control than Zapier but prefer visual building over n8n's code-first approach. # Firecrawl + n8n Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n Learn how to use Firecrawl with n8n for web scraping automation, a complete step-by-step guide. ## Introduction to Firecrawl and n8n Web scraping automation has become essential for modern businesses. Whether you need to monitor competitor prices, aggregate content, generate leads, or power AI applications with real-time data, the combination of Firecrawl and n8n provides a powerful solution without requiring programming knowledge. Firecrawl and n8n integration **What is n8n?** n8n is an open-source workflow automation platform that connects different tools and services together. Think of it as a visual programming environment where you drag and drop nodes onto a canvas, connect them, and create automated workflows. With over 400 integrations, n8n lets you build complex automations without writing code. ## Why Use Firecrawl with n8n? Traditional web scraping presents several challenges. Custom scripts break when websites update their structure. Anti-bot systems block automated requests. JavaScript-heavy sites don't render properly. Infrastructure requires constant maintenance. Firecrawl handles these technical complexities on the scraping side, while n8n provides the automation framework. Together, they let you build production-ready workflows that: * Extract data from any website reliably * Connect scraped data to other business tools * Run on schedules or triggered by events * Scale from simple tasks to complex pipelines This guide will walk you through setting up both platforms and building your first scraping workflow from scratch. ## Step 1: Create Your Firecrawl Account Firecrawl provides the web scraping capabilities for your workflows. Let's set up your account and get your API credentials. ### Sign Up for Firecrawl 1. Navigate to [firecrawl.dev](https://firecrawl.dev) in your web browser 2. Click the "Get Started" or "Sign Up" button 3. Create an account using your email address or GitHub login 4. Verify your email if prompted ### Get Your API Key After signing in, you need an API key to connect Firecrawl to n8n: 1. Go to your Firecrawl dashboard 2. Navigate to the [API Keys page](https://www.firecrawl.dev/app/api-keys) 3. Click "Create New API Key" 4. Give your key a descriptive name (e.g., "n8n Integration") 5. Copy the generated API key and save it somewhere secure Firecrawl api key section Your API key is like a password. Keep it secure and never share it publicly. You'll need this key in the next section. Firecrawl provides free credits when you sign up, which is enough to test your workflows and complete this tutorial. ## Step 2: Set Up n8n n8n offers two deployment options: cloud-hosted or self-hosted. For beginners, the cloud version is the fastest way to get started. ### Choose Your n8n Version **n8n Cloud (Recommended for beginners):** * No installation required * Free tier available * Managed infrastructure * Automatic updates **Self-Hosted:** * Complete data control * Run on your own servers * Requires Docker installation * Good for advanced users with specific security requirements Choose the option that fits your needs. Both paths will lead you to the same workflow editor interface. ### Option A: n8n Cloud (Recommended for Beginners) 1. Visit [n8n.cloud](https://n8n.cloud) 2. Click "Start Free" or "Sign Up" 3. Register using your email address or GitHub 4. Complete the verification process 5. You'll be directed to your n8n dashboard n8n Cloud homepage showing the signup options The free tier provides everything you need to build and test workflows. You can upgrade later if you need more execution time or advanced features. ### Option B: Self-Hosted with Docker If you prefer to run n8n on your own infrastructure, you can set it up quickly using Docker. **Prerequisites:** * [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed on your computer * Basic familiarity with command line/terminal **Installation Steps:** 1. Open your terminal or command prompt 2. Create a Docker volume to persist your workflow data: ```bash theme={null} docker volume create n8n_data ``` This volume stores your workflows, credentials, and execution history so they persist even if you restart the container. 3. Run the n8n Docker container: ```bash theme={null} docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n ``` Terminal showing the docker commands being executed with n8n starting up 4. Wait for n8n to start. You'll see output indicating the server is running 5. Open your web browser and navigate to `http://localhost:5678` 6. Create your n8n account by registering with an email n8n self-hosted welcome screen at localhost:5678 Your self-hosted n8n instance is now running locally. The interface is identical to n8n Cloud, so you can follow the rest of this guide regardless of which option you chose. The `--rm` flag automatically removes the container when you stop it, but your data remains safe in the `n8n_data` volume. For production deployments, see the [n8n self-hosting documentation](https://docs.n8n.io/hosting/) for more advanced configuration options. ### Understanding the n8n Interface When you first log in to n8n, you'll see the main dashboard: n8n dashboard showing the workflow list view with "Create new workflow" button Key interface elements: * **Workflows**: Your saved automations appear here * **Executions**: History of workflow runs * **Credentials**: Stored API keys and authentication tokens * **Settings**: Account and workspace configuration Click "Create New Workflow" to open the workflow editor. ### The Workflow Canvas The workflow editor is where you'll build your automations: Empty n8n workflow canvas with the "+" button visible in the center Important elements: * **Canvas**: The main area where you place and connect nodes * **Add Node Button (+)**: Click this to add new nodes to your workflow * **Node Panel**: Opens when you click "+" showing all available nodes * **Execute Workflow**: Runs your workflow manually for testing * **Save**: Saves your workflow configuration Let's build your first workflow by adding the Firecrawl node. ## Step 3: Install and Configure the Firecrawl Node n8n includes native support for Firecrawl. You'll install the node and connect it to your Firecrawl account using the API key you created earlier. ### Add the Firecrawl Node to Your Workflow 1. In your new workflow canvas, click the "**+**" button in the center 2. The node selection panel opens on the right side 3. In the search box at the top, type "**Firecrawl**" 4. You'll see the Firecrawl node appear in the search results Clicking the + button, typing "Firecrawl" in the search, and the Firecrawl node appearing 5. Click "**Install**" next to the Firecrawl node 6. Wait for the installation to complete (this takes a few seconds) 7. Once installed, click on the Firecrawl node to add it to your canvas Firecrawl node now added to the canvas, showing as a box with the Firecrawl logo The Firecrawl node will appear on your canvas as a box with the Firecrawl logo. This node represents a single Firecrawl operation in your workflow. ### Connect Your Firecrawl API Key **n8n Cloud users:** Instead of manually entering an API key, you can use the one-click **"Connect to Firecrawl"** OAuth button when adding the Firecrawl node. This automatically creates a new Firecrawl team linked to your n8n account and grants **100,000 free credits**. To view these credits on the [Firecrawl dashboard](https://www.firecrawl.dev/app/usage), make sure you switch to your n8n-linked team using the team selector in the top-left corner. Before you can use the Firecrawl node, you need to authenticate it with your API key: 1. Click on the Firecrawl node box to open its configuration panel on the right 2. At the top, you'll see a "Credential to connect with" dropdown 3. Since this is your first time, click "**Create New Credential**" Firecrawl node configuration panel showing the credentials dropdown with "Create New Credential" option 4. A credential configuration window opens 5. Enter a name for this credential (e.g., "My Firecrawl Account") 6. Paste your Firecrawl API key in the "API Key" field 7. Click "**Save**" at the bottom The credential is now saved in n8n. You won't need to enter your API key again for future Firecrawl nodes. ### Test Your Connection Let's verify that your Firecrawl node is properly connected: 1. With the Firecrawl node still selected, look at the configuration panel 2. In the "Resource" dropdown, select "**Scrape a url and get its content**" 3. In the "URL" field, enter: `https://firecrawl.dev` 4. Leave other settings at their defaults for now 5. Click the "**Test step**" button at the bottom right of the node Selecting Scrape operation, entering example.com URL, and clicking Test step button If everything is configured correctly, you'll see the scraped content from example.com appear in the output panel below the node. Congratulations! Your Firecrawl node is now connected and working. ## Step 4: Create Your Telegram Bot Before building your first workflow, you'll need a Telegram bot to receive notifications. Telegram bots are free and easy to create through Telegram's BotFather. ### Create a Bot with BotFather 1. Open Telegram on your phone or desktop 2. Search for "**@BotFather**" (the official bot from Telegram) 3. Start a conversation with BotFather by clicking "**Start**" 4. Send the command `/newbot` to create a new bot 5. BotFather will ask you to choose a name for your bot (this is the display name users will see) 6. Enter a name like "**My Firecrawl Bot**" 7. Next, choose a username for your bot. It must end with "bot" (e.g., "**my\_firecrawl\_updates\_bot**") 8. If the username is available, BotFather will create your bot and send you a message with your bot token Creating a bot with BotFather, showing the full conversation flow Save your bot token securely. This token is like a password that allows n8n to send messages as your bot. Never share it publicly. ### Get Your Chat ID To send messages to yourself, you need your Telegram chat ID: 1. Open your web browser and visit this URL (replace `YOUR_BOT_TOKEN` with your actual bot token): ``` https://api.telegram.org/botYOUR_BOT_TOKEN/getUpdates ``` 2. Keep this browser tab open 3. Now, search for your bot's username in Telegram (the one you just created) 4. Start a conversation with your bot by clicking "**Start**" 5. Send any message to your bot (e.g., "hello") 6. Go back to the browser tab and refresh the page 7. Look for the `"chat":{"id":` field in the JSON response 8. The number next to `"id":` is your chat ID (e.g., `123456789`) 9. Save this chat ID for later Browser showing Telegram API getUpdates response with chat ID highlighted Your chat ID is the unique identifier for your conversation with the bot. You'll use this to tell n8n where to send messages. You now have everything needed to integrate Telegram with your n8n workflows. ## Step 5: Build Practical Workflows with Telegram Now let's build three real-world workflows that send information directly to your Telegram. These examples demonstrate different Firecrawl operations and how to integrate them with Telegram notifications. ### Example 1: Daily Firecrawl Product Updates Summary Get a daily summary of Firecrawl product updates delivered to your Telegram every morning. **What you'll build:** * Scrapes Firecrawl's product updates blog at 9 AM daily * Uses AI to generate a summary of the content * Sends the summary to your Telegram **Step-by-step:** 1. Create a new workflow in n8n 2. Add a **Schedule Trigger** node: * Click the "**+**" button on canvas * Search for "**Schedule Trigger**" * Configure: Every day at 9:00 AM Schedule Trigger configured for daily 9 AM execution 3. Add the **Firecrawl** node: * Click "**+**" next to Schedule Trigger * Search for and add "**Firecrawl**" * Select your Firecrawl credential * Configure: * **Resource**: Scrape a url and get its content * **URL**: `https://www.firecrawl.dev/blog/category/product-updates` * **Formats**: Select "Summary" Adding and configuring Firecrawl node with the blog URL and Summary format selected 4. Add the **Telegram** node: * Click "**+**" next to Firecrawl * Search for "**Telegram**" * Click "**Send a text message**" to add it to the canvas 5. Set up Telegram credentials: * Click on the Telegram node to open its configuration * In the "Credential to connect with" dropdown, click "**Create New Credential**" * Paste your bot token from BotFather * Click "**Save**" Telegram credential configuration with bot token field 6. Configure the Telegram message: * **Operation**: Send Message * **Chat ID**: Enter your chat ID * **Text**: Leave this with a "hello" message for now * Click **Execute step** to test sending a message while receiving the summary from Firecrawl. Configuring Telegram node and mapping the summary field to the message text * Now with Firecrawl's summary structure, add the summary to the message text by dragging the `summary` field from Firecrawl node output. 7. Test the workflow: * Click "**Execute Workflow**" * Check your Telegram for the summary message Complete workflow showing Schedule Trigger → Firecrawl → Telegram with all nodes connected 8. Activate the workflow by toggling the "**Active**" switch Your Telegram bot will now send you a daily summary of Firecrawl product updates every morning at 9 AM. ### Example 2: AI News Search to Telegram This workflow uses Firecrawl's Search operation to find AI news and send formatted results to Telegram. **Key differences from Example 1:** * Uses a **Manual Trigger** instead of Schedule (run on demand) * Uses **Search** operation instead of Scrape * Includes a **Code** node to format multiple results **Build the workflow:** 1. Create a new workflow and add a **Manual Trigger** node 2. Add **Firecrawl** node with these settings: * **Resource**: Search and optionally scrape search results * **Query**: `ai news` * **Limit**: 5 Firecrawl Search node configuration with "ai news" query 3. Add a **Code** node to format the search results: * Select "Run Once for All Items" * Paste this code: ```javascript theme={null} const results = $input.all(); let message = "Latest AI News:\n\n"; results.forEach((item) => { const webData = item.json.data.web; webData.forEach((article, index) => { message += `${index + 1}. ${article.title}\n`; message += `${article.description}\n`; message += `${article.url}\n\n`; }); }); return [{ json: { message } }]; ``` Adding Code node and pasting the formatting script 4. Update **Telegram** node (using your saved credential): * **Text**: Drag the `message` field from Code node Complete workflow execution with AI news sent to Telegram Replace the Manual Trigger with a Schedule Trigger to get automatic AI news updates at set intervals. ### Example 3: AI-Powered News Summary This workflow adds AI to Example 2, using OpenAI to generate intelligent summaries of the latest AI news before sending to Telegram. **Key changes from Example 2:** * Add **OpenAI credentials** setup * Add **AI Agent** node between Code and Telegram * AI Agent analyzes and summarizes all the news articles intelligently * Telegram receives the AI-generated summary instead of raw news list **Modify the workflow:** 1. **Get your OpenAI API key**: * Go to [platform.openai.com/api-keys](https://platform.openai.com/api-keys) * Sign in or create an account * Click "**Create new secret key**" * Give it a name (e.g., "n8n Integration") * Copy the API key immediately (you won't see it again) OpenAI dashboard showing API key creation 2. **Add and connect the AI Agent node**: * Click "**+**" after the Code node * Search for "**Basic LLM Chain**" or "**AI Agent**" * Drag the `message` field from the Code node to the AI Agent's input prompt field * Select **OpenAI** as the LLM provider Adding AI Agent node, dragging input from Code node, and connecting OpenAI as LLM 3. **Add your OpenAI credentials**: * Click "**Create New Credential**" for OpenAI * Paste your OpenAI API key * Select model: **gpt-5-mini** (cost-effective) or **gpt-5** (more capable) * Click "**Save**" Adding OpenAI credentials to the AI Agent node 4. **Add the system prompt to the AI Agent**: * In the AI Agent node, add this system prompt: ``` You are an AI news analyst. Analyze the provided AI news articles and create a concise, insightful summary highlighting the most important developments and trends. Group related topics together and provide context about why these developments matter. Keep the summary conversational and engaging, around 3-4 paragraphs. ``` Adding the system prompt to the AI Agent node 5. **Update the Telegram node and test**: * Update the Telegram node: * **Text**: Drag the AI Agent's output (the generated summary) * Remove the old mapping to the Code node's message * Click "**Execute Workflow**" to test * The AI will analyze all news articles and create a summary * Check your Telegram for the AI-generated summary Complete workflow execution with AI-generated summary sent to Telegram The AI Agent receives all the formatted news articles and creates an intelligent summary, making it easier to understand trends and important developments at a glance. ## Understanding Firecrawl Operations Now that you've built some workflows, let's explore the different Firecrawl operations available in n8n. Each operation is designed for specific web scraping use cases. ### Scrape a url and get its content Extracts content from a single web page and returns it in various formats. **What it does:** * Scrapes a single URL * Returns clean markdown, HTML, or AI-generated summaries * Can capture screenshots and extract links **Best for:** * Article extraction * Product page monitoring * Blog post scraping * Generating page summaries **Example use case:** Daily blog summaries (like Example 1 above) ### Search and optionally scrape search results Performs web searches and returns results with optional content scraping. **What it does:** * Searches the web, news, or images * Returns titles, descriptions, and URLs * Optionally scrapes the full content of results **Best for:** * Research automation * News monitoring * Trend discovery * Finding relevant content **Example use case:** AI news aggregation (like Example 2 above) ### Crawl a website Recursively discovers and scrapes multiple pages from a website. **What it does:** * Follows links automatically * Scrapes multiple pages in one operation * Can filter URLs by patterns **Best for:** * Full documentation extraction * Site archiving * Multi-page data collection ### Map a website and get urls Returns all URLs found on a website without scraping content. **What it does:** * Discovers all links on a site * Returns clean URL list * Fast and lightweight **Best for:** * URL discovery * Sitemap generation * Planning larger crawls ### Extract Data Uses AI to extract structured information based on custom prompts or schemas. **What it does:** * AI-powered data extraction * Returns data in your specified format * Works across multiple pages **Best for:** * Custom data extraction * Building databases * Structured information gathering ### Batch Scrape Scrapes multiple URLs in parallel efficiently. **What it does:** * Processes multiple URLs at once * More efficient than loops * Returns all results together **Best for:** * Processing URL lists * Bulk data collection * Large-scale scraping projects ### Agent Uses an AI agent to autonomously browse and extract data from websites based on a natural language prompt. **What it does:** * Accepts a prompt describing what data you need * AI agent navigates and extracts information autonomously * Available in **Sync** mode (waits for results) and **Async** mode (returns a job ID immediately) * Use **Get Agent Status** to poll for results when using Async mode **Best for:** * Complex, multi-page data gathering guided by a prompt * Extracting information when you don't know the exact page structure * Research tasks that require navigating multiple pages **Sync vs. Async:** * **Agent (Sync)** starts the job and waits for the result in one step — simplest for most use cases. The **Max Wait Time** parameter controls how long the node polls before timing out (default: 300 seconds, maximum: 600 seconds). If the agent job takes longer than this, the node returns a timeout status even though the job may still complete on the Firecrawl side. For jobs that may exceed 10 minutes, use the async mode instead. * **Agent (Async)** returns a job ID immediately. Add a second Firecrawl node with the **Get Agent Status** operation to retrieve results once the job completes. For details on the agent feature, see the [Agent documentation](/features/agent). ### Browser Sandbox Provides a persistent browser session that you can control with code, allowing multi-step browser automation within a single session. **Operations:** * **Create Browser Session** — starts a new browser session and returns a `sessionId` * **Execute Browser Code** — runs JavaScript, Python, or bash code in the session (using the `sessionId` from the Create step) * **List Browser Sessions** — lists active or destroyed sessions * **Delete Browser Session** — destroys a session when you are done **Best for:** * Multi-step browser workflows that require maintaining state across pages * Dynamic page navigation where the number of steps is not known in advance * Workflows that use persistent browser profiles to preserve cookies and localStorage across runs In n8n, select the **Browser** resource on the Firecrawl node to access these operations. Pass the `sessionId` from the Create step into each subsequent Execute or Delete step. Use n8n's **Loop Over Items** node to iterate through a dynamic list of pages, calling Execute for each one within the same session. For details on the Browser Sandbox feature, see the [Browser Sandbox documentation](/features/browser-sandbox). ## Workflow Templates and Examples Instead of building from scratch, you can start with pre-built templates. The n8n community has created numerous Firecrawl workflows you can copy and customize. ### Featured Templates Build an AI chatbot with web access using Firecrawl and n8n Ready-to-use templates for lead generation, price monitoring, and more Scrape pages into embeddings and store in Supabase pgvector for RAG Scrape company websites and extract structured business signals Scrape pages into embeddings and store in Pinecone for RAG Browse hundreds of workflows using Firecrawl View official integration documentation ### How to Import Templates To use a template from the n8n community: 1. Click on a workflow template link 2. Click "**Import template to localhost:5678 self-hosted instance**" button on the template page 3. The workflow opens in your n8n instance 4. Configure credentials for each node 5. Customize settings for your use case 6. Activate the workflow Importing a template from n8n.io, showing the import button and workflow appearing in n8n ## Best Practices Follow these guidelines to build reliable, efficient workflows: ### Testing and Debugging * Always test workflows manually before activating schedules * Use the "**Execute Workflow**" button to test the entire flow * Check output data at each node to verify correctness * Use the "**Executions**" tab to review past runs and debug issues Executions tab showing workflow run history with timestamps and status ### Error Handling * Add Error Trigger nodes to catch and handle failures * Set up notifications when workflows fail * Use the "**Continue On Fail**" setting for non-critical nodes * Monitor your workflow executions regularly ### Performance Optimization * Use Batch Scrape for multiple URLs instead of loops * Set appropriate rate limits to avoid overwhelming target sites * Cache data when possible to reduce unnecessary requests * Schedule intensive workflows during off-peak hours ### Security * Never expose API keys in workflow configurations * Use n8n's credential system to securely store authentication * Be careful when sharing workflows publicly * Follow target websites' terms of service and robots.txt ## Next Steps You now have the fundamentals to build web scraping automations with Firecrawl and n8n. Here's how to continue learning: ### Explore Advanced Features * Study webhook configurations for real-time data processing * Experiment with AI-powered extraction using prompts and schemas * Build complex multi-step workflows with branching logic ### Join the Community * [Firecrawl Discord](https://discord.gg/firecrawl) - Get help with Firecrawl and discuss web scraping * [n8n Community Forum](https://community.n8n.io/) - Ask questions about workflow automation * Share your workflows and learn from others ### Recommended Learning Path 1. Complete the example workflows in this guide 2. Modify templates from the community library 3. Build a workflow to solve a real problem in your work 4. Explore advanced Firecrawl operations 5. Contribute your own templates to help others **Need help?** If you're stuck or have questions, the Firecrawl and n8n communities are active and helpful. Don't hesitate to ask for guidance as you build your automations. ## Additional Resources * [Firecrawl API Documentation](/api-reference/v2-introduction) * [n8n Documentation](https://docs.n8n.io/) * [Web Scraping Best Practices](https://www.firecrawl.dev/blog) # Firecrawl + Zapier Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/zapier Official tutorials and Zapier integration templates for Firecrawl + Zapier automation **Official Zapier Integration:** [zapier.com/apps/firecrawl/integrations](https://zapier.com/apps/firecrawl/integrations) Connect with 8,000+ apps • No-code automation • Pre-built Zap templates • Cloud-based ## Official Blog Post Real-world case study: How Zapier integrated Firecrawl into Zapier Chatbots in a single afternoon. ## Popular Integrations ### Google Sheets → [View Integration](https://zapier.com/apps/google-sheets/integrations/firecrawl) Track competitor data, centralize marketing insights, and automate data collection. **Best For:** Business owners, marketing teams *** ### Airtable → [View Integration](https://zapier.com/apps/airtable/integrations/firecrawl) Build lead generation databases and content aggregation systems with structured storage. **Best For:** Sales teams, project managers *** ### Zapier Tables → [View Integration](https://zapier.com/apps/zapier-tables/integrations/firecrawl) No-code database automation for employee onboarding and centralized lead management. **Best For:** HR teams, operations ### Slack → [View Integration](https://zapier.com/apps/slack/integrations/firecrawl) Get website change notifications, competitor monitoring alerts, and market intelligence updates. **Best For:** Marketing teams, product managers *** ### Telegram → [View Integration](https://zapier.com/apps/telegram/integrations/firecrawl) Instant price alerts, breaking news notifications, and real-time monitoring. **Best For:** Traders, news enthusiasts ### HubSpot → [View Integration](https://zapier.com/apps/hubspot/integrations/firecrawl) Contact enrichment, lead scoring with web data, and marketing automation. **Best For:** Marketing ops, sales ops *** ### Pipedrive → [View Integration](https://zapier.com/apps/pipedrive/integrations/firecrawl) Lead enrichment from websites and competitor intelligence tracking. **Best For:** Sales teams, account executives *** ### Attio → [View Integration](https://zapier.com/apps/attio/integrations/firecrawl) Modern CRM data enrichment and relationship intelligence. **Best For:** Modern sales teams ### Google Docs → [View Integration](https://zapier.com/apps/google-docs/integrations/firecrawl) Automated report generation, research documentation, and content aggregation. **Best For:** Researchers, content creators *** ### Notion → [View Integration](https://zapier.com/apps/notion/integrations/firecrawl) Knowledge base updates, research library building, and content curation. **Best For:** Product teams, researchers ### Schedule by Zapier → [View Integration](https://zapier.com/apps/schedule/integrations/firecrawl) Run hourly, daily, weekly, or monthly scraping automatically. *** ### Zapier Interfaces → [View Integration](https://zapier.com/apps/interfaces/integrations/firecrawl) Build custom internal tools with form-based scraping and team dashboards. **Best For:** Operations teams *** ### Zapier Chatbots → [View Integration](https://zapier.com/apps/zapier-chatbots/integrations/firecrawl) AI chatbots with live web knowledge for customer support and lead generation. Official Zapier product uses Firecrawl internally ## Firecrawl Actions | Action | Use Case | | --------------------------- | ----------------------------------------- | | **Scrape URL** | Quick single-page data capture | | **Crawl Website** | Full site scraping with multiple pages | | **Extract Structured Data** | AI-powered extraction with custom schemas | | **Search Web** | Research automation with search + scrape | | **Map Website** | SEO analysis and site structure mapping | ## Quick Reference 1. Sign up at [firecrawl.dev](https://firecrawl.dev) 2. Get your API key 3. Create a Zap in Zapier 4. Connect Firecrawl with your API key 5. Choose your workflow and activate * Use `/Scrape URL` for single pages (faster) * Schedule strategically (hourly/daily/weekly) * Test in Firecrawl playground first * Add error handling for failed scrapes * Use filters to prevent unnecessary runs ## Industry Use Cases * Price monitoring across competitors * Product availability tracking * Review aggregation * Listing aggregation * Market trend analysis * Property data collection * Competitor content tracking * SEO monitoring * Backlink analysis * Market data collection * News aggregation * Regulatory filing monitoring * Job posting aggregation * Company research automation * Candidate information enrichment ## Zapier vs n8n | Feature | Zapier | n8n | | ---------------- | ------------------------------------- | ------------------------ | | **Setup** | No-code, cloud-based | Self-hosted or cloud | | **Pricing** | Per-task pricing | Flat monthly | | **Integrations** | 8,000+ apps | 400+ integrations | | **Best For** | Quick automation, non-technical users | Custom logic, developers | **Pro Tip:** Start with Zapier's pre-built templates and customize as needed. Perfect for quick, no-code automation! # Agent Source: https://docs.firecrawl.dev/features/agent Gather data wherever it lives on the web. Firecrawl `/agent` is a magic API that searches, navigates, and gathers data from the widest range of websites, finding data in hard-to-reach places and uncovering data in ways no other API can. It accomplishes in a few minutes what would take a human many hours — end-to-end data collection, without scripts or manual work. Whether you need one data point or entire datasets at scale, Firecrawl `/agent` works to get your data. **Think of `/agent` as deep research for data, wherever it is!** **Research Preview**: Agent is in early access. Expect rough edges. It will get significantly better over time. [Share feedback →](mailto:product@firecrawl.com) Agent builds on everything great about `/extract` and takes it further: * **No URLs Required**: Just describe what you need via `prompt` parameter. URLs are optional * **Deep Web Search**: Autonomously searches and navigates deep into sites to find your data * **Reliable and Accurate**: Works with a wide variety of queries and use cases * **Faster**: Processes multiple sources in parallel for quicker results Test the agent in the interactive playground — no code required. ## Using `/agent` The only required parameter is `prompt`. Simply describe what data you want to extract. For structured output, provide a JSON schema. The SDKs support Pydantic (Python) and Zod (Node) for type-safe schema definitions: ```python Python theme={null} from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") background: Optional[str] = Field(None, description="Professional background") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema, model="spark-1-mini", max_credits=100 ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ prompt: "Find the founders of Firecrawl", schema: z.object({ founders: z.array(z.object({ name: z.string().describe("Full name of the founder"), role: z.string().describe("Role or position").optional(), background: z.string().describe("Professional background").optional() })).describe("List of founders") }), model: "spark-1-mini", maxCredits: 100 }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the founders of Firecrawl", "model": "spark-1-mini", "maxCredits": 100, "schema": { "type": "object", "properties": { "founders": { "type": "array", "description": "List of founders", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "Full name" }, "role": { "type": "string", "description": "Role or position" }, "background": { "type": "string", "description": "Professional background" } }, "required": ["name"] } } }, "required": ["founders"] } }' ``` ### Response ```json JSON theme={null} { "success": true, "status": "completed", "data": { "founders": [ { "name": "Eric Ciarla", "role": "Co-founder", "background": "Previously at Mendable" }, { "name": "Nicolas Camara", "role": "Co-founder", "background": "Previously at Mendable" }, { "name": "Caleb Peffer", "role": "Co-founder", "background": "Previously at Mendable" } ] }, "expiresAt": "2024-12-15T00:00:00.000Z", "creditsUsed": 15 } ``` ## Providing URLs (Optional) You can optionally provide URLs to focus the agent on specific pages: ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.agent( urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt="Compare the features and pricing information from these pages" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); const result = await firecrawl.agent({ urls: ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt: "Compare the features and pricing information from these pages" }); console.log(result.data); ``` ```bash cURL theme={null} curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://docs.firecrawl.dev", "https://firecrawl.dev/pricing" ], "prompt": "Compare the features and pricing information from these pages" }' ``` ## Job Status and Completion Agent jobs run asynchronously. When you submit a job, you'll receive a Job ID that you can use to check status: * **Default method**: `agent()` waits and returns final results * **Start then poll**: Use `start_agent` (Python) or `startAgent` (Node) to get a Job ID immediately, then poll with `get_agent_status` / `getAgentStatus` Job results are available via the API for 24 hours after completion. After this period, you can still view your agent history and results in the [activity logs](https://www.firecrawl.dev/app/logs). ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Start an agent job agent_job = app.start_agent( prompt="Find the founders of Firecrawl" ) # Check the status status = app.get_agent_status(agent_job.id) print(status) # Example output: # status='completed' # success=True # data={ ... } # expires_at=datetime.datetime(...) # credits_used=15 ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Start an agent job const started = await firecrawl.startAgent({ prompt: "Find the founders of Firecrawl" }); // Check the status if (started.id) { const status = await firecrawl.getAgentStatus(started.id); console.log(status.status, status.data); } ``` ```bash cURL theme={null} curl -X GET "https://api.firecrawl.dev/v2/agent/" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ### Possible States | Status | Description | | ------------ | ------------------------------------------ | | `processing` | The agent is still working on your request | | `completed` | Extraction finished successfully | | `failed` | An error occurred during extraction | | `cancelled` | The job was cancelled by the user | #### Pending Example ```json JSON theme={null} { "success": true, "status": "processing", "expiresAt": "2024-12-15T00:00:00.000Z" } ``` #### Completed Example ```json JSON theme={null} { "success": true, "status": "completed", "data": { "founders": [ { "name": "Eric Ciarla", "role": "Co-founder" }, { "name": "Nicolas Camara", "role": "Co-founder" }, { "name": "Caleb Peffer", "role": "Co-founder" } ] }, "expiresAt": "2024-12-15T00:00:00.000Z", "creditsUsed": 15 } ``` ## Share agent runs You can share agent runs directly from the Agent playground. Shared links are public — anyone with the link can view the run output and activity — and you can revoke access at any time to disable the link. Shared pages are not indexed by search engines. ## Model Selection Firecrawl Agent offers two models. **Spark 1 Mini is 60% cheaper** and is the default — perfect for most use cases. Upgrade to Spark 1 Pro when you need maximum accuracy on complex tasks. | Model | Cost | Accuracy | Best For | | -------------- | --------------- | -------- | ------------------------------------- | | `spark-1-mini` | **60% cheaper** | Standard | Most tasks (default) | | `spark-1-pro` | Standard | Higher | Complex research, critical extraction | **Start with Spark 1 Mini** (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical. ### Spark 1 Mini (Default) `spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks. **Use Mini when:** * Extracting simple data points (contact info, pricing, etc.) * Working with well-structured websites * Cost efficiency is a priority * Running high-volume extraction jobs ### Spark 1 Pro `spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks. **Use Pro when:** * Performing complex competitive analysis * Extracting data that requires deep reasoning * Accuracy is critical for your use case * Dealing with ambiguous or hard-to-find data ### Specifying a Model Pass the `model` parameter to select which model to use: ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Using Spark 1 Mini (default - can be omitted) result = app.agent( prompt="Find the pricing of Firecrawl", model="spark-1-mini" ) # Using Spark 1 Pro for complex tasks result = app.agent( prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) print(result.data) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Using Spark 1 Mini (default - can be omitted) const result = await firecrawl.agent({ prompt: "Find the pricing of Firecrawl", model: "spark-1-mini" }); // Using Spark 1 Pro for complex tasks const resultPro = await firecrawl.agent({ prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", model: "spark-1-pro" }); console.log(result.data); ``` ```bash cURL theme={null} # Using Spark 1 Mini (default) curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Find the pricing of Firecrawl", "model": "spark-1-mini" }' # Using Spark 1 Pro for complex tasks curl -X POST "https://api.firecrawl.dev/v2/agent" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee", "model": "spark-1-pro" }' ``` ## Parameters | Parameter | Type | Required | Description | | ------------ | ------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `prompt` | string | **Yes** | Natural language description of the data you want to extract (max 10,000 characters) | | `model` | string | No | Model to use: `spark-1-mini` (default) or `spark-1-pro` | | `urls` | array | No | Optional list of URLs to focus the extraction | | `schema` | object | No | Optional JSON schema for structured output | | `maxCredits` | number | No | Maximum number of credits to spend on this agent task. Defaults to **2,500** if not set. The dashboard supports values up to **2,500**; for higher limits, set `maxCredits` via the API (values above 2,500 are always treated as paid requests). If the limit is reached, the job fails and **no data is returned**, though credits consumed for work performed are still charged. | ## Agent vs Extract: What's Improved | Feature | Agent (New) | Extract | | ----------------- | ----------- | -------- | | URLs Required | No | Yes | | Speed | Faster | Standard | | Cost | Lower | Standard | | Reliability | Higher | Standard | | Query Flexibility | High | Moderate | ## Example Use Cases * **Research**: "Find the top 5 AI startups and their funding amounts" * **Competitive Analysis**: "Compare pricing plans between Slack and Microsoft Teams" * **Data Gathering**: "Extract contact information from company websites" * **Content Summarization**: "Summarize the latest blog posts about web scraping" ## CSV Upload in Agent Playground The [Agent Playground](https://www.firecrawl.dev/app/agent) supports CSV upload for batch processing. Your CSV can contain one or more columns of input data. For example, a single column of company names, or multiple columns such as company name, product, and website URL. Each row represents one item for the agent to process. Upload your CSV, then add output columns using the "+" button in the grid header. Each column has its own prompt — click a column header to describe what the agent should find for that field (e.g., "CEO or founder name", "Total funding raised"). Hit Run, and the agent processes each row in parallel, filling in the results. ## API Reference Check out the [Agent API Reference](/api-reference/endpoint/agent) for more details. Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com). ## Pricing Firecrawl Agent uses **dynamic billing** that scales with the complexity of your data extraction request. You pay based on the actual work Agent performs, ensuring fair pricing whether you're extracting simple data points or complex structured information from multiple sources. ### How Agent pricing works Agent pricing is **dynamic and credit-based** during Research Preview: * **Simple extractions** (like contact info from a single page) typically use fewer credits and cost less * **Complex research tasks** (like competitive analysis across multiple domains) use more credits but reflect the total effort involved * **Transparent usage** shows you exactly how many credits each request consumed * **Credit conversion** automatically converts agent credit usage to credits for easy billing Credit usage varies based on the complexity of your prompt, the amount of data processed, and the structure of the output requested. As a rough guide, most agent runs consume **a few hundred credits**, though simpler single-page tasks may use less and complex multi-domain research may use more. ### Parallel Agents Pricing If you are running multiple agents in parallel with Spark-1 Fast, pricing is a lot more predictable at 10 credits per cell. ### Getting started **All users** receive **5 free daily runs**, which can be used from either the playground or the API, to explore Agent's capabilities without any cost. Additional usage is billed based on credit consumption and converted to credits. ### Managing costs Agent can be expensive, but there are some ways to decrease the cost: * **Start with free runs**: Use your 5 daily free requests to understand pricing * **Set a `maxCredits` parameter**: Limit your spending by setting a maximum number of credits you're willing to spend. The dashboard caps this at 2,500 credits; to set a higher limit, use the `maxCredits` parameter directly via the API (note: values above 2,500 are always billed as paid requests) * **Optimize prompts**: More specific prompts often use fewer credits * **Break large tasks into smaller runs**: A single agent run has an output ceiling based on the underlying model's generation capacity (\~150-200 rows of structured data). For large extraction jobs, split by category, region, or URL batch (3-5 URLs per run) and merge the results. This also keeps each run well under the `maxCredits` limit. * **Monitor usage**: Track your consumption through the dashboard * **Set expectations**: Complex multi-domain research will use more credits than simple single-page extractions Try Agent now at [firecrawl.dev/app/agent](https://www.firecrawl.dev/app/agent) to see how credit usage scales with your specific use cases. Pricing is subject to change as we move from Research Preview to general availability. Current users will receive advance notice of any pricing updates. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Batch Scrape Source: https://docs.firecrawl.dev/features/batch-scrape Scrape multiple URLs in a single batch job Batch scrape lets you scrape multiple URLs in a single job. Pass a list of URLs and optional parameters, and Firecrawl processes them concurrently and returns all results together. * Works like `/crawl` but for an explicit list of URLs * Synchronous and asynchronous modes * Supports all scrape options including structured extraction * Configurable concurrency per job ## How it works You can run a batch scrape in two ways: | Mode | SDK method (JS / Python) | Behavior | | ------------ | ----------------------------------------- | ---------------------------------------------------------------- | | Synchronous | `batchScrape` / `batch_scrape` | Starts the batch and waits for completion, returning all results | | Asynchronous | `startBatchScrape` / `start_batch_scrape` | Starts the batch and returns a job ID for polling or webhooks | ## Basic usage ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") start = firecrawl.start_batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"]) # returns id job = firecrawl.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", ], formats=["markdown"], poll_interval=2, wait_timeout=120) print(job.status, job.completed, job.total) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Start a batch scrape job const { id } = await firecrawl.startBatchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev' ], { options: { formats: ['markdown'] }, }); // Wait for completion const job = await firecrawl.batchScrape([ 'https://firecrawl.dev', 'https://docs.firecrawl.dev' ], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 }); console.log(job.status, job.completed, job.total); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"], "formats": ["markdown"] }' ``` ### Response Calling `batchScrape` / `batch_scrape` returns the full results when the batch completes. ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` Calling `startBatchScrape` / `start_batch_scrape` returns a job ID you can track via `getBatchScrapeStatus` / `get_batch_scrape_status`, the API endpoint `/batch/scrape/{id}`, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the [activity logs](https://www.firecrawl.dev/app/logs). ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Concurrency By default, a batch scrape job uses your team's full concurrent browser limit (see [Rate Limits](/rate-limits)). You can lower this per job with the `maxConcurrency` parameter. For example, `maxConcurrency: 50` limits that job to 50 simultaneous scrapes. Setting this value too low on large batches will significantly slow down processing, so only reduce it if you need to leave capacity for other concurrent jobs. ## Structured extraction You can use batch scrape to extract structured data from every page in the batch. This is useful when you want the same schema applied to a list of URLs. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Scrape multiple websites: batch_scrape_result = firecrawl.batch_scrape( ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], formats=[{ 'type': 'json', 'prompt': 'Extract the title and description from the page.', 'schema': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'description': {'type': 'string'} }, 'required': ['title', 'description'] } }] ) print(batch_scrape_result) # Or, you can use the start method: batch_scrape_job = firecrawl.start_batch_scrape( ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], formats=[{ 'type': 'json', 'prompt': 'Extract the title and description from the page.', 'schema': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'description': {'type': 'string'} }, 'required': ['title', 'description'] } }] ) print(batch_scrape_job) # You can then use the job ID to check the status of the batch scrape: batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id) print(batch_scrape_status) ``` ```js Node theme={null} import Firecrawl, { ScrapeResponse } from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); // Define schema to extract contents into const schema = { type: "object", properties: { title: { type: "string" }, description: { type: "string" } }, required: ["title", "description"] }; // Scrape multiple websites (synchronous): const batchScrapeResult = await firecrawl.batchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], { formats: [ { type: "json", prompt: "Extract the title and description from the page.", schema: schema } ] }); // Output all the results of the batch scrape: console.log(batchScrapeResult) // Or, you can use the start method: const batchScrapeJob = await firecrawl.startBatchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], { formats: [ { type: "json", prompt: "Extract the title and description from the page.", schema: schema } ] }); console.log(batchScrapeJob) // You can then use the job ID to check the status of the batch scrape: const batchScrapeStatus = await firecrawl.getBatchScrapeStatus(batchScrapeJob.id); console.log(batchScrapeStatus) ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"], "formats" : [{ "type": "json", "prompt": "Extract the title and description from the page.", "schema": { "type": "object", "properties": { "title": { "type": "string" }, "description": { "type": "string" } }, "required": [ "title", "description" ] } }] }' ``` ### Response `batchScrape` / `batch_scrape` returns full results: ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26", "data": [ { "json": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot." } }, ... ] } ``` `startBatchScrape` / `start_batch_scrape` returns a job ID: ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789" } ``` ## Webhooks You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This lets you process results immediately instead of waiting for the entire batch to complete. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Event types | Event | Description | | ------------------------ | ------------------------------------- | | `batch_scrape.started` | The batch scrape job has begun | | `batch_scrape.page` | A single URL was successfully scraped | | `batch_scrape.completed` | All URLs have been processed | | `batch_scrape.failed` | The batch scrape encountered an error | ### Payload Each webhook delivery includes a JSON body with the following structure: ```json theme={null} { "success": true, "type": "batch_scrape.page", "id": "batch-job-id", "data": [...], "metadata": {}, "error": null } ``` ### Verifying webhook signatures Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with. 1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings 2. Extract the signature from the `X-Firecrawl-Signature` header 3. Compute HMAC-SHA256 of the raw request body using your secret 4. Compare with the signature header using a timing-safe function Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...` For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview). > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Change Tracking Source: https://docs.firecrawl.dev/features/change-tracking Detect and monitor changes in web content between scrapes Change Tracking Change tracking compares the current content of a page against the last time you scraped it. Add `changeTracking` to your `formats` array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed. * Works with `/scrape`, `/crawl`, and `/batch/scrape` * Two diff modes: `git-diff` for line-level changes, `json` for field-level comparison * Scoped to your team, and optionally scoped to a tag that you pass in ## How it works Every scrape with `changeTracking` enabled stores a snapshot and compares it against the previous snapshot for that URL. Snapshots are stored persistently and do not expire, so comparisons remain accurate regardless of how much time has passed between scrapes. | Scrape | Result | | ----------------- | -------------------------------------------------- | | First time | `changeStatus: "new"` (no previous version exists) | | Content unchanged | `changeStatus: "same"` | | Content modified | `changeStatus: "changed"` (diff data available) | | Page removed | `changeStatus: "removed"` | The response includes these fields in the `changeTracking` object: | Field | Type | Description | | ------------------ | --------------------- | ---------------------------------------------------------------------------------------------- | | `previousScrapeAt` | `string \| null` | Timestamp of the previous scrape (`null` on first scrape) | | `changeStatus` | `string` | `"new"`, `"same"`, `"changed"`, or `"removed"` | | `visibility` | `string` | `"visible"` (discoverable via links/sitemap) or `"hidden"` (URL works but is no longer linked) | | `diff` | `object \| undefined` | Line-level diff (only present in `git-diff` mode when status is `"changed"`) | | `json` | `object \| undefined` | Field-level comparison (only present in `json` mode when status is `"changed"`) | ## Basic usage Include both `markdown` and `changeTracking` in the `formats` array. The `markdown` format is required because change tracking compares pages via their markdown content. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") result = firecrawl.scrape( "https://example.com/pricing", formats=["markdown", "changeTracking"] ) print(result.changeTracking) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const result = await firecrawl.scrape('https://example.com/pricing', { formats: ['markdown', 'changeTracking'] }); console.log(result.changeTracking); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": ["markdown", "changeTracking"] }' ``` ### Response On the first scrape, `changeStatus` is `"new"` and `previousScrapeAt` is `null`: ```json theme={null} { "success": true, "data": { "markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...", "changeTracking": { "previousScrapeAt": null, "changeStatus": "new", "visibility": "visible" } } } ``` On subsequent scrapes, `changeStatus` reflects whether content changed: ```json theme={null} { "success": true, "data": { "markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...", "changeTracking": { "previousScrapeAt": "2025-06-01T10:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible" } } } ``` ## Git-diff mode The `git-diff` mode returns line-by-line changes in a format similar to `git diff`. Pass an object in the `formats` array with `modes: ["git-diff"]`: ```python Python theme={null} result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] ) if result.changeTracking.changeStatus == "changed": print(result.changeTracking.diff.text) ``` ```js Node theme={null} const result = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', modes: ['git-diff'] } ] }); if (result.changeTracking.changeStatus === 'changed') { console.log(result.changeTracking.diff.text); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }' ``` ### Response The `diff` object contains both a plain-text diff and a structured JSON representation: ```json theme={null} { "changeTracking": { "previousScrapeAt": "2025-06-01T10:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "diff": { "text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo", "json": { "files": [{ "chunks": [{ "content": "@@ -1,3 +1,3 @@", "changes": [ { "type": "normal", "content": "# Pricing" }, { "type": "del", "ln": 2, "content": "Starter: $9/mo" }, { "type": "del", "ln": 3, "content": "Pro: $29/mo" }, { "type": "add", "ln": 2, "content": "Starter: $12/mo" }, { "type": "add", "ln": 3, "content": "Pro: $39/mo" } ] }] }] } } } } ``` The structured `diff.json` object contains: * `files`: array of changed files (typically one for web pages) * `chunks`: sections of changes within a file * `changes`: individual line changes with `type` (`"add"`, `"del"`, or `"normal"`), line number (`ln`), and `content` ## JSON mode The `json` mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff. Pass `modes: ["json"]` with a `schema` defining the fields to extract: ```python Python theme={null} result = firecrawl.scrape( "https://example.com/product/widget", formats=[ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "price": { "type": "string" }, "availability": { "type": "string" } } } } ] ) if result.changeTracking.changeStatus == "changed": changes = result.changeTracking.json print(f"Price: {changes['price']['previous']} → {changes['price']['current']}") ``` ```js Node theme={null} const result = await firecrawl.scrape('https://example.com/product/widget', { formats: [ 'markdown', { type: 'changeTracking', modes: ['json'], schema: { type: 'object', properties: { price: { type: 'string' }, availability: { type: 'string' } } } } ] }); if (result.changeTracking.changeStatus === 'changed') { const changes = result.changeTracking.json; console.log(`Price: ${changes.price.previous} → ${changes.price.current}`); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/product/widget", "formats": [ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "price": { "type": "string" }, "availability": { "type": "string" } } } } ] }' ``` ### Response Each field in the schema is returned with `previous` and `current` values: ```json theme={null} { "changeTracking": { "previousScrapeAt": "2025-06-05T08:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "json": { "price": { "previous": "$19.99", "current": "$24.99" }, "availability": { "previous": "In Stock", "current": "In Stock" } } } } ``` You can also pass an optional `prompt` to guide the LLM extraction alongside the schema. JSON mode uses LLM extraction and costs **5 credits per page**. Basic change tracking and `git-diff` mode have no additional cost. ## Using tags By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain **separate tracking histories** for the same URL, which is useful when you monitor the same page at different intervals or in different contexts. ```python Python theme={null} # Hourly monitoring (compared against last "hourly" scrape) result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "tag": "hourly" } ] ) # Daily summary (compared against last "daily" scrape) result = firecrawl.scrape( "https://example.com/pricing", formats=[ "markdown", { "type": "changeTracking", "tag": "daily" } ] ) ``` ```js Node theme={null} // Hourly monitoring (compared against last "hourly" scrape) const result = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', tag: 'hourly' } ] }); // Daily summary (compared against last "daily" scrape) const result2 = await firecrawl.scrape('https://example.com/pricing', { formats: [ 'markdown', { type: 'changeTracking', tag: 'daily' } ] }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "tag": "hourly" } ] }' ``` ## Crawl with change tracking Add change tracking to crawl operations to monitor an entire site for changes. Pass the `changeTracking` format inside `scrapeOptions`: ```python Python theme={null} result = firecrawl.crawl( "https://example.com", limit=50, scrape_options={ "formats": ["markdown", "changeTracking"] } ) for page in result.data: status = page.changeTracking.changeStatus url = page.metadata.url print(f"{url}: {status}") ``` ```js Node theme={null} const result = await firecrawl.crawl('https://example.com', { limit: 50, scrapeOptions: { formats: ['markdown', 'changeTracking'] } }); for (const page of result.data) { console.log(`${page.metadata.url}: ${page.changeTracking.changeStatus}`); } ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "scrapeOptions": { "formats": ["markdown", "changeTracking"] } }' ``` ## Batch scrape with change tracking Use [batch scrape](/features/batch-scrape) to monitor a specific set of URLs: ```python Python theme={null} result = firecrawl.batch_scrape( [ "https://example.com/pricing", "https://example.com/product/widget", "https://example.com/blog/latest" ], formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}] ) ``` ```js Node theme={null} const result = await firecrawl.batchScrape([ 'https://example.com/pricing', 'https://example.com/product/widget', 'https://example.com/blog/latest' ], { options: { formats: ['markdown', { type: 'changeTracking', modes: ['git-diff'] }] } }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example.com/pricing", "https://example.com/product/widget", "https://example.com/blog/latest" ], "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }' ``` ## Scheduling change tracking Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools. ### Cron job Create a script that scrapes a URL and alerts on changes: ```bash check-pricing.sh theme={null} #!/bin/bash RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://competitor.com/pricing", "formats": [ "markdown", { "type": "changeTracking", "modes": ["json"], "schema": { "type": "object", "properties": { "starter_price": { "type": "string" }, "pro_price": { "type": "string" } } } } ] }') STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus') if [ "$STATUS" = "changed" ]; then echo "$RESPONSE" | jq '.data.changeTracking.json' # Send alert via email, Slack, etc. fi ``` Schedule it with `crontab -e`: ```bash theme={null} 0 */6 * * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1 ``` | Schedule | Expression | | ------------------------ | ------------- | | Every hour | `0 * * * *` | | Every 6 hours | `0 */6 * * *` | | Daily at 9 AM | `0 9 * * *` | | Weekly on Monday at 8 AM | `0 8 * * 1` | ### Cloud and serverless schedulers * **AWS**: EventBridge rule triggering a Lambda function * **GCP**: Cloud Scheduler triggering a Cloud Function * **Vercel / Netlify**: Cron-triggered serverless functions * **GitHub Actions**: Scheduled workflows with `schedule` and `cron` trigger ### Workflow automation No-code platforms like **n8n**, **Zapier**, and **Make** can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the [workflow automation guides](/developer-guides/workflow-automation/n8n). ## Webhooks For async operations like crawl and batch scrape, use [webhooks](/webhooks/overview) to receive change tracking results as they arrive instead of polling. ```python Python theme={null} job = firecrawl.start_crawl( "https://example.com", limit=50, scrape_options={ "formats": [ "markdown", {"type": "changeTracking", "modes": ["git-diff"]} ] }, webhook={ "url": "https://your-server.com/firecrawl-webhook", "headers": {"Authorization": "Bearer your-webhook-secret"}, "events": ["crawl.page", "crawl.completed"] } ) ``` ```js Node theme={null} const { id } = await firecrawl.startCrawl('https://example.com', { limit: 50, scrapeOptions: { formats: [ 'markdown', { type: 'changeTracking', modes: ['git-diff'] } ] }, webhook: { url: 'https://your-server.com/firecrawl-webhook', headers: { Authorization: 'Bearer your-webhook-secret' }, events: ['crawl.page', 'crawl.completed'] } }); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "scrapeOptions": { "formats": [ "markdown", { "type": "changeTracking", "modes": ["git-diff"] } ] }, "webhook": { "url": "https://your-server.com/firecrawl-webhook", "headers": { "Authorization": "Bearer your-webhook-secret" }, "events": ["crawl.page", "crawl.completed"] } }' ``` The `crawl.page` event payload includes the `changeTracking` object for each page: ```json theme={null} { "success": true, "type": "crawl.page", "id": "550e8400-e29b-41d4-a716-446655440000", "data": [{ "markdown": "# Pricing\n\nStarter: $12/mo...", "metadata": { "title": "Pricing", "url": "https://example.com/pricing", "statusCode": 200 }, "changeTracking": { "previousScrapeAt": "2025-06-05T12:00:00.000+00:00", "changeStatus": "changed", "visibility": "visible", "diff": { "text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo" } } }] } ``` For webhook configuration details (headers, metadata, events, retries, signature verification), see the [Webhooks documentation](/webhooks/overview). ## Configuration reference The full set of options available when passing a `changeTracking` format object: | Parameter | Type | Default | Description | | --------- | ---------- | ---------- | ----------------------------------------------------------------- | | `type` | `string` | (required) | Must be `"changeTracking"` | | `modes` | `string[]` | `[]` | Diff modes to enable: `"git-diff"`, `"json"`, or both | | `schema` | `object` | (none) | JSON Schema for field-level comparison (required for `json` mode) | | `prompt` | `string` | (none) | Custom prompt to guide LLM extraction (used with `json` mode) | | `tag` | `string` | `null` | Separate tracking history identifier | ### Data models ```ts TypeScript theme={null} interface ChangeTrackingResult { previousScrapeAt: string | null; changeStatus: "new" | "same" | "changed" | "removed"; visibility: "visible" | "hidden"; diff?: { text: string; json: { files: Array<{ from: string | null; to: string | null; chunks: Array<{ content: string; changes: Array<{ type: "add" | "del" | "normal"; ln?: number; ln1?: number; ln2?: number; content: string; }>; }>; }>; }; }; json?: Record; } ``` ```python Python theme={null} class ChangeTrackingData(BaseModel): previous_scrape_at: Optional[str] = None change_status: str # "new" | "same" | "changed" | "removed" visibility: str # "visible" | "hidden" diff: Optional[Dict[str, Any]] = None json: Optional[Dict[str, Any]] = None ``` ## Important details The `markdown` format must always be included alongside `changeTracking`. Change tracking compares pages via their markdown content. * **Snapshot retention**: Snapshots are stored persistently and do not expire. A scrape performed months after the previous one will still compare correctly against the earlier snapshot. * **Scoping**: Comparisons are scoped to your team. Your first scrape of any URL returns `"new"`, even if other users have scraped it. * **URL matching**: Previous scrapes are matched on exact source URL, team ID, `markdown` format, and `tag`. Keep URLs consistent between scrapes. * **Parameter consistency**: Using different `includeTags`, `excludeTags`, or `onlyMainContent` settings across scrapes of the same URL produces unreliable comparisons. * **Comparison algorithm**: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization. * **Caching**: Requests with `changeTracking` bypass the index cache. The `maxAge` parameter is ignored. * **Error handling**: Monitor the `warning` field in responses and handle the `changeTracking` object potentially being absent (this can occur if the database lookup for the previous scrape times out). ## Billing | Mode | Cost | | --------------------- | --------------------------------------- | | Basic change tracking | No extra cost (standard scrape credits) | | `git-diff` mode | No extra cost | | `json` mode | 5 credits per page | > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Crawl Source: https://docs.firecrawl.dev/features/crawl Recursively crawl a website and get content from every page Crawl submits a URL to Firecrawl and recursively discovers and scrapes every reachable subpage. It handles sitemaps, JavaScript rendering, and rate limits automatically, returning clean markdown or structured data for each page. * Discovers pages via sitemap and recursive link traversal * Supports path filtering, depth limits, and subdomain/external link control * Returns results via polling, WebSocket, or webhook Test crawling in the interactive playground — no code required. ## Installation ```python Python theme={null} # pip install firecrawl-py from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") ``` ```js Node theme={null} # npm install @mendable/firecrawl-js import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); ``` ```bash CLI theme={null} # Install globally with npm npm install -g firecrawl # Authenticate (one-time setup) firecrawl login ``` ## Basic usage Submit a crawl job by calling `POST /v2/crawl` with a starting URL. The endpoint returns a job ID that you use to poll for results. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY") docs = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=10) print(docs) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const docs = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 10 }); console.log(docs); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 10 }' ``` ```bash CLI theme={null} # Start a crawl job (returns job ID) firecrawl crawl https://firecrawl.dev # Wait for completion with progress firecrawl crawl https://firecrawl.dev --wait --progress --limit 100 ``` Each page crawled consumes 1 credit. The default crawl `limit` is 10,000 pages. Before starting, the crawl endpoint checks that your remaining credits can cover the `limit` — if not, it returns a **402 (Payment Required)** error. Set a lower `limit` to match your intended crawl size (e.g. `limit: 100`) to avoid this. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page. ### Scrape options All options from the [Scrape endpoint](/api-reference/endpoint/scrape) are available in crawl via `scrapeOptions` (JS) / `scrape_options` (Python). These apply to every page the crawler scrapes, including formats, proxy, caching, actions, location, and tags. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY') # Crawl with scrape options response = firecrawl.crawl('https://example.com', limit=100, scrape_options={ 'formats': [ 'markdown', { 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'title': { 'type': 'string' } } } } ], 'proxy': 'auto', 'max_age': 600000, 'only_main_content': True } ) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }); // Crawl with scrape options const crawlResponse = await firecrawl.crawl('https://example.com', { limit: 100, scrapeOptions: { formats: [ 'markdown', { type: 'json', schema: { type: 'object', properties: { title: { type: 'string' } } }, }, ], proxy: 'auto', maxAge: 600000, onlyMainContent: true, }, }); ``` ## Checking crawl status Use the job ID to poll for the crawl status and retrieve results. ```python Python theme={null} status = firecrawl.get_crawl_status("") print(status) ``` ```js Node theme={null} const status = await firecrawl.getCrawlStatus(""); console.log(status); ``` ```bash cURL theme={null} # After starting a crawl, poll status by jobId curl -s -X GET "https://api.firecrawl.dev/v2/crawl/" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # Check crawl status using job ID firecrawl crawl ``` Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the [activity logs](https://www.firecrawl.dev/app/logs). Pages in the crawl results `data` array are pages that Firecrawl successfully scraped, even if the target site returned an HTTP error like 404. The `metadata.statusCode` field shows the HTTP status code from the target site. To retrieve pages that Firecrawl itself failed to scrape (e.g. network errors, timeouts, or robots.txt blocks), use the dedicated [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint (`GET /crawl/{id}/errors`). ### Response handling The response varies based on the crawl's status. For incomplete or large responses exceeding 10MB, a `next` URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the `next` parameter is absent, it indicates the end of the crawl data. The `skip` and `next` parameters are only relevant when hitting the API directly. If you're using the SDK, pagination is handled automatically and all results are returned at once. ```json Scraping theme={null} { "status": "scraping", "total": 36, "completed": 10, "creditsUsed": 10, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=10", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` ```json Completed theme={null} { "status": "completed", "total": 36, "completed": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=26", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } }, ... ] } ``` ## SDK methods There are two ways to use crawl with the SDK. ### Crawl and wait The `crawl` method waits for the crawl to complete and returns the full response. It handles pagination automatically. This is recommended for most use cases. ```python Python theme={null} from firecrawl import Firecrawl from firecrawl.types import ScrapeOptions firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Crawl a website: crawl_status = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options=ScrapeOptions(formats=['markdown', 'html']), poll_interval=30 ) print(crawl_status) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); const crawlResponse = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown', 'html'], } }) console.log(crawlResponse) ``` The response includes the crawl status and all scraped data: ```bash Python theme={null} success=True status='completed' completed=100 total=100 creditsUsed=100 expiresAt=datetime.datetime(2025, 4, 23, 19, 21, 17, tzinfo=TzInfo(UTC)) next=None data=[ Document( markdown='[Day 7 - Launch Week III.Integrations DayApril 14th to 20th](...', metadata={ 'title': '15 Python Web Scraping Projects: From Beginner to Advanced', ... 'scrapeId': '97dcf796-c09b-43c9-b4f7-868a7a5af722', 'sourceURL': 'https://www.firecrawl.dev/blog/python-web-scraping-projects', 'url': 'https://www.firecrawl.dev/blog/python-web-scraping-projects', 'statusCode': 200 } ), ... ] ``` ```json Node theme={null} { success: true, status: "completed", completed: 100, total: 100, creditsUsed: 100, expiresAt: "2025-04-23T19:28:45.000Z", data: [ { markdown: "[Day 7 - Launch Week III.Integrations DayApril ...", html: ` ``` The initial response returns the job ID: ```json theme={null} { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/crawl/123-456-789" } ``` ## Real-time results with WebSocket The watcher method provides real-time updates as pages are crawled. Start a crawl, then subscribe to events for immediate data processing. ```python Python theme={null} import asyncio from firecrawl import AsyncFirecrawl async def main(): firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY") # Start a crawl first started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5) # Watch updates (snapshots) until terminal status async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120): if snapshot.status == "completed": print("DONE", snapshot.status) for doc in snapshot.data: print("DOC", doc.metadata.source_url if doc.metadata else None) elif snapshot.status == "failed": print("ERR", snapshot.status) else: print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total) asyncio.run(main()) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // Start a crawl and then watch it const { id } = await firecrawl.startCrawl('https://mendable.ai', { excludePaths: ['blog/*'], limit: 5, }); const watcher = firecrawl.watcher(id, { kind: 'crawl', pollInterval: 2, timeout: 120 }); watcher.on('document', (doc) => { console.log('DOC', doc); }); watcher.on('error', (err) => { console.error('ERR', err?.error || err); }); watcher.on('done', (state) => { console.log('DONE', state.status); }); // Begin watching (WS with HTTP fallback) await watcher.start(); ``` ## Webhooks You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they are scraped instead of waiting for the entire crawl to complete. ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["started", "page", "completed"] } }' ``` ### Event types | Event | Description | | ----------------- | ---------------------------------------- | | `crawl.started` | Fires when the crawl begins | | `crawl.page` | Fires for each page successfully scraped | | `crawl.completed` | Fires when the crawl finishes | | `crawl.failed` | Fires if the crawl encounters an error | ### Payload ```json theme={null} { "success": true, "type": "crawl.page", "id": "crawl-job-id", "data": [...], // Page data for 'page' events "metadata": {}, // Your custom metadata "error": null } ``` ### Verifying webhook signatures Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with. 1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings 2. Extract the signature from the `X-Firecrawl-Signature` header 3. Compute HMAC-SHA256 of the raw request body using your secret 4. Compare with the signature header using a timing-safe function Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...` For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, payload structure, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview). ## Configuration reference The full set of parameters available when submitting a crawl job: | Parameter | Type | Default | Description | | ----------------------- | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `url` | `string` | (required) | The starting URL to crawl from | | `limit` | `integer` | `10000` | Maximum number of pages to crawl | | `maxDiscoveryDepth` | `integer` | (none) | Maximum depth from the root URL based on link-discovery hops, not the number of `/` segments in the URL. Each time a new URL is found on a page, it is assigned a depth one higher than the page it was discovered on. The root site and sitemapped pages have a discovery depth of 0. Pages at the max depth are still scraped, but links on them are not followed. | | `includePaths` | `string[]` | (none) | URL pathname regex patterns to include. Only matching paths are crawled. | | `excludePaths` | `string[]` | (none) | URL pathname regex patterns to exclude from the crawl | | `regexOnFullURL` | `boolean` | `false` | Match `includePaths`/`excludePaths` against the full URL (including query parameters) instead of just the pathname | | `crawlEntireDomain` | `boolean` | `false` | Follow internal links to sibling or parent URLs, not just child paths | | `allowSubdomains` | `boolean` | `false` | Follow links to subdomains of the main domain | | `allowExternalLinks` | `boolean` | `false` | Follow links to external websites | | `sitemap` | `string` | `"include"` | Sitemap handling: `"include"` (default), `"skip"`, or `"only"` | | `ignoreQueryParameters` | `boolean` | `false` | Avoid re-scraping the same path with different query parameters | | `ignoreRobotsTxt` | `boolean` | `false` | Ignore the website's robots.txt rules. **Enterprise only** — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. | | `robotsUserAgent` | `string` | (none) | Custom User-Agent string for robots.txt evaluation. When set, robots.txt is fetched with this User-Agent and rules are matched against it instead of the default. **Enterprise only** — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. | | `delay` | `number` | (none) | Delay in seconds between scrapes to respect rate limits. Setting this forces concurrency to 1. | | `maxConcurrency` | `integer` | (none) | Maximum concurrent scrapes. Defaults to your team's concurrency limit. | | `scrapeOptions` | `object` | (none) | Options applied to every scraped page (formats, proxy, caching, actions, etc.) | | `webhook` | `object` | (none) | Webhook configuration for real-time notifications | | `prompt` | `string` | (none) | Natural language prompt to generate crawl options. Explicitly set parameters override generated equivalents. | ## Important details By default, crawl ignores sublinks that are not children of the URL you provide. For example, `website.com/other-parent/blog-1` would not be returned if you crawled `website.com/blogs/`. Use the `crawlEntireDomain` parameter to include sibling and parent paths. To crawl subdomains like `blog.website.com` when crawling `website.com`, use the `allowSubdomains` parameter. * **Sitemap discovery**: By default, the crawler includes the website's sitemap to discover URLs (`sitemap: "include"`). If you set `sitemap: "skip"`, only pages reachable through HTML links from the root URL are found. Assets like PDFs or deeply nested pages listed in the sitemap but not directly linked from HTML will be missed. For maximum coverage, keep the default setting. * **Credit usage**: Each page crawled costs 1 credit. JSON mode adds 4 credits per page, enhanced proxy adds 4 credits per page, and PDF parsing costs 1 credit per PDF page. * **Result expiration**: Job results are available via the API for 24 hours after completion. After that, view results in the [activity logs](https://www.firecrawl.dev/app/logs). * **Crawl errors**: The `data` array contains pages Firecrawl successfully scraped. Use the [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint to retrieve pages that failed due to network errors, timeouts, or robots.txt blocks. * **Non-deterministic results**: Crawl results may vary between runs of the same configuration. Pages are scraped concurrently, so the order in which links are discovered depends on network timing and which pages finish loading first. This means different branches of a site may be explored to different extents near the depth boundary, especially at higher `maxDiscoveryDepth` values. To get more deterministic results, set `maxConcurrency` to `1` or use `sitemap: "only"` if the site has a comprehensive sitemap. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Document Parsing Source: https://docs.firecrawl.dev/features/document-parsing Learn about document parsing capabilities. Firecrawl provides powerful document parsing capabilities, allowing you to extract structured content from various document formats. This feature is particularly useful for processing files like spreadsheets, Word documents, and more. ## Supported Document Formats Firecrawl currently supports the following document formats: * **Excel Spreadsheets** (`.xlsx`, `.xls`) * Each worksheet is converted to an HTML table * Worksheets are separated by H2 headings with the sheet name * Preserves cell formatting and data types * **Word Documents** (`.docx`, `.doc`, `.odt`, `.rtf`) * Extracts text content while preserving document structure * Maintains headings, paragraphs, lists, and tables * Preserves basic formatting and styling * **PDF Documents** (`.pdf`) * Extracts text content with layout information * Preserves document structure including sections and paragraphs * Handles both text-based and scanned PDFs (with OCR support) * Supports `mode` option to control parsing strategy: `fast` (text-only), `auto` (text with OCR fallback, default), or `ocr` (force OCR) * Priced at 1 credit per-page. See [Pricing](https://firecrawl.dev/pricing) for details. ### PDF Parsing Modes Use the `parsers` option to control how PDFs are processed: | Mode | Description | | ------ | --------------------------------------------------------------------------------------------------------------------- | | `auto` | Attempts fast text-based extraction first, falls back to OCR if needed. This is the default. | | `fast` | Text-based parsing only (embedded text). Fastest option, but will not extract text from scanned or image-heavy pages. | | `ocr` | Forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page. | ```js theme={null} // Object syntax with mode parsers: [{ type: "pdf", mode: "ocr", maxPages: 20 }] // Default (auto mode) parsers: [{ type: "pdf" }] ``` ## How to Use Document Parsing Document parsing in Firecrawl works automatically when you provide a URL that points to a supported document type. The system will detect the file type based on the URL extension or content-type header and process it accordingly. ### Example: Scraping an Excel File ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com/data.xlsx'); console.log(doc.markdown); ``` ### Example: Scraping a Word Document ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); const doc = await firecrawl.scrape('https://example.com/data.docx'); console.log(doc.markdown); ``` ## Output Format All supported document types are converted to clean, structured markdown. For example, an Excel file with multiple sheets might be converted to: ```markdown theme={null} ## Sheet1 | Name | Value | |-------|-------| | Item 1 | 100 | | Item 2 | 200 | ## Sheet2 | Date | Description | |------------|--------------| | 2023-01-01 | First quarter| ``` > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Enhanced Mode Source: https://docs.firecrawl.dev/features/enhanced-mode Use enhanced proxies for reliable scraping on complex sites Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. Set the `proxy` parameter to control which proxy strategy is used for a request. ## Proxy types Firecrawl supports three proxy types: | Type | Description | Speed | Cost | | ---------- | ------------------------------------------------------------ | ------ | ----------------------------------------------------------- | | `basic` | Standard proxies suitable for most sites | Fast | 1 credit | | `enhanced` | Enhanced proxies for complex sites | Slower | 5 credits per request | | `auto` | Tries `basic` first, then retries with `enhanced` on failure | Varies | 1 credit if basic succeeds, 5 credits if enhanced is needed | If you do not specify a proxy, Firecrawl defaults to `auto`. ## Basic usage Set the `proxy` parameter to choose a proxy strategy. The following example uses `auto`, which lets Firecrawl decide when to escalate to enhanced proxies. ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') # Choose proxy strategy: 'basic' | 'enhanced' | 'auto' doc = firecrawl.scrape('https://example.com', formats=['markdown'], proxy='auto') print(doc.warning or 'ok') ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Choose proxy strategy: 'basic' | 'enhanced' | 'auto' const doc = await firecrawl.scrape('https://example.com', { formats: ['markdown'], proxy: 'auto' }); console.log(doc.warning || 'ok'); ``` ```bash cURL theme={null} // Choose proxy strategy: 'basic' | 'enhanced' | 'auto' curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR-API-KEY' \ -d '{ "url": "https://example.com", "proxy": "auto" }' ``` Enhanced proxy requests cost **5 credits per request**. When using `auto`, the 5-credit cost only applies if the basic proxy fails and the enhanced retry succeeds. > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Faster Scraping Source: https://docs.firecrawl.dev/features/fast-scraping Speed up your scrapes by 500% with the maxAge parameter ## How It Works Firecrawl caches previously scraped pages and, by default, returns a recent copy when available. * **Default freshness**: `maxAge = 172800000` ms (2 days). If the cached copy is newer than this, it’s returned instantly; otherwise, Firecrawl scrapes fresh and updates the cache. * **Force fresh**: Set `maxAge: 0` to always scrape. Be aware this bypasses the cache entirely, meaning every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if you don't need real-time content on every request. * **Skip caching**: Set `storeInCache: false` if you don’t want to store results for a request. Get your results **up to 500% faster** when you don’t need the absolute freshest data. Control freshness via `maxAge`: 1. **Return instantly** if we have a recent version of the page 2. **Scrape fresh** only if our version is older than your specified age 3. **Save you time** - results come back in milliseconds instead of seconds ## When to Use This **Great for:** * Documentation, articles, product pages * Bulk processing jobs * Development and testing * Building knowledge bases **Skip for:** * Real-time data (stock prices, live scores, breaking news) * Frequently updated content * Time-sensitive applications ## Usage Add `maxAge` to your scrape request. Values are in milliseconds (e.g., `3600000` = 1 hour). ```python Python theme={null} from firecrawl import Firecrawl firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Use cached data if it's less than 1 hour old (3600000 ms) # This can be 500% faster than a fresh scrape! scrape_result = firecrawl.scrape( 'https://firecrawl.dev', formats=['markdown'], max_age=3600000 # 1 hour in milliseconds ) print(scrape_result.markdown) ``` ```javascript JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Use cached data if it's less than 1 hour old (3600000 ms) // This can be 500% faster than a fresh scrape! const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'], maxAge: 3600000 // 1 hour in milliseconds }); console.log(scrapeResult.markdown); ``` ## Common maxAge values Here are some helpful reference values: * **5 minutes**: `300000` - For semi-dynamic content * **1 hour**: `3600000` - For content that updates hourly * **1 day**: `86400000` - For daily-updated content * **1 week**: `604800000` - For relatively static content ## Performance impact With `maxAge` enabled: * **500% faster response times** for recent content * **Instant results** instead of waiting for fresh scrapes ## Important notes * **Default**: `maxAge` is `172800000` (2 days) * **Fresh when needed**: If our data is older than `maxAge`, we scrape fresh automatically * **No stale data**: You'll never get data older than your specified `maxAge` * **Credits**: Cached results still cost 1 credit per page. Caching improves speed and latency, not credit usage. ### When caching is bypassed Caching is automatically skipped when your request includes any of the following: * Custom `headers` * `actions` (browser automation steps) * A browser `profile` * `changeTracking` format * Custom `screenshot` viewport or quality settings ### Cache hit matching For a cache hit, these parameters must match exactly between the original and subsequent requests: `url`, `mobile`, `location`, `waitFor`, `blockAds`, `screenshot` (enabled/disabled and full-page), and stealth proxy mode. You can verify cache behavior by checking `metadata.cacheState` in the response — it will be `"hit"` or `"miss"`. ## Faster crawling The same speed benefits apply when crawling multiple pages. Use `maxAge` within `scrapeOptions` to get cached results for pages we’ve seen recently. ```python Python theme={null} from firecrawl import Firecrawl from firecrawl.v2.types import ScrapeOptions firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Crawl with cached scraping - 500% faster for pages we've seen recently crawl_result = firecrawl.crawl( 'https://firecrawl.dev', limit=100, scrape_options=ScrapeOptions( formats=['markdown'], max_age=3600000 # Use cached data if less than 1 hour old ) ) for page in crawl_result.data: print(f"URL: {page.metadata.source_url}") print(f"Content: {page.markdown[:200]}...") ``` ```javascript JavaScript theme={null} import Firecrawl from '@mendable/firecrawl-js'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); // Crawl with cached scraping - 500% faster for pages we've seen recently const crawlResult = await firecrawl.crawl('https://firecrawl.dev', { limit: 100, scrapeOptions: { formats: ['markdown'], maxAge: 3600000 // Use cached data if less than 1 hour old } }); crawlResult.data.forEach(page => { console.log(`URL: ${page.metadata.sourceURL}`); console.log(`Content: ${page.markdown.substring(0, 200)}...`); }); ``` ```bash cURL theme={null} curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "limit": 100, "scrapeOptions": { "formats": ["markdown"], "maxAge": 3600000 } }' ``` When crawling with `maxAge`, each page in your crawl will benefit from the 500% speed improvement if we have recent cached data for that page. Start using `maxAge` today for dramatically faster scrapes and crawls! > Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions. # Interact after scraping Source: https://docs.firecrawl.dev/features/interact Interact with a page you fetched by prompting or running code. Scrape a page to get clean data, then call `/interact` to start taking actions in that page - click buttons, fill forms, extract dynamic content, or navigate deeper. Just describe what you want, or write code if you need full control. Describe what action you want to take in the page Interact via code execution securely with playwright, agent-browser Watch or interact with the browser in real time via embeddable stream ## How It Works 1. **Scrape** a URL with `POST /v2/scrape`. The response includes a `scrapeId` in `data.metadata.scrapeId`. Optionally pass a `profile` to persist browser state across sessions. 2. **Interact** by calling `POST /v2/scrape/{scrapeId}/interact` with a `prompt` or with playwright `code`. On the first call, the scraped session is resumed and you can start interacting with the page. 3. **Stop** the session with `DELETE /v2/scrape/{scrapeId}/interact` when you're done. ## Quick Start Scrape a page, interact with it, and stop the session: ```python Python theme={null} from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") # 1. Scrape Amazon's homepage result = app.scrape("https://www.amazon.com", formats=["markdown"]) scrape_id = result.metadata.scrape_id # 2. Interact — search for a product and get its price app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max") response = app.interact(scrape_id, prompt="Click on the first result and tell me the price") print(response.output) # 3. Stop the session app.stop_interaction(scrape_id) ``` ```js Node theme={null} import Firecrawl from '@mendable/firecrawl-js'; const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }); // 1. Scrape Amazon's homepage const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] }); const scrapeId = result.metadata?.scrapeId; // 2. Interact — search for a product and get its price await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' }); const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' }); console.log(response.output); // 3. Stop the session await app.stopInteraction(scrapeId); ``` ```bash cURL theme={null} # 1. Scrape Amazon's homepage RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com", "formats": ["markdown"]}') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # 2. Interact — search for a product and get its price curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Search for iPhone 16 Pro Max"}' curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Click on the first result and tell me the price"}' # 3. Stop the session curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ```bash CLI theme={null} # 1. Scrape Amazon's homepage (scrape ID is saved automatically) firecrawl scrape https://www.amazon.com # 2. Interact — search for a product and get its price firecrawl interact "Search for iPhone 16 Pro Max" firecrawl interact "Click on the first result and tell me the price" # 3. Stop the session firecrawl interact stop ``` ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.", "exitCode": 0, "killed": false } ``` ## Interact via prompting The simplest way to interact with a page. Describe what you want in natural language and it will click, type, scroll, and extract data automatically. ```python Python theme={null} response = app.interact(scrape_id, prompt="What are the customer reviews saying about battery life?") print(response.output) ``` ```js Node theme={null} const response = await app.interact(scrapeId, { prompt: 'What are the customer reviews saying about battery life?', }); console.log(response.output); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "What are the customer reviews saying about battery life?" }' ``` ```bash CLI theme={null} firecrawl interact "What are the customer reviews saying about battery life?" ``` The response includes an `output` field with the agent's answer: ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "output": "Customers are generally positive about battery life. Most reviewers report 8-10 hours of use on a single charge. A few noted it drains faster with heavy multitasking.", "stdout": "...", "result": "...", "stderr": "", "exitCode": 0, "killed": false } ``` ### Keep Prompts Small and Focused Prompts work best when each one is a **single, clear task**. Instead of asking the agent to do a complex multi-step workflow in one shot, break it into separate interact calls. Each call reuses the same browser session, so state carries over between them. ## Running Code For full control, you can execute code directly in the browser sandbox. The `page` variable (a Playwright Page object) is available in Node.js and Python. Bash mode has [agent-browser](https://github.com/vercel-labs/agent-browser) pre-installed. You can also take screenshots within the session — use `(await page.screenshot()).toString("base64")` in Node.js, `await page.screenshot(path="/tmp/screenshot.png")` in Python, or `agent-browser screenshot` in Bash. ### Node.js (Playwright) The default language. Write Playwright code directly — `page` is already connected to the browser. ```python Python theme={null} response = app.interact(scrape_id, code=""" // Click a button and wait for navigation await page.click('#next-page'); await page.waitForLoadState('networkidle'); // Extract content from the new page const title = await page.title(); const content = await page.$eval('.article-body', el => el.textContent); JSON.stringify({ title, content }); """) print(response.result) ``` ```js Node theme={null} const response = await app.interact(scrapeId, { code: ` // Click a button and wait for navigation await page.click('#next-page'); await page.waitForLoadState('networkidle'); // Extract content from the new page const title = await page.title(); const content = await page.$eval('.article-body', el => el.textContent); JSON.stringify({ title, content }); `, }); console.log(response.result); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "code": "await page.click(\"#next-page\"); await page.waitForLoadState(\"networkidle\"); const title = await page.title(); JSON.stringify({ title });", "language": "node", "timeout": 30 }' ``` ```bash CLI theme={null} # Uses the last scrape automatically firecrawl interact -c " await page.click('#next-page'); await page.waitForLoadState('networkidle'); const title = await page.title(); const content = await page.\$eval('.article-body', el => el.textContent); JSON.stringify({ title, content }); " # Or pass a scrape ID explicitly # firecrawl interact -c "await page.title()" ``` ### Python Set `language` to `"python"` for Playwright's Python API. ```python Python theme={null} response = app.interact( scrape_id, code=""" import json await page.click('#load-more') await page.wait_for_load_state('networkidle') items = await page.query_selector_all('.item') data = [] for item in items: text = await item.text_content() data.append(text.strip()) print(json.dumps(data)) """, language="python", ) print(response.stdout) ``` ```js Node theme={null} const response = await app.interact(scrapeId, { code: ` import json await page.click('#load-more') await page.wait_for_load_state('networkidle') items = await page.query_selector_all('.item') data = [] for item in items: text = await item.text_content() data.append(text.strip()) print(json.dumps(data)) `, language: 'python', }); console.log(response.stdout); ``` ```bash cURL theme={null} curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "code": "import json\nawait page.click(\"#load-more\")\nawait page.wait_for_load_state(\"networkidle\")\nitems = await page.query_selector_all(\".item\")\ndata = [await i.text_content() for i in items]\nprint(json.dumps(data))", "language": "python" }' ``` ```bash CLI theme={null} firecrawl interact --python -c " import json await page.click('#load-more') await page.wait_for_load_state('networkidle') items = await page.query_selector_all('.item') data = [await i.text_content() for i in items] print(json.dumps(data)) " ``` ### Bash (agent-browser) [agent-browser](https://github.com/vercel-labs/agent-browser) is a CLI pre-installed in the sandbox with 60+ commands. It provides an accessibility tree with element refs (`@e1`, `@e2`, ...) — ideal for LLM-driven automation. ```python Python theme={null} # Take a snapshot to see interactive elements snapshot = app.interact( scrape_id, code="agent-browser snapshot -i", language="bash", ) print(snapshot.stdout) # Output: # [document] # @e1 [input type="text"] "Search..." # @e2 [button] "Search" # @e3 [link] "About" # Interact with elements using @refs app.interact( scrape_id, code='agent-browser fill @e1 "firecrawl" && agent-browser click @e2', language="bash", ) ``` ```js Node theme={null} // Take a snapshot to see interactive elements const snapshot = await app.interact(scrapeId, { code: 'agent-browser snapshot -i', language: 'bash', }); console.log(snapshot.stdout); // Output: // [document] // @e1 [input type="text"] "Search..." // @e2 [button] "Search" // @e3 [link] "About" // Interact with elements using @refs await app.interact(scrapeId, { code: 'agent-browser fill @e1 "firecrawl" && agent-browser click @e2', language: 'bash', }); ``` ```bash cURL theme={null} # Take a snapshot to see interactive elements curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"code": "agent-browser snapshot -i", "language": "bash"}' # Interact with elements using @refs curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"code": "agent-browser fill @e1 \"firecrawl\" && agent-browser click @e2", "language": "bash"}' ``` ```bash CLI theme={null} # Take a snapshot to see interactive elements firecrawl interact --bash -c "agent-browser snapshot -i" # Interact with elements using @refs firecrawl interact --bash -c 'agent-browser fill @e1 "firecrawl" && agent-browser click @e2' ``` Common agent-browser commands: | Command | Description | | ------------------------- | ----------------------------------------- | | `snapshot` | Full accessibility tree with element refs | | `snapshot -i` | Interactive elements only | | `click @e1` | Click element by ref | | `fill @e1 "text"` | Clear field and type text | | `type @e1 "text"` | Type without clearing | | `press Enter` | Press a keyboard key | | `scroll down 500` | Scroll down by pixels | | `get text @e1` | Get text content | | `get url` | Get current URL | | `wait @e1` | Wait for element | | `wait --load networkidle` | Wait for network idle | | `find text "X" click` | Find element by text and click | | `screenshot` | Take a screenshot of the current page | | `eval "js code"` | Run JavaScript in page | ## Live View Every interact response returns a `liveViewUrl` that you can embed to watch the browser in real time. Useful for debugging, demos, or building browser-powered UIs. ```json Response theme={null} { "success": true, "liveViewUrl": "https://liveview.firecrawl.dev/...", "interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...", "stdout": "", "result": "...", "exitCode": 0 } ``` ```html theme={null}