> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Advanced Scraping Guide

> Configure scrape options, browser actions, crawl, map, and the agent endpoint with Firecrawl's full API surface.

Reference for every option across Firecrawl's scrape, crawl, map, and agent endpoints.

## Basic scraping

To scrape a single page and get clean markdown content, use the `/scrape` endpoint.

<CodeGroup>
  ```python Python theme={null}
  # pip install firecrawl-py

  from firecrawl import Firecrawl

  firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

  doc = firecrawl.scrape("https://firecrawl.dev")

  print(doc.markdown)
  ```

  ```js Node theme={null}
  // npm install firecrawl

  import { Firecrawl } from 'firecrawl-js';

  const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });

  const doc = await firecrawl.scrape('https://firecrawl.dev');

  console.log(doc.markdown);
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.firecrawl.dev/v2/scrape \
      -H 'Content-Type: application/json' \
      -H 'Authorization: Bearer fc-YOUR-API-KEY' \
      -d '{
        "url": "https://docs.firecrawl.dev"
      }'
  ```
</CodeGroup>

## Scraping PDFs

Firecrawl supports PDFs. Use the `parsers` option (e.g., `parsers: ["pdf"]`) when you want to ensure PDF parsing. You can control the parsing strategy with the `mode` option:

* **`auto`** (default) — attempts fast text-based extraction first, then falls back to OCR if needed.
* **`fast`** — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages.
* **`ocr`** — forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page.

`{ type: "pdf" }` and `"pdf"` both default to `mode: "auto"`.

```json theme={null}
"parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }]
```

## Scrape options

When using the `/scrape` endpoint, you can customize the request with the following options.

### Formats (`formats`)

The `formats` array controls which output types the scraper returns. Default: `["markdown"]`.

**String formats**: pass the name directly (e.g. `"markdown"`).

| Format     | Description                                                                                                                       |
| ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `markdown` | Page content converted to clean Markdown.                                                                                         |
| `html`     | Processed HTML with unnecessary elements removed.                                                                                 |
| `rawHtml`  | Original HTML exactly as returned by the server.                                                                                  |
| `links`    | All links found on the page.                                                                                                      |
| `images`   | All images found on the page.                                                                                                     |
| `summary`  | An LLM-generated summary of the page content.                                                                                     |
| `branding` | Extracts brand identity (colors, fonts, typography, spacing, UI components).                                                      |
| `product`  | Extracts a structured product (title, price, availability, images, variants) from product pages via multi-source structured data. |

**Object formats**: pass an object with `type` and additional options.

| Format           | Options                                                                                  | Description                                                                                                             |
| ---------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `json`           | `prompt?: string`, `schema?: object`                                                     | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters).   |
| `screenshot`     | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }`                 | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. |
| `changeTracking` | `modes?: ("json" \| "git-diff")[]`, `tag?: string`, `schema?: object`, `prompt?: string` | Track changes between scrapes. Requires `"markdown"` to also be in the formats array.                                   |
| `attributes`     | `selectors: [{ selector: string, attribute: string }]`                                   | Extract specific HTML attributes from elements matching CSS selectors.                                                  |

### Mobile scraping

Set `mobile: true` to emulate a mobile device. This is useful when a responsive site hides content on desktop or serves a different layout to mobile browsers.

For region-specific sites, combine with `location` and a mobile screenshot to verify the rendered layout:

<CodeGroup>
  ```python Python theme={null}
  from firecrawl import Firecrawl

  firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

  doc = firecrawl.scrape(
      "https://example.com",
      mobile=True,
      location={"country": "GB", "languages": ["en-GB"]},
      formats=[
          "markdown",
          {"type": "screenshot", "fullPage": True, "viewport": {"width": 390, "height": 844}},
      ],
      only_main_content=False,
      wait_for=2000,
  )

  print(doc.markdown)
  ```

  ```js Node theme={null}
  import { Firecrawl } from 'firecrawl-js';

  const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });

  const doc = await firecrawl.scrape('https://example.com', {
    mobile: true,
    location: { country: 'GB', languages: ['en-GB'] },
    formats: [
      'markdown',
      { type: 'screenshot', fullPage: true, viewport: { width: 390, height: 844 } },
    ],
    onlyMainContent: false,
    waitFor: 2000,
  });

  console.log(doc.markdown);
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.firecrawl.dev/v2/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://example.com",
      "mobile": true,
      "location": {
        "country": "GB",
        "languages": ["en-GB"]
      },
      "formats": [
        "markdown",
        { "type": "screenshot", "fullPage": true, "viewport": { "width": 390, "height": 844 } }
      ],
      "onlyMainContent": false,
      "waitFor": 2000
    }'
  ```
</CodeGroup>

If the site still serves a desktop layout despite `mobile: true`, add a mobile User-Agent via `headers`:

```json theme={null}
{
  "headers": {
    "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"
  }
}
```

### Content filtering

These parameters control which parts of the page appear in the output. When `onlyMainContent` is `true` (the default), boilerplate (nav, footer, etc.) is stripped. `includeTags` and `excludeTags` are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set `onlyMainContent: false` to use the full page as the starting point for tag filtering.

| Parameter         | Type      | Default | Description                                                                                                                            |
| ----------------- | --------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `onlyMainContent` | `boolean` | `true`  | Return only the main content. Set `false` for the full page.                                                                           |
| `includeTags`     | `array`   | —       | CSS selectors to include — tags, classes, IDs, or attribute selectors (e.g. `["h1", "p", ".main-content", "[data-testid=\"main\"]"]`). |
| `excludeTags`     | `array`   | —       | CSS selectors to exclude — tags, classes, IDs, or attribute selectors (e.g. `["#ad", "#footer", "[role=\"banner\"]"]`).                |

### Timing and cache

| Parameter | Type           | Default     | Description                                                                                            |
| --------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------ |
| `waitFor` | `integer` (ms) | `0`         | Extra wait time before scraping, on top of smart-wait. Use sparingly.                                  |
| `maxAge`  | `integer` (ms) | `172800000` | Return a cached version if fresher than this value (default is 2 days). Set `0` to always fetch fresh. |
| `timeout` | `integer` (ms) | `60000`     | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second).              |

### PDF parsing

| Parameter | Type    | Default   | Description                                                                      |
| --------- | ------- | --------- | -------------------------------------------------------------------------------- |
| `parsers` | `array` | `["pdf"]` | Controls PDF processing. `[]` to skip parsing and return base64 (1 credit flat). |

```json theme={null}
{ "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 }
```

| Property   | Type                        | Default      | Description                                                                           |
| ---------- | --------------------------- | ------------ | ------------------------------------------------------------------------------------- |
| `type`     | `"pdf"`                     | *(required)* | Parser type.                                                                          |
| `mode`     | `"fast" \| "auto" \| "ocr"` | `"auto"`     | `fast`: text-based extraction only. `auto`: fast with OCR fallback. `ocr`: force OCR. |
| `maxPages` | `integer`                   | —            | Cap the number of pages to parse.                                                     |

### Actions

Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all `wait` actions and `waitFor` must not exceed 60 seconds.

| Action              | Parameters                                                               | Description                                                                                                                              |
| ------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `wait`              | `milliseconds?: number`, `selector?: string`                             | Wait for a fixed duration **or** until an element is visible (provide one, not both). When using `selector`, times out after 30 seconds. |
| `click`             | `selector: string`, `all?: boolean`                                      | Click an element matching the CSS selector. Set `all: true` to click every match.                                                        |
| `write`             | `text: string`                                                           | Type text into the currently focused field. You must focus the element with a `click` action first.                                      |
| `press`             | `key: string`                                                            | Press a keyboard key (e.g. `"Enter"`, `"Tab"`, `"Escape"`).                                                                              |
| `scroll`            | `direction?: "up" \| "down"`, `selector?: string`                        | Scroll the page or a specific element. Direction defaults to `"down"`.                                                                   |
| `screenshot`        | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max viewport resolution is 7680×4320.                                                                              |
| `scrape`            | *(none)*                                                                 | Capture the current page HTML at this point in the action sequence.                                                                      |
| `executeJavascript` | `script: string`                                                         | Run JavaScript code in the page. Return values are available in the `actions.javascriptReturns` array of the response.                   |
| `pdf`               | `format?: string`, `landscape?: boolean`, `scale?: number`               | Generate a PDF. Supported formats: `"A0"` through `"A6"`, `"Letter"`, `"Legal"`, `"Tabloid"`, `"Ledger"`. Defaults to `"Letter"`.        |

<CodeGroup>
  ```python Python theme={null}
  from firecrawl import Firecrawl

  firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

  doc = firecrawl.scrape('https://example.com', {
    'actions': [
      { 'type': 'wait', 'milliseconds': 1000 },
      { 'type': 'click', 'selector': '#accept' },
      { 'type': 'scroll', 'direction': 'down' },
      { 'type': 'click', 'selector': '#q' },
      { 'type': 'write', 'text': 'firecrawl' },
      { 'type': 'press', 'key': 'Enter' },
      { 'type': 'wait', 'milliseconds': 2000 }
    ],
    'formats': ['markdown']
  })

  print(doc.markdown)
  ```

  ```js Node theme={null}
  import { Firecrawl } from 'firecrawl-js';

  const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });

  const doc = await firecrawl.scrape('https://example.com', {
    actions: [
      { type: 'wait', milliseconds: 1000 },
      { type: 'click', selector: '#accept' },
      { type: 'scroll', direction: 'down' },
      { type: 'click', selector: '#q' },
      { type: 'write', text: 'firecrawl' },
      { type: 'press', key: 'Enter' },
      { type: 'wait', milliseconds: 2000 }
    ],
    formats: ['markdown']
  });

  console.log(doc.markdown);
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.firecrawl.dev/v2/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://example.com",
      "actions": [
        { "type": "wait", "milliseconds": 1000 },
        { "type": "click", "selector": "#accept" },
        { "type": "scroll", "direction": "down" },
        { "type": "click", "selector": "#q" },
        { "type": "write", "text": "firecrawl" },
        { "type": "press", "key": "Enter" },
        { "type": "wait", "milliseconds": 2000 }
      ],
      "formats": ["markdown"]
    }'
  ```
</CodeGroup>

#### Action execution notes

* **Write** requires a preceding `click` to focus the target element.
* **Scroll** accepts an optional `selector` to scroll a specific element instead of the page.
* **Wait** accepts either `milliseconds` (fixed delay) or `selector` (wait until visible).
* Actions run **sequentially**: each step completes before the next begins.
* Actions are **not supported for PDFs**. If the URL resolves to a PDF the request will fail.

#### Advanced action examples

**Taking a screenshot:**

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "click", "selector": "#load-more" },
      { "type": "wait", "milliseconds": 1000 },
      { "type": "screenshot", "fullPage": true, "quality": 80 }
    ]
  }'
```

**Clicking multiple elements:**

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "click", "selector": ".expand-button", "all": true },
      { "type": "wait", "milliseconds": 500 }
    ],
    "formats": ["markdown"]
  }'
```

**Generating a PDF:**

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "pdf", "format": "A4", "landscape": false }
    ]
  }'
```

**Executing JavaScript (e.g. extracting embedded page data):**

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://example.com",
    "actions": [
      { "type": "executeJavascript", "script": "document.querySelector(\"#__NEXT_DATA__\").textContent" }
    ],
    "formats": ["markdown"]
  }'
```

The return value of each `executeJavascript` action is captured in the `actions.javascriptReturns` array of the response.

### Full scrape example

The following request combines multiple scrape options:

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "formats": [
        "markdown",
        "links",
        "html",
        "rawHtml",
        { "type": "screenshot", "fullPage": true, "quality": 80 }
      ],
      "includeTags": ["h1", "p", "a", ".main-content"],
      "excludeTags": ["#ad", "#footer"],
      "onlyMainContent": false,
      "waitFor": 1000,
      "timeout": 15000,
      "parsers": ["pdf"]
    }'
```

This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to `<h1>`, `<p>`, `<a>`, and `.main-content` while excluding `#ad` and `#footer`, waits 1 second before scraping, sets a 15 second timeout, and enables PDF parsing.

See the full [Scrape API reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape) for details.

## JSON extraction via formats

Use the JSON format object in `formats` to extract structured data in one pass:

<CodeGroup>
  ```python Python theme={null}
  from firecrawl import Firecrawl

  firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

  doc = firecrawl.scrape('https://firecrawl.dev', {
    'formats': [{
      'type': 'json',
      'prompt': 'Extract the features of the product',
      'schema': {
        'type': 'object',
        'properties': { 'features': { 'type': 'object' } },
        'required': ['features']
      }
    }]
  })

  print(doc.json)
  ```

  ```js Node theme={null}
  import { Firecrawl } from 'firecrawl-js';

  const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });

  const doc = await firecrawl.scrape('https://firecrawl.dev', {
    formats: [{
      type: 'json',
      prompt: 'Extract the features of the product',
      schema: {
        type: 'object',
        properties: { features: { type: 'object' } },
        required: ['features']
      }
    }]
  });

  console.log(doc.json);
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.firecrawl.dev/v2/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://firecrawl.dev",
      "formats": [{
        "type": "json",
        "prompt": "Extract the features of the product",
        "schema": {"type": "object", "properties": {"features": {"type": "object"}}, "required": ["features"]}
      }]
    }'
  ```
</CodeGroup>

## Agent endpoint

Use the `/v2/agent` endpoint for autonomous, multi-page data extraction. The agent runs asynchronously: you start a job, then poll for results.

### Agent options

| Parameter               | Type      | Default          | Description                                                                                                                                                               |
| ----------------------- | --------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `prompt`                | `string`  | *(required)*     | Natural-language instructions describing what data to extract (max 10,000 characters).                                                                                    |
| `urls`                  | `array`   | —                | URLs to constrain the agent to.                                                                                                                                           |
| `schema`                | `object`  | —                | JSON schema to structure the extracted data.                                                                                                                              |
| `maxCredits`            | `number`  | `2500`           | Maximum credits the agent can spend. The dashboard supports up to 2,500; for higher limits, set this via the API (values above 2,500 are always billed as paid requests). |
| `strictConstrainToURLs` | `boolean` | `false`          | When `true`, the agent only visits the provided URLs.                                                                                                                     |
| `model`                 | `string`  | `"spark-1-mini"` | AI model to use. `"spark-1-mini"` (default, 60% cheaper) or `"spark-1-pro"` (higher accuracy).                                                                            |

<CodeGroup>
  ```python Python theme={null}
  from firecrawl import Firecrawl

  firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

  # Start agent job
  started = firecrawl.start_agent(
      prompt="Extract the title and description",
      urls=["https://docs.firecrawl.dev"],
      schema={"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]}
  )

  # Poll status
  status = firecrawl.get_agent_status(started["id"])
  print(status.get("status"), status.get("data"))
  ```

  ```js Node theme={null}
  import { Firecrawl } from 'firecrawl';

  const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });

  // Start agent job
  const started = await firecrawl.startAgent({
    prompt: 'Extract the title and description',
    urls: ['https://docs.firecrawl.dev'],
    schema: { type: 'object', properties: { title: { type: 'string' }, description: { type: 'string' } }, required: ['title'] }
  });

  // Poll status
  const status = await firecrawl.getAgentStatus(started.id);
  console.log(status.status, status.data);
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.firecrawl.dev/v2/agent \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "prompt": "Extract the title and description",
      "urls": ["https://docs.firecrawl.dev"],
      "schema": {"type": "object", "properties": {"title": {"type": "string"}, "description": {"type": "string"}}, "required": ["title"]}
    }'
  ```
</CodeGroup>

### Check agent status

Poll `GET /v2/agent/{jobId}` to check progress. The response `status` field will be `"processing"`, `"completed"`, or `"failed"`.

```bash cURL theme={null}
curl -X GET https://api.firecrawl.dev/v2/agent/YOUR-JOB-ID \
  -H 'Authorization: Bearer fc-YOUR-API-KEY'
```

The Python and Node SDKs also provide a convenience method (`firecrawl.agent()`) that starts the job and polls automatically until completion.

## Crawling multiple pages

To crawl multiple pages, use the `/v2/crawl` endpoint. The crawl runs asynchronously and returns a job ID. Use the `limit` parameter to control how many pages are crawled. If omitted, the crawl will process up to 10,000 pages.

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "limit": 10
    }'
```

### Response

```json theme={null}
{ "id": "1234-5678-9101" }
```

### Check crawl job

Use the job ID to check the status of a crawl and retrieve its results.

```bash cURL theme={null}
curl -X GET https://api.firecrawl.dev/v2/crawl/1234-5678-9101 \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY'
```

If the content is larger than 10MB or the crawl job is still running, the response may include a `next` parameter, a URL to the next page of results.

### Crawl prompt and params preview

You can provide a natural-language `prompt` to let Firecrawl derive crawl settings. Preview them first:

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl/params-preview \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://docs.firecrawl.dev",
    "prompt": "Extract docs and blog"
  }'
```

### Crawler options

When using the `/v2/crawl` endpoint, you can customize crawling behavior with the following options.

#### Path filtering

| Parameter        | Type      | Default | Description                                                       |
| ---------------- | --------- | ------- | ----------------------------------------------------------------- |
| `includePaths`   | `array`   | —       | Regex patterns for URLs to include (pathname only by default).    |
| `excludePaths`   | `array`   | —       | Regex patterns for URLs to exclude (pathname only by default).    |
| `regexOnFullURL` | `boolean` | `false` | Match patterns against the full URL instead of just the pathname. |

<Warning>
  The starting URL is also checked against `includePaths`. If it does not match any of the patterns, the crawl may return 0 pages.
</Warning>

#### Crawl scope

| Parameter            | Type         | Default | Description                                                  |
| -------------------- | ------------ | ------- | ------------------------------------------------------------ |
| `maxDiscoveryDepth`  | `integer`    | —       | Max link-depth for discovering new URLs.                     |
| `limit`              | `integer`    | `10000` | Max pages to crawl.                                          |
| `crawlEntireDomain`  | `boolean`    | `false` | Explore siblings and parents to cover the entire domain.     |
| `allowExternalLinks` | `boolean`    | `false` | Follow links to external domains.                            |
| `allowSubdomains`    | `boolean`    | `false` | Follow subdomains of the main domain.                        |
| `delay`              | `number` (s) | —       | Delay between scrapes. Setting this forces concurrency to 1. |

#### Sitemap and deduplication

| Parameter                | Type      | Default     | Description                                                                                             |
| ------------------------ | --------- | ----------- | ------------------------------------------------------------------------------------------------------- |
| `sitemap`                | `string`  | `"include"` | `"include"`: use sitemap + link discovery. `"skip"`: ignore sitemap. `"only"`: crawl only sitemap URLs. |
| `deduplicateSimilarURLs` | `boolean` | `true`      | Normalize URL variants (`www.`, `https`, trailing slashes, `index.html`) as duplicates.                 |
| `ignoreQueryParameters`  | `boolean` | `false`     | Strip query strings before deduplication (e.g. `/page?a=1` and `/page?a=2` become one URL).             |

#### Scrape options for crawl

| Parameter       | Type     | Default                     | Description                                                                  |
| --------------- | -------- | --------------------------- | ---------------------------------------------------------------------------- |
| `scrapeOptions` | `object` | `{ formats: ["markdown"] }` | Per-page scrape config. Accepts all [scrape options](#scrape-options) above. |

### Crawl example

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "includePaths": ["^/blog/.*$", "^/docs/.*$"],
      "excludePaths": ["^/admin/.*$", "^/private/.*$"],
      "maxDiscoveryDepth": 2,
      "limit": 1000
    }'
```

## Mapping website links

The `/v2/map` endpoint identifies URLs related to a given website.

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/map \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev"
    }'
```

### Map options

| Parameter           | Type      | Default     | Description                         |
| ------------------- | --------- | ----------- | ----------------------------------- |
| `search`            | `string`  | —           | Filter links by text match.         |
| `limit`             | `integer` | `100`       | Max links to return.                |
| `sitemap`           | `string`  | `"include"` | `"include"`, `"skip"`, or `"only"`. |
| `includeSubdomains` | `boolean` | `true`      | Include subdomains.                 |

Here is the API Reference for it: [Map Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/map)

## Whitelisting Firecrawl

### Allowing Firecrawl to scrape your website

* **User Agent**: Allow `FirecrawlAgent` in your firewall or security rules.
* **IP addresses**: Firecrawl does not use a fixed set of outbound IPs.

### Allowing your application to call the Firecrawl API

If your firewall blocks outbound requests from your application to external services, you need to whitelist Firecrawl's API server IP address so your application can reach the Firecrawl API (`api.firecrawl.dev`):

* **IP Address**: `35.245.250.27`

Add this IP to your firewall's outbound allowlist so your backend can send scrape, crawl, map, and agent requests to Firecrawl.