Welcome to V1

Firecrawl V1 is here! With that we introduce a more reliable and developer friendly API. Here is what’s new:

Output Formats for /scrape. Choose what formats you want your output in.
New /map endpoint for getting most of the URLs of a webpage.
Developer friendly API for /crawl/{id} status.
2x Rate Limits for all plans.
Go SDK and Rust SDK
Teams support
API Key Management in the dashboard.
onlyMainContent is now default to true.
/crawl webhooks and websocket support.

Scrape Formats

You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are:

Markdown (markdown)
HTML (html)
Raw HTML (rawHtml) (with no modifications)
Screenshot (screenshot or screenshot@fullPage)
Links (links)
JSON (json) - structured output

Output keys will match the format you choose.

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-YOUR_API_KEY")

# Scrape a website:
scrape_result = app.scrape_url('firecrawl.dev', formats=['markdown', 'html'])
print(scrape_result)

Response

SDKs will return the data object directly. cURL will return the payload exactly as shown below.

{
  "success": true,
  "data" : {
    "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...",
    "html": "<!DOCTYPE html><html lang=\"en\" class=\"light\" style=\"color-scheme: light;\"><body class=\"__variable_36bd41 __variable_d7dc5d font-inter ...",
    "metadata": {
      "title": "Home - Firecrawl",
      "description": "Firecrawl crawls and converts any website into clean markdown.",
      "language": "en",
      "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain",
      "robots": "follow, index",
      "ogTitle": "Firecrawl",
      "ogDescription": "Turn any website into LLM-ready data.",
      "ogUrl": "https://www.firecrawl.dev/",
      "ogImage": "https://www.firecrawl.dev/og.png?123",
      "ogLocaleAlternate": [],
      "ogSiteName": "Firecrawl",
      "sourceURL": "https://firecrawl.dev",
      "statusCode": 200
    }
  }
}

Introducing /map (Alpha)

The easiest way to go from a single url to a map of the entire website.

Usage

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-YOUR_API_KEY")

# Map a website:
map_result = app.map_url('https://firecrawl.dev')
print(map_result)

Response

SDKs will return the data object directly. cURL will return the payload exactly as shown below.

{
  "status": "success",
  "links": [
    "https://firecrawl.dev",
    "https://www.firecrawl.dev/pricing",
    "https://www.firecrawl.dev/blog",
    "https://www.firecrawl.dev/playground",
    "https://www.firecrawl.dev/smart-crawl",
    ...
  ]
}

WebSockets

To crawl a website with WebSockets, use the Crawl URL and Watch method.

# inside an async function...
nest_asyncio.apply()

# Define event handlers
def on_document(detail):
    print("DOC", detail)

def on_error(detail):
    print("ERR", detail['error'])

def on_done(detail):
    print("DONE", detail['status'])

    # Function to start the crawl and watch process
async def start_crawl_and_watch():
    # Initiate the crawl job and get the watcher
    watcher = app.crawl_url_and_watch('firecrawl.dev', limit=5)

    # Add event listeners
    watcher.add_event_listener("document", on_document)
    watcher.add_event_listener("error", on_error)
    watcher.add_event_listener("done", on_done)

    # Start the watcher
    await watcher.connect()

# Run the event loop
await start_crawl_and_watch()

JSON format

LLM extraction is now available in v1 under the json format. To extract structured from a page, you can pass a schema to the endpoint or just provide a prompt.

from firecrawl import JsonConfig, FirecrawlApp
from pydantic import BaseModel
app = FirecrawlApp(api_key="<YOUR_API_KEY>")

class ExtractSchema(BaseModel):
    company_mission: str
    supports_sso: bool
    is_open_source: bool
    is_in_yc: bool

json_config = JsonConfig(
    schema=ExtractSchema
)

llm_extraction_result = app.scrape_url(
    'https://firecrawl.dev',
    formats=["json"],
    json_options=json_config,
    only_main_content=False,
    timeout=120000
)

print(llm_extraction_result.json)

Output:

JSON

{
    "success": true,
    "data": {
      "json": {
        "company_mission": "AI-powered web scraping and data extraction",
        "supports_sso": true,
        "is_open_source": true,
        "is_in_yc": true
      },
      "metadata": {
        "title": "Firecrawl",
        "description": "AI-powered web scraping and data extraction",
        "robots": "follow, index",
        "ogTitle": "Firecrawl",
        "ogDescription": "AI-powered web scraping and data extraction",
        "ogUrl": "https://firecrawl.dev/",
        "ogImage": "https://firecrawl.dev/og.png",
        "ogLocaleAlternate": [],
        "ogSiteName": "Firecrawl",
        "sourceURL": "https://firecrawl.dev/"
      },
    }
}

Extracting without schema (New)

You can now extract without a schema by just passing a prompt to the endpoint. The llm chooses the structure of the data.

curl -X POST https://api.firecrawl.dev/v1/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev/",
      "formats": ["json"],
      "jsonOptions": {
        "prompt": "Extract the company mission from the page."
      }
    }'

Output:

JSON

{
    "success": true,
    "data": {
      "json": {
        "company_mission": "AI-powered web scraping and data extraction",
      },
      "metadata": {
        "title": "Firecrawl",
        "description": "AI-powered web scraping and data extraction",
        "robots": "follow, index",
        "ogTitle": "Firecrawl",
        "ogDescription": "AI-powered web scraping and data extraction",
        "ogUrl": "https://firecrawl.dev/",
        "ogImage": "https://firecrawl.dev/og.png",
        "ogLocaleAlternate": [],
        "ogSiteName": "Firecrawl",
        "sourceURL": "https://firecrawl.dev/"
      },
    }
}

New Crawl Webhook

You can now pass a webhook parameter to the /crawl endpoint. This will send a POST request to the URL you specify when the crawl is started, updated and completed. The webhook will now trigger for every page crawled and not just the whole result at the end.

cURL

curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "limit": 100,
      "webhook": {
        "url": "https://your-domain.com/webhook",
        "metadata": {
          "any_key": "any_value"
        },
        "events": ["started", "page", "completed"]
      }
    }'

Webhook Events

There are now 4 types of events:

crawl.started - Triggered when the crawl is started.
crawl.page - Triggered for every page crawled.
crawl.completed - Triggered when the crawl is completed to let you know it’s done.
crawl.failed - Triggered when the crawl fails.

Webhook Response

success - If the webhook was successful in crawling the page correctly.
type - The type of event that occurred.
id - The ID of the crawl.
data - The data that was scraped (Array). This will only be non empty on crawl.page and will contain 1 item if the page was scraped successfully. The response is the same as the /scrape endpoint.
error - If the webhook failed, this will contain the error message.

Migrating from V0

⚠️ Deprecation Notice: V0 endpoints will be deprecated on April 1st, 2025. Please migrate to V1 endpoints before then to ensure uninterrupted service.

/scrape endpoint

The updated /scrape endpoint has been redesigned for enhanced reliability and ease of use. The structure of the new /scrape request body is as follows:

{
  "url": "<string>",
  "formats": ["markdown", "html", "rawHtml", "links", "screenshot", "json"],
  "includeTags": ["<string>"],
  "excludeTags": ["<string>"],
  "headers": { "<key>": "<value>" },
  "waitFor": 123,
  "timeout": 123
}

Formats

You can now choose what formats you want your output in. You can specify multiple output formats. Supported formats are:

Markdown (markdown)
HTML (html)
Raw HTML (rawHtml) (with no modifications)
Screenshot (screenshot or screenshot@fullPage)
Links (links)
JSON (json)

By default, the output will be include only the markdown format.

Details on the new request body

The table below outlines the changes to the request body parameters for the /scrape endpoint in V1.

Parameter	Change	Description
`onlyIncludeTags`	Moved and Renamed	Moved to root level. And renamed to `includeTags`.
`removeTags`	Moved and Renamed	Moved to root level. And renamed to `excludeTags`.
`onlyMainContent`	Moved	Moved to root level. `true` by default.
`waitFor`	Moved	Moved to root level.
`headers`	Moved	Moved to root level.
`parsePDF`	Moved	Moved to root level.
`extractorOptions`	No Change
`timeout`	No Change
`pageOptions`	Removed	No need for `pageOptions` parameter. The scrape options were moved to root level.
`replaceAllPathsWithAbsolutePaths`	Removed	`replaceAllPathsWithAbsolutePaths` is not needed anymore. Every path is now default to absolute path.
`includeHtml`	Removed	add `"html"` to `formats` instead.
`includeRawHtml`	Removed	add `"rawHtml"` to `formats` instead.
`screenshot`	Removed	add `"screenshot"` to `formats` instead.
`fullPageScreenshot`	Removed	add `"screenshot@fullPage"` to `formats` instead.
`extractorOptions`	Removed	Use `"json"` format instead with `jsonOptions` object.

The new json format is described in the llm-extract section.

/crawl endpoint

We’ve also updated the /crawl endpoint on v1. Check out the improved body request below:

{
  "url": "<string>",
  "excludePaths": ["<string>"],
  "includePaths": ["<string>"],
  "maxDepth": 2,
  "ignoreSitemap": true,
  "limit": 10,
  "allowBackwardLinks": true,
  "allowExternalLinks": true,
  "scrapeOptions": {
    // same options as in /scrape
    "formats": ["markdown", "html", "rawHtml", "screenshot", "links"],
    "headers": { "<key>": "<value>" },
    "includeTags": ["<string>"],
    "excludeTags": ["<string>"],
    "onlyMainContent": true,
    "waitFor": 123
  }
}

Details on the new request body

The table below outlines the changes to the request body parameters for the /crawl endpoint in V1.

Parameter	Change	Description
`pageOptions`	Renamed	Renamed to `scrapeOptions`.
`includes`	Moved and Renamed	Moved to root level. Renamed to `includePaths`.
`excludes`	Moved and Renamed	Moved to root level. Renamed to `excludePaths`.
`allowBackwardCrawling`	Moved and Renamed	Moved to root level. Renamed to `allowBackwardLinks`.
`allowExternalLinks`	Moved	Moved to root level.
`maxDepth`	Moved	Moved to root level.
`ignoreSitemap`	Moved	Moved to root level.
`limit`	Moved	Moved to root level.
`crawlerOptions`	Removed	No need for `crawlerOptions` parameter. The crawl options were moved to root level.
`timeout`	Removed	Use `timeout` in `scrapeOptions` instead.

Get Started

Standard Features

Agentic Features

Webhooks

Developer Guides

Use Cases

Contributing

Scrape Formats

Response

Introducing /map (Alpha)

Usage

Response

WebSockets

JSON format

Extracting without schema (New)

New Crawl Webhook

Webhook Events

Webhook Response

Migrating from V0

/scrape endpoint

Formats

Details on the new request body

/crawl endpoint

Details on the new request body

Get Started

Standard Features

Agentic Features

Webhooks

Developer Guides

Use Cases

Contributing

​Scrape Formats

​Response

​Introducing /map (Alpha)

​Usage

​Response

​WebSockets

​JSON format

​Extracting without schema (New)

​New Crawl Webhook

​Webhook Events

​Webhook Response

​Migrating from V0

​/scrape endpoint

​Formats

​Details on the new request body

​/crawl endpoint

​Details on the new request body

Scrape Formats

Response

Introducing /map (Alpha)

Usage

Response

WebSockets

JSON format

Extracting without schema (New)

New Crawl Webhook

Webhook Events

Webhook Response

Migrating from V0

/scrape endpoint

Formats

Details on the new request body

/crawl endpoint

Details on the new request body