Introducing /extract (Beta)

Extract structured data from a single, multiple URLs, or entire websites using Large Language Models (LLMs). Our new /extract endpoint allows you to:

  • Extract structured data from full websites at once
  • Connect or build data enrichment applications that need structured data from websites
  • Develop AI applications that need clean data from multiple websites

Considerations

The /extract endpoint provides flexible data extraction with customizable schemas. Results can be improved through prompt tuning. It is currently in beta and we welcome your feedback.

Extracting Data

/extract endpoint

Used to extract structured data from entire websites.

When specifying URLs, you can append /* to the URL to extract information from the entire website path rather than just a single page.

For example, https://firecrawl.dev/* will attempt to extract data from all pages on the firecrawl.dev domain. The /* is still in under testing so please let us know if you have any issues by emailing help@firecrawl.dev.

Usage

For more details about the parameters, refer to the API Reference.

Response (sdks)

JSON
{
  "success": true,
  "data": {
    "company_mission": "Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call.",
    "supports_sso": false,
    "is_open_source": true,
    "is_in_yc": true
  }
}

Response (async or not using sdks)

JSON
{
  "success": true,
  "id": "850eb555-db9c-42b9-9d96-bac1fca8bb23",
  "urlTrace": []
}

Checking extract status

You can use the /extract/ID endpoint to check the status of an extract job.

This endpoint only works for extract jobs that are in progress or extract jobs that have completed recently (within the last 24 hours).

Pending Response

Extract jobs can have one of the following states:

  • completed: The extract job has finished successfully.
  • pending: The extract job is still in progress.
  • failed: The extract job encountered an error and did not complete.
  • cancelled: The extract job was cancelled by the user.
JSON
{
  "success": true,
  "data": [],
  "status": "processing",
  "expiresAt": "2025-01-08T20:58:12.000Z"
}

Completed Response

JSON
{
  "success": true,
  "data": {
      "company_mission": "Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call.",
      "supports_sso": false,
      "is_open_source": true,
      "is_in_yc": true
    },
  "status": "completed",
  "expiresAt": "2025-01-08T20:58:12.000Z"
}

Extracting without schema

You can now extract without a schema by just passing a prompt to the endpoint. The LLM chooses the structure of the data.

If you want to improve the results of the extraction, you can pass an enableWebSearch parameter to the endpoint. This will allow it to attempt to find the data from external links - outside the scope of the provided URLs.

Billing

While /extract is in beta, we are charging 5 credits per URL scraped used to form the final response. This is to prevent abuse. This will be changed in the future.