Extract
Extract structured data from pages using LLMs
Introducing /extract
(Open Beta)
The /extract
endpoint simplifies collecting structured data from any number of URLs or entire domains. Provide a list of URLs, optionally with wildcards (e.g., example.com/*
), and a prompt or schema describing the information you want. Firecrawl handles the details of crawling, parsing, and collating large or small datasets.
Using /extract
You can extract structured data from one or multiple URLs, including wildcards:
- Single Page
Example:https://firecrawl.dev/some-page
- Multiple Pages / Full Domain
Example:https://firecrawl.dev/*
When you use /*
, Firecrawl will automatically crawl and parse all URLs it can discover in that domain, then extract the requested data. This feature is experimental; email help@firecrawl.dev if you have issues.
Example Usage
Key Parameters:
- urls: An array of one or more URLs. Supports wildcards (
/*
) for broader crawling. - prompt (Optional unless no schema): A natural language prompt describing the data you want or specifying how you want that data structured.
- schema (Optional unless no prompt): A more rigid structure if you already know the JSON layout.
- enableWebSearch (Optional): When
true
, extraction can follow links outside the specified domain.
See API Reference for more details.
Response (sdks)
Asynchronous Extraction & Status Checking
When you submit an extraction job—either directly via the API or through the SDK’s asynchronous methods—you’ll receive a Job ID. You can use this ID to:
- Check Job Status: Send a request to the /extract/ endpoint to see if the job is still running or has finished.
- Automatically Poll (Default SDK Behavior): If you use the default extract method (Python/Node), the SDK automatically polls this endpoint for you and returns the final results once the job completes.
- Manually Poll (Async SDK Methods): If you use the asynchronous methods—async_extract (Python) or asyncExtract (Node)—the SDK immediately returns a Job ID that you can track. Use get_extract_status (Python) or getExtractStatus (Node) to check the job’s progress on your own schedule.
This endpoint only works for jobs in progress or recently completed (within 24 hours).
Below are code examples for checking an extraction job’s status using Python, Node.js, and cURL:
Possible States
- completed: The extraction finished successfully.
- pending: Firecrawl is still processing your request.
- failed: An error occurred; data was not fully extracted.
- cancelled: The job was cancelled by the user.
Pending Example
Completed Example
Extracting without a Schema
If you prefer not to define a strict structure, you can simply provide a prompt
. The underlying model will choose a structure for you, which can be useful for more exploratory or flexible requests.
Improving Results with Web Search
Setting enableWebSearch = true
in your request will expand the crawl beyond the provided URL set. This can capture supporting or related information from linked pages.
Here’s an example that extracts information about dash cams, enriching the results with data from related pages:
Example Response with Web Search
The response includes additional context gathered from related pages, providing more comprehensive and accurate information.
Known Limitations (Beta)
-
Large-Scale Site Coverage
Full coverage of massive sites (e.g., “all products on Amazon”) in a single request is not yet supported. -
Complex Logical Queries
Requests like “find every post from 2025” may not reliably return all expected data. More advanced query capabilities are in progress. -
Occasional Inconsistencies
Results might differ across runs, particularly for very large or dynamic sites. Usually it captures core details, but some variation is possible. -
Beta State
Since/extract
is still in Beta, features and performance will continue to evolve. We welcome bug reports and feedback to help us improve.
Billing and Usage Tracking
You can check our the pricing for /extract on the Extract landing page pricing page and monitor usage via the Extract page on the dashboard.
Have feedback or need help? Email help@firecrawl.dev.