Scrape
Turn any url into clean data
Firecrawl converts web pages into markdown, ideal for LLM applications.
- It manages complexities: proxies, caching, rate limits, js-blocked content
- Handles dynamic content: dynamic websites, js-rendered sites, PDFs, images
- Outputs clean markdown, structured data, screenshots or html.
For details, see the Scrape Endpoint API Reference.
Scraping a URL with Firecrawl
/scrape endpoint
Used to scrape a URL and get its content.
Installation
Usage
For more details about the parameters, refer to the API Reference.
Response
SDKs will return the data object directly. cURL will return the payload exactly as shown below.
Extract structured data
/scrape (with extract) endpoint
Used to extract structured data from scraped pages.
Output:
Extracting without schema (New)
You can now extract without a schema by just passing a prompt
to the endpoint. The llm chooses the structure of the data.
Output:
Extract object
The extract
object accepts the following parameters:
schema
: The schema to use for the extraction.systemPrompt
: The system prompt to use for the extraction.prompt
: The prompt to use for the extraction without a schema.
Interacting with the page with Actions
Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
Here is an example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot.
It is important to almost always use the wait
action before/after executing other actions to give enough time for the page to load.
Example
Output
For more details about the actions parameters, refer to the API Reference.
Batch scraping multiple URLs
You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.
How it works
It is very similar to how the /crawl
endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape.
The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape.
Usage
Response
If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape.
Synchronous
Asynchronous
You can then use the job ID to check the status of the batch scrape by calling the /batch/scrape/{id}
endpoint. This endpoint is meant to be used while the job is still running or right after it has completed as batch scrape jobs expire after 24 hours.
Location and Language
Specify country and preferred languages to get relevant content based on your target location and language preferences.
How it works
When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to ‘US’ if not specified.
Usage
To use the location and language settings, include the location
object in your request body with the following properties:
country
: ISO 3166-1 alpha-2 country code (e.g., ‘US’, ‘AU’, ‘DE’, ‘JP’). Defaults to ‘US’.languages
: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location.