LLM Extract
Extract structured data from pages via LLMs
Scrape and extract structured data with Firecrawl
Firecrawl leverages Large Language Models (LLMs) to efficiently extract structured data from web pages. Here’s how:
-
Schema Definition: Define the URL to scrape and the desired data schema using JSON Schema (following OpenAI tool schema). This schema specifies the data structure you expect to extract from the page.
-
Scrape Endpoint: Pass the URL and the schema to the scrape endpoint. Documentation for this endpoint can be found here: Scrape Endpoint Documentation
-
Structured Data Retrieval: Receive the scraped data in the structured format defined by your schema. You can then use this data as needed in your application or for further processing.
This method streamlines data extraction, reducing manual handling and enhancing efficiency.
Extract structured data
/scrape (with extract) endpoint
Used to extract structured data from scraped pages.
Output:
Extracting without schema (New)
You can now extract without a schema by just passing a prompt
to the endpoint. The llm chooses the structure of the data.
Output:
Extract object
The extract
object accepts the following parameters:
schema
: The schema to use for the extraction.systemPrompt
: The system prompt to use for the extraction.prompt
: The prompt to use for the extraction without a schema.