v2 API Change: JSON schema extraction is fully supported in v2, but the API format has changed. In v2, the schema is embedded directly inside the format object as
formats: [{type: "json", schema: {...}}]. The v1 jsonOptions parameter no longer exists in v2.Scrape and extract structured data with Firecrawl
Firecrawl uses AI to get structured data from web pages in 3 steps:-
Set the Schema (optional):
Define a JSON schema (using OpenAI’s format) to specify the data you want, or just provide a
promptif you don’t need a strict schema, along with the webpage URL. - Make the Request: Send your URL and schema to our scrape endpoint using JSON mode. See how here: Scrape Endpoint Documentation
- Get Your Data: Get back clean, structured data matching your schema that you can use right away.
Extract structured data
JSON mode via /scrape
Used to extract structured data from scraped pages.JSON
Structured data without schema
You can also extract without a schema by just passing aprompt to the endpoint. The llm chooses the structure of the data.
JSON
Real-world example: Extracting company information
Here’s a comprehensive example extracting structured company information from a website:Output
JSON format options
When using JSON mode in v2, include an object informats with the schema embedded directly:
formats: [{ type: 'json', schema: { ... }, prompt: '...' }]
Parameters:
schema: JSON Schema describing the structured output you want (required for schema-based extraction).prompt: Optional prompt to guide extraction (also used for no-schema extraction).
jsonOptions parameter in v2. The schema must be included directly inside the format object in the formats array.
HTML attributes are not available in JSON extraction. JSON extraction works on the markdown conversion of the page, which only preserves visible text content. HTML attributes (e.g.,
data-id, custom attributes on elements) are stripped during conversion and the LLM cannot see them. If you need to extract HTML attribute values, use rawHtml format and parse attributes client-side, or use an executeJavascript action to inject attribute values into visible text before extraction.Tips for consistent extraction
If you are seeing inconsistent or incomplete results from JSON extraction, these practices can help:- Keep prompts short and focused. Long prompts with many rules increase variability. Move specific constraints (like allowed values) into the schema instead.
- Use concise property names. Avoid embedding instructions or enum lists in property names. Use a short key like
"installation_type"and put allowed values in anenumarray. - Add
enumarrays for constrained fields. When a field has a fixed set of values, list them inenumand make sure they match the exact text shown on the page. - Include null-handling in field descriptions. Add
"Return null if not found on the page."to each field’sdescriptionso the model does not guess missing values. - Add location hints. Tell the model where to find data on the page, e.g.
"Flow rate in GPM from the Specifications table.". - Split large schemas into smaller requests. Schemas with many fields (e.g. 30+) produce less consistent results. Split them into 2–3 requests of 10–15 fields each.

