Parse

The /parse endpoint converts local or non-public documents into clean, LLM-ready data. Upload file bytes via multipart/form-data and get back Markdown, JSON, HTML, links, images, or a summary — with reading order and tables preserved.

Turn PDF, DOCX, XLSX, HTML, and more into Markdown or structured JSON
Up to 5x faster parsing via a Rust-based engine
Files up to 50 MB per request
Zero Data Retention support

When to use `/parse`

Use /parse when the source document is a local file or not publicly accessible by URL. If you have a public URL that points to a document, prefer /scrape — it auto-detects the file type from the extension or content type and parses it the same way.

Source	Endpoint
Public URL to a document (e.g. `https://example.com/report.pdf`)	`POST /scrape`
Local file or non-public bytes (PDF, DOCX, XLSX, HTML, …)	`POST /parse`

Parsing

/parse endpoint

Used to upload a file and receive parsed content. The request is multipart/form-data with a required file part and an optional options JSON part. Supported extensions: .html, .htm, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls.

Usage

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

doc = firecrawl.parse("./report.pdf")

print(doc.markdown)

Response

SDKs return the document object directly. cURL returns the JSON payload.

{
  "success": true,
  "data": {
    "markdown": "# Annual Report\n\n...",
    "metadata": {
      "title": "Annual Report",
      "numPages": 42,
      "sourceFile": "report.pdf"
    }
  }
}

Options

/parse accepts a subset of scrape options under the options field. Common settings:

formats: Array of output formats. Defaults to ["markdown"]. Supported: markdown, html, rawHtml, links, images, summary, and json (with a schema or prompt).
onlyMainContent: Only return the main content of the document. Defaults to true.
includeTags / excludeTags: Tag-level inclusion or exclusion (HTML inputs).
timeout: Request timeout in milliseconds. Defaults to 30000, max 300000.
parsers: File-parser controls. For PDFs, set { "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": <int> }.

/parse does not support browser-only options like actions, waitFor, location, mobile, or change tracking.

PDF parser modes

cURL

curl -X POST https://api.firecrawl.dev/v2/parse \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@./scan.pdf' \
  -F 'options={"parsers":[{"type":"pdf","mode":"ocr","maxPages":50}]};type=application/json'

fast: text-only extraction, fastest path.
auto (default): text-first with OCR fallback for image-only pages.
ocr: OCR every page — use for scanned documents.

Structured JSON output

Pass a JSON schema or prompt to extract structured data directly from the document:

cURL

curl -X POST https://api.firecrawl.dev/v2/parse \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@./invoice.pdf' \
  -F 'options={"formats":[{"type":"json","schema":{"type":"object","properties":{"total":{"type":"number"},"vendor":{"type":"string"}}}}]};type=application/json'

Considerations

Maximum file size is 50 MB per request.
Parsing very large or scanned PDFs in ocr mode may take longer — increase timeout or use maxPages to bound the work.
For batches of files, call /parse per file in parallel; there is no batch upload variant.

Are you an AI agent that needs a Firecrawl API key? See firecrawl.dev/agent-onboarding/SKILL.md for automated onboarding instructions.

Get Started

Core Endpoints

More

Quickstarts

Developer Guides

Webhooks

Use Cases

Other

Contributing

When to use `/parse`

Parsing

/parse endpoint

Usage

Response

Options

PDF parser modes

Structured JSON output

Considerations

Get Started

Core Endpoints

More

Quickstarts

Developer Guides

Webhooks

Use Cases

Other

Contributing

Documentation Index

​When to use /parse

​Parsing

​/parse endpoint

​Usage

​Response

​Options

​PDF parser modes

​Structured JSON output

​Considerations

When to use `/parse`

Parsing

/parse endpoint

Usage

Response

Options

PDF parser modes

Structured JSON output

Considerations