> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse

> Upload a local or non-public document and convert it into clean, LLM-ready data

The `/parse` endpoint converts local or non-public documents into clean, LLM-ready data. Upload file bytes via `multipart/form-data` and get back Markdown, JSON, HTML, links, images, or a summary — with reading order and tables preserved.

* Turn PDF, DOCX, XLSX, HTML, and more into Markdown or structured JSON
* Up to **5x faster** parsing via a Rust-based engine
* Files up to **50 MB** per request
* Zero Data Retention support

## When to use `/parse`

Use `/parse` when the source document is **a local file** or **not publicly accessible by URL**. If you have a public URL that points to a document, prefer [`/scrape`](/features/scrape) — it auto-detects the file type from the extension or content type and parses it the same way.

| Source                                                           | Endpoint                                         |
| ---------------------------------------------------------------- | ------------------------------------------------ |
| Public URL to a document (e.g. `https://example.com/report.pdf`) | [`POST /scrape`](/api-reference/endpoint/scrape) |
| Local file or non-public bytes (PDF, DOCX, XLSX, HTML, ...)      | [`POST /parse`](/api-reference/endpoint/parse)   |

<Tip>
  **Using Firecrawl through MCP?** Use `firecrawl_parse` for local files. Local MCP can read the file directly when configured with `FIRECRAWL_API_URL`. Remote hosted MCP returns a short-lived upload command first, then parses the returned `uploadRef`. Public document URLs should still use `/scrape`.
</Tip>

<Tip>
  **Have a public PDF URL?** Use `/scrape` instead -- it auto-detects document types and returns clean markdown. No need for a separate PDF library.

  ```python Python theme={null}
  doc = firecrawl.scrape("https://example.com/report.pdf", formats=["markdown"])
  ```
</Tip>

## Parsing

### /parse endpoint

Used to upload a file and receive parsed content. The request is `multipart/form-data` with a required `file` part and an optional `options` JSON part.

**Supported extensions:** `.html`, `.htm`, `.pdf`, `.docx`, `.doc`, `.odt`, `.rtf`, `.xlsx`, `.xls`.

### Usage

<CodeGroup>
  ```python Python theme={null}
  from firecrawl import Firecrawl

  firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

  doc = firecrawl.parse("./report.pdf")

  print(doc.markdown)
  ```

  ```javascript Node theme={null}
  import { Firecrawl } from "firecrawl";
  import fs from "node:fs";

  const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });

  const doc = await firecrawl.parse({
    data: fs.readFileSync("./report.pdf"),
    filename: "report.pdf",
  });

  console.log(doc.markdown);
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.firecrawl.dev/v2/parse \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -F 'file=@./report.pdf' \
    -F 'options={"formats":["markdown"]};type=application/json'
  ```
</CodeGroup>

### Response

SDKs return the document object directly. cURL returns the JSON payload.

```json theme={null}
{
  "success": true,
  "data": {
    "markdown": "# Annual Report\n\n...",
    "metadata": {
      "title": "Annual Report",
      "numPages": 42,
      "sourceFile": "report.pdf"
    }
  }
}
```

## Options

`/parse` accepts a subset of scrape options under the `options` field. Common settings:

* `formats`: Array of output formats. Defaults to `["markdown"]`. Supported: `markdown`, `html`, `rawHtml`, `links`, `images`, `summary`, and `json` (with a schema or prompt).
* `onlyMainContent`: Only return the main content of the document. Defaults to `true`.
* `includeTags` / `excludeTags`: Tag-level inclusion or exclusion (HTML inputs).
* `redactPII`: Redact personally identifiable information from returned markdown.
* `timeout`: Request timeout in milliseconds. Defaults to `30000`, max `300000`.
* `parsers`: File-parser controls. For PDFs, set `{ "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": <int> }`.

<Note>
  `/parse` does not support browser-only options like `actions`, `waitFor`, `location`, `mobile`, or change tracking.
</Note>

### PDF parser modes

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/parse \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@./scan.pdf' \
  -F 'options={"parsers":[{"type":"pdf","mode":"ocr","maxPages":50}]};type=application/json'
```

* `fast`: text-only extraction, fastest path.
* `auto` (default): text-first with OCR fallback for image-only pages.
* `ocr`: OCR every page — use for scanned documents.

### Structured JSON output

Pass a JSON schema or prompt to extract structured data directly from the document:

```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/parse \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@./invoice.pdf' \
  -F 'options={"formats":[{"type":"json","schema":{"type":"object","properties":{"total":{"type":"number"},"vendor":{"type":"string"}}}}]};type=application/json'
```

## Considerations

* Maximum file size is **50 MB** per request.
* Parsing very large or scanned PDFs in `ocr` mode may take longer — increase `timeout` or use `maxPages` to bound the work.
* For batches of files, call `/parse` per file in parallel; there is no batch upload variant.

> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.