Supported Document Formats
Firecrawl currently supports the following document formats:-
Excel Spreadsheets (
.xlsx
,.xls
)- Each worksheet is converted to an HTML table
- Worksheets are separated by H2 headings with the sheet name
- Preserves cell formatting and data types
-
Word Documents (
.docx
,.doc
,.odt
,.rtf
)- Extracts text content while preserving document structure
- Maintains headings, paragraphs, lists, and tables
- Preserves basic formatting and styling
-
PDF Documents (
.pdf
)- Extracts text content with layout information
- Preserves document structure including sections and paragraphs
- Handles both text-based and scanned PDFs (with OCR support)
- Priced at 1 credit per-page. See Pricing for details.
How to Use Document Parsing
Document parsing in Firecrawl works automatically when you provide a URL that points to a supported document type. The system will detect the file type based on the URL extension or content-type header and process it accordingly.Example: Scraping an Excel File
Node
Example: Scraping a Word Document
Node