Crawl
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
The base URL to start crawling from
URL patterns to include
URL patterns to exclude
Generate alt text for images using LLMs (must have a paid plan)
If true, returns only the URLs as a list on the crawl status. Attention: the return response will be a list of URLs inside the data, not a list of documents.
Maximum depth to crawl. Depth 1 is the base URL, depth 2 is the base URL and its direct children, and so on.
The crawling mode to use. Fast mode crawls 4x faster websites without sitemap, but may not be as accurate and shouldn't be used in heavy js-rendered websites.
Maximum number of pages to crawl
Only return the main content of the page excluding headers, navs, footers, etc.
Include the raw HTML content of the page. Will output a html key in the response.
curl --request POST \
--url https://api.firecrawl.dev/v0/crawl \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"url": "<string>",
"crawlerOptions": {
"includes": [
"<string>"
],
"excludes": [
"<string>"
],
"generateImgAltText": true,
"returnOnlyUrls": true,
"maxDepth": 123,
"mode": "default",
"limit": 123
},
"pageOptions": {
"onlyMainContent": true,
"includeHtml": true
}
}'
{
"jobId": "<string>"
}
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The base URL to start crawling from
URL patterns to include
URL patterns to exclude
Generate alt text for images using LLMs (must have a paid plan)
If true, returns only the URLs as a list on the crawl status. Attention: the return response will be a list of URLs inside the data, not a list of documents.
Maximum depth to crawl. Depth 1 is the base URL, depth 2 is the base URL and its direct children, and so on.
The crawling mode to use. Fast mode crawls 4x faster websites without sitemap, but may not be as accurate and shouldn't be used in heavy js-rendered websites.
default
, fast
Maximum number of pages to crawl
Only return the main content of the page excluding headers, navs, footers, etc.
Include the raw HTML content of the page. Will output a html key in the response.
Response
curl --request POST \
--url https://api.firecrawl.dev/v0/crawl \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"url": "<string>",
"crawlerOptions": {
"includes": [
"<string>"
],
"excludes": [
"<string>"
],
"generateImgAltText": true,
"returnOnlyUrls": true,
"maxDepth": 123,
"mode": "default",
"limit": 123
},
"pageOptions": {
"onlyMainContent": true,
"includeHtml": true
}
}'
{
"jobId": "<string>"
}