Extract - Firecrawl Docs

curl --request POST \ --url https://api.firecrawl.dev/v2/extract \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "urls": [ "<string>" ], "enableWebSearch": false, "ignoreInvalidURLs": true, "ignoreSitemap": false, "includeSubdomains": true, "prompt": "<string>", "schema": {}, "scrapeOptions": { "actions": [ { "milliseconds": 2, "type": "wait" } ], "blockAds": true, "excludeTags": [ "<string>" ], "formats": [ "markdown" ], "headers": {}, "includeTags": [ "<string>" ], "location": { "country": "US", "languages": [ "en-US" ] }, "lockdown": false, "maxAge": 172800000, "minAge": 123, "mobile": false, "onlyCleanContent": false, "onlyMainContent": true, "parsers": [ "pdf" ], "proxy": "auto", "redactPII": false, "removeBase64Images": true, "skipTlsVerification": true, "storeInCache": true, "timeout": 60000, "waitFor": 0 }, "showSources": false } '

授权

Authorization

string

header

必填

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

请求体

application/json

urls

string<uri>[]

必填

要从中提取数据的 URL。URL 应为 glob 格式。

enableWebSearch

boolean

默认值:false

为 true 时，提取过程将通过 Web 搜索获取更多数据

ignoreInvalidURLs

boolean

默认值:true

如果在 urls 数组中指定了无效的 URL，这些 URL 将被忽略。请求不会因此整体失败，系统会改为使用剩余的有效 URL 执行提取操作，并在响应的 invalidURLs 字段中返回这些无效的 URL。

ignoreSitemap

boolean

默认值:false

如果为 true，则在网站扫描过程中会忽略 sitemap.xml 文件

includeSubdomains

boolean

默认值:true

设为 true 时，也会扫描所提供 URL 的子域名

prompt

string

用于引导提取过程的提示词

schema

object

用于定义提取后数据结构的 Schema。必须符合 JSON Schema 标准。

scrapeOptions

object

Show child attributes

showSources

boolean

默认值:false

当为 true 时，用于提取数据的来源将作为 sources 键包含在响应中

响应

提取成功

string

invalidURLs

string[] | null

如果 ignoreInvalidURLs 为 true，则该字段将是一个数组，包含请求中指定的无效 URL。若没有无效 URL，则该数组为空。如果 ignoreInvalidURLs 为 false，则该字段为 undefined。

success

boolean