抽出 - Firecrawl Docs

LLMでページから構造化データを抽出する

curl --request POST \
  --url https://api.firecrawl.dev/v1/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "urls": [
    "<string>"
  ],
  "prompt": "<string>",
  "schema": {},
  "enableWebSearch": false,
  "ignoreSitemap": false,
  "includeSubdomains": true,
  "showSources": false,
  "scrapeOptions": {
    "onlyMainContent": true,
    "includeTags": [
      "<string>"
    ],
    "excludeTags": [
      "<string>"
    ],
    "maxAge": 0,
    "headers": {},
    "waitFor": 0,
    "mobile": false,
    "skipTlsVerification": false,
    "timeout": 30000,
    "parsePDF": true,
    "jsonOptions": {
      "schema": {},
      "systemPrompt": "<string>",
      "prompt": "<string>"
    },
    "actions": [
      {
        "type": "wait",
        "milliseconds": 2,
        "selector": "#my-element"
      }
    ],
    "location": {
      "country": "US",
      "languages": [
        "en-US"
      ]
    },
    "removeBase64Images": true,
    "blockAds": true,
    "proxy": "basic",
    "storeInCache": true,
    "formats": [
      "markdown"
    ],
    "changeTrackingOptions": {
      "modes": [
        "git-diff"
      ],
      "schema": {},
      "prompt": "<string>",
      "tag": null
    }
  },
  "ignoreInvalidURLs": false
}
'

{
  "success": true,
  "id": "<string>",
  "invalidURLs": [
    "<string>"
  ]
}

POST

extract

LLMでページから構造化データを抽出する

curl --request POST \
  --url https://api.firecrawl.dev/v1/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "urls": [
    "<string>"
  ],
  "prompt": "<string>",
  "schema": {},
  "enableWebSearch": false,
  "ignoreSitemap": false,
  "includeSubdomains": true,
  "showSources": false,
  "scrapeOptions": {
    "onlyMainContent": true,
    "includeTags": [
      "<string>"
    ],
    "excludeTags": [
      "<string>"
    ],
    "maxAge": 0,
    "headers": {},
    "waitFor": 0,
    "mobile": false,
    "skipTlsVerification": false,
    "timeout": 30000,
    "parsePDF": true,
    "jsonOptions": {
      "schema": {},
      "systemPrompt": "<string>",
      "prompt": "<string>"
    },
    "actions": [
      {
        "type": "wait",
        "milliseconds": 2,
        "selector": "#my-element"
      }
    ],
    "location": {
      "country": "US",
      "languages": [
        "en-US"
      ]
    },
    "removeBase64Images": true,
    "blockAds": true,
    "proxy": "basic",
    "storeInCache": true,
    "formats": [
      "markdown"
    ],
    "changeTrackingOptions": {
      "modes": [
        "git-diff"
      ],
      "schema": {},
      "prompt": "<string>",
      "tag": null
    }
  },
  "ignoreInvalidURLs": false
}
'

{
  "success": true,
  "id": "<string>",
  "invalidURLs": [
    "<string>"
  ]
}

注意: 本APIの機能とパフォーマンスが向上した新しい v2 バージョンが利用可能です。

承認

Authorization

string

header

必須

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

ボディ

application/json

urls

string<uri>[]

必須

データを抽出する対象のURLです。URLはglobパターン形式で指定してください。

prompt

string

抽出プロセスを指示するプロンプト

schema

object

抽出されたデータの構造を定義するスキーマ。JSON Schema に準拠している必要があります。

enableWebSearch

boolean

デフォルト:false

true の場合、抽出では Web 検索を使用して追加データを取得します

ignoreSitemap

boolean

デフォルト:false

true の場合、ウェブサイトのスキャン時に sitemap.xml ファイルは無視されます。

includeSubdomains

boolean

デフォルト:true

true の場合、指定した URL のサブドメインもスキャン対象になります

showSources

boolean

デフォルト:false

true の場合、データ抽出に使用されたソースが、レスポンス内の sources キーとして含まれます

scrapeOptions

object

Show child attributes

ignoreInvalidURLs

boolean

デフォルト:false

urls 配列に無効な URL が指定されている場合、それらは無視されます。リクエスト全体が失敗するのではなく、有効な URL のみを使って抽出処理が実行され、無効な URL はレスポンスの invalidURLs フィールドに返されます。

レスポンス

抽出成功

success

boolean

string

invalidURLs

string[] | null

ignoreInvalidURLs が true の場合、これはリクエスト内で指定された無効な URL を含む配列です。無効な URL がなかった場合、この配列は空になります。ignoreInvalidURLs が false の場合、このフィールドは undefined になります。

抽出ステータスの取得

API の使い方

スクレイピングエンドポイント

クローリングエンドポイント

マッピングエンドポイント

検索エンドポイント

抽出エンドポイント

アカウントエンドポイント

抽出

承認

ボディ

レスポンス

API の使い方

スクレイピング エンドポイント

クローリング エンドポイント

マッピング エンドポイント

検索エンドポイント

抽出エンドポイント

アカウント エンドポイント

承認

ボディ

レスポンス

スクレイピングエンドポイント

クローリングエンドポイント

マッピングエンドポイント

アカウントエンドポイント