Webhook 事件类型

负载结构

所有 webhook 事件均采用以下结构：

{
  "success": true,
  "type": "crawl.page",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [...],
  "metadata": {}
}

字段	类型	描述
`success`	boolean	操作是否成功
`type`	string	事件类型（例如 `crawl.page`）
`id`	string	任务 ID
`data`	array	与事件相关的数据（见下方示例）
`metadata`	object	来自你在 webhook 配置中的自定义元数据
`error`	string	错误信息（当 `success` 为 `false` 时）

爬取事件

`crawl.started`

在爬取任务开始处理时发送。

{
  "success": true,
  "type": "crawl.started",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [],
  "metadata": {}
}

`crawl.page`

在爬取过程中，每抓取到一个页面就会发送此事件。data 数组包含页面内容和元数据。

{
  "success": true,
  "type": "crawl.page",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [
    {
      "markdown": "# 页面内容……",
      "metadata": {
        "title": "页面标题",
        "description": "页面说明",
        "url": "https://example.com/page",
        "statusCode": 200,
        "contentType": "text/html",
        "scrapeId": "550e8400-e29b-41d4-a716-446655440001",
        "sourceURL": "https://example.com/page",
        "proxyUsed": "basic",
        "cacheState": "命中",
        "cachedAt": "2025-09-03T21:11:25.636Z",
        "creditsUsed": 1
      }
    }
  ],
  "metadata": {}
}

`crawl.completed`

在爬取任务结束且所有页面都已处理时发送。

{
  "success": true,
  "type": "crawl.completed",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [],
  "metadata": {}
}

批量抓取事件

`batch_scrape.started`

在批量抓取任务开始处理时发送。

{
  "success": true,
  "type": "batch_scrape.started",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [],
  "metadata": {}
}

`batch_scrape.page`

针对批处理中每个被抓取的 URL 发送。data 数组包含页面内容和元数据。

{
  "success": true,
  "type": "batch_scrape.page",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [
    {
      "markdown": "# Page content...",
      "metadata": {
        "title": "Page Title",
        "description": "页面描述",
        "url": "https://example.com",
        "statusCode": 200,
        "contentType": "text/html",
        "scrapeId": "550e8400-e29b-41d4-a716-446655440001",
        "sourceURL": "https://example.com",
        "proxyUsed": "basic",
        "cacheState": "miss",
        "cachedAt": "2025-09-03T23:30:53.434Z",
        "creditsUsed": 1
      }
    }
  ],
  "metadata": {}
}

`batch_scrape.completed`

在批次中的所有 URL 均处理完成后发送。

{
  "success": true,
  "type": "batch_scrape.completed",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [],
  "metadata": {}
}

提取事件

`extract.started`

当提取任务开始处理时发送。

{
  "success": true,
  "type": "extract.started",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [],
  "metadata": {}
}

`extract.completed`

在提取操作成功完成后发送。data 数组包含提取的数据和用量信息。

{
  "success": true,
  "type": "extract.completed",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [
    {
      "success": true,
      "data": { "siteName": "示例网站", "category": "科技" },
      "extractId": "550e8400-e29b-41d4-a716-446655440000",
      "llmUsage": 0.0020118,
      "totalUrlsScraped": 1,
      "sources": {
        "siteName": ["https://example.com"],
        "category": ["https://example.com"]
      }
    }
  ],
  "metadata": {}
}

`extract.failed`

当提取失败时发送。error 字段中包含失败原因。

{
  "success": false,
  "type": "extract.failed",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [],
  "error": "提取数据失败：超时已超出",
  "metadata": {}
}

事件过滤

默认情况下，你会接收到所有事件。若只想订阅特定事件，请在 webhook 配置中通过指定 events 数组：

{
  "url": "https://your-app.com/webhook",
  "events": ["completed", "failed"]
}

如果你只关心任务是否完成，而不需要逐页级更新时，这会很有用。

快速上手

新功能

标准功能

Webhook 回调

开发者指南

使用案例

贡献

事件类型

负载结构

爬取事件

`crawl.started`

`crawl.page`

`crawl.completed`

批量抓取事件

`batch_scrape.started`

`batch_scrape.page`

`batch_scrape.completed`

提取事件

`extract.started`

`extract.completed`

`extract.failed`

事件过滤

快速上手

新功能

标准功能

Webhook 回调

开发者指南

使用案例

贡献

​负载结构

​爬取事件

​crawl.started

​crawl.page

​crawl.completed

​批量抓取事件

​batch_scrape.started

​batch_scrape.page

​batch_scrape.completed

​提取事件

​extract.started

​extract.completed

​extract.failed

​事件过滤

负载结构

爬取事件

`crawl.started`

`crawl.page`

`crawl.completed`

批量抓取事件

`batch_scrape.started`

`batch_scrape.page`

`batch_scrape.completed`

提取事件

`extract.started`

`extract.completed`

`extract.failed`

事件过滤