网站监控

网站监控面向整个网站，而不是固定的一组 URL。每次检查都会针对目标 url 执行一次 crawl，抓取所有发现的页面，并将结果与最近一次保留的快照进行比对。这样就能捕捉到新增、变更和删除的页面，而不只是你已指定页面上的修改。对于文档站点、博客、更新日志、帮助中心和竞争对手网站来说，这是更合适的选择。本页介绍 crawl 目标。调度、目标与判定、变更追踪、通知和定价是所有监控类型共用的。请参见监控概览。

创建网站监控

创建一个使用 crawl 目标的监控，以便在每次检查时比较爬取发现的每个页面的差异：

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

monitor = firecrawl.create_monitor(
    name="Docs monitor",
    schedule={"cron": "7-59/15 * * * *", "timezone": "UTC"},
    goal="Notify me when docs pages add, remove, or materially change API behavior",
    targets=[
        {
            "type": "crawl",
            "url": "https://example.com/docs",
            "crawlOptions": {
                "limit": 100,
                "maxDiscoveryDepth": 3,
            },
        }
    ],
    webhook={
        "url": "https://example.com/webhooks/firecrawl",
        "events": ["monitor.page", "monitor.check.completed"],
    },
)

print(monitor.id)

import Firecrawl from "@mendable/firecrawl-js";

const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });

const monitor = await firecrawl.createMonitor({
  name: "Docs monitor",
  schedule: { cron: "7-59/15 * * * *", timezone: "UTC" },
  webhook: {
    url: "https://example.com/webhooks/firecrawl",
    events: ["monitor.page", "monitor.check.completed"],
  },
  goal: "Notify me when docs pages add, remove, or materially change API behavior",
  targets: [
    {
      type: "crawl",
      url: "https://example.com/docs",
      crawlOptions: {
        limit: 100,
        maxDiscoveryDepth: 3,
      },
    },
  ],
});

console.log(monitor.id);

curl -s -X POST "https://api.firecrawl.dev/v2/monitor" \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Docs monitor",
    "schedule": {
      "cron": "7-59/15 * * * *",
      "timezone": "UTC"
    },
    "webhook": {
      "url": "https://example.com/webhooks/firecrawl",
      "events": ["monitor.page", "monitor.check.completed"]
    },
    "goal": "Notify me when docs pages add, remove, or materially change API behavior",
    "targets": [
      {
        "type": "crawl",
        "url": "https://example.com/docs",
        "crawlOptions": {
          "limit": 100,
          "maxDiscoveryDepth": 3
        }
      }
    ]
  }'

爬取目标

crawl 目标需要 type 和单个 url。使用 crawlOptions 配置爬取行为，使用 scrapeOptions 配置如何抓取每个发现的页面：

Crawl target

{
  "type": "crawl",
  "url": "https://example.com/docs",
  "crawlOptions": {
    "limit": 100,
    "includePaths": ["/docs"]
  },
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}

常见的 crawlOptions 字段：

limit：单次检查最多爬取的页面数。
maxDiscoveryDepth：从起始 url 出发，最多沿链接发现多少层页面。
maxDepth：最大爬取深度。
includePaths：仅监控与这些路径模式匹配的 URL (例如 /docs) 。
excludePaths：跳过与这些路径模式匹配的 URL。

与页面监控一样，由监控触发的抓取默认会将 maxAge 设为 0，因此除非你在 scrapeOptions 中设置了不同的 maxAge，否则每次检查都会重新抓取已发现的页面。

每项检查报告的内容

爬取检查会将每个发现的页面与上一次检查结果进行比对，并记录每个页面的状态：

same：该页面再次被发现，且未发生变化。
changed：该页面再次被发现，且发生了变化。
new：该页面为首次发现。
removed：上一次检查中发现的某个页面此次未再发现。
error：该页面无法检查。

如需针对已爬取页面中的特定结构化字段触发告警，请在 scrapeOptions 中添加 changeTracking 格式。请参见变更追踪。

共享配置

计划: cron 或自然语言频率，最短 5 分钟。
目标与判定: 仅在出现有意义的变更时告警。
通知: 通过 webhook 和 email 发送。
检查结果: 查看每次检查及各页面的差异。
计费: 每次检查中，每个发现的页面消耗 1 个额度，另加可选判定。

快速上手

核心端点

更多

快速入门

开发者指南

Webhooks

使用场景

其他

参与贡献

创建网站监控

爬取目标

每项检查报告的内容

共享配置

​创建网站监控

​爬取 目标

​每项检查报告的内容

​共享配置

创建网站监控

爬取目标

每项检查报告的内容

共享配置