跳转到主要内容

Documentation Index

Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt

Use this file to discover all available pages before exploring further.

Firecrawl 监控会定期执行抓取或爬取,并将每次结果与上一次保留的快照进行比较。你可以使用监控来跟踪产品页面、文档、博客、更新日志、竞争对手网站,或任何变化值得关注的页面。 每次检查都会将页面级结果记录为 samenewchangedremovederror。你可以在每个受监控页面处理完成时接收 Webhook,也可以在每次检查完成后接收一个 Webhook;还可以在发生更改或错误时接收电子邮件摘要,或组合使用这些通知方式。
监控功能依赖保留的快照和差异比较。不适用于启用 零数据保留 的团队。

创建监控

为一个或多个指定 URL 创建抓取监控:
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

monitor = firecrawl.create_monitor(
    name="Hacker News AI monitor",
    schedule={"text": "every 30 minutes", "timezone": "UTC"},
    goal=(
        "Alert when a new Hacker News story related to AI enters the top 10. "
        "Ignore changes to stories that are not about AI. "
        "Do not alert on changes outside the top 10."
    ),
    targets=[
        {
            "type": "scrape",
            "urls": ["https://news.ycombinator.com"],
        }
    ],
    notification={
        "email": {
            "enabled": True,
            "recipients": ["alerts@example.com"],
            "includeDiffs": True,
        }
    },
)

print(monitor.id)
创建爬取监控,以便在每次检查时,对爬取过程中发现的每个页面进行差异比对:
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

monitor = firecrawl.create_monitor(
    name="Docs monitor",
    schedule={"cron": "7-59/15 * * * *", "timezone": "UTC"},
    goal="Notify me when docs pages add, remove, or materially change API behavior",
    targets=[
        {
            "type": "crawl",
            "url": "https://example.com/docs",
            "crawlOptions": {
                "limit": 100,
                "maxDiscoveryDepth": 3,
            },
        }
    ],
    webhook={
        "url": "https://example.com/webhooks/firecrawl",
        "events": ["monitor.page", "monitor.check.completed"],
    },
)

print(monitor.id)
每次创建调用都会返回新创建的监控,其中包含规范化后的 cron、计算得出的 nextRunAtestimatedCreditsPerMonth。启用判定后,estimatedCreditsPerMonth 是一个上限估算值,因为判定额度只会对实际发生变化且被判定的页面计费:
Response
{
  "success": true,
  "data": {
    "id": "019df960-06e7-7383-9d89-82c0113dc31a",
    "name": "Hacker News AI monitor",
    "status": "active",
    "schedule": {
      "cron": "*/30 * * * *",
      "timezone": "UTC"
    },
    "nextRunAt": "2026-05-17T16:00:00.000Z",
    "lastRunAt": null,
    "currentCheckId": null,
    "goal": "Alert when a new Hacker News story related to AI enters the top 10. Ignore changes to stories that are not about AI. Do not alert on changes outside the top 10.",
    "judgeEnabled": true,
    "targets": [
      {
        "id": "019df960-09bb-7c11-8001-1f12f50ab1c2",
        "type": "scrape",
        "urls": ["https://news.ycombinator.com"]
      }
    ],
    "webhook": null,
    "notification": {
      "email": {
        "enabled": true,
        "recipients": ["alerts@example.com"],
        "includeDiffs": true
      }
    },
    "retentionDays": 30,
    "estimatedCreditsPerMonth": 2880,
    "lastCheckSummary": null,
    "createdAt": "2026-05-17T15:30:00.000Z",
    "updatedAt": "2026-05-17T15:30:00.000Z"
  }
}
你也可以通过 Firecrawl CLI 创建监控:
CLI
firecrawl monitor create --name "Hacker News AI" \
  --schedule "every 30 minutes" \
  --goal "Alert when a new Hacker News story related to AI enters the top 10. Ignore changes to stories that are not about AI. Do not alert on changes outside the top 10." \
  --page https://news.ycombinator.com

目标与判定

如果你只想在发生有意义的变化时收到告警,请添加一个通俗易懂的 goal。如果设置了 goal 但省略了 judgeEnabled,Firecrawl 会自动启用判定。判定会在已发生变化的页面上运行,并返回一个 judgment,其中包含 meaningfulconfidencereasonmeaningfulChanges 如果你想先保存目标、暂时不对变化进行判定,请使用 judgeEnabled: false。只有当监控同时设置了 judgeEnabledgoal 非空时,判定器才会运行。
每次检查都会对底层抓取或爬取照常计费。如果启用了判定,判定器还会为其验证的每个已发生变化的页面额外收取 1 个额度。没有已发生变化页面的检查不会消耗判定额度。
好的目标应简短且明确:说明什么情况应触发告警,重申任何范围限制,例如前 N 项、价格、角色类型、公司、地区、主题、状态或实体,并且仅在排除条件本身属于目标意图时才写入。如果目标范围较宽泛,就保持宽泛;例如,“任何变化” 不应再加入会掩盖变化的噪声过滤条件。 例如,一个具有以下目标的监控:
当与 AI 相关的新 Hacker News 故事进入前 10 时发出告警。忽略与 AI 无关的故事的变动。不对前 10 以外的变动发出告警。
当匹配的故事进入监控范围时,可能会生成如下这样的 monitor.page Webhook:
monitor.page
{
  "success": true,
  "type": "monitor.page",
  "id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
  "webhookId": "f1e2d3c4-0000-0000-0000-000000000000",
  "data": [
    {
      "monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
      "checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
      "url": "https://news.ycombinator.com",
      "status": "changed",
      "previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
      "currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
      "error": null,
      "isMeaningful": true,
      "judgment": {
        "meaningful": true,
        "confidence": "high",
        "reason": "A new AI-related story entered the Hacker News top 10.",
        "meaningfulChanges": [
          {
            "type": "added",
            "after": "4. Show HN: Open-source AI coding assistant",
            "reason": "This is a new AI-related story inside the top 10."
          }
        ]
      },
      "diff": {
        "text": "--- previous\n+++ current\n@@ -1,5 +1,6 @@\n # Hacker News\n 1. Database internals for beginners\n 2. A new approach to CSS\n 3. Building reliable queues\n+4. Show HN: Open-source AI coding assistant\n"
      }
    }
  ],
  "metadata": {
    "environment": "production"
  }
}

调度计划

调度计划可以使用 cron 表达式或简单的自然语言文本来设置。
{
  "schedule": {
    "cron": "*/30 * * * *",
    "timezone": "UTC"
  }
}
支持的自然语言示例:
  • every 30 minutes
  • every 15 minutes starting at :07
  • hourly
  • every 2 hours
  • daily
  • daily at 9:00
  • daily at 9am
  • daily at 5:30 PM
  • weekly
最短间隔为 15 分钟。API 响应始终返回规范化后的 cron 表达式。对于文本调度计划,timezone 用于控制像 daily at 9am 这类短语何时运行。文本调度计划在转换为 cron 之前,会先按监控 ID 错开分布,因此多个监控不会在同一时刻同时运行。

目标

监控支持两种目标类型:
  • scrape:对 urls 中的每个 URL 执行一次抓取。
  • crawl:每次检查时,对 url 执行一次完整爬取,然后对所有发现的页面进行差异比对。
每个监控可接受 1 到 50 个目标。retentionDays 的默认值为 30,最大可设置为 365 目标的抓取选项会传递到底层抓取任务。由监控触发的抓取默认将 maxAge 设为 0,因此除非你显式设置其他 maxAge,否则每次检查都会执行一次新的抓取。
Scrape target
{
  "type": "scrape",
  "urls": ["https://example.com/pricing"],
  "scrapeOptions": {
    "formats": ["markdown"],
    "maxAge": 0
  }
}
对于爬取目标,使用 crawlOptions 控制爬取行为,并使用 scrapeOptions 控制每个页面的抓取:
Crawl target
{
  "type": "crawl",
  "url": "https://example.com/docs",
  "crawlOptions": {
    "limit": 100,
    "includePaths": ["/docs"]
  },
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}

变更追踪

默认情况下,Firecrawl 会比较每个页面的 markdown 差异,并报告 samechangednewremovederror。如果你想检测特定结构化字段 (如价格、标题、库存状态标记、列表中的条目等) 的变化,请在目标的 scrapeOptions 中添加 changeTracking 格式,并设置 modes: ["json"],以启用 JSON 模式的变更追踪。

Markdown 模式 (默认)

scrapeOptions.formats 仅为 ["markdown"] 时,检查 响应中的每个已变更页面都会附带统一文本 diff,以及一个 parseDiff 风格的 AST:
Markdown-mode diff
{
  "diff": {
    "text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Pricing\n-Starter — $19/mo\n+Starter — $24/mo\n",
    "json": {
      "files": [
        {
          "from": "previous",
          "to": "current",
          "chunks": [
            {
              "content": "@@ -1,3 +1,3 @@",
              "changes": []
            }
          ]
        }
      ]
    }
  }
}

JSON 模式

传入 changeTracking 格式并设置 modes: ["json"],同时提供一个 JSON schema (或用于描述你关注字段的 prompt) 。Firecrawl 会在每次检查时提取该 JSON,并输出一个按字段区分的差异,以字段路径作为键;此外还会附带一个包含当前完整提取结果的 snapshot.json,这样消费方就无需重新获取底层 抓取 结果。
from firecrawl import Firecrawl
from pydantic import BaseModel
from typing import List

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")


class Plan(BaseModel):
    name: str
    price: str
    features: List[str]


class Pricing(BaseModel):
    plans: List[Plan]


monitor = firecrawl.create_monitor(
    name="Pricing monitor",
    schedule={"text": "hourly", "timezone": "UTC"},
    goal="Notify me when a pricing tier, price, or headline feature changes",
    targets=[
        {
            "type": "scrape",
            "urls": ["https://example.com/pricing"],
            "scrapeOptions": {
                "formats": [
                    {
                        "type": "changeTracking",
                        "modes": ["json"],
                        "prompt": "Extract pricing tiers and headline features for each plan.",
                        "schema": Pricing.model_json_schema(),
                    }
                ]
            },
        }
    ],
    notification={
        "email": {
            "enabled": True,
            "recipients": ["alerts@example.com"],
            "includeDiffs": True,
        }
    },
)

print(monitor.id)
diff 载荷 如下所示——键是提取结果中的 JSON 路径,每个值都是一个 {previous, current} 对:
JSON-mode diff
{
  "diff": {
    "json": {
      "plans[0].price": {
        "previous": "$19/mo",
        "current": "$24/mo"
      },
      "plans[1].features[2]": {
        "previous": "10 GB storage",
        "current": "25 GB storage"
      }
    }
  },
  "snapshot": {
    "json": {
      "plans": [
        {
          "name": "Starter",
          "price": "$24/mo",
          "features": ["Up to 3 users", "Basic analytics", "Email support"]
        },
        {
          "name": "Pro",
          "price": "$49/mo",
          "features": ["Unlimited users", "Advanced analytics", "25 GB storage"]
        }
      ]
    }
  }
}
即使被跟踪的字段都没有变化,但周围的 markdown 发生了变化,JSON 模式监控器仍会报告 same,除非你同时启用 git-diff (请参见下方的混合模式) 。该 diff 只关注 schema 中定义的字段。

混合模式 (JSON + git-diff)

如果你同时需要两种结果形式——结构化的逐字段差异 以及 原始 markdown 统一 diff——请同时传入这两种模式:
Mixed target (JSON + git-diff)
{
  "type": "scrape",
  "urls": ["https://example.com/pricing"],
  "scrapeOptions": {
    "formats": [
      {
        "type": "changeTracking",
        "modes": ["json", "git-diff"],
        "prompt": "Extract pricing tiers and headline features for each plan.",
        "schema": {
          "type": "object",
          "properties": {
            "plans": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": { "type": "string" },
                  "price": { "type": "string" }
                }
              }
            }
          }
        }
      }
    ]
  }
}
随后,检查 响应会同时包含 diff.text (markdown sidecar) 和 diff.json (逐字段差异) ,以及提取得到的 snapshot.json
Mixed-mode diff (JSON + git-diff)
{
  "diff": {
    "text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Pricing\n-Starter — $19/mo\n+Starter — $24/mo\n",
    "json": {
      "plans[0].price": {
        "previous": "$19/mo",
        "current": "$24/mo"
      }
    }
  },
  "snapshot": {
    "json": {
      "plans": [
        { "name": "Starter", "price": "$24/mo" },
        { "name": "Pro", "price": "$49/mo" }
      ]
    }
  }
}
混合模式页面会在任一结果形式发生变化时报告 changed

通知

Webhooks

当监控配置了 webhook 时,Firecrawl 可以发送两种监控事件:
  • monitor.page:每个被监控的 抓取 在 抓取 worker 中完成后发送。
  • monitor.check.completed:在完整检查完成汇总后发送。包含检查状态和汇总计数。页面级结果请使用 monitor.page 事件或监控检查 API。
如果对已变更页面执行了有意义变更判定,monitor.page 会包含 isMeaningfuljudgment
Webhook config
{
  "webhook": {
    "url": "https://example.com/webhooks/firecrawl",
    "headers": {
      "Authorization": "Bearer your-secret"
    },
    "metadata": {
      "environment": "production"
    },
    "events": ["monitor.page", "monitor.check.completed"]
  }
}
monitor.page 载荷:
monitor.page
{
  "success": true,
  "type": "monitor.page",
  "id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
  "webhookId": "f1e2d3c4-0000-0000-0000-000000000000",
  "data": [
    {
      "monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
      "checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
      "url": "https://example.com/blog",
      "status": "changed",
      "previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
      "currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
      "error": null,
      "isMeaningful": true,
      "judgment": {
        "meaningful": true,
        "confidence": "high",
        "reason": "The page headline changed to announce a new release cadence.",
        "meaningfulChanges": [
          {
            "type": "changed",
            "before": "Welcome to our weekly update.",
            "after": "Welcome to our weekly update — now with daily releases!",
            "reason": "The headline changed in a way that matches the monitor goal."
          }
        ]
      },
      "diff": {
        "text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Latest posts\n-Welcome to our weekly update.\n+Welcome to our weekly update — now with daily releases!\n"
      }
    }
  ],
  "metadata": {
    "environment": "production"
  }
}
monitor.check.completed 载荷:
monitor.check.completed
{
  "success": true,
  "type": "monitor.check.completed",
  "id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
  "webhookId": "f1e2d3c4-0001-0000-0000-000000000000",
  "data": [
    {
      "monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
      "checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
      "status": "completed",
      "summary": {
        "totalPages": 2,
        "same": 1,
        "changed": 1,
        "new": 0,
        "removed": 0,
        "error": 0
      }
    }
  ],
  "metadata": {
    "environment": "production"
  }
}
当检查在没有页面级错误的情况下完成时,successtrue。对于失败或部分完成的检查,它为 false;如果可用,error 会包含失败原因。

电子邮件

仅当某次检查出现页面变更、新增、移除或报错时,才会发送电子邮件摘要。
Email config
{
  "notification": {
    "email": {
      "enabled": true,
      "recipients": ["alerts@example.com"],
      "includeDiffs": true
    }
  }
}
当某个监控设置了目标并启用判定后,电子邮件摘要会优先展示有意义的变更页面。如果所有变更页面都被判定为噪声,且没有新增、移除或报错的页面,则会抑制发送该电子邮件。 如果省略 recipients,Firecrawl 会将邮件发送给有资格接收系统告警邮件的团队成员。 你最多可以配置 25 个明确指定的收件人。

查看检查结果

使用 GET /v2/monitor/{monitorId}/checks 获取检查列表,使用 GET /v2/monitor/{monitorId}/checks/{checkId} 查看某次检查的详情。SDKs 默认会自动处理分页。
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

check = firecrawl.get_monitor_check(monitor_id, check_id, limit=25, status="changed")

for page in check.pages:
    print(page.url, page.status)

    if page.judgment:
        print(page.judgment.meaningful, page.judgment.reason)

    if page.diff and page.diff.text:
        print(page.diff.text)

    if page.snapshot and page.snapshot.json:
        print(page.snapshot.json)
检查列表可按检查 status 过滤:queuedrunningcompletedfailedpartialskipped_overlap 检查详情响应包含 estimatedCreditsactualCredits、汇总计数以及分页的 pages 数组。estimatedCredits 是该检查预留额度的上限;actualCredits 是 Firecrawl 在确认有多少页面发生变化并需要判定后,最终收取的额度。使用顶层 next URL 获取下一页结果,其分页方式与 爬取 保持一致。你还可以按页面 status 过滤:samenewchangedremovederror。每个已变更页面都包含内联 diff 数据;来自 JSON 模式监控的页面还会包含带有当前提取结果的 snapshot
Markdown-mode response
{
  "success": true,
  "next": "https://api.firecrawl.dev/v2/monitor/019df960-06e7-7383-9d89-82c0113dc31a/checks/019df960-5f2a-75fb-a98b-bd2d32ca67d4?skip=25&limit=25",
  "data": {
    "id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
    "monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
    "status": "completed",
    "estimatedCredits": 2,
    "actualCredits": 2,
    "summary": {
      "totalPages": 1,
      "same": 0,
      "changed": 1,
      "new": 0,
      "removed": 0,
      "error": 0
    },
    "pages": [
      {
        "id": "019df960-7708-7c62-a5dc-6206f16ac122",
        "targetId": "019df960-09bb-7c11-8001-1f12f50ab1c2",
        "url": "https://example.com/blog",
        "status": "changed",
        "previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
        "currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
        "statusCode": 200,
        "error": null,
        "metadata": {
          "title": "Example Blog",
          "creditsUsed": 1
        },
        "judgment": {
          "meaningful": true,
          "confidence": "high",
          "reason": "The page headline changed to announce a new release cadence.",
          "meaningfulChanges": [
            {
              "type": "changed",
              "before": "Welcome to our weekly update.",
              "after": "Welcome to our weekly update — now with daily releases!",
              "reason": "The headline changed in a way that matches the monitor goal."
            }
          ]
        },
        "createdAt": "2026-05-17T15:35:00.000Z",
        "diff": {
          "text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Latest posts\n-Welcome to our weekly update.\n+Welcome to our weekly update — now with daily releases!\n",
          "json": {
            "files": [
              {
                "from": "previous",
                "to": "current",
                "chunks": []
              }
            ]
          }
        }
      }
    ],
    "next": "https://api.firecrawl.dev/v2/monitor/019df960-06e7-7383-9d89-82c0113dc31a/checks/019df960-5f2a-75fb-a98b-bd2d32ca67d4?skip=25&limit=25"
  }
}

API 参考