Documentation Index
Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt
Use this file to discover all available pages before exploring further.
Firecrawl 监控会定期执行抓取或爬取,并将每次结果与上一次保留的快照进行比较。你可以使用监控来跟踪产品页面、文档、博客、更新日志、竞争对手网站,或任何变化值得关注的页面。
每次检查都会将页面级结果记录为 same、new、changed、removed 或 error。你可以在每个受监控页面处理完成时接收 Webhook,也可以在每次检查完成后接收一个 Webhook;还可以在发生更改或错误时接收电子邮件摘要,或组合使用这些通知方式。
监控功能依赖保留的快照和差异比较。不适用于启用 零数据保留 的团队。
为一个或多个指定 URL 创建抓取监控:
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
monitor = firecrawl.create_monitor(
name="Hacker News AI monitor",
schedule={"text": "every 30 minutes", "timezone": "UTC"},
goal=(
"Alert when a new Hacker News story related to AI enters the top 10. "
"Ignore changes to stories that are not about AI. "
"Do not alert on changes outside the top 10."
),
targets=[
{
"type": "scrape",
"urls": ["https://news.ycombinator.com"],
}
],
notification={
"email": {
"enabled": True,
"recipients": ["alerts@example.com"],
"includeDiffs": True,
}
},
)
print(monitor.id)
创建爬取监控,以便在每次检查时,对爬取过程中发现的每个页面进行差异比对:
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
monitor = firecrawl.create_monitor(
name="Docs monitor",
schedule={"cron": "7-59/15 * * * *", "timezone": "UTC"},
goal="Notify me when docs pages add, remove, or materially change API behavior",
targets=[
{
"type": "crawl",
"url": "https://example.com/docs",
"crawlOptions": {
"limit": 100,
"maxDiscoveryDepth": 3,
},
}
],
webhook={
"url": "https://example.com/webhooks/firecrawl",
"events": ["monitor.page", "monitor.check.completed"],
},
)
print(monitor.id)
每次创建调用都会返回新创建的监控,其中包含规范化后的 cron、计算得出的 nextRunAt 和 estimatedCreditsPerMonth。启用判定后,estimatedCreditsPerMonth 是一个上限估算值,因为判定额度只会对实际发生变化且被判定的页面计费:
{
"success": true,
"data": {
"id": "019df960-06e7-7383-9d89-82c0113dc31a",
"name": "Hacker News AI monitor",
"status": "active",
"schedule": {
"cron": "*/30 * * * *",
"timezone": "UTC"
},
"nextRunAt": "2026-05-17T16:00:00.000Z",
"lastRunAt": null,
"currentCheckId": null,
"goal": "Alert when a new Hacker News story related to AI enters the top 10. Ignore changes to stories that are not about AI. Do not alert on changes outside the top 10.",
"judgeEnabled": true,
"targets": [
{
"id": "019df960-09bb-7c11-8001-1f12f50ab1c2",
"type": "scrape",
"urls": ["https://news.ycombinator.com"]
}
],
"webhook": null,
"notification": {
"email": {
"enabled": true,
"recipients": ["alerts@example.com"],
"includeDiffs": true
}
},
"retentionDays": 30,
"estimatedCreditsPerMonth": 2880,
"lastCheckSummary": null,
"createdAt": "2026-05-17T15:30:00.000Z",
"updatedAt": "2026-05-17T15:30:00.000Z"
}
}
你也可以通过 Firecrawl CLI 创建监控:
firecrawl monitor create --name "Hacker News AI" \
--schedule "every 30 minutes" \
--goal "Alert when a new Hacker News story related to AI enters the top 10. Ignore changes to stories that are not about AI. Do not alert on changes outside the top 10." \
--page https://news.ycombinator.com
如果你只想在发生有意义的变化时收到告警,请添加一个通俗易懂的 goal。如果设置了 goal 但省略了 judgeEnabled,Firecrawl 会自动启用判定。判定会在已发生变化的页面上运行,并返回一个 judgment,其中包含 meaningful、confidence、reason 和 meaningfulChanges。
如果你想先保存目标、暂时不对变化进行判定,请使用 judgeEnabled: false。只有当监控同时设置了 judgeEnabled 且 goal 非空时,判定器才会运行。
每次检查都会对底层抓取或爬取照常计费。如果启用了判定,判定器还会为其验证的每个已发生变化的页面额外收取 1 个额度。没有已发生变化页面的检查不会消耗判定额度。
好的目标应简短且明确:说明什么情况应触发告警,重申任何范围限制,例如前 N 项、价格、角色类型、公司、地区、主题、状态或实体,并且仅在排除条件本身属于目标意图时才写入。如果目标范围较宽泛,就保持宽泛;例如,“任何变化” 不应再加入会掩盖变化的噪声过滤条件。
例如,一个具有以下目标的监控:
当与 AI 相关的新 Hacker News 故事进入前 10 时发出告警。忽略与 AI 无关的故事的变动。不对前 10 以外的变动发出告警。
当匹配的故事进入监控范围时,可能会生成如下这样的 monitor.page Webhook:
{
"success": true,
"type": "monitor.page",
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"webhookId": "f1e2d3c4-0000-0000-0000-000000000000",
"data": [
{
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"url": "https://news.ycombinator.com",
"status": "changed",
"previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
"currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
"error": null,
"isMeaningful": true,
"judgment": {
"meaningful": true,
"confidence": "high",
"reason": "A new AI-related story entered the Hacker News top 10.",
"meaningfulChanges": [
{
"type": "added",
"after": "4. Show HN: Open-source AI coding assistant",
"reason": "This is a new AI-related story inside the top 10."
}
]
},
"diff": {
"text": "--- previous\n+++ current\n@@ -1,5 +1,6 @@\n # Hacker News\n 1. Database internals for beginners\n 2. A new approach to CSS\n 3. Building reliable queues\n+4. Show HN: Open-source AI coding assistant\n"
}
}
],
"metadata": {
"environment": "production"
}
}
调度计划可以使用 cron 表达式或简单的自然语言文本来设置。
{
"schedule": {
"cron": "*/30 * * * *",
"timezone": "UTC"
}
}
支持的自然语言示例:
every 30 minutes
every 15 minutes starting at :07
hourly
every 2 hours
daily
daily at 9:00
daily at 9am
daily at 5:30 PM
weekly
最短间隔为 15 分钟。API 响应始终返回规范化后的 cron 表达式。对于文本调度计划,timezone 用于控制像 daily at 9am 这类短语何时运行。文本调度计划在转换为 cron 之前,会先按监控 ID 错开分布,因此多个监控不会在同一时刻同时运行。
监控支持两种目标类型:
scrape:对 urls 中的每个 URL 执行一次抓取。
crawl:每次检查时,对 url 执行一次完整爬取,然后对所有发现的页面进行差异比对。
每个监控可接受 1 到 50 个目标。retentionDays 的默认值为 30,最大可设置为 365。
目标的抓取选项会传递到底层抓取任务。由监控触发的抓取默认将 maxAge 设为 0,因此除非你显式设置其他 maxAge,否则每次检查都会执行一次新的抓取。
{
"type": "scrape",
"urls": ["https://example.com/pricing"],
"scrapeOptions": {
"formats": ["markdown"],
"maxAge": 0
}
}
对于爬取目标,使用 crawlOptions 控制爬取行为,并使用 scrapeOptions 控制每个页面的抓取:
{
"type": "crawl",
"url": "https://example.com/docs",
"crawlOptions": {
"limit": 100,
"includePaths": ["/docs"]
},
"scrapeOptions": {
"formats": ["markdown"]
}
}
默认情况下,Firecrawl 会比较每个页面的 markdown 差异,并报告 same、changed、new、removed 或 error。如果你想检测特定结构化字段 (如价格、标题、库存状态标记、列表中的条目等) 的变化,请在目标的 scrapeOptions 中添加 changeTracking 格式,并设置 modes: ["json"],以启用 JSON 模式的变更追踪。
当 scrapeOptions.formats 仅为 ["markdown"] 时,检查 响应中的每个已变更页面都会附带统一文本 diff,以及一个 parseDiff 风格的 AST:
{
"diff": {
"text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Pricing\n-Starter — $19/mo\n+Starter — $24/mo\n",
"json": {
"files": [
{
"from": "previous",
"to": "current",
"chunks": [
{
"content": "@@ -1,3 +1,3 @@",
"changes": []
}
]
}
]
}
}
}
传入 changeTracking 格式并设置 modes: ["json"],同时提供一个 JSON schema (或用于描述你关注字段的 prompt) 。Firecrawl 会在每次检查时提取该 JSON,并输出一个按字段区分的差异,以字段路径作为键;此外还会附带一个包含当前完整提取结果的 snapshot.json,这样消费方就无需重新获取底层 抓取 结果。
from firecrawl import Firecrawl
from pydantic import BaseModel
from typing import List
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
class Plan(BaseModel):
name: str
price: str
features: List[str]
class Pricing(BaseModel):
plans: List[Plan]
monitor = firecrawl.create_monitor(
name="Pricing monitor",
schedule={"text": "hourly", "timezone": "UTC"},
goal="Notify me when a pricing tier, price, or headline feature changes",
targets=[
{
"type": "scrape",
"urls": ["https://example.com/pricing"],
"scrapeOptions": {
"formats": [
{
"type": "changeTracking",
"modes": ["json"],
"prompt": "Extract pricing tiers and headline features for each plan.",
"schema": Pricing.model_json_schema(),
}
]
},
}
],
notification={
"email": {
"enabled": True,
"recipients": ["alerts@example.com"],
"includeDiffs": True,
}
},
)
print(monitor.id)
diff 载荷 如下所示——键是提取结果中的 JSON 路径,每个值都是一个 {previous, current} 对:
{
"diff": {
"json": {
"plans[0].price": {
"previous": "$19/mo",
"current": "$24/mo"
},
"plans[1].features[2]": {
"previous": "10 GB storage",
"current": "25 GB storage"
}
}
},
"snapshot": {
"json": {
"plans": [
{
"name": "Starter",
"price": "$24/mo",
"features": ["Up to 3 users", "Basic analytics", "Email support"]
},
{
"name": "Pro",
"price": "$49/mo",
"features": ["Unlimited users", "Advanced analytics", "25 GB storage"]
}
]
}
}
}
即使被跟踪的字段都没有变化,但周围的 markdown 发生了变化,JSON 模式监控器仍会报告 same,除非你同时启用 git-diff (请参见下方的混合模式) 。该 diff 只关注 schema 中定义的字段。
如果你同时需要两种结果形式——结构化的逐字段差异 以及 原始 markdown 统一 diff——请同时传入这两种模式:
Mixed target (JSON + git-diff)
{
"type": "scrape",
"urls": ["https://example.com/pricing"],
"scrapeOptions": {
"formats": [
{
"type": "changeTracking",
"modes": ["json", "git-diff"],
"prompt": "Extract pricing tiers and headline features for each plan.",
"schema": {
"type": "object",
"properties": {
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "string" }
}
}
}
}
}
}
]
}
}
随后,检查 响应会同时包含 diff.text (markdown sidecar) 和 diff.json (逐字段差异) ,以及提取得到的 snapshot.json:
Mixed-mode diff (JSON + git-diff)
{
"diff": {
"text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Pricing\n-Starter — $19/mo\n+Starter — $24/mo\n",
"json": {
"plans[0].price": {
"previous": "$19/mo",
"current": "$24/mo"
}
}
},
"snapshot": {
"json": {
"plans": [
{ "name": "Starter", "price": "$24/mo" },
{ "name": "Pro", "price": "$49/mo" }
]
}
}
}
混合模式页面会在任一结果形式发生变化时报告 changed。
当监控配置了 webhook 时,Firecrawl 可以发送两种监控事件:
monitor.page:每个被监控的 抓取 在 抓取 worker 中完成后发送。
monitor.check.completed:在完整检查完成汇总后发送。包含检查状态和汇总计数。页面级结果请使用 monitor.page 事件或监控检查 API。
如果对已变更页面执行了有意义变更判定,monitor.page 会包含 isMeaningful 和 judgment。
{
"webhook": {
"url": "https://example.com/webhooks/firecrawl",
"headers": {
"Authorization": "Bearer your-secret"
},
"metadata": {
"environment": "production"
},
"events": ["monitor.page", "monitor.check.completed"]
}
}
monitor.page 载荷:
{
"success": true,
"type": "monitor.page",
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"webhookId": "f1e2d3c4-0000-0000-0000-000000000000",
"data": [
{
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"url": "https://example.com/blog",
"status": "changed",
"previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
"currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
"error": null,
"isMeaningful": true,
"judgment": {
"meaningful": true,
"confidence": "high",
"reason": "The page headline changed to announce a new release cadence.",
"meaningfulChanges": [
{
"type": "changed",
"before": "Welcome to our weekly update.",
"after": "Welcome to our weekly update — now with daily releases!",
"reason": "The headline changed in a way that matches the monitor goal."
}
]
},
"diff": {
"text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Latest posts\n-Welcome to our weekly update.\n+Welcome to our weekly update — now with daily releases!\n"
}
}
],
"metadata": {
"environment": "production"
}
}
monitor.check.completed 载荷:
{
"success": true,
"type": "monitor.check.completed",
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"webhookId": "f1e2d3c4-0001-0000-0000-000000000000",
"data": [
{
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"status": "completed",
"summary": {
"totalPages": 2,
"same": 1,
"changed": 1,
"new": 0,
"removed": 0,
"error": 0
}
}
],
"metadata": {
"environment": "production"
}
}
当检查在没有页面级错误的情况下完成时,success 为 true。对于失败或部分完成的检查,它为 false;如果可用,error 会包含失败原因。
仅当某次检查出现页面变更、新增、移除或报错时,才会发送电子邮件摘要。
{
"notification": {
"email": {
"enabled": true,
"recipients": ["alerts@example.com"],
"includeDiffs": true
}
}
}
当某个监控设置了目标并启用判定后,电子邮件摘要会优先展示有意义的变更页面。如果所有变更页面都被判定为噪声,且没有新增、移除或报错的页面,则会抑制发送该电子邮件。
如果省略 recipients,Firecrawl 会将邮件发送给有资格接收系统告警邮件的团队成员。
你最多可以配置 25 个明确指定的收件人。
使用 GET /v2/monitor/{monitorId}/checks 获取检查列表,使用 GET /v2/monitor/{monitorId}/checks/{checkId} 查看某次检查的详情。SDKs 默认会自动处理分页。
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
check = firecrawl.get_monitor_check(monitor_id, check_id, limit=25, status="changed")
for page in check.pages:
print(page.url, page.status)
if page.judgment:
print(page.judgment.meaningful, page.judgment.reason)
if page.diff and page.diff.text:
print(page.diff.text)
if page.snapshot and page.snapshot.json:
print(page.snapshot.json)
检查列表可按检查 status 过滤:queued、running、completed、failed、partial 或 skipped_overlap。
检查详情响应包含 estimatedCredits、actualCredits、汇总计数以及分页的 pages 数组。estimatedCredits 是该检查预留额度的上限;actualCredits 是 Firecrawl 在确认有多少页面发生变化并需要判定后,最终收取的额度。使用顶层 next URL 获取下一页结果,其分页方式与 爬取 保持一致。你还可以按页面 status 过滤:same、new、changed、removed 或 error。每个已变更页面都包含内联 diff 数据;来自 JSON 模式监控的页面还会包含带有当前提取结果的 snapshot。
{
"success": true,
"next": "https://api.firecrawl.dev/v2/monitor/019df960-06e7-7383-9d89-82c0113dc31a/checks/019df960-5f2a-75fb-a98b-bd2d32ca67d4?skip=25&limit=25",
"data": {
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"status": "completed",
"estimatedCredits": 2,
"actualCredits": 2,
"summary": {
"totalPages": 1,
"same": 0,
"changed": 1,
"new": 0,
"removed": 0,
"error": 0
},
"pages": [
{
"id": "019df960-7708-7c62-a5dc-6206f16ac122",
"targetId": "019df960-09bb-7c11-8001-1f12f50ab1c2",
"url": "https://example.com/blog",
"status": "changed",
"previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
"currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
"statusCode": 200,
"error": null,
"metadata": {
"title": "Example Blog",
"creditsUsed": 1
},
"judgment": {
"meaningful": true,
"confidence": "high",
"reason": "The page headline changed to announce a new release cadence.",
"meaningfulChanges": [
{
"type": "changed",
"before": "Welcome to our weekly update.",
"after": "Welcome to our weekly update — now with daily releases!",
"reason": "The headline changed in a way that matches the monitor goal."
}
]
},
"createdAt": "2026-05-17T15:35:00.000Z",
"diff": {
"text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Latest posts\n-Welcome to our weekly update.\n+Welcome to our weekly update — now with daily releases!\n",
"json": {
"files": [
{
"from": "previous",
"to": "current",
"chunks": []
}
]
}
}
}
],
"next": "https://api.firecrawl.dev/v2/monitor/019df960-06e7-7383-9d89-82c0113dc31a/checks/019df960-5f2a-75fb-a98b-bd2d32ca67d4?skip=25&limit=25"
}
}
{
"success": true,
"data": {
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"status": "completed",
"estimatedCredits": 2,
"actualCredits": 2,
"summary": {
"totalPages": 1,
"same": 0,
"changed": 1,
"new": 0,
"removed": 0,
"error": 0
},
"pages": [
{
"id": "019df960-7708-7c62-a5dc-6206f16ac122",
"targetId": "019df960-09bb-7c11-8001-1f12f50ab1c2",
"url": "https://example.com/pricing",
"status": "changed",
"previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
"currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
"statusCode": 200,
"error": null,
"metadata": {
"title": "Pricing",
"creditsUsed": 1
},
"judgment": {
"meaningful": true,
"confidence": "high",
"reason": "The Starter plan price and Pro storage limit changed.",
"meaningfulChanges": [
{
"type": "changed",
"before": "$19/mo",
"after": "$24/mo",
"reason": "The Starter plan price changed."
},
{
"type": "changed",
"before": "10 GB storage",
"after": "25 GB storage",
"reason": "The Pro storage limit changed."
}
]
},
"createdAt": "2026-05-17T15:35:00.000Z",
"diff": {
"json": {
"plans[0].price": {
"previous": "$19/mo",
"current": "$24/mo"
},
"plans[1].features[2]": {
"previous": "10 GB storage",
"current": "25 GB storage"
}
}
},
"snapshot": {
"json": {
"plans": [
{
"name": "Starter",
"price": "$24/mo",
"features": ["Up to 3 users", "Basic analytics", "Email support"]
},
{
"name": "Pro",
"price": "$49/mo",
"features": ["Unlimited users", "Advanced analytics", "25 GB storage"]
}
]
}
}
}
]
}
}
Mixed-mode response (JSON + git-diff)
{
"success": true,
"data": {
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"status": "completed",
"estimatedCredits": 2,
"actualCredits": 2,
"summary": {
"totalPages": 1,
"same": 0,
"changed": 1,
"new": 0,
"removed": 0,
"error": 0
},
"pages": [
{
"id": "019df960-7708-7c62-a5dc-6206f16ac122",
"targetId": "019df960-09bb-7c11-8001-1f12f50ab1c2",
"url": "https://example.com/pricing",
"status": "changed",
"previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
"currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
"statusCode": 200,
"error": null,
"metadata": {
"title": "Pricing",
"creditsUsed": 1
},
"judgment": {
"meaningful": true,
"confidence": "high",
"reason": "The Starter plan price changed.",
"meaningfulChanges": [
{
"type": "changed",
"before": "Starter — $19/mo",
"after": "Starter — $24/mo",
"reason": "The Starter plan price changed."
}
]
},
"createdAt": "2026-05-17T15:35:00.000Z",
"diff": {
"text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Pricing\n-Starter — $19/mo\n+Starter — $24/mo\n",
"json": {
"plans[0].price": {
"previous": "$19/mo",
"current": "$24/mo"
}
}
},
"snapshot": {
"json": {
"plans": [
{ "name": "Starter", "price": "$24/mo" },
{ "name": "Pro", "price": "$49/mo" }
]
}
}
}
]
}
}