Change tracking allows you to monitor and detect changes in web content over time. This feature is available in both the JavaScript and Python SDKs.
Overview
Change tracking enables you to:
- Detect if a webpage has changed since the last scrape
- View the specific changes between scrapes
- Get structured data about what has changed
- Control the visibility of changes
Using the changeTracking
format, you can monitor changes on a website and receive information about:
previousScrapeAt
: The timestamp of the previous scrape that the current page is being compared against (null
if no previous scrape)
changeStatus
: The result of the comparison between the two page versions
new
: This page did not exist or was not discovered before (usually has a null
previousScrapeAt
)
same
: This page’s content has not changed since the last scrape
changed
: This page’s content has changed since the last scrape
removed
: This page was removed since the last scrape
visibility
: The visibility of the current page/URL
visible
: This page is visible, meaning that its URL was discovered through an organic route (through links on other visible pages or the sitemap)
hidden
: This page is not visible, meaning it is still available on the web, but no longer discoverable via the sitemap or crawling the site. We can only identify invisible links if they had been visible, and captured, during a previous crawl or scrape
JavaScript SDK
Basic Usage
To use change tracking in the JavaScript SDK, include 'changeTracking'
in the formats array when scraping a URL:
const app = new FirecrawlApp({ apiKey: 'your-api-key' });
const result = await app.scrapeUrl('https://example.com', {
formats: ['markdown', 'changeTracking']
});
// Access change tracking data
console.log(result.changeTracking.changeStatus); // 'new', 'same', 'changed', or 'removed'
console.log(result.changeTracking.visibility); // 'visible' or 'hidden'
console.log(result.changeTracking.previousScrapeAt); // ISO timestamp of previous scrape
Example Response:
{
"url": "https://firecrawl.dev",
"markdown": "# AI Agents for great customer experiences\n\nChatbots that delight your users...",
"changeTracking": {
"previousScrapeAt": "2025-04-10T12:00:00Z",
"changeStatus": "changed",
"visibility": "visible"
}
}
Advanced Options
You can configure change tracking with additional options:
const result = await app.scrapeUrl('https://example.com', {
formats: ['markdown', 'changeTracking'],
changeTrackingOptions: {
modes: ['git-diff', 'json'], // Enable specific change tracking modes
schema: {
type: 'object',
properties: {
title: { type: 'string' },
content: { type: 'string' }
}
}, // Schema for structured JSON comparison
prompt: 'Custom prompt for extraction' // Optional custom prompt
}
});
// Access git-diff format changes
if (result.changeTracking.diff) {
console.log(result.changeTracking.diff.text); // Git-style diff text
console.log(result.changeTracking.diff.json); // Structured diff data
}
// Access JSON comparison changes
if (result.changeTracking.json) {
console.log(result.changeTracking.json.title.previous); // Previous title
console.log(result.changeTracking.json.title.current); // Current title
}
Git-Diff Results Example:
**April, 13 2025**
-**05:55:05 PM**
+**05:58:57 PM**
...
JSON Comparison Results Example:
{
"time": {
"previous": "2025-04-13T17:54:32Z",
"current": "2025-04-13T17:55:05Z"
}
}
TypeScript Interface
The change tracking feature includes the following TypeScript interfaces:
interface FirecrawlDocument {
// ... other properties
changeTracking?: {
previousScrapeAt: string | null;
changeStatus: "new" | "same" | "changed" | "removed";
visibility: "visible" | "hidden";
diff?: {
text: string;
json: {
files: Array<{
from: string | null;
to: string | null;
chunks: Array<{
content: string;
changes: Array<{
type: string;
normal?: boolean;
ln?: number;
ln1?: number;
ln2?: number;
content: string;
}>;
}>;
}>;
};
};
json?: any;
};
}
interface ScrapeParams {
// ... other properties
changeTrackingOptions?: {
prompt?: string;
schema?: any;
modes?: ("json" | "git-diff")[];
}
}
Python SDK
Basic Usage
To use change tracking in the Python SDK, include 'changeTracking'
in the formats list when scraping a URL:
from firecrawl import FirecrawlApp, ChangeTrackingOptions
app = FirecrawlApp(api_key='your-api-key')
result = app.scrape_url('https://example.com',
formats=['markdown', 'changeTracking']
)
# Access change tracking data
print("Change Status:", result.changeTracking.changeStatus) # 'new', 'same', 'changed', or 'removed'
print("Visibility:", result.changeTracking.visibility) # 'visible' or 'hidden'
print("Previous Scrape At:", result.changeTracking.previousScrapeAt) # ISO timestamp of previous scrape
Advanced Options
You can configure change tracking with additional options:
result = app.scrape_url('https://example.com',
formats=['markdown', 'changeTracking'],
change_tracking_options=ChangeTrackingOptions(
modes=['git-diff', 'json'], # Enable specific change tracking modes
schema={
'type': 'object',
'properties': {
'title': {'type': 'string'},
'content': {'type': 'string'}
}
}, # Schema for structured JSON comparison
prompt='Custom prompt for extraction' # Optional custom prompt
)
)
# Access git-diff format changes
if 'diff' in result.changeTracking:
print(result.changeTracking.diff.text) # Git-style diff text
print(result.changeTracking.diff.json) # Structured diff data
# Access JSON comparison changes
if 'json' in result.changeTracking:
print(result.changeTracking.json.title.previous) # Previous title
print(result.changeTracking.json.title.current) # Current title
Python Data Model
The change tracking feature includes the following Python data model:
class ChangeTrackingData(pydantic.BaseModel):
"""
Data for the change tracking format.
"""
previousScrapeAt: Optional[str] = None
changeStatus: str # "new" | "same" | "changed" | "removed"
visibility: str # "visible" | "hidden"
diff: Optional[Dict[str, Any]] = None
json: Optional[Any] = None
Change Tracking Modes
The change tracking feature supports two modes:
Git-Diff Mode
The git-diff
mode provides a traditional diff format similar to Git’s output. It shows line-by-line changes with additions and deletions marked.
Example output:
@@ -1,1 +1,1 @@
-old content
+new content
The structured JSON representation of the diff includes:
files
: Array of changed files (in web context, typically just one)
chunks
: Sections of changes within a file
changes
: Individual line changes with type (add, delete, normal)
JSON Mode
The json
mode provides a structured comparison of specific fields extracted from the content. This is useful for tracking changes in specific data points rather than the entire content.
Example output:
{
"title": {
"previous": "Old Title",
"current": "New Title"
},
"price": {
"previous": "$19.99",
"current": "$24.99"
}
}
To use JSON mode, you need to provide a schema that defines the fields to extract and compare.
Important Facts
Here are some important details to know when using the change tracking feature:
-
Comparison Method: Scrapes are always compared via their markdown response.
- The
markdown
format must also be specified when using the changeTracking
format. Other formats may also be specified in addition.
- The comparison algorithm is resistant to changes in whitespace and content order. iframe source URLs are currently ignored for resistance against captchas and antibots with randomized URLs.
-
Matching Previous Scrapes: Previous scrapes to compare against are currently only matched on the source URL, the team ID, and the markdown
format, and not on any other scrape or crawl options.
- For an effective comparison, the input URL should be exactly the same as the previous request for the same content.
- Crawling the same URLs with different
includePaths
/excludePaths
will have inconsistencies when using changeTracking
.
- Scraping the same URLs with different
includeTags
/excludeTags
/onlyMainContent
will have inconsistencies when using changeTracking
.
- Compared pages will also be compared against previous scrapes that only have the
markdown
format without the changeTracking
format.
- Comparisons are scoped to your team. If you scrape a URL for the first time with your API key, its
changeStatus
will always be new
, even if other Firecrawl users have scraped it before.
-
Beta Status: While in Beta, it is recommended to monitor the warning
field of the resulting document, and to handle the changeTracking
object potentially missing from the response.
- This may occur when the database lookup to find the previous scrape to compare against times out.
Examples
Basic Scrape Example
// Request
{
"url": "https://firecrawl.dev",
"formats": ["markdown", "changeTracking"]
}
// Response
{
"success": true,
"data": {
"markdown": "...",
"metadata": {...},
"changeTracking": {
"previousScrapeAt": "2025-03-30T15:07:17.543071+00:00",
"changeStatus": "same",
"visibility": "visible"
}
}
}
Crawl Example
// Request
{
"url": "https://firecrawl.dev",
"scrapeOptions": {
"formats": ["markdown", "changeTracking"]
}
}
Tracking Product Price Changes
// JavaScript
const result = await app.scrapeUrl('https://example.com/product', {
formats: ['markdown', 'changeTracking'],
changeTrackingOptions: {
modes: ['json'],
schema: {
type: 'object',
properties: {
price: { type: 'string' },
availability: { type: 'string' }
}
}
}
});
if (result.changeTracking.changeStatus === 'changed') {
console.log(`Price changed from ${result.changeTracking.json.price.previous} to ${result.changeTracking.json.price.current}`);
}
# Python
result = app.scrape_url('https://example.com/product',
formats=['markdown', 'changeTracking'],
changeTrackingOptions={
'modes': ['json'],
'schema': {
'type': 'object',
'properties': {
'price': {'type': 'string'},
'availability': {'type': 'string'}
}
}
}
)
if result.changeTracking.changeStatus == 'changed':
print(f"Price changed from {result.changeTracking.json.price.previous} to {result.changeTracking.json.price.current}")
Monitoring Content Changes with Git-Diff
// JavaScript
const result = await app.scrapeUrl('https://example.com/blog', {
formats: ['markdown', 'changeTracking'],
changeTrackingOptions: {
modes: ['git-diff']
}
});
if (result.changeTracking.changeStatus === 'changed') {
console.log('Content changes:');
console.log(result.changeTracking.diff.text);
}
# Python
result = app.scrape_url('https://example.com/blog',
formats=['markdown', 'changeTracking'],
changeTrackingOptions={
'modes': ['git-diff']
}
)
if result.changeTracking.changeStatus == 'changed':
print('Content changes:')
print(result.changeTracking.diff.text)
Billing
The change tracking feature is currently in beta. Using the basic change tracking functionality and git-diff
mode has no additional cost. However, if you use the json
mode for structured data comparison, the page scrape will cost 5 credits per page.