Change tracking allows you to monitor and detect changes in web content over time. This feature is available in both the JavaScript and Python SDKs.

Overview

Change tracking enables you to:

  • Detect if a webpage has changed since the last scrape
  • View the specific changes between scrapes
  • Get structured data about what has changed
  • Control the visibility of changes

Using the changeTracking format, you can monitor changes on a website and receive information about:

  • previousScrapeAt: The timestamp of the previous scrape that the current page is being compared against (null if no previous scrape)
  • changeStatus: The result of the comparison between the two page versions
    • new: This page did not exist or was not discovered before (usually has a null previousScrapeAt)
    • same: This page’s content has not changed since the last scrape
    • changed: This page’s content has changed since the last scrape
    • removed: This page was removed since the last scrape
  • visibility: The visibility of the current page/URL
    • visible: This page is visible, meaning that its URL was discovered through an organic route (through links on other visible pages or the sitemap)
    • hidden: This page is not visible, meaning it is still available on the web, but no longer discoverable via the sitemap or crawling the site. We can only identify invisible links if they had been visible, and captured, during a previous crawl or scrape

JavaScript SDK

Basic Usage

To use change tracking in the JavaScript SDK, include 'changeTracking' in the formats array when scraping a URL:

const app = new FirecrawlApp({ apiKey: 'your-api-key' });
const result = await app.scrapeUrl('https://example.com', {
  formats: ['markdown', 'changeTracking']
});

// Access change tracking data
console.log(result.changeTracking.changeStatus); // 'new', 'same', 'changed', or 'removed'
console.log(result.changeTracking.visibility); // 'visible' or 'hidden'
console.log(result.changeTracking.previousScrapeAt); // ISO timestamp of previous scrape

Example Response:

{
  "url": "https://firecrawl.dev",
  "markdown": "# AI Agents for great customer experiences\n\nChatbots that delight your users...",
  "changeTracking": {
    "previousScrapeAt": "2025-04-10T12:00:00Z",
    "changeStatus": "changed",
    "visibility": "visible"
  }
}

Advanced Options

You can configure change tracking with additional options:

const result = await app.scrapeUrl('https://example.com', {
  formats: ['markdown', 'changeTracking'],
  changeTrackingOptions: {
    modes: ['git-diff', 'json'], // Enable specific change tracking modes
    schema: { 
      type: 'object', 
      properties: { 
        title: { type: 'string' },
        content: { type: 'string' }
      } 
    }, // Schema for structured JSON comparison
    prompt: 'Custom prompt for extraction' // Optional custom prompt
  }
});

// Access git-diff format changes
if (result.changeTracking.diff) {
  console.log(result.changeTracking.diff.text); // Git-style diff text
  console.log(result.changeTracking.diff.json); // Structured diff data
}

// Access JSON comparison changes
if (result.changeTracking.json) {
  console.log(result.changeTracking.json.title.previous); // Previous title
  console.log(result.changeTracking.json.title.current); // Current title
}

Git-Diff Results Example:

 **April, 13 2025**
 
-**05:55:05 PM**
+**05:58:57 PM**

...

JSON Comparison Results Example:

{
  "time": { 
    "previous": "2025-04-13T17:54:32Z", 
    "current": "2025-04-13T17:55:05Z" 
  }
}

TypeScript Interface

The change tracking feature includes the following TypeScript interfaces:

interface FirecrawlDocument {
  // ... other properties
  changeTracking?: {
    previousScrapeAt: string | null;
    changeStatus: "new" | "same" | "changed" | "removed";
    visibility: "visible" | "hidden";
    diff?: {
      text: string;
      json: {
        files: Array<{
          from: string | null;
          to: string | null;
          chunks: Array<{
            content: string;
            changes: Array<{
              type: string;
              normal?: boolean;
              ln?: number;
              ln1?: number;
              ln2?: number;
              content: string;
            }>;
          }>;
        }>;
      };
    };
    json?: any;
  };
}

interface ScrapeParams {
  // ... other properties
  changeTrackingOptions?: {
    prompt?: string;
    schema?: any;
    modes?: ("json" | "git-diff")[];
  }
}

Python SDK

Basic Usage

To use change tracking in the Python SDK, include 'changeTracking' in the formats list when scraping a URL:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your-api-key')
result = app.scrape_url('https://example.com', {
    'formats': ['markdown', 'changeTracking']
})

# Access change tracking data
print(result['changeTracking']['changeStatus'])  # 'new', 'same', 'changed', or 'removed'
print(result['changeTracking']['visibility'])  # 'visible' or 'hidden'
print(result['changeTracking']['previousScrapeAt'])  # ISO timestamp of previous scrape

Advanced Options

You can configure change tracking with additional options:

result = app.scrape_url('https://example.com', {
    'formats': ['markdown', 'changeTracking'],
    'changeTrackingOptions': {
        'modes': ['git-diff', 'json'],  # Enable specific change tracking modes
        'schema': {
            'type': 'object',
            'properties': {
                'title': {'type': 'string'},
                'content': {'type': 'string'}
            }
        },  # Schema for structured JSON comparison
        'prompt': 'Custom prompt for extraction'  # Optional custom prompt
    }
})

# Access git-diff format changes
if 'diff' in result['changeTracking']:
    print(result['changeTracking']['diff']['text'])  # Git-style diff text
    print(result['changeTracking']['diff']['json'])  # Structured diff data

# Access JSON comparison changes
if 'json' in result['changeTracking']:
    print(result['changeTracking']['json']['title']['previous'])  # Previous title
    print(result['changeTracking']['json']['title']['current'])  # Current title

Python Data Model

The change tracking feature includes the following Python data model:

class ChangeTrackingData(pydantic.BaseModel):
    """
    Data for the change tracking format.
    """
    previousScrapeAt: Optional[str] = None
    changeStatus: str  # "new" | "same" | "changed" | "removed"
    visibility: str  # "visible" | "hidden"
    diff: Optional[Dict[str, Any]] = None
    json: Optional[Any] = None

Change Tracking Modes

The change tracking feature supports two modes:

Git-Diff Mode

The git-diff mode provides a traditional diff format similar to Git’s output. It shows line-by-line changes with additions and deletions marked.

Example output:

@@ -1,1 +1,1 @@
-old content
+new content

The structured JSON representation of the diff includes:

  • files: Array of changed files (in web context, typically just one)
  • chunks: Sections of changes within a file
  • changes: Individual line changes with type (add, delete, normal)

JSON Mode

The json mode provides a structured comparison of specific fields extracted from the content. This is useful for tracking changes in specific data points rather than the entire content.

Example output:

{
  "title": {
    "previous": "Old Title",
    "current": "New Title"
  },
  "price": {
    "previous": "$19.99",
    "current": "$24.99"
  }
}

To use JSON mode, you need to provide a schema that defines the fields to extract and compare.

Important Facts

Here are some important details to know when using the change tracking feature:

  • Comparison Method: Scrapes are always compared via their markdown response.

    • The markdown format must also be specified when using the changeTracking format. Other formats may also be specified in addition.
    • The comparison algorithm is resistant to changes in whitespace and content order. iframe source URLs are currently ignored for resistance against captchas and antibots with randomized URLs.
  • Matching Previous Scrapes: Previous scrapes to compare against are currently only matched on the source URL, the team ID, and the markdown format, and not on any other scrape or crawl options.

    • For an effective comparison, the input URL should be exactly the same as the previous request for the same content.
    • Crawling the same URLs with different includePaths/excludePaths will have inconsistencies when using changeTracking.
    • Scraping the same URLs with different includeTags/excludeTags/onlyMainContent will have inconsistencies when using changeTracking.
    • Compared pages will also be compared against previous scrapes that only have the markdown format without the changeTracking format.
    • Comparisons are scoped to your team. If you scrape a URL for the first time with your API key, its changeStatus will always be new, even if other Firecrawl users have scraped it before.
  • Beta Status: While in Beta, it is recommended to monitor the warning field of the resulting document, and to handle the changeTracking object potentially missing from the response.

    • This may occur when the database lookup to find the previous scrape to compare against times out.

Examples

Basic Scrape Example

// Request
{
    "url": "https://firecrawl.dev",
    "formats": ["markdown", "changeTracking"]
}

// Response
{
  "success": true,
  "data": {
    "markdown": "...",
    "metadata": {...},
    "changeTracking": {
      "previousScrapeAt": "2025-03-30T15:07:17.543071+00:00",
      "changeStatus": "same",
      "visibility": "visible"
    }
  }
}

Crawl Example

// Request
{
    "url": "https://firecrawl.dev",
    "scrapeOptions": {
        "formats": ["markdown", "changeTracking"]
    }
}

Tracking Product Price Changes

// JavaScript
const result = await app.scrapeUrl('https://example.com/product', {
  formats: ['markdown', 'changeTracking'],
  changeTrackingOptions: {
    modes: ['json'],
    schema: {
      type: 'object',
      properties: {
        price: { type: 'string' },
        availability: { type: 'string' }
      }
    }
  }
});

if (result.changeTracking.changeStatus === 'changed') {
  console.log(`Price changed from ${result.changeTracking.json.price.previous} to ${result.changeTracking.json.price.current}`);
}
# Python
result = app.scrape_url('https://example.com/product', {
    'formats': ['markdown', 'changeTracking'],
    'changeTrackingOptions': {
        'modes': ['json'],
        'schema': {
            'type': 'object',
            'properties': {
                'price': {'type': 'string'},
                'availability': {'type': 'string'}
            }
        }
    }
})

if result['changeTracking']['changeStatus'] == 'changed':
    print(f"Price changed from {result['changeTracking']['json']['price']['previous']} to {result['changeTracking']['json']['price']['current']}")

Monitoring Content Changes with Git-Diff

// JavaScript
const result = await app.scrapeUrl('https://example.com/blog', {
  formats: ['markdown', 'changeTracking'],
  changeTrackingOptions: {
    modes: ['git-diff']
  }
});

if (result.changeTracking.changeStatus === 'changed') {
  console.log('Content changes:');
  console.log(result.changeTracking.diff.text);
}
# Python
result = app.scrape_url('https://example.com/blog', {
    'formats': ['markdown', 'changeTracking'],
    'changeTrackingOptions': {
        'modes': ['git-diff']
    }
})

if result['changeTracking']['changeStatus'] == 'changed':
    print('Content changes:')
    print(result['changeTracking']['diff']['text'])

Billing

The change tracking feature is currently in beta. Using the basic change tracking functionality and git-diff mode has no additional cost. However, if you use the json mode for structured data comparison, the page scrape will cost 5 credits per page.