Change Tracking
Firecrawl can track changes between the current page and a previous version, and tell you if it updated or not
Change tracking allows you to monitor and detect changes in web content over time. This feature is available in both the JavaScript and Python SDKs.
Overview
Change tracking enables you to:
- Detect if a webpage has changed since the last scrape
- View the specific changes between scrapes
- Get structured data about what has changed
- Control the visibility of changes
Using the changeTracking
format, you can monitor changes on a website and receive information about:
previousScrapeAt
: The timestamp of the previous scrape that the current page is being compared against (null
if no previous scrape)changeStatus
: The result of the comparison between the two page versionsnew
: This page did not exist or was not discovered before (usually has anull
previousScrapeAt
)same
: This page’s content has not changed since the last scrapechanged
: This page’s content has changed since the last scraperemoved
: This page was removed since the last scrape
visibility
: The visibility of the current page/URLvisible
: This page is visible, meaning that its URL was discovered through an organic route (through links on other visible pages or the sitemap)hidden
: This page is not visible, meaning it is still available on the web, but no longer discoverable via the sitemap or crawling the site. We can only identify invisible links if they had been visible, and captured, during a previous crawl or scrape
JavaScript SDK
Basic Usage
To use change tracking in the JavaScript SDK, include 'changeTracking'
in the formats array when scraping a URL:
Example Response:
Advanced Options
You can configure change tracking with additional options:
Git-Diff Results Example:
JSON Comparison Results Example:
TypeScript Interface
The change tracking feature includes the following TypeScript interfaces:
Python SDK
Basic Usage
To use change tracking in the Python SDK, include 'changeTracking'
in the formats list when scraping a URL:
Advanced Options
You can configure change tracking with additional options:
Python Data Model
The change tracking feature includes the following Python data model:
Change Tracking Modes
The change tracking feature supports two modes:
Git-Diff Mode
The git-diff
mode provides a traditional diff format similar to Git’s output. It shows line-by-line changes with additions and deletions marked.
Example output:
The structured JSON representation of the diff includes:
files
: Array of changed files (in web context, typically just one)chunks
: Sections of changes within a filechanges
: Individual line changes with type (add, delete, normal)
JSON Mode
The json
mode provides a structured comparison of specific fields extracted from the content. This is useful for tracking changes in specific data points rather than the entire content.
Example output:
To use JSON mode, you need to provide a schema that defines the fields to extract and compare.
Important Facts
Here are some important details to know when using the change tracking feature:
-
Comparison Method: Scrapes are always compared via their markdown response.
- The
markdown
format must also be specified when using thechangeTracking
format. Other formats may also be specified in addition. - The comparison algorithm is resistant to changes in whitespace and content order. iframe source URLs are currently ignored for resistance against captchas and antibots with randomized URLs.
- The
-
Matching Previous Scrapes: Previous scrapes to compare against are currently only matched on the source URL, the team ID, and the
markdown
format, and not on any other scrape or crawl options.- For an effective comparison, the input URL should be exactly the same as the previous request for the same content.
- Crawling the same URLs with different
includePaths
/excludePaths
will have inconsistencies when usingchangeTracking
. - Scraping the same URLs with different
includeTags
/excludeTags
/onlyMainContent
will have inconsistencies when usingchangeTracking
. - Compared pages will also be compared against previous scrapes that only have the
markdown
format without thechangeTracking
format. - Comparisons are scoped to your team. If you scrape a URL for the first time with your API key, its
changeStatus
will always benew
, even if other Firecrawl users have scraped it before.
-
Beta Status: While in Beta, it is recommended to monitor the
warning
field of the resulting document, and to handle thechangeTracking
object potentially missing from the response.- This may occur when the database lookup to find the previous scrape to compare against times out.
Examples
Basic Scrape Example
Crawl Example
Tracking Product Price Changes
Monitoring Content Changes with Git-Diff
Billing
The change tracking feature is currently in beta. Using the basic change tracking functionality and git-diff
mode has no additional cost. However, if you use the json
mode for structured data comparison, the page scrape will cost 5 credits per page.