
Overview
Change tracking enables you to:- Detect if a webpage has changed since the last scrape
- View the specific changes between scrapes
- Get structured data about what has changed
- Control the visibility of changes
changeTracking
format, you can monitor changes on a website and receive information about:
previousScrapeAt
: The timestamp of the previous scrape that the current page is being compared against (null
if no previous scrape)changeStatus
: The result of the comparison between the two page versionsnew
: This page did not exist or was not discovered before (usually has anull
previousScrapeAt
)same
: This page’s content has not changed since the last scrapechanged
: This page’s content has changed since the last scraperemoved
: This page was removed since the last scrape
visibility
: The visibility of the current page/URLvisible
: This page is visible, meaning that its URL was discovered through an organic route (through links on other visible pages or the sitemap)hidden
: This page is not visible, meaning it is still available on the web, but no longer discoverable via the sitemap or crawling the site. We can only identify invisible links if they had been visible, and captured, during a previous crawl or scrape
SDKs
Basic Usage
To use change tracking, include'changeTracking'
in the formats when scraping a URL:
Advanced Options
You can configure change tracking by passing an object in theformats
array:
Git-Diff Results Example:
JSON Comparison Results Example:
Data Models
The change tracking feature includes the following data models:Change Tracking Modes
The change tracking feature supports two modes:Git-Diff Mode
Thegit-diff
mode provides a traditional diff format similar to Git’s output. It shows line-by-line changes with additions and deletions marked.
Example output:
files
: Array of changed files (in web context, typically just one)chunks
: Sections of changes within a filechanges
: Individual line changes with type (add, delete, normal)
JSON Mode
Thejson
mode provides a structured comparison of specific fields extracted from the content. This is useful for tracking changes in specific data points rather than the entire content.
Example output:
Important Facts
Here are some important details to know when using the change tracking feature:-
Comparison Method: Scrapes are always compared via their markdown response.
- The
markdown
format must also be specified when using thechangeTracking
format. Other formats may also be specified in addition. - The comparison algorithm is resistant to changes in whitespace and content order. iframe source URLs are currently ignored for resistance against captchas and antibots with randomized URLs.
- The
-
Matching Previous Scrapes: Previous scrapes to compare against are currently matched on the source URL, the team ID, the
markdown
format, and thetag
parameter.- For an effective comparison, the input URL should be exactly the same as the previous request for the same content.
- Crawling the same URLs with different
includePaths
/excludePaths
will have inconsistencies when usingchangeTracking
. - Scraping the same URLs with different
includeTags
/excludeTags
/onlyMainContent
will have inconsistencies when usingchangeTracking
. - Compared pages will also be compared against previous scrapes that only have the
markdown
format without thechangeTracking
format. - Comparisons are scoped to your team. If you scrape a URL for the first time with your API key, its
changeStatus
will always benew
, even if other Firecrawl users have scraped it before.
-
Beta Status: While in Beta, it is recommended to monitor the
warning
field of the resulting document, and to handle thechangeTracking
object potentially missing from the response.- This may occur when the database lookup to find the previous scrape to compare against times out.
Examples
Basic Scrape Example
Crawl Example
Tracking Product Price Changes
Monitoring Content Changes with Git-Diff
Billing
The change tracking feature is currently in beta. Using the basic change tracking functionality andgit-diff
mode has no additional cost. However, if you use the json
mode for structured data comparison, the page scrape will cost 5 credits per page.