Firecrawl can track changes between the current page and a previous version, and tell you if it updated or not
Change tracking allows you to monitor and detect changes in web content over time. This feature is available in both the JavaScript and Python SDKs.
changeTracking format, you can monitor changes on a website and receive information about:
previousScrapeAt: The timestamp of the previous scrape that the current page is being compared against (null if no previous scrape)changeStatus: The result of the comparison between the two page versions
new: This page did not exist or was not discovered before (usually has a null previousScrapeAt)same: This page’s content has not changed since the last scrapechanged: This page’s content has changed since the last scraperemoved: This page was removed since the last scrapevisibility: The visibility of the current page/URL
visible: This page is visible, meaning that its URL was discovered through an organic route (through links on other visible pages or the sitemap)hidden: This page is not visible, meaning it is still available on the web, but no longer discoverable via the sitemap or crawling the site. We can only identify invisible links if they had been visible, and captured, during a previous crawl or scrape'changeTracking' in the formats when scraping a URL:
formats array:
git-diff mode provides a traditional diff format similar to Git’s output. It shows line-by-line changes with additions and deletions marked.
Example output:
files: Array of changed files (in web context, typically just one)chunks: Sections of changes within a filechanges: Individual line changes with type (add, delete, normal)json mode provides a structured comparison of specific fields extracted from the content. This is useful for tracking changes in specific data points rather than the entire content.
Example output:
markdown format must also be specified when using the changeTracking format. Other formats may also be specified in addition.markdown format, and the tag parameter.
includePaths/excludePaths will have inconsistencies when using changeTracking.includeTags/excludeTags/onlyMainContent will have inconsistencies when using changeTracking.markdown format without the changeTracking format.changeStatus will always be new, even if other Firecrawl users have scraped it before.warning field of the resulting document, and to handle the changeTracking object potentially missing from the response.
git-diff mode has no additional cost. However, if you use the json mode for structured data comparison, the page scrape will cost 5 credits per page.