from firecrawl import Firecrawlfirecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")# Scrape a website:doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"])print(doc)
For more details about the parameters, refer to the API Reference.
Each scrape consumes 1 credit. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page.
You can now extract without a schema by just passing a prompt to the endpoint. The llm chooses the structure of the data.
Copy
from firecrawl import Firecrawlapp = Firecrawl(api_key="fc-YOUR-API-KEY")result = app.scrape( 'https://firecrawl.dev', formats=[{ "type": "json", "prompt": "Extract the company mission from the page." }], only_main_content=False, timeout=120000)print(result)
Output:
JSON
Copy
{ "success": true, "data": { "json": { "company_mission": "AI-powered web scraping and data extraction", }, "metadata": { "title": "Firecrawl", "description": "AI-powered web scraping and data extraction", "robots": "follow, index", "ogTitle": "Firecrawl", "ogDescription": "AI-powered web scraping and data extraction", "ogUrl": "https://firecrawl.dev/", "ogImage": "https://firecrawl.dev/og.png", "ogLocaleAlternate": [], "ogSiteName": "Firecrawl", "sourceURL": "https://firecrawl.dev/" }, }}
The branding format extracts comprehensive brand identity information from a webpage, including colors, fonts, typography, spacing, UI components, and more. This is useful for design system analysis, brand monitoring, or building tools that need to understand a website’s visual identity.
Copy
from firecrawl import Firecrawlfirecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')result = firecrawl.scrape( url='https://firecrawl.dev', formats=['branding'])print(result['branding'])
Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.Here is an example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot.It is important to almost always use the wait action before/after executing other actions to give enough time for the page to load.
When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to ‘US’ if not specified.
To make requests faster, Firecrawl serves results from cache by default when a recent copy is available.
Default freshness window: maxAge = 172800000 ms (2 days). If a cached page is newer than this, it’s returned instantly; otherwise, the page is scraped and then cached.
Performance: This can speed up scrapes by up to 5x when data doesn’t need to be ultra-fresh.
Always fetch fresh: Set maxAge to 0.
Avoid storing: Set storeInCache to false if you don’t want Firecrawl to cache/store results for this request.
Change tracking: Requests that include changeTracking bypass the cache, so maxAge is ignored.
Example (force fresh content):
Copy
from firecrawl import Firecrawlfirecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')doc = firecrawl.scrape(url='https://example.com', maxAge=0, formats=['markdown'])print(doc)
You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.
It is very similar to how the /crawl endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape.The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape.
If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape.
You can then use the job ID to check the status of the batch scrape by calling the /batch/scrape/{id} endpoint. This endpoint is meant to be used while the job is still running or right after it has completed as batch scrape jobs expire after 24 hours.