Batch Scrape
Batch scrape multiple URLs
Batch scraping multiple URLs
You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.
How it works
It is very similar to how the /crawl
endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape.
The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape.
Usage
Response
If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape.
Synchronous
Asynchronous
You can then use the job ID to check the status of the batch scrape by calling the /batch/scrape/{id}
endpoint. This endpoint is meant to be used while the job is still running or right after it has completed as batch scrape jobs expire after 24 hours.
Batch scrape with extraction
You can also use the batch scrape endpoint to extract structured data from the pages. This is useful if you want to get the same structured data from a list of URLs.