Octoparse Cloud ServiceWednesday, March 01, 2017 4:05 AM
Octoparse has always been dedicated to providing users with seamless web scraping experience. Octoparse Cloud Service offers greater scalability and data extraction efficiency by providing numbers of advanced options such as on-scheduled data extraction, extracting voluminous data or scraping from anti-bot webpages. Now, let us find out if Octoparse Cloud Services is what you are looking for.
What is Octoparse Cloud Service ?
Octoparse Cloud Service offers a Cloud-based platform for users to run their extraction tasks in the cloud 24/7. With the Cloud Service, users need not worry about any hardware limitation and will be set free from high cost hardware maintenance.
How does Cloud Service Work ?
Octoparse Cloud employs distributed system to enable multi-threads processing, what it means is, the same sets of data are crawled on 6 or more virtual machines simultaneously. The result is a much more efficient scraping experience when comparing to a local extraction. Once an extraction task is configured and set to run in the Cloud, data will be scraped automatically according to the pre-set schedule. Users can assess the data any time any where after the extraction is completed.
Benefits of the Cloud
There are some websites that are particularly sensitive to web scraping and takes some serious anti-bot measures to thwart any possible scraping behaviors. Octoparse Cloud service is supported by thousands of cloud servers, each with a unique IP address. When an extraction task is set to execute in the Cloud, 6 to 14 random cloud servers will be assigned to run the task simultaneously. Requests are performed on the target website through various IP’s, minimizing the chances of being traced and blocked by the target website.
Extraction Speed Up
Cloud Extraction can be considerably faster compared with Octoparse Local Extraction. On average, with Octoparse Cloud Service, the same set of data can scraped 6 to 14 times as fast as with local extraction. This is because there are 6 to 14 cloud servers scraping the data simultaneously. User will be able to prioritize different tasks by adjusting the number of cloud servers contributing to each extraction task in the Cloud.
Scheduling Task Execution
An extraction task can be scheduled to execute any time when running in the Cloud. It should be noted here scheduling is only possible with Cloud Service and not on a local machine. Once a task is configured and set to run in the Cloud, the task will be executed automatically at the scheduled time. Extracted data will then be saved in the Cloud and is accessible by user any time any where. Currently, Octoparse Cloud service allows user to schedule extraction as frequent as every one-minute, further supporting extraction from websites with frequent updates.
Octoparse provides an API for users to access the extracted data directly without accessing the App. Upon connected with the Octoparse API, data can be delivered automatically to users’ own systems, in any frequency, including in real time. The Octoparse API makes it possible for user to get data, export data, publish data, or use it in any creative ways tailored to their business needs.
When do you need the Cloud Service ?
- If you have voluminous data extractions tasks that need to be tackled fast.
- If you need to schedule extraction tasks to run regularly
- If you need the scraped data be exported automatically
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!