undefined
Blog > Post

Cloud Extraction Works 24/7 with Speed 3-10 Times Faster than Local Extraction

Monday, March 7, 2022

A few years ago, we wrote a web crawler to parse and extract data from websites. In this process, the most painful thing was the data extraction tasks were interrupted in some circumstances. For example, the computers shut down suddenly because of unexpected reasons, or the IP was blocked by the targeted website because of frequent access. 

In order to resolve this problem, we’ve developed Cloud Extraction.

 

#1 Cloud Extraction

Cloud Extraction means data extraction tasks running in the cloud. You need to configure a rule and upload it to our cloud platform. Then your task will be reasonably assigned to one or several cloud servers to extract data simultaneously via central control commands. For example, you have configured a rule to extract data across pages ( 99 pages in total). Well, your tasks will be automatically divided into three sections and evenly assigned to three cloud servers to extract data at the same time. In this way, it will only take you one third of the original time to extract data from 99% websites.

 

#2 Avoid IP Being Blacklisted

Moreover, Cloud extraction can avoid various errors so we don’t have to worry about occasional network interruption anymore. When this occurs, cloud servers can resume its work immediately as soon as the network connection is available again. And also, we are no longer worried about IP being blacklisted. Cloud Extraction provides you with a huge number of IP addresses in Professional Edition. Cloud Extraction resolves this issue effectively by assigning your tasks to several cloud servers and speeding up the extraction speed. 

 

#3 API

If you need to extract data at a specified time or update your data once an hour, you can make a scheduled task for Cloud Extraction. 

If you find some data haven’t been extracted, you can launch Octoparse to extract these missing data again.

Cloud service also provides you API to link your system and Octoparse closely, which enables you to directly export the extracted data into your database. So for those who want to update their system data in real-time, Octoparse is your best choice. Just make a schedule to obtain the latest data, and then automatically link and update your system automatically in real-time.

 

Octoparse API documents:

 

  

cloud_extraction_works_24_7_with_speed_3_10_times_faster_than_local_extraction

 

We are pleased to announce that we released a new version of Octoparse and we are very excited by its unique features. Octoparse is a free web scraper for collecting data from the web. Based on the popularity in China market, where Octoparse already has more than 180,000 users, we decided to break into an international market. 

We are glad to help and make our product even better for you. If you find any missing feature, please feel free to contact us.

support@octoparse.com

 

 

Author: The Octoparse Team 

 

More Resources

 

Web Scraping Templates Take Away

Locate Element with XPath

Octoparse Regular Expression Tool (RegEx)

Deal with AJAX

Cloud Extraction: Scrape at Large Scale

Connect Octoparse API Step by Step

 

 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept Close