Octoparse Cloud Service - Splitting Tasks to Speed Up Cloud Extraction

Monday, March 13, 2017 8:38 AM

 
In many cases if a task has not been split appropriately, there will be no obvious difference compared to a local extraction. In most cases, task split up can be accomplished by tweaking the loop from the workflow.

Here are some lists that can be split:
Fixed List
List of URL’s
Text List

 

And these lists can’t be split:
Single Element
Variable List

Let’s see how to split an extraction task using Fixed List.

In the screenshot below, a variable list is created by default. This way, the extraction process can only be run with a single cloud server. Cloud extraction is not going to outperform local extraction in this case.

 

 

To make the extraction faster, we could split the task by manually creating a fixed list. Here, we edit XPath //DIV[@id='mainResults']/UL[1]/LI and append an array sequence number to this XPath, such as //DIV[@id='mainResults']/UL[1]/LI[i] (i=1, 2, 3 ..., n). Input the edited XPath as a fixed list, hit ‘OK’ and ‘Save’. Now, see how the first item in the loop is detected accurately, just what we want. Click‘OK’ and‘Save’.

 

 

In the same way, edit the XPath with a sequential array number and input one by one, then hit ‘OK’ and‘Save’.

 

 

 Now, all loop items are detected correctly (just like when using a variable list)

 

 

If for some reasons, you will not like the task to be split (for example, if running locally), check the appropriate box to prevent task from splitting.

 

 

 

Octoparse professional plan users will get a maximum number of 14 cloud servers working at any time. Tasks that are split correctly will have numerous cloud servers working to extract simultaneously, scraping data a lot faster.

Users can adjust the maximum number of tasks running at the same time or prioritize those that are more important. Higher priority tasks will be run first while others will line up till a cloud server becomes available.

 

Author: The Octoparse Team

Download Octoparse Today

For more information about Octoparse, please click here.

Sign up today!

 

Author's Picks

Octoparse Cloud Service

Octoparse Cloud Service - Start your Cloud Extraction Now!

Reasons and Solutions - I get data from Local Extraction but none from Cloud Extraction?

Reasons and Solutions - Cloud Extraction Is Slower Than Local Extraction

Reasons and Solutions - Missing Data in Cloud Extraction

Web Scraping Service - Octoparse Cloud Extraction Works Better

Speed up Cloud Extraction (1)

Speed up Cloud Extraction (2)

 

Request Pro Trial Contact
us

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.