Run tasks on local machine

Sunday, April 08, 2018 4:43 AM

Web scraping tasks created in Octoparse can be run on your local machine (Local Extraction) or in the cloud (Cloud Extraction ). Running tasks locally can help you,

1) troubleshoot/debug any workflow issues
2) Extract the data without utilizing cloud resources

Tips! 

Local Extraction is available for both free and premium users. For free users, it is limited to 10,000 records of data exported each time and 2 concurrent local runs ; For premium users (Standard & Professional), there is no limitation on records of data exported and concurrent local runs.

 

In this tutorial, we will go through following features:

 

 

 

Run tasks on Local Extraction

In Wizard Mode , when Octoparse proceeds to "complete", you can click "Local Extraction" to execute the crawler on your local machine.

 

In Advanced Mode , after the completion of configuring your task, click "Start Extraction" and then select "Local Extraction" to run the task locally.

 

Then you can see the running process of the task and view the data extracted.

 

 

 

Settings of Local Extraction

When the task is running, you are able to modify the "Extraction settings" for your local tasks. By default, Octoparse disables these three functions. You can enable them based on the requirements of your task.

Display error message: Error message will show up in the built-in browser when there is an error, such as data missing. 

Loading image: Disable image loading to speed up opening the webpage.

Memory release: Local extraction can easily eat up your computer memory. Select "Memory release" to release.

 

Tips!

1. Where does the local task run?

Local Extraction is running the crawler with your own IP and some websites may limit the visit times of the same IP. Under this circumstance, the crawler is likely to be blocked if it runs on websites over the limitation.

2. What will affect Local Extraction?

As the crawler is running on the local machine, it will be affected by the local network speed and hardware configuration.

 

 

Related articles:

Cloud Extraction 

Wizard Mode 

Advanced Mode 

Concurrent runs 

 

 

btn_sidebar_use.png
btn_sidebar_form.png