Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
Lesson 7: Execute tasksThursday, August 16, 2018
Now that you know how to capture data from different kinds of web pages, you are all good to start getting some data by running your task via Local Extraction or Cloud Extraction.
1) Run tasks with "Local Extraction"
When running a task locally via "Local Extraction", you are utilizing local resources including the operating system, hardware capacity, IP address, as well as the network bandwidth. These are also the key factors that could have influenced the extraction process, such as how fast the extraction runs, whether a particular website is loading, or if access to any websites is being blocked. While using local extraction, the data extracted will only be stored locally on your own machine and will be replaced by new data if the extraction is set to run for the second time.
Local extraction is very useful for test running a task to see if the task is working as expected. Once the task is tested correct, you can wait for the local extraction to finish or set the task to run in the Cloud for improved performance.
- Click "Save and run" on "Action Tips" or click the "Start Extraction" button to start running your task.
- Select "Local Extraction" to start a local job
As the extraction starts to run locally, you are able to see how Octoparse is interacting the webpage from the built-in browser, if the steps in the workflow are being executed as expected. The data extracted are added to the "Data extracted" pane right below the browser dynamically as more data gets captured.
Metrics including the amount of data extracted, the total time spent, as well as the average extraction speed, are provided right below the "Data extracted" pane.
Alternatively, you can check the dashboard for the the total number of lines extracted.
A few extra settings are available by clicking on the "Extraction settings" button right on top of the extraction window:
· Display error message during "Local Extraction" process
· Disable image loading in "Local Extraction"
· Automatic release memory
1. Where does the task extraction take place while using "Local Extraction"?
When you run your task with "Local Extraction", the task runs locally on your machine using your own local IP address.
2. What affects the speed of "Local Extraction"?
The speed of "Local Extraction" is affected by your computer performance, internet connection as well as the loading speed of the target website.
2) Run tasks with "Cloud Extraction" (for premium plans)
When you run a task with "Cloud Extraction", the task would be run on the Octoparse cloud platform, which allows tasks to run 24/7 even with your computer or the app shut down. Advanced features such as automatic IP rotation, task scheduling, extraction speed up, and Octoparse API are all parts of the Octoparse Cloud service (see all benefits of Octoparse Cloud service ).
1. What are the IP’s of the cloud servers?
When you execute your tasks in the cloud, tasks will be run on our cloud servers, each with a unique IP. When a task is set to run with "Cloud Extraction", 6-20 servers will be assigned to run the task simultaneously, minimizing the chances of being blacklisted by the target website.
2. How does "Cloud Extraction" speed up the extraction process?
When a task is configured as a split-table task, it further breaks down into numerous sub-tasks that can be running simultaneously in the Cloud, thus speed up the extraction (see what type of task is split-table ).
- Click "Cloud Extraction" to start running a task in the Cloud.
If your task is configured correctly, data will be extracted and stored in the Cloud where it can be accessed from any machine.
Check the dashboard for the progress of the job or filter the task list for "task staus".
The amount of data extracted and extraction time spent, are also available right below task status on the dashboard.
- Most popular tutorials
- Scrape product information from Amazon
- How to download images from a list of URLs?
- Extract multiple pages through pagination
- Scraping info from Craigslist
- Scraping search results from Google Scholar