Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
Run/Schedule tasks in the cloudThursday, August 16, 2018
Octoparse offers a powerful cloud platform for premium users (Standard & Professional ) to run your tasks 24/7.
When you run a task with "Cloud Extraction", it runs in the cloud with multiple servers using our IP's. You can shut down the app or your computer while the task is running. No need to worry about hardware limitation. Data extracted will be saved in the cloud and can be accessed any time.
Task scheduling is also supported by Octoparse cloud extraction. To retrieve the most updated information, you can schedule your task to run as frequently as you need.
Features covered in this tutorial：
To run your task with cloud extraction:
When you finish configuring your task, click "Start Extraction" and select "Cloud Extraction" to execute a run in the cloud.
Once a task is set to run in the cloud, its status will change to "Running in the cloud" on the dashboard. At the same time, the amount of data extracted and the extraction time spent will be shown under task status. You can filter the tasks by their status when you click on the arrow for "Status".
To batch run tasks with cloud extraction:
Select any tasks that need to be run, click on then select "Cloud Extraction".
Settings of cloud extraction:
Octoparse cloud extraction allows for executing multiple tasks simultaneously.
On the Standard Plan you can run 6 concurrent tasks in the cloud (6 cloud servers available), and on the Professional Plan you can run 20 concurrent tasks (20 cloud servers available). To set the maximum number of tasks running in parallel, click and select a desired number from the drop-down options:
1. How’s the performance of cloud extraction?
Getting data extracted in the Cloud can be a lot faster than running the tasks locally given the task is spit-table (Learn about when a task is spit-table ). A spit-table task can be broken down into multiple subtasks which can be run on multiple servers simultaneously, thus making the extraction faster.
2. Can I run more tasks than the maximum number's allowing for?
Yes, you can. But some of the tasks will be queued until more cloud servers become available upon completion of the earlier tasks.
To schedule a run in the cloud:
When you finish configuring your task, click "Start Extraction" and select "Set a Schedule".
Select how frequently you want to run it: Once/Weekly/Monthly/Interval. And customize the time and date according to your data requirements. Click "Start" and the task will be run as scheduled.
Time for the next execution can be found on the dashboard. And if you wish to cancel a scheduled task, click "More Actions", select "Do not run on schedule" in "Cloud Extraction" tab.
Note that if you "Save" a schedule instead of starting it right away, you will need to click "More Actions" and select "Run on Schedule" in "Cloud Extraction" tab to start.
What's the default time zone for Octoparse Cloud platform?
The next execution time shown on the dashboard is defaulted to your local time zone (according to your operating system). But regardless of the location, time/date extracted by cloud extraction has a default time zone of 0 (UTC±00:00). Currently, Octoparse does not support changing the timezone.
To set a schedule for a group of tasks, click , select a task group, then choose "Schedule Cloud Extraction for the group"
Other advantages of Cloud Extraction:
- Most popular tutorials
- Extract multiple pages through pagination
- Scraping info from Craigslist
- Scraping search results from Google Scholar
- Scraping restaurant info from Grubhub
- Scrape product images from eBay