Octoparse 8.5: Empowering Local Scraping and MoreWednesday, February 16, 2022
Here is the exciting news: Octoparse 8.5 is now released with game-changing new features and major improvements. Previously we all know that we can count on cloud scraping when it comes to scraping fast at scale, but this time, we want to make local scraping just as competitive.
What's new in Octoparse 8.5?
Scraping Speed, Ease of Use, and Secure Data Storage are essential elements to a web scraping tool and its users. These are what Octoparse 8.5 is designed to focus on.
For this update, most of the work goes to Local Run/Local Scraping (compared to Cloud Scraping), dashboard task management, and some smaller optimizations such as Switch Cloud IP for a task & Time Zone Conversion.
- Though the main updates are covered in this article, there is more to be explored. Here is a comprehensive version of Octoparse 8.5 updates plus technical guides
- Why do we focus on Local scraping? Cloud scraping is powerful but not always omnipotent. Making local scraping just as flexible and powerful can greatly complement cloud scraping and all together they will make Octoparse a much more powerful web scraping tool, and create a seamless scraping experience for Octoparse users like you.
So there's a new release, what's in it for me?
If any of the below voices resonate with you, you will find the Octoparse 8.5 updates extremely helpful.
- Cloud scraping is cool and I rely more on local runs to get the data.
- I need the local scraping to go faster!
- I want the local run data to be sent to my database automatically just like the cloud run data.
- I need individual batches of data for all my runs.
- I get frustrated when I don’t know why my task doesn't work and I have no idea how to fix it.
- I'd like to pause the task for a while just to check things up and see if the data's been extracted accurately.
- I wish there was a way to manage my tasks more efficiently.
The rest of this article will lead you in and help you get a hang of the 8.5 new features faster. Let’s dive right in!
Live Logs for troubleshooting local runs
With Octoparse 8.5, you can now
- Check real-time logs for local runs (for task inspection)
- Pause & resume a local run when needed
Whether you are new to Octoparse or if you've already played around for a while, it's always difficult to find out why your task is not working as expected. And without knowing the causes, fixing it can be a nightmare. With the new Octoparse 8.5, you'll now be provided with the Error Log which tells you to your face what went wrong and where did it get stuck, such that fixing the problem becomes much easier as the problem gets spotted. No more guesswork.
If your task fails, tick on the "show error logs only", the logs will tell you exactly why the scraper gets stuck and what goes wrong during the scraping process. The error logs give a direct answer to how you shall fix your scraper and make it work again.
Now you know what the problem is. Just shoot it away!
Here are a few errors you may encounter and some approaches to fix it.
- A certain element not found - time to check your Xpath!
- Fail to load the webpage - check if anything wrong with your network or IP?
- AJAX timeout - increase your timeout limit
The logs will no longer be accessible if you close the local run window after the task is completed. If you need a second look at the logs or the errors, don’t forget to export the logs.
Boost mode for 3X faster local runs
Yes, Cloud scraping is fast and efficient. Yet, as the "Boost Mode" for local scraping comes along, speed is not the privilege for cloud scraping any more! Octoparse 8.5 introduces "Boost Mode" for local extraction for up to 3X faster extraction as the task splits itself into multiple subtasks that run concurrently. As a results, you'll get your data much faster.
Well, there are a few notes to be made with "Boost Mode".
- Boost mode is only applicable to tasks that are built with "splitable" loop such as a list of URLs, a list of text items, or a fix list of page elements.
- The exact number of tasks that you can run on your desktop in Boost Mode is highly dependable on the capacity of your device.
If local extraction is what you use, "Boost Mode" can take your web scraping experience to a next level. To some extent, it closes the gap between local run and cloud run by making a local run as fast and scalable as cloud run can be.
Read related tutorial: What's the difference between Standard Mode and Boost Mode?
Auto-backup local data to the Cloud
With Octoparse 8.5, you can now
- Access historical data for each run on your local device
- Auto back-up local run data to the cloud
(If you are interested in setting your local scraping automation process with Octoparse 8.5,
contact email@example.com for a free trial & more details.)
With the previous version, Octoparse only keep the last set of data for any local runs. As the Local Run History went live, you are now able to access every batch of data you have scraped with the same task. For example, if you run task A four times a week, all four batches of data will be stored indiviudally and assessible in your account.
Additionally, you can turn on the Auto Backup so that Octoparse will store your data in the Cloud after each run is completed. This is extremely helpful if you are using API to connect data to your database. In this way, you will be able to process not only cloud-run data but also local-run data on your side.
Switching on the Auto Backup will not trigger data backup of any previous runs to the Cloud, but only the data extracted for the later runs. If a run is completed and the batch of data hasn't been backed up to the Cloud yet, you can still backup the data to the Cloud manually.
Manage your task with batch actions
This particular update with the Dashboard aims to cut back repetitive work and make task management easier, especially for those that have a large list of tasks to take care of.
With Octoparse 8.5, you can now
- Manage multiple tasks at once using batch actions, such as duplicate task, stop cloud runs, schedule local runs, and etc.
- Sort/filter your tasks more efficiently using the new parameters included in the filters. You can even save the filter settings for later use.
While the main updates are included in this article, there are more to be explored. Here is a comprehensive version of Octoparse 8.5 updates plus technical guides.
Summary and further help
Beside all the above, there are still improvement to be discovered as you fiddle with the brand new 8.5 version yourself. If you have any problems or feedback with Octoparse 8.5 and would like to talk to us, feel free to contact us at firstname.lastname@example.org.
More step-by-step tutorials (for Octoparse 8.5 updates) are coming up:
- What's new in Octoparse 8.5
- Brand new local extraction of Octoparse 8.5
- Add the original URL (before redirecting) along with the data scraped
- Switch Cloud IP for a task
- What's the difference between Standard Mode and Boost Mode?
- How to convert the time zone of the current time field
- Backup local data to the Cloud