Throughout the years of working in the data industry, the Octoparse team had never slowed down its pace in making data more accessible and ready to all people. It’s rooted in our belief that in the era of big data, anyone should be blessed with the capability to collect data so as to harness the power of big data.
Yet, despite the improved usability of our program, the thorough step-by-step training resources, and even with such a friendly bunch of support we have at Octoparse, there are still a number of people feeling hesitant to use it due to limited time and effort. This November, we are extremely excited to introduce the release the Version 7.1 which includes one of the most revolutionary moves in years – Template Mode Scraping.
What makes Template Mode Scraping so special?
If you have ever wondered about the level of technical proficiencies required to build a web scraper? The answer is “None” with the newly launched Template Mode Scraping. More specifically, now there are dozens of built-in templates within the program and all ready to be used to fetch data instantly, with a nearly zero learning curve!
Many popular sites like Amazon, Indeed, Booking, Trip Advisors, Twitters, YouTube, Yellowpage, Walmart, Realtor, and many more are covered at this moment. And the best part is if you feel any website should be added, simply tell us and we’ll seriously consider having a template created for the site.
Who is this for?
Anyone! Yes, anyone that wants to get data fast and easily. If we already have a template you need, that’s great! if not, let us know.
Template Mode Scraping can be especially valuable to anyone that needs to extract data from some of the most popular websites out there and maybe those that would prefer to skip the learning and do not require a high level of data customization.
How is it different from the old Wizard Mode Scraping?
If you are not new to Octoparse, you may have already tried our old Wizard Mode Scrapers. In fact, the new Template Mode Scraping and Wizard Mode Scraping are completely different. The old Wizard Mode works for a few specific page structures while the Template Mode scrapers are pre-built scrapers that extract pre-defined data fields from specific websites. Contrary to the Wizard Mode which users are required to correctly identify the proper webpage structure and tell Octoparse what data fields need to be captured, the Template Scrapers take over all the heavy lifting so all you have to do is tell Octoparse your search criteria, i.e. iPhone then click “start” to get data.
How to use it?
Step 1. Select “Task Templates” from the home screen
Step 2. Pick a template
Step 3. Check the pre-defined data fields and parameters
Step 4. Select “Use Template”
Step 5. Enter the variable for the parameters, such as “iPhone” for the search keyword
Step 6. Save the template and run
And there are more upgrades…
Not to leave behind Octopuses’ commitment to large-scale scraping of even the most complex/difficult websites, the new release also included features focusing on more efficient, effective, and powerful data scraping.
1. Million-level URLs Input
Did you hate it when you can only input 20,000 URLs to any crawling task? We did so we’ve changed it. Now, you can add up to 1 million URLs to any task. Better yet, import the list of URLs from local files (txt, CSV, or xls) or from another task directly. You can even associate two running tasks by having one extract the URLs and the second one fetch additional data from each individual URL extracted. In short, you can now associate the two tasks directly without having to manually “transfer” the URLs from one task to another.
Moreover, the new URL Generator feature enables “generating” URL lists based on specific patterns. A straightforward example will be one that only has the page number changes.
Possible user cases include:
– Scraping from a large list of the URL list
– Scraping massive products from E-commercial sites. Getting product URLs and product details separately can greatly improve the efficiency and consistency of the scrapes, at the same time, also reduces the chance of getting blocked and missing data.
– Scraping sites that block easily. Tasks running on a list of URLs can be assigned to run on various servers and thus better leverage IP resources to avoid getting banned.
– Scraping from a large number of different pages from a particular website. Use the URL generator to quickly generate all the page URLs and scrape all the pages simultaneously. No need to go through the pages one by one.
2. Improved Dashboard
Compared to the Dashboard in version 7.0, the improved Dashboard layout is more informative, customizable, and efficient.
The new version offers two kinds of dashboard layouts to choose from based on your preference (arrange tasks by date created or by task groups). Also, choose what task information you would like to see in the dashboard, including scraping status, time used, number of runs, next run (if scheduled), and scraping completion time.
3. Upgraded Anti-blocking mechanism
- Auto switch browser (User-agent)
- Auto clear cookies
Two more anti-scraping options have been added to help reduce the chance of getting blocked by scraping-sensitive websites. In version 7.1, Octoparse can automatically switch UA and clear cookies for you.
The Next Step…
Octoparse is always working to bring you a more accessible scraping experience. There are two things we care the most: ease of use and robustness. Please share with us how you find the new features or what templates you need. We’ll love to hear your feedback!