Octoparse vs. Content Grabber comparison: which is better for web scraping?
Sunday, January 17, 2021As there are different web scrapers, a problem appeared: which one is the best custom scraper focused on our specific needs and scrape everything? Most off-the-shelf web scrapers are often quite generic and mostly designed to perform a common and simple task (refer to Top 5 Web Scraping Tools Review for more information). That’s to say, they may not appear to be as flexible and universal as you’d expect. So in this post, I will compare the web scraper Octoparse and Content Grabber to give you some insights before choosing the web scraping service that will serve you for a long time for data extraction.
Features Comparison
Here is a form of the features comparison between Octoparse and Content Grabber:
Feature |
Octoparse |
Content Grabber |
General Rule |
||
Authoring environment |
The windows-based software application (available for MAC with the virtual machine) |
The windows-based software application (available for MAC with the virtual machine) |
Smart Mode |
Yes, getting extracted data just by entering the target URL |
No |
Cloud service |
No |
|
Scraper logic |
Variables, loops, conditionals |
Variables, loops, conditionals |
Speed |
Fast parallel execution |
Fast parallel execution |
Hosting |
Hosted on a cloud of Octoparse servers if subscribed to Octoparse cloud or on the local machine |
Local machine |
Selecting elements |
Point-and-click, XPath |
Point-and-click, XPath |
Transforming data |
Regular expressions, string operations |
Regular expressions |
Speed |
Fast parallel execution |
Fast parallel execution |
Knowledge of HTML and HTTP |
Not required |
Required |
Knowledge of Regular expression and XPath |
Not necessary, but would be better for further exploration |
Not necessary, but would be better for further exploration |
Features Extraction |
||
Javascript, Ajax and dynamic content extraction |
Yes |
Yes |
Pop-ups, infinite scroll, hover contents, tabs, logging in |
Yes |
Yes |
Pagination |
Yes |
Yes |
Entering into search boxes |
Yes |
Yes |
Capture text, links, files, meta tags, HTML and much more |
Yes |
Yes |
Copy and paste commands, drag and drop commands |
Yes |
Yes |
Pre-configured crawlers for commonly scraped websites |
Yes |
No |
PDF and Excel extraction |
No |
Yes by using 3rd party document converters |
Image and videos extraction |
No, only able to extract the image or file URLs |
Yes |
IP Rotation |
Included in paid plans or manual IP proxy |
Yes by using 3rd party proxy rotation service Nohodo |
CAPTCHA |
Yes, on the local machine |
Yes, with a 3rd party CAPTCHA recognition service account |
Website crawler function |
Yes |
Yes |
Run-time configuration |
With a premium Octoparse account |
With a premium import.io account |
Remove duplicate data |
Yes |
Yes |
Track changes on a website |
Yes (Incremental extraction) |
Yes |
RegEx tool and XPath tool |
Yes |
No |
Command-line |
No |
Yes |
Data Export |
||
Data export |
CSV, Excel, TXT, Databases |
CSV, Excel, JSON, PDF, Databases |
API |
Yes |
Yes |
Support |
||
Debugging |
Yes, with limited functionality |
Yes |
Support |
Free professional support, tutorials, community support |
Paid service |
So what could Octoparse and Content Grabber both do for you?
Octoparse offers most of the web scraping power and scale of Content Grabber in a much easier-to-use package. Content Grabber is designed to work at a higher level in which most of the features of Octoparse are bundled together.
Both Octoparse and Content Grabber stand for the new visual web scraper on the market. They both have the simple-and-click UI where users browse the website and click on the data elements in the order of collecting them.
Like a bot, they could follow the links to go into the deeper web pages by clicking the items and extracting the data on the other pages. They both offer API options, IP rotation, and services to schedule extractors running in real-time. Also, they are able to get data in CSV format and transform data by manually modifying Regular expression.
What’s more, they can be instructed to do more than just extract data. They have a variety of options to choose from, making it possible to get data from interactive websites. You can instruct them to scrape data from very complex and dynamic sites because they can:
- Sign in to accounts
- Select choices from dropdown menus, pop-ups, hovers
- Search with a search bar
- Go to a new page simply by clicking on a "next" button
- Get data from infinitely scrolling pages and other dynamic webpages
- ...
This means that these two web scrapers can be as flexible and universal as you’d expect. They could deal with:
- Difficult tables, like merged tables, tables with an indefinite number of columns, missing values and so on.
- Difficult blocks layouts, especially those in which there is no direct HTML association among the data presented on a screen, like extracting all the products skipping advertisements, scraping discounted products only.
- Test list, when the HTML DOM structure is plain.
- Invalid HTML: unescaped characters, non-HTML tags, unclosed tags, unmatched quotes, missed spaces, invalid tag nesting.
- Scrape behind a login. Both scrapers could submit a login form via POST, HTTP 302 Redirect outwork and cookie storing performance.
- CAPTCHA solving.
Both data extraction tools actually have a lot of functionality to extract all kinds of websites if you could fully explore their functionality. And as a fan of Content Grabber, I will recommend Content Grabber for a few situations:
- Tight integration with existing python codebase and infrastructure via API
- Advanced debugging tool
- Third-party Captcha solution
We are working on solving the second issue to make Octoparse more humane.
However, if you are starting out, we encourage you to try Octoparse which will get you up and running much faster and for free or with a much cheaper cost.
Cost Comparison
At first glance, the main difference between the two services appears to be their pricing. Octoparse packages capabilities into conventional software-as-a-service (SaaS) plans Free, Standard ($89) and Professional ($189).
Content Grabber is a paid service. There are two purchasing methods for Content Grabber users, buying a license and monthly subscription. The license version (three editions) outright gives you a perpetual license, pricing from $449 to $2495. The monthly subscription will charge upfront each month. There are also three editions pricing from $69 to $299.
Brand |
Octoparse |
Content Grabber |
||||
Basic |
Standard |
Professional |
Server |
Professional |
Premium |
|
Monthly plan ($) |
Free |
89 |
189 |
69 |
149 |
299 |
Yearly plan/License($) |
Free |
900 |
1896 |
449 |
995 |
2495 |
The big difference between Octoparse and Content Grabber premium plans is that there’s no limited license and users for Octoparse. That’s to say, more than one user could use Octoparse at different computers with the same premium account. Content Grabber is licensed per user per computer. This means you need a license for each computer where Content Grabber is installed, and if the computer is accessed by more than one user, you need a license for each user using the software on the computer. Also, one license does not cover both your desktop computer and your laptop, or both your office computer and your home computer.
You could see Octoparse free plan grants powerful functionality without defining how many web pages you could extract for one task. The higher version mainly offers more tasks and faster speed for more money and IP rotation. Also, only the premium plans enable you to schedule the crawlers and run the crawlers on a regular basis.
For Content Grabber, versions are different from different functionality: export function, API, self-contained agents, etc. The charging is also different for maintenance and support.
If you don't want to learn how to use a tool and just want your data on demand, both Octoparse and Content Grabber provide data service extracting data for you. Just contact the sales of both companies and they will scrape data from the website you want.
Web Scraping Example
The video below would show you how to make a crawler/agent in the web scraper Octoparse and Content Grabber. These two projects are both used to scrape the US Yellowpages. Just click the link to dig out more details.
Octoparse Project: https://www.youtube.com/watch?v=hSVjxElKIUc
Content Grabber Project: https://www.youtube.com/watch?v=vr-IggETB5Q
Conclusion: Octoparse and Content Grabber
Like the earlier comparison, Octoparse vs Content Grabber is somewhat of an apples-to-oranges comparison. Content Grabber is designed to work at a higher level in which most of the features of Scrapinghub are bundled together. If you are just starting out, we encourage you to try Octoparse which will easily get you up with free version or at a much lower cost.
As a final note, if there’s something wrong with the information above, just contact me here.
Artículo en español: Comparación Octoparse vs. Content Grabber: ¿Cuál es mejor para el web scraping?
También puede leer artículos de web scraping en el sitio web oficial
Top 20 Web Scraping Tools to Scrape the Websites Quickly
Top 30 Big Data Tools for Data Analysis
Web Scraping Templates Take Away
How to Build a Web Crawler - A Guide for Beginners
Video: Create Your First Scraper with Octoparse 7.X
Most popular posts
Posts by topic