As there are different web scrapers, a problem appeared: which one is the best custom scraper focused on our specific needs and scrapes everything? Most off-the-shelf web scrapers are often quite generic and are mostly designed to perform a common and simple task (refer to Top 5 Web Scraping Tools Review for more information). That being said, they may not appear to be as flexible and universal as you’d expect. So in this post, I will compare the web scraper Octoparse and Content Grabber to give you some insights before choosing the web scraping service that will serve you for a long time for data extraction.
Here is a form of the features comparison between Octoparse and Content Grabber:
|Authoring environment||The windows-based software application (available for MAC with the virtual machine)||The windows-based software application (available for MAC with the virtual machine)|
|Smart Mode||Yes, getting extracted data just by entering the target URL||No|
|Scraper logic||Variables, loops, conditionals||Variables, loops, conditionals|
|Speed||Fast parallel execution||Fast parallel execution|
|Hosting||Hosted on a cloud of Octoparse servers if subscribed to Octoparse cloud or on the local machine||Local machine|
|Selecting elements||Point-and-click, XPath||Point-and-click, XPath|
|Transforming data||Regular expressions, string operations||Regular expressions|
|Speed||Fast parallel execution||Fast parallel execution|
|Knowledge of HTML and HTTP||Not required||Required|
|Knowledge of Regular expression and XPath||Not necessary, but would be better for further exploration||Not necessary, but would be better for further exploration|
|Pop-ups, infinite scroll, hover contents, tabs, logging in||Yes||Yes|
|Entering into search boxes||Yes||Yes|
|Capture text, links, files, meta tags, HTML and much more||Yes||Yes|
|Copy and paste commands, drag and drop commands||Yes||Yes|
|Pre-configured crawlers for commonly scraped websites||Yes||No|
|PDF and Excel extraction||No||Yes by using 3rd party document converters|
|Image and videos extraction||No, only able to extract the image or file URLs||Yes|
|IP Rotation||Included in paid plans or manual IP proxy||Yes by using 3rd party proxy rotation service Nohodo|
|CAPTCHA||Yes, on the local machine||Yes, with a 3rd party CAPTCHA recognition service account|
|Website crawler function||Yes||Yes|
|Run-time configuration||With a premium Octoparse account||With a premium import.io account|
|Remove duplicate data||Yes||Yes|
|Track changes on a website||Yes (Incremental extraction)||Yes|
|RegEx tool and XPath tool||Yes||No|
|Data export||CSV, Excel, TXT, Databases||CSV, Excel, JSON, PDF, Databases|
|Debugging||Yes, with limited functionality||Yes|
|Support||Free professional support, tutorials, community support||Paid service|
So what could Octoparse and Content Grabber both do for you?
Octoparse offers most of the web scraping power and scale of Content Grabber in a much easier-to-use package. Content Grabber is designed to work at a higher level where most of the features of Octoparse are bundled together.
Both Octoparse and Content Grabber stand for the new visual web scraper on the market. They both have a simple-and-click UI where users browse the website and click on the data elements in order to collect them.
Like a bot, they could follow the links to go into the deeper web pages by clicking the items and extracting the data on the other pages. They both offer API options, IP rotation, and services to schedule extractors running in real-time. Also, they are able to get data in CSV format and transform data by manually modifying Regular expression.
What’s more, they can be instructed to do more than just extract data. They have a variety of options to choose from, making it possible to get data from interactive websites. You can instruct them to scrape data from very complex and dynamic sites because they can:
- Sign in to accounts
- Select choices from the dropdown menu, pop-up, hover
- Search using the search bar
- Go to a new page simply by clicking on the “next” button
- Get data from infinitely scrolling pages and other dynamic webpages
This means that these two web scrapers can be as flexible and universal as you’d expect. They can deal with:
- Difficult tables, like merged tables, tables with an indefinite number of columns, missing values and so on.
- Difficult block layouts, especially those in which there is no direct HTML association among the data presented on a screen, like extracting all the products skipping advertisements, scraping discounted products only.
- Test list when the HTML DOM structure is plain.
- Invalid HTML: Unscaped characters, non-HTML tags, unclosed tags, unmatched quotes, missing spaces, invalid tag nesting.
- Scrape behind the login. Both scrapers can submit a login form via POST, HTTP 302 redirects to outwork and cookie storing performance.
- CAPTCHA solving.
Both data extraction tools have a lot of functionality to extract all kinds of websites if you could fully explore their functionality. As a fan of Content Grabber, I would recommend Content Grabber for a few situations:
- Tight integration with existing python codebase and infrastructure via API
- Advanced debugging tool
- Third-party Captcha solution
We are working on solving the second issue to make Octoparse more humane.
However, if you are starting out, we encourage you to try Octoparse which will get you up a
At first glance, the main difference between the two services appears to be their pricing. Octoparse packages capabilities into conventional software-as-a-service (SaaS) plans Free, Standard ($89) and Professional ($189).
Content Grabber is a paid service. There are two purchasing methods for Content Grabber users: buying a license and monthly subscription. The license version (three editions) outright gives you a perpetual license, pricing from $449 to $2495. The monthly subscription will be charged upfront each month. There are also three editions pricing from $69 to $299.
|Monthly plan ($)||Free||89||189||69||149||299|
The big difference between Octoparse and Content Grabber premium plans is that there are no limited licenses and users for Octoparse. That’s to say, more than one user could use Octoparse at different computers with the same premium account. Content Grabber is licensed per user per computer. This means you need a license for each computer where Content Grabber is installed, and if the computer is accessed by more than one user, you need a license for each user using the software on the computer. Also, one license does not cover both your desktop computer and your laptop, or both your office computer and your home computer.
You can see that the Octoparse free plan grants powerful functionality without defining how many web pages you could extract for one task. The higher version mainly offers more tasks and faster speed for more money and IP rotation. Also, only the premium plans enable you to schedule the crawlers and run the crawlers on a regular basis.
For Content Grabber, versions are different from different functionality: export function, API, self-contained agents, etc. Charges are also different for maintenance and support.
If you don’t want to learn how to use a tool and just want your data on demand, both Octoparse and Content Grabber provide data service extracting data for you. Just contact the sales of both companies and they will scrape data from the website you want.
Octoparse and Content Grabber
Like the earlier comparison, Octoparse vs Content Grabber is somewhat of an apple-to-orange comparison. Content Grabber is designed to work at a higher level where most of the features of Scrapinghub are bundled together. If you are just starting out, we encourage you to try Octoparse which will easily get you up with a free version or at a much lower cost.