Top 5 Web Scraping Tools Comparison
Thursday, January 21, 2021Web scraping tools (also called data extraction tools or web scrapers) help you collect data from websites and store them on your local database or spreadsheets. There are a lot of web scraping tools on the market. Before choosing the right web scraping tools for your business, it’s important to know what each tool provides. Here I have a very comprehensive comparison chart for the top 5 web scraping tools - Octoparse, Parsehub, Mozenda, Dexi.io, and Import.io.
Overview
Here I will give a brief introduction on these 5 web scraping tools.
Characteristics |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
Usability |
★★★★★ |
★★★★☆ |
★★★★★ | ★★★★★ | ★★★★★ |
Functionality |
★★★★☆ |
★★★★☆ |
★★★★☆ |
★★★★☆ |
★★★★☆ |
Easy to learn |
★★★★★ |
★★★★☆ |
★★★★★ |
★★★★☆ |
★★★★★ |
Customer support |
Email, phone, training, community support |
Email, live chat, forum |
Phone, email, video chat |
Email, phone, community support |
Email, training, chatbot, community support |
Price |
$19-Basic, $89-Standard, $249-Professional |
$149-Standard, $499-Professional |
Starting from $100 per 5000 pages |
$119-Standard, $399-Professional, $699-Corporate |
$299-Essential, $4999-Enterprise annual, $9999-Premium annual |
Trial version/Free version |
Free version- free, |
Free version-free |
30 days trial |
Trial |
7 days trial |
OS (Specifications) |
Win |
Win, Mac, Linux |
Win |
Win, Mac, Linux |
Win, Mac, Linux |
Data Export formats |
Txt, CSV, Excel, databases(MySql, SqlServer, Oracle) |
CSV, JSON |
CSV, TSV, XML, Excel and JSON. |
CSV, Excel, XML, JSON, Zip |
CSV, JSON, Google sheets |
Multi-thread |
Yes |
Yes |
Yes |
Yes |
No |
API |
Yes |
Yes |
Yes, (specific API) |
Yes |
Yes |
Scheduling |
Yes |
Yes |
Yes |
Yes |
Yes |
Build a crawler
A crawler is a task for scraping data from usually one website with unlimited/limited Page/URL inquiries. Here I will list the most important features when scraping data online.
Content |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
|
Built-in browser |
Yes |
Yes |
Yes |
No |
Yes |
|
Keyboard shortcuts |
No |
Yes |
No |
Yes |
Yes |
|
Pagination |
Next button |
Yes |
Yes |
Yes |
Yes |
Yes |
Load more |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Numbers |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Infinite scrolling |
Yes, support infinite scrolling times customized setting |
Yes |
Yes |
Yes |
Yes |
|
Enter Text |
Various keywords |
Yes |
Yes |
Yes |
Yes |
Yes |
Combine keywords from two lists |
No |
No |
No |
No |
||
Date Inputs |
No |
Yes |
No |
No |
||
"In tandem" loop |
No |
No |
No |
|||
Enter a list of URL/keywords |
A list of URLs |
Yes |
Yes |
Yes |
Yes |
Yes |
JSON |
No |
No |
No |
No |
||
Update URL lists |
No |
No |
No |
Yes |
||
Input document |
No |
Yes, support CSV input |
Yes, support CSV input |
No |
||
Select data |
Selecting elements |
Point-and-click, XPath |
Point-and-click, XPath, CSS |
Point-and-click, XPath |
Point-and-click, CSS |
Point-and-click, XPath |
Data formats |
Text, HTML, url, etc. |
Text, HTML, url, etc. |
Text, HTML, url, etc. |
Text, HTML, url, etc. |
Text, HTML, url, etc. |
|
Transforming data |
Yes, via Regular Expression |
Yes, via Regular Expression |
Yes, via Regular Expression |
Yes, via Regular Expression |
Yes, via Regular Expression |
|
Use scraped data as an input in one project |
No |
Yes, scrape data from a website and use it as an input for another website to scrape data |
No |
No |
No |
|
Customized serial number |
No |
Yes, adding a number variable that increments on each iteration |
No |
No |
No |
|
Different kinds of date strings |
Yes |
Yes |
No |
Yes |
||
Crawlers/Tasks switch |
Yes, support multi-thread operation |
Yes, support switching to another crawler when configuring a crawler |
No |
Yes |
No |
Extract data
Some advanced features needed when extracting data:
Content |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
Scraping Mode |
Local running and cloud running with Octoparse servers |
Cloud running |
Cloud running |
Cloud running |
Cloud running |
Visual Mode |
Yes |
No |
Yes |
No |
Yes |
Test Run |
No |
Yes, up to 5 pages |
Yes |
Yes |
No |
Extract behind a login |
Yes |
Yes |
Yes |
Yes |
Yes |
Scheduling |
Yes, support scheduling tasks in real-time/daily/weekly/monthly |
Yes |
Yes |
Yes, support choosing the local timezone |
Yes |
IP Rotation |
Yes, cloud running could automatically rotate IP |
Yes |
Yes, support IP rotation and choosing different geo-location before running a task |
Yes |
Yes |
Solving Captcha |
Yes, only available for local running |
Yes, only available for Text Input Captcha |
No |
Yes, need to integrate with third-party Captcha solving platform |
No |
Error report/debug |
Yes, missing data error report |
No |
Yes, error troubleshoot reminder |
Yes, provide screenshots, error message, debug mode and the execution log |
No |
Get data
Features about how to get data:
Content |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
|
Extraction Speed or cloud servers distribution |
Free plan |
No cloud servers, depending on the local network |
1 worker (approx. 5 pages/minute) |
- |
- |
- |
Standard plan |
6 cloud workers, depending on the rule of the crawlers |
4 workers (approx. 20 pages/minute) |
Depending on paid page credits |
1 worker |
Depending on the number of URLs extraction |
|
Professional plan |
20 cloud servers, depending on the rule of the crawlers |
24 workers (approx. 120 pages/minute) |
Depending on paid page credits |
3 workers |
Depending on the number of URLs extraction |
|
Concurrent running crawlers |
Free plan |
2 for local running |
1 |
- |
- |
- |
Standard plan |
Unlimited for local running, 6 for cloud running |
4 |
2 |
1 |
Depending on the number of URLs extraction |
|
Professional plan |
Unlimited for local running, 20 for cloud running |
24 |
5 |
3 |
Depending on the number of URLs extraction |
|
Customized servers |
No |
Yes, servers could be distributed manually |
No |
Yes |
No |
|
Local running |
Yes |
Only available with test Run |
Only available with Test Run |
No |
No |
|
Cloud Running |
Yes, different paid plans have different cloud extraction based on the cloud servers |
Yes, different paid plans have different cloud extraction speed |
Yes, different paid page credits have different cloud extraction speed |
Yes, depending on different workers or robots |
Yes |
|
Notification on task completion |
No |
Yes, email |
Yes, email |
Yes, message |
Yes, email |
Data export and data storage
Content |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
Data export |
|||||
API |
Yes |
Yes |
Yes |
Yes |
Yes |
CSV |
Yes |
Yes |
Yes |
Yes |
Yes |
JSON |
No |
Yes |
Yes |
Yes |
No |
Google Sheet |
No |
Yes, with API |
Yes |
Yes |
Yes |
Tableau |
No |
Yes, integrated with Tableau |
No |
No |
Yes |
Web |
No |
Yes |
No |
Yes |
No |
Data storage |
|||||
Free |
No, need to export the data to your own machine |
14 days |
- |
- |
- |
Standard |
3 months |
14 days |
1 GB storage |
No |
No |
Professional |
3 months |
30 days |
5 GB Storage |
No |
No |
Enterprise |
- |
30 days |
50GB Storage |
No |
No |
Solutions
Web scraping tools are used to scrape different kinds of websites. Here I list some typical websites that most people concern.
Content |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
|
Job |
|
No, easy to be detected and banned by LinkedIn anti-web scraping techniques |
No |
No |
No |
No |
Glassdoor |
Yes |
Yes |
Yes |
Yes |
Yes |
|
SNS |
|
Yes |
No |
No |
No |
No |
|
Yes |
Yes |
Yes |
Yes |
Yes |
|
|
Yes |
Yes |
Yes |
Yes |
Yes |
|
Real estate |
Airbnb |
No, the updated website is not compatible with Octopare built-in browser |
No |
No |
No |
No |
Booking |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Realtor.com |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Tripadvisor |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Product details |
Yellowpages |
Yes |
Yes |
Yes |
Yes |
Yes |
Yelp |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Amazon |
Yes |
Yes |
Yes |
Yes |
Yes |
|
e Bay |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Maps |
Google Maps(latitude and longitude data) |
Yes |
No |
No |
Yes |
|
Others |
|
Yes |
No |
No |
No |
No |
Premium Plans and support
Content |
Octoparse |
Parsehub |
Mozenda |
Dexi.io |
Import.io |
|
All Paid Plans |
Download Images and Files to Dropbox |
No |
Yes, integrate with Dropbox |
Yes, integrate with Dropbox |
Yes, integrate with Dropbox |
No |
Download Images and Files to Amazon S3 |
No |
Yes, integrate with Amazon S3 |
Yes, integrate with Amazon S3 |
Yes, integrate with Amazon S3 |
No |
|
Outer proxy |
Yes, also available for free plans |
Yes |
No |
Yes |
No |
|
IP rotation |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Crawlers/tasks |
Free plan |
10 crawlers |
5 public projects |
- |
- |
- |
Standard plan |
100 crawlers |
20 private projects |
1 user, 10 agents |
1 worker |
Depending on the number of URLs extraction |
|
Professional plan |
250 crawlers |
120 private projects |
2 users, 50 agents |
3 workers |
Depending on the number of URLs extraction |
|
Enterprise/custom plan |
- |
Custom |
3 users, unlimited agents |
Custom |
Depending on the number of URLs extraction |
|
Enterprise |
OCR - Optical Character Recognition |
No |
Yes, scrape text out of images |
Yes, scrape text out of document |
No |
Yes |
URL queries |
Free plan |
Unlimited |
200 |
- |
- |
- |
Standard |
Unlimited |
10,000 |
5000 (up to 25000) |
Unlimited URLs with limited scraping time |
5000 |
|
Professional |
Unlimited |
Unlimited |
25000 (up to 125000) |
Unlimited URLs with limited scraping time |
250000 |
|
Enterprise |
- |
Unlimited |
Starting from 100000 |
Unlimited URLs with limited scraping time |
1000000 |
|
Support |
Response time |
Within 1 day |
1 day |
1 day |
1 day |
1 day |
Support system |
No |
Intercom |
Ticket |
No |
Intercom |
日本語記事:注目のWebスクレイピングツール5選を徹底比較!
Webスクレイピングについての記事は 公式サイトでも読むことができます。
Artículo en español: Comparación de Las 5 Mejores Herramientas de Web Scraping
También puede leer artículos de web scraping en el Website Oficial
Author: The Octoparse Team
Top 20 Web Scraping Tools to Scrape the Websites Quickly
Top 30 Big Data Tools for Data Analysis
80 Best Data Science Books That Worth Reading
Most popular posts
Posts by topic