Blog > Data Collection > Post

Top 5 Web Scraping Tools Comparison

Friday, March 09, 2018

Web scraping tools (also called data extraction tools or web scrapers) help you collect data from websites and store it on your local database or spreadsheets. There are a lot of web scraping tools on the market. Before choosing the right web scraping tools for your business, it’s important to know what each tool provides. Here I have a very comprehensive comparison chart for top 5 web scraping tools - Octoparse, parsehub, Mozenda, dexi.io and import.io.

 

Overview

Here I will give a brief introduction on these 5 web scraping tools.

Characteristics

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

Usability

 ★★★★★

 ★★★★

 ★★★★★  ★★★★★  ★★★★★

Functionality

 ★★★★☆

 

 

 

 

Easy to learn

 ★★★★★

 

 ★★

 

 ★★

Customer support

Email, phone, training, community support

Email, live chat, forum

Phone, email, video chat

Email, phone, community support

Email, training, chatbot, community support

Price

$19-Basic, $89-Standard, $249-Professional

$149-Standard, $499-Professional

Starting from $100 per 5000 pages

$119-Standard, $399-Professional, $699-Corporate

$299-Essential, $4999-Enterprise annual, $9999-Premium annual

Trial version/Free version

Free version- free,
profession trial Version - 5 days trial

Free version-free

30 days trial

Trial

7 days trial

OS (Specifications)

Win

Win, Mac, Linux

Win

Win, Mac, Linux

Win, Mac, Linux

Data Export formats

Txt, CSV, Excel, databases(MySql, SqlServer, Oracle)

CSV, Json

CSV, TSV, XML, Excel and Jason.

CSV, Excel, XML, Json, Zip

CSV, Json, Google sheets

Multi-thread

Yes

Yes

Yes

Yes

No

API

Yes

Yes

Yes, (specific API)

Yes

Yes

Scheduling

Yes

Yes

Yes

Yes

Yes

 

Build a crawler

A crawler is a task for scraping data from usually one website with unlimited/limited Page/URL inquiries. Here I will list the most important features when scraping data online.

Content

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

Built-in browser

Yes

Yes

Yes

No

Yes

Keyboard shortcuts

No

Yes

No

Yes

Yes

Pagination

Next button

Yes

Yes

Yes

Yes

Yes

Load more

Yes

Yes

Yes

Yes

Yes

Numbers

Yes

Yes

Yes

Yes

Yes

Infinite scrolling

Yes, support infinite scrolling times customized setting

Yes

Yes

Yes

Yes

Enter Text

Various keywords

Yes

Yes

Yes

Yes

Yes

Combine keywords from two lists

No

Yes

No

No

No

Date Inputs

No

Yes

Yes

No

No

"In tandem" loop

No

Yes

No

No

No

Enter a list of URL/keywords

A list of URLs

Yes

Yes

Yes

Yes

Yes

Json

No

Yes

No

No

No

Update URL lists

No

Yes

No

No

Yes

Input document

No

Yes, support Google Sheet input

Yes, support CSV input

Yes, support CSV input

No

Select data

Selecting elements

Point-and-click, XPath

Point-and-click, XPath, CSS

Point-and-click, XPath

Point-and-click, CSS

Point-and-click, XPath

Data formats

Text, HTML, url, etc.

Text, HTML, url, etc.

Text, HTML, url, etc.

Text, HTML, url, etc.

Text, HTML, url, etc.

Transforming data

Yes, via Regular Expression

Yes, via Regular Expression

Yes, via Regular Expression

Yes, via Regular Expression

Yes, via Regular Expression

Use scraped data as an input in one project

No

Yes, scrape data from a website and use it as an input for another website to scrape data

No

No

No

Customized serial number

No

Yes, adding a number variable that increments on each iteration

No

No

No

Different kinds of date strings

Yes

Yes

Yes

No

Yes

Crawlers/Tasks switch

Yes, support multi-thread operation

Yes, support switching to another crawler when configuring a crawler

No

Yes

No

 

Extract data

Some advanced features needed when extracting data:

Content

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

Scraping Mode

Local running and cloud running with Octoparse servers

Cloud running

Cloud running

Cloud running

Cloud running

Visual Mode

Yes

No

Yes

No

Yes

Test Run

No

Yes, up to 5 pages

Yes

Yes

No

Extract behind a login

Yes

Yes

Yes

Yes

Yes

Scheduling

Yes, support scheduling tasks in real time/daily/weekly/monthly

Yes

Yes

Yes, support choosing local timezone

Yes

IP Rotation

Yes, cloud running could automatically rotate IP

Yes

Yes, support IP rotation and choosing different geo-location before running a task

Yes

Yes

Solving Captcha

Yes, only available for local running

Yes, only available for Text Input Captcha

No

Yes, need to integrate with third-party Captcha solving platform

No

Error report/debug

Yes, missing data error report

No

Yes, error troubleshoot reminder

Yes, provide screenshots, error message, debug mode and the execution log

No

 

Get data

Features about how to get data:

Content

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

Extraction Speed or cloud servers distribution

Free plan

No cloud servers, depending on the local network

1 worker (approx. 5 pages/minute)

-

-

-

Standard plan

6 cloud workers, depending on the rule of the crawlers

4 workers (approx. 20 pages/minute)

Depending on paid page credits

1 worker

Depending on the number of URLs extraction

Professional plan

20 cloud servers, depending on the rule of the crawlers

24 workers (approx. 120 pages/minute)

Depending on paid page credits

3 workers

Depending on the number of URLs extraction

Concurrent running crawlers

Free plan

2 for local running

1

-

-

-

Standard plan

Unlimited for local running, 6 for cloud running

4

2

1

Depending on the number of URLs extraction

Professional plan

Unlimited for local running, 20 for cloud running

24

5

3

Depending on the number of URLs extraction

Customized servers

No

Yes, servers could be distributed manually

No

Yes

No

Local running

Yes

Only available withTest Run

Only available with Test Run

No

No

Cloud Running

Yes, different paid plans have different cloud extraction based on the cloud servers

Yes, different paid plans have different cloud extraction speed

Yes, different paid page credits have different cloud extraction speed

Yes, depending on different workers or robots

Yes

Notification on task completion

No

Yes, email

Yes, email

Yes, message

Yes, email

 

Data export and data storage

Content

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

Data export 

API

Yes

Yes

Yes

Yes

Yes

CSV

Yes

Yes

Yes

Yes

Yes

Json

No

Yes

Yes

Yes

No

Google Sheet

No

Yes, with API

Yes

Yes

Yes

Tableau

No

Yes, integrage with Tableau

No

No

Yes

Web

No

Yes

No

Yes

No

Data storage

Free

No, need to export the data to your own machine

14 days

-

-

-

Standard

3 months

14 days

1 GB storage

No

No

Professional

3 months

30 days

5 GB Storage

No

No

Enterprise

-

30 days

50GB Stroage

No

No

 

 

Solutions

Web scraping tools are used to scrape different kinds of websites. Here I list some typical websites that most people concerned.

Content

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

Job

LinkedIn

No, easy to be detected and banned by LinkedIn anti-web scraping techniques

No

No

No

No

Glassdoor

Yes

Yes

Yes

Yes

Yes

SNS

FaceBook

Yes

No

No

No

No

Twitter

Yes

Yes

Yes

Yes

Yes

Instagram

Yes

Yes

Yes

Yes

Yes

Real estate

Airbnb

No, the updated website is not compatible with Octopare built-in browser

No

No

No

No

Booking

Yes

Yes

Yes

Yes

Yes

Realtor.com

Yes

Yes

Yes

Yes

Yes

Tripadvisor

Yes

Yes

Yes

Yes

Yes

Product details

Yellowpages

Yes

Yes

Yes

Yes

Yes

Yelp

Yes

Yes

Yes

Yes

Yes

Amazon

Yes

Yes

Yes

Yes

Yes

e Bay

Yes

Yes

Yes

Yes

Yes

Maps

Google Maps(latitude and longitude data)

Yes

Yes

No

No

Yes

Others

Google

Yes

No

No

No

No

 

 

Premium Plans and support

Content

Octoparse

Parsehub

Mozenda

Dexi.io

Import.io

All Paid Plans

Download Images and Files to Dropbox

No

Yes, integrate with Dropbox

Yes, integrate with Dropbox

Yes, integrate with Dropbox

No

Download Images and Files to Amazon S3

No

Yes, integrate with Amazon S3

Yes, integrate with Amazon S3

Yes, integrate with Amazon S3

No

Outer proxy

Yes, also available for free plans

Yes

No

Yes

No

IP rotation

Yes

Yes

Yes

Yes

Yes

Crawlers/tasks

Free plan

10 crawlers

5 public projects

-

-

-

Standard plan

100 crawlers

20 private projects

1 user, 10 agents

1 worker

Depending on the number of URLs extraction

Professional plan

250 crawlers

120 private projects

2 users, 50 agents

3 workers

Depending on the number of URLs extraction

Enterprise/custom plan

-

Custom

3 users, unlimited agents

Custom

Depending on the number of URLs extraction

Enterprise

OCR - Optical Character Recognition

No

Yes, scrape text out of images

Yes, scrape text out of document

No

Yes

URL queries

Free plan

Unlimited

200

-

-

-

Standard

Unlimited

10,000

5000 (up to 25000)

Unlimited URLs with limited scraping time

5000

Professional

Unlimited

Unlimited

25000 (up to 125000)

Unlimited URLs with limited scraping time

250000

Enterprise

-

Unlimited

Starting from 100000

Unlimited URLs with limited scraping time

1000000

Support

Response time

Within 1 day

1 day

1 day

1 day

1 day

Support system

No

Intercom

Ticket

No

Intercom

 

Author: The Octoparse Team

 

More related sources:

What is data extraction?

Best Data Scraping Tools for 2018 (Top 10 Reviews)

Big Data: 70 Amazing Free Data Sources You Should Know for 2017

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png