My Experience in Choosing a Free Web Crawler Software

As the world is drowning in data, crawling or scraping data is becoming more and more popular. Certain web data crawlers or scrapers software which are known as extraction tools shouldn’t be strangers any more to people with crawling needs. Most of these web data crawlers, scrapers, or extractors are web-based applications or can be installed in the local desk-top with a user-friendly UI.

I once tried crawling data on my own by programing in Ruby, Python to retrieve the structured data I need. Sometimes it is really time consuming, bothering and low efficient. Then, I began trying using some data crawler tools, as I learned that there are some kinds of scrapers and crawlers that require no programming and can help users to crawl data much faster with high quality. There are hundreds of web crawlers available when you search “Data Crawler Software” via Google. Here, I just want to introduce several free web crawler software I once used for your reference.

Octoparse

Octoparse is a powerful visual windows-based free web data crawler software. The UI can be seen as below, it is really easy for users to grasp this tool by using its simple and friendly user interface. To use it, you need download this application on your local desk-top first. As the figure below shows, you can click-and-drag the blocks in the Workflow Designer pane to customize your own task. Actually, Octoparse provides two editions of crawling service, the Free Edition and Paid Edition. Anyway, both editions can satisfy the basic scraping or crawling needs of users. You can run your tasks on the local side and have data exported in various formats. More advance, if you switch your Free edition to any Paid Edition, then you can share the Cloud-based service by uploading your task and configurations to the Cloud Platform, where there are 6 or more servers running your tasks simultaneously with a higher speed in a larger scale. Plus, you can automate your data extraction leaving without being traced using Octoparse’s anonymous proxy featured service that could rotate tons of IPs, it will prevent you from being blocked by certain websites. Octoparse also provides API creation to connect your system to your scraped data in real time. You can either import the Octoparse data into your own DB, or use our API to require access to your account’s data. After you finish your configuration of the task, you can export data in various formats as you need, like CSV, Excel formats, HTML, TXT, and database (MySQL, SQL Server, and Oracle).

Import.io

Import.io is also known as a web crawler software covering all different levels of crawling needs. It offers a Magic tool which can convert a site into a table without any training sessions. It suggests users to download its desktop app if more complicated websites need to be crawled. Once you’ve built your API, they offer a number of simple integration options such as Google Sheets, Plot.ly, Excel as well as GET and POST requests. When you consider that all this comes with a free-for-life price tag and an awesome support team, it is a clear first port of call for those on the hunt for structured data. They also offer a paid enterprise level option for companies looking for more large scale or complex data extraction.

Mozenda

Mozenda is also a user-friendly web data crawler software. It has a point-and-click UI for users without any coding skills to use. Mozenda also takes the hassle out of automating and publishing extracted data. Tell Mozenda what data you want once, and then get it however frequently you need it. Plus, it allows advanced programming using REST API the user can connect directly Mozenda account. Plus, it provides the Cloud-based service and rotation of IPs as well.

ScrapeBox

SEO experts, online marketers and even spammers should be very familiar with ScrapeBox with its very user-friendly UI. Users can easily harvest data from a website to grab emails, check page rank, verify working proxis and RSS submission. BY using thousands of rotating proxies, you will be able to sneak on the competitor’s site keywords, do research on .gov sites, harvesting data, and commenting without getting blocked or detected.

Web Scraper Plugin

Admittedly, those crawlers are powerful to meet people with complicated crawling or scraping needs. While if people just want to scrape data in a simple way, I suggest you choose the Google Web Scraper Plugin. It is a browser-based web scraper working like the Firfox’s Outwit Hub. You can download it as an extension and have it installed in your browser. You need to highlight the data fields you’d like to crawl, right-click and choose “Scrape similar…”. Anything that’s similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs. The latest version still had some bugs on spreadsheets. Even though it is easy to handle, it is noteworthy that it can’t scrape images and crawl data in a large amount.