Easily Extract Data from the Web| Web Crawler Software Review
Monday, March 13, 2017
[Octoparase User Review] By Adriano from Italy - Basic Plan User
We are using Octoparse to scrape pages, and we find it extremely powerful. The free tool is good for users that don’t need to use many functions. After some limits you can buy the upgrades. The Wizard mode is simple, the Advanced is a little difficult to use if you are new to it. We would love to use some of the advanced functionality, if they could be “moved” to the simple side (considering that the scraper is intended mostly for non developers ...)
The Wizard Mode gives you the possibility to choose between:
- List or Table Extraction
- List and Detail Extraction
- URLList Extraction
- Single Page Extraction
Depending on your needs, it could be good to extract all the fields from a single page (Point 4 Single Page Extraction) or extract data in the form of table (Point 1 List or Table Extraction).
We mainly use list and detail extraction, where you just need to provide a result page of a query on a first step, putting in and defining the list of similar URLs you need to extract. Octoparse will detect the list automatically just after your second selection. On the next step you would need to instruct Octoparse on the fields you wish to extract.
For each field you can decide to extract the text, the inner HTML, the outer HTML, or the links behind the text (i.e. Email addresses or Internet addresses). We find this functionality extremely good. In the simple mode (free) the speed is based on your computer and on your Internet speed.
If you don’t need to extract more than 2000 pages per time with the Wizard Mode, the free version is good enough. If instead you upgrade to the most advanced plans, you can use the speed of multiple servers and run many tasks at the same time in local machine or their cloud service.
(Notes: The support said that there’s no limit to scrape the pages with the Advanced Mode, but I am not quite familiar with that.)
If the List and Detail Extraction doesn’t providing results, we would transfer to use the URL List Extraction because it is difficult for Octoparse to find out the missing URLs on a result page (The new version 6.4.1 provides Extraction Failure Report to find out the missing URLs now). With this functionality you simply provide the crawler with a list of similar URLs that you need to crawl and the rest will be done automatically.
For more information about Octoparse, please click here.
Sign up today!
Most popular posts
- Related articles
- Best Scalable Web Scraping tool - Octoparse R...
- Octoparse Web Crawler Helps Automatically Col...
- Octoparse Web Scraper Provides Core Market Da...
- Easily Extract Data from the Web| Web Crawler...
- Visual Web Scraping Tool|Web Crawler Software...