Octoparase User Review
By Adriano from Italy – Basic Plan User
We are using Octoparse to scrape pages, and we find it extremely powerful. The free tool is good for users that don’t need to use many functions. After some limits, you can buy the upgrades. The Wizard mode is simple, the Advanced is a little difficult to use if you are new to it. We would love to use some of the advanced functionality if they could be “moved” to the simple side (considering that the scraper is intended mostly for non-developers…)
The Wizard Mode gives you the possibility to choose between:
- List or Table Extraction
- List and Detail Extraction
- URLList Extraction
- Single Page Extraction
Depending on your needs, it could be good to extract all the fields from a single page (Point 4 Single Page Extraction) or extract data in the form of a table (Point 1 List or Table Extraction).
We mainly use list and detail extraction, where you just need to provide a result page of a query on a first step, putting in and defining the list of similar URLs you need to extract. Octoparse will detect the list automatically just after your second selection. On the next step you would need to instruct Octoparse on the fields you wish to extract.
For each field, you can decide to extract the text, the inner HTML, the outer HTML, or the links behind the text (i.e. Email addresses or Internet addresses). We find this functionality extremely good. In the simple mode (free) the speed is based on your computer and on your Internet speed.
If you don’t need to extract more than 2000 pages per time with the Wizard Mode, the free version is good enough. If instead, you upgrade to the most advanced plans, you can use the speed of multiple servers and run many tasks at the same time in the local machine or their cloud service.
(Notes: The support said that there’s no limit to scrape the pages with the Advanced Mode, but I am not quite familiar with that.)
If the List and Detail Extraction doesn’t provide results, we would transfer to use the URL List Extraction because it is difficult for Octoparse to find out the missing URLs on a result page (The new version 6.4.1 provides Extraction Failure Report to find out the missing URLs now). With this functionality, you simply provide the crawler with a list of similar URLs that you need to crawl and the rest will be done automatically.