Best Web Scraper | Web Crawler Software ReviewSaturday, March 11, 2017
[Octoparase User Review] By Marcin from Poland - Basic Plan User
When we were looking for an appropriate scraping software we tried every scraping program available on the market. After some days of testing it turned out that Octoparse is exactly what we were looking for. This great and powerful tool completely outclasses the competition in most of the hard tasks:
- It can load and pass through very complex and big websites.
- You can set more complex logic which is very useful even if rarely used.
- There are practically no websites that Octoparse can’t load. It doesn’t matter on what base and system they were built.
- Friendly environment, with easy to use GUI.
- Great support with honest, and kind people who really deeply analyze your problem and respond as quickly as possible when they find something that can cause it, including a way to fix it.
- And everything without a single line of script which is great!
There are, of course, some cons too, because nothing is perfect and that’s a normal thing.
Sometimes program hangs on some sites. After some observations, it came out that it is not a website problem. It happens on sites that are built similar to, for example, Wikipedia, whose construction is practically the same under every link. Probably it is not the API in WorkFlow for designer too, because it is simple: loop by list of URL -> extract data, especially when there are more links. Sometimes it shows that site is still loading, sometimes just hangs on one opened page for hours to then just move on. I note that I used all program options, including advanced options too including timeout limit, reload web page and every other option. Nothing has helped.
Finally, functionality and enhancements that can be added in the future to Octoparse, to further improve it. Example situation: We add a 100 links to the program and 17 of them failed to execute. We found out that reload website option is unclear. If we e.g. use proxy and the connection will fail with it, we can now use other proxy or few proxies depending on settings. It will be great if i.e. one proxy will fail at connection in specified number of attempts (let’s say 5-15 attempts), then Octoparse will delete this proxy or mark it as malfunctioning and avoid it in the future, still returning these 17 bad link in scores. And it will be even better if score list after export will have links that failed to execute, just mark them in a separate column, or add website status error code or something. This can help in retry to get data from the websites that failed to load.
Author: The Octoparse Team