Best Scalable Web Scraping tool - Octoparse Review from Korean UserMonday, May 15, 2017
[Octoparase User Review] By EunJeong Lee from Korea - Standard Plan User
(The original post in Korean is here)
Let's decorate the first blog with useful information. The robot needs to crawl the web server and extract the data automatically. This is called Web Scraping.
Why is this necessary?
I'm sorry for the paper ...
It is not easy to find information on Naver even if you look for it. So I've found a web scraping service that goes a few days behind Google and I like it! OCTOPARSE !
Octoparse is a Chinese web scraping service (Octopus is a brand derived from octopus, the character is called yomi octopus!)
Below I briefly left the advantages and disadvantages.
The advantages of Octoparse
First, since it started in China, services are being provided in Chinese. It has been well in China and has advanced to the United States. For those who are more comfortable with Chinese than English, they will be more accessible than American sites.
Second, it is also very cheap compared to competitors. Other web scraping services have a limit of 500 queries or 100 webpage limits, I need to use a premium for big data extraction, and it's broken by $150 to $190 / month.
The surprise is that Octoparse has a free plan! This plan does not have a web page limit or queries limit.
However, you can only turn two spiders at a time. And you have to run it on your computer's RAM rather than the cloud. That means, if the data is large, you have to turn the computer on all day.
But at a premium of $89 ($79 / month for a one-year payment), it's much cheaper than our competitors. I am using a standard plan, so I have several tasks running at the same time!
Third, it's really easy to learn! If you download octoparse after signing up, you will receive a tutorial with sample data. It is also very well explained on the website (in English).
The disadvantages of Octoparse
To be fair and to say the disadvantage：
First, if you run the spider with local extraction rather than the cloud, it will automatically stop after 4 hours.
At first I panicked really big. But fortunately, when you turn it back on, the data is still alive and you can recover it, save it, and then start over with the next data. This process is very annoying and takes longer than expected. So I contacted myself and the programmers are already working on this part :)
If you have a lot of data to extract, we recommend a standard plan.
The cloud does not have the same problem because it extracts data from the web and stores it.
Second, the transparent spider:
The reason for being transparent is that I turned the task to the cloud, because it was too many links (10,000 links)
I was not in the cloud at all. So I broke and split many links but I had a hard time extracting it !!! It seems to be fucking.
I hope this information has helped ~ and I will go back to my paper.