[Octoparase User Review] By Andrew Malik from USA- Basic Plan User
It took me about a day to look into all available web scrapers. At the end stopped on Octoparse for couple reasons.
Pros:
– Installs on Windows, so I could use spare Windows Server for scraping. No nodejs learning or programming needed.
– GUI was simple to understand, can dump a list of links that need to be scraped, select content on the page that needs to go into Excel spreadsheet and click start. That’s it, no need to select specific HTML divs or write RegEx code. Don’t know how, but this was the only scraper that could analyze and grab a specific text on the page without setting any rules. The other scrapers I’ve tried had a hard time and had to make complicated rules.
– During scraping opens the pages in a real browser, so Javascript, AJAX websites would work as well.
– You can export to Excel, directly to SQL, MYSQ or Oracle database, CSV, TXT or HTML file.
– You can also back up your scraped data to Octoparse as a backup, will be saved with your task.
– Configuration and scraper apps run in different programs. If one suddenly would to shut down because of some errors, other Octoparse tasks would still continue to work as nothing has happened.
Cons:
-Had a hard time adding a list of 50000 links into the queue, but not a problem because you can have multiple tasks 30-40K links in my case, just divide links between those tasks.
-Did not say anywhere that it was saving the tasks to their servers, so that’s why probably has trouble with large tasks. On the other hand, this one is also a Pro, because you can create tasks on your computer and load them up on your server just by restarting the app.
Overall:
You can have 2 active tasks running at the same time for free. lf you want more, you can upgrade to a paid version. It takes about a second to open a page, so roughly you can scrape one page per second per task.
Overall this worked better than great. Did not have to ask our devs to write a scraper, the time I spent creating the scraper would be the same amount of time I would spend discussing with our devs how to scrape the content. And now devs are asking me for stats on scraped data, not the other way around.
If you do any marketing and wish to gather data for stats or just create your database from any website, super easy to do, recommend it.