Easy Web Scraping with Octoparse|Web Crawler Software ReviewSaturday, March 11, 2017
[Octoparase User Review] By Brain Phillips from Ireland - Basic Plan User
I needed a way to extract data from a dynamic webpage for a college project. The page hosts timetables - over 500 of them - each one generated using a number of comboboxes. Having test run some rival software I finally settled on Octoparse; it was the only software that was simple enough for a complete novice to use, but sophisticated enough to handle what I needed.
My first task was simple; I needed to pull information from static fields about room availability from each room timetable. To do this I extracted room names, times and status from the relevant pages, output the results in CSV format, and then converted the contents to JSON using an online tool. The JSON objects are now stored in my NoSQL database and accessible by my project application. First part completed with ease!
The second task was far more complicated. I needed to pull timetable data for students from the site, and each timetable had a different number of rows. This resulted in my scouring the many useful tutorials and examples provided by Octoparse, trying to modify the examples to suit my needs. I made some progress but my attention was being pulled away by other projects.
I decided to reach out to Support to explain my requirements and see if it was possible. The support agent that replied was very friendly and informed me of the Facebook group created for Octoparse users to discuss their tasks and any problems they may be facing. I joined and posted my question, attaching the .otd file I’d been experimenting with. To my surprise one of the support staff took that file, examined it and made corrections to the Loops and Data Extractions I needed. He then returned the file to me, and posted a link to a guide I’d overlooked showing me how to properly extract the data in the way I needed.
Thanks to the excellent support repsonse and the software I now have all the timetable data I need to power my application. I’ve taken nearly 30,000 data records from the site, transformed the room data into JSON and added the course data Excel files to Google Sheets. A separate admin app is connecting to Sheets and pulling in the data and sorting it into collections I can use in my main app. What should have been a very time consuming and complicated job has been made so easy by using Octoparse, and when I finish college and become a paid customer I’ll get access to all sorts of useful features to make life even easier :)
In short, I can’t recommend the software highly enough, and I’ve full confidence in the support staff should I ever need to call on them again!
For more information about Octoparse, please click here.
Sign up today!