Scraping JavaScript pages without Python

[Octoparase User Review] By Fabrice Siebert – Basic Plan User

A nightware for a web crawler without using any tools!

I have been crawling and parsing websites for a while, with use of php and cUrl. Years after years, it sounded clear that my extracting routines running on my server were more and more difficult to maintain in a good working shape. In fact, websites regularly change minor things on their pages; and in the best case, you wouldn’t get some or all of the awaited data anymore, in the worse case, absolutely inaccurate data.

Then came for me (and I must admit, my limited skills) THE hammer: AJAX ! Yes, HTML + Javascipt + CSS + DOM… And the dynamic pages that don’t load at first sight, that wait for you to click on a button, that just show as you scroll down, that exchange static pictures URLs with Javascipt dynamically shown pictures… In two words: a nightmare!

So, I had to find a way to still be able to extract my needed data, without having to pass an engineer degree in information technology… had to be fast, had to be robust!

Why I choose Octoparse?

I gave a try to some scraping tools, and my final choice was made to Octoparse. Several reasons for it :

Easy to set up.
Lots of tutorials to start easily.
Ajax is handled as easy as a basic HTMLURL… as if it wouldn’t be any Ajax routines on the pages. It’s really what make me give a try… because I was unable to access the most important part of the data I needed… hidden behind an ‘Display’ Ajax button that I wasn’t able to deal with (with php / cUrl)
10 tasks are offered for free.And as far I know, they won’t be public tasks as it’s the case with some of Octoparse competitors.
Smart Mode and Wizard mode make it easy to find the data, often at first sight. Sometimes you need to find alternate ones… but Octoparse tries to do it for you.

Advanced Mode: the most important part!

But of course, the Advanced Mode is the most important part. You don’t need to start with it : Start with smart, or with wizard, and then Edit in Advanced Mode… and extract with accuracy what you need.

I’ve been using kind of Xpath for years with php… but here, its easy and clear. You can even save a data extraction configuration files, to be used in new project, or elsewhere.

The only drawback I have noticed, is that Octoparse uses mostly children/children/children XPath ways, that seems, to me, less robust than locations with specific attributes like class, ID, or others, when Wizard Mode is used. But of course, you can make it more robust and edit it in the advanced mode.

Formatting the data before exporting them is now easy, and helps to shrink the volume of data.

I’ve not been using Octoparse for a long time, but it should definitively help me to gain a lot of time… and money (as far as I’m able to set up the APIs… 😉

Customer Reviews

Web Scraping Tool with Detailed Tutorials to Extract Online Data|Web Crawler Software Review

Abigail Jones

There are still a lot of other tutorials I would like to head back to and smooth over to really utilize Octoparse more fully. And also if you are a beginner like me, the step by step tutorial will blow you away. It not only adopts from the traditional sense of tutorial videos or page by page scroll via a table of content. Octoparse literally shows you where to click on button by button within the software. Now to me, that is what I call robust visual learning.

2017-03-14T00:00:00+00:00 · 2 min read

Customer Reviews

Simple Web Scraping Tool – No Coding|Web Crawler Software Review

Abigail Jones

It took me about a day to look into all available web scrapers. At the end stopped on Octoparse for couple reasons. No nodejs learning or programming needed.

2017-03-10T00:00:00+00:00 · 2 min read

Customer Reviews

Automated Web Scraping Tool with API | Web Crawler Software Review

Abigail Jones

Octoparse offers a free scraping GUI for everyone without coding experience. The free local extraction feature is enough for almost any business starter or data collector. I mean Octoparse is like the middle man from the web. I mean it’s so great that you can handle all types of pagination, from infinite scrolling down to AJAX, also including the normal button pagination.

2017-03-10T00:00:00+00:00 · 2 min read

Customer Reviews

Proxies for Web Scraping|Web Crawler Software Review

Abigail Jones

The easy configuration, with the nice instruction videos on Youtube, enabled me to quickly execute all different kinds of scraping tasks. In addition, I was happy with the IP proxy and IP rotation.

2017-03-09T00:00:00+00:00 · 1 min read

Scraping JavaScript pages without Python|Web Crawler Software Review

A nightware for a web crawler without using any tools!

Why I choose Octoparse?

Advanced Mode: the most important part!

Hot posts

Explore topics

Get started with Octoparse today

Related Articles