How to Extract Data from Webpages Loaded with AJAX (Example: gumtree.com)Wednesday, May 04, 2016 3:22 AM
Welcome to Octoparse tutorial. In this tutorial, I will guide you through one of the features of Octoparse: Extract data from web pages where data is loaded with Ajax. I will show you an simple hands-on example to get you started.
In fact, you don’t need to know much about Ajax to extract data. All you need is just to figure out whether the site you want to scrape uses Ajax or not. Many websites use a lot of Ajax such as Google, Amazon and eBay. Usually the URL of the page will not have any change when updating part of the content. With Octoparse, you can easily extract data from web pages where data is loaded with Ajax.
Take the site gumtree for example.
On this page, it has contact details that need us to click the Reveal button to get the complete number. When we click “Reveal”, the rest of the contact number comes out and look at the URL, it doesn't have any change.
So we know this page uses AJAX and we need to set "Load with Ajax" in Octoparse. If not, the result won't be extracted or it will take a very long time to do that.
First, open the page in the bulit-in browser. (I just take one page for example.)
Then click on "Reveal". Select “Click an item”.
This page uses Ajax, so we need to set Ajax load.
Choose “Load page with Ajax”. Set an Ajax timeout. Click “Save”.
Then extract information you want.
Extract brand. Click on the title. Select “extract text.”
Extract price. And extract contact details you just reveal.
Then you run the local extraction and the data would be extracted.
If you want to extract the part of content that loads with Ajax, you need to set Ajax timeout. Or the result won't come out or the process will take a very long time.
Now you know how to extract data from web pages loaded with Ajax. Octoparse now and try it foryourself!
If this video tutorial is not available for you, you can click hereto see the corresponding graphic tutorial.