Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape real estate data on Realtor.com

Saturday, October 27, 2018

 

In this tutorial, we are going to introduce how to scrape information from realtor.com.

We will enter each house detail page and scrape the title, location, price, and rating.

To follow through, you may want to use the URL in the tutorial:

https://www.realtor.com/realestateandhomes-search/Tallassee_AL

 

Here are the main steps in this tutorial: [Download demo task file here ]

1)  "Go To Web Page" - to open the targeted web page

2) Create a pagination loop - to scrape all the results from multiple pages

3) Create a "Loop Item" -  to loop click into each item on each list

4) Extract data - to select the data for extraction

5) Save and start extraction - to run the task and get data

  

 

 

 

 

 

 

1)  "Go To Web Page" - to open the targeted web page

· Click "+ Task" to start a task using Advanced Mode

Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Realtor, we strongly recommend Advanced Mode to start your data extraction project.

· Paste the URL into the "Extraction URL" box and click "Save URL" to move on

 

 

 

2)  Create a pagination loop - to scrape all the results from multiple pages

· Scroll down and click the "Next Page" button on the webpage

· Click "Loop click single element" on the "Action Tips"

As Realtor loads the content with AJAX, we should set up AJAX Load for the “Pagination” action.

· Uncheck "Auto retry when no response"

· Check "Load the page with AJAX"

· Set up "AJAX Timeout"

 

 

 

3)  Create a "Loop Item" -  to loop click into each item on each list

We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we 'd better go back to the first page.

· Click "Go To Web Page" in the workflow.

· Select the pagination loop

By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.

Now, let’s build the loop item

· Click the first link on the web page

· Click "Select All" on "Action Tips"

· Select "Loop click each image"

We need to set up "AJAX Load" for this step as well since it loads the content with AJAX.

· Uncheck "Auto retry when no response"

· Uncheck "Open the link in the new tab"

· Check "Load the page with AJAX"

· Set up "AJAX Timeout"

 

 

 

 

4) Extract data - to select the data for extraction

· Click the information you need on the page

· Select "Extract data" on "Action Tips"

 

 

 

Tips!

If you want the data to be extracted correctly to the corresponding data fields, you’d better write a new XPath that will always pinpoint the right data on all pages. The related tutorials you might need are listed below.

Associate Daa with Nearby Text 

Locate Element with Xpath

Data Fetched to the Incorrect Data Fields

 

 

 

 

 

5) Save and start extraction - to run the task and get data

· Click "Start Extraction"

· Select "Local Extraction"

 

 

Here is the sample output.

 Was this article helpful? Contact us anytime if you need our help!

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png