Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scraping real estate info from Zillow

Tuesday, January 29, 2019

In this tutorial, we are going to show you how to scrape real estate info from Zillow.

To follow through you might want to use the URL in this tutorial:

https://www.zillow.com/homes/

 

Here are the main steps in this tutorial: [Download task file here]

1."Go To Web Page"- to open the targeted web page

2. Enter text -  to capture data from the search results

3.Create a pagination loop to scrape all the results from multiple pages 

4.Build a "Loop Item" to loop click into each item on each page

5.Extract data - to select data you need to scrape

6.Run extraction - to run your task and get data

 

 

 

 

 

1)"Go To Web page"- to open the targeted web page

  • Click "+Task" to start a new task with Advanced Mode
  • Paste the URL into the "Input URL" box
  • Click "Save URL" to move on
  • Set up timeout and "wait before execution"

This page requires a longer time to load. So we need to set up those two. Otherwise, it may lead to some error like missing data.

 

 

 

 

2) Enter text -  to capture data from the search results

  • Click on the search box on the page in the built-in browser and select"Enter text" on "Action Tips"

When you click on the input field in the built-in browser, Octoparse can detect that you select a search box, the "Enter text" action will automatically appear on "Action tips"

  • Input "New York" on"Action Tips"
  • Click "OK", then the "Enter Text" action will be generated in the workflow.
  • Click the search button of the web page and select "Click button" on "Action Tips", you will notice the"Click Item" action is added into the workflow.
  • Go to New Tab, Check "Open the link in new tab" and click "Save"

 

 

 

 

 

3) Create a pagination loop - to scrape all the results from multiple pages

  • Roll down and click "Next Page" button
  • Click "Loop click next page" on "Action Tips"
  • Uncheck "Auto retry when no response"
  • Check "Load the page with AJAX" and set Time out
  • Click "Save"

 

Tips!

To know more about AJAX, please refer to another tutorial:  Deal with AJAX 

 

 

 

 

 

4) Build a "Loop Item" - to loop click into each item on each page

We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we'd better go back to the first page.

  • Click "Go To Web Page" in the workflow.
  • Click "Enter Text"
  • Click "Click Item"
  • Select the pagination loop in the workflow

By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.

  • Click the first item

You can see the first item is highlighted in green while the rest are highlighted in red. 

  • Click "Select All" on "Action Tips"

Now all the items on this page are highlighted in green.

  • Select "Loop click each URL"

Zillow applies the AJAX technique to load new content. Therefore, we need to set up AJAX Load here.

  • Set up "wait before execution" to avoid data missing
  • Uncheck "Auto retry when no response"
  • Uncheck "Open the link in new tab"
  • Check "Load the page with AJAX" and set a Timeout
  • Click "Save"

 

 

 

 

5) Extract data - to select data you need to scrape

  • Click data you need in the item block.
  • Click "Extract text of the selected element" and rename the "Field name" column if necessary.

Rename the fields by selecting from the pre-defined list or inputting on your own

  • Click "OK" to save the result.
  • Click the "Close button"in the built-in browser
  • Choose "Click button" on "Action Tips"

 

 

 

 

 

6) Run extraction - to run your task and get data

  • Click "start extraction"
  • Select "local extraction" to run the task on your computer

 

 

Below is the output sample:

 

 

 

Author: Momo

Editor:Suire

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download