undefined

Scrape Data from Airbnb into Excel

Friday, October 7, 2016 10:09 PM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Airbnb is a good website to find you a perfect vacation hotel. In this tutorial, we will help you to learn how to use Octoparse to get hotel info from Airbnb. The easiest way is to use pre-built task templates of Airbnb. You don't need to configure scraping tasks but just enter keywords/URLs to wait for the data. For further details, you may check it out here: Task Templatesexternal-link-symbol-1.png

If you want to build the task from scratch, you can continue to read this tutorial.

 

Here is the Airbnb room source link that we will be using as an example.
https://www.airbnb.com/s/New-York--NY--United-States/homes?adults=2&search_type=pagination&s_tag=A2EV74MC&tab_id=home_tab&refinement_paths%5B%5D=

%2Fhomes&children=1&place_id=ChIJOwg_06VPwokRYv534QaPC8g&federated_search_session_id=2e7da092-4a51-48db-ba26-9746f41ac068

 

Here are the main steps in this tutorial [Download task file here]

1. Go to Web Page - open the target website

2. Build a Loop Item - click each hotel link

3. Extract data - scrape information from the detail page

4. Modify the XPath of data fields

5. Create pagination- scrape data from multiple pages

6. Modify the XPath of Pagination

7. Run your task - get and export data you want

 

1) "Go To Web Page" - open the target website

  • Enter the URL on the home screen and click "Start" to create a new task

 

2) Build a Loop Item - click each hotel link

 

  • Select the first two blocks to detect all blocks
  • Click on "Loop click each URL" to enter the detail page

A Loop Item will be created and Octoparse opens the first hotel page automatically.

3) Extract data from the detail page

  • Select any info you want and click on "Extract the text of the element"
  • Select Add customer field -> Page-level data -> Page URL if you would like to pull the page URL from the current page
  • Double click the data field to modify the name

 

4) Modify the XPath of data fields

The Airbnb page design is tricky and auto-generated XPaths usually does not for all the pages. No worries! We have prepared everything you need. You can just use the element XPath provided below.

  • Switch to Vertical View - Vertical View can help modify multiple data fields easily
  • Double click on the XPath to modify it
  • Input the new XPath to it

 

Tips!

XPath plays an important role in locating the correct element in Octoparse. If you want to learn more about it, please refer to the following tutorial:

What is XPath and how to use it in Octoparse

 

Here are XPaths for different fields of Airbnb pages:

Hotel Title: //h1
Number of review: //button[contains(@aria-label,'Rate')]
Review rating: //button[contains(@aria-label,'Rate')]/../preceding-sibling::span[1]
Number of guests: //span[contains(text(),'guest')]
Number of bedrooms: //span[contains(text(),'bedroom')]
Number of bath: //span[contains(text(),'bathroom')]
Number of beds: //span[contains(text(),'bed')][not(contains(text(),'room'))]
Price: //div[contains(@style,'pricing')]/div[1]//span

 

5) Create pagination

  • Click on Go to Web Page to open the listing page again
  • Select the next page button (">") at the bottom of the main page
  • Choose Loop click single element from the Tips

A Pagination will be created in the workflow

  • Drag the workflow to the right position 

 

6) Modify the XPath of Pagination and Loop Item

 

The auto-generated XPath does not always work well. In this case, we will need to modify the XPath of the Pagination and Loop Item

  • Click on Pagination
  • Enter the XPath: //*[@aria-label='Next']0
  • Click on Loop Item
  • Change Loop Mode to Variable list
  • Enter XPath: //a[contains(@aria-labelledby,'title')]
  • Click Apply to save

The next page is loaded with AJAX, so we need to add AJAX timeout to the "Click to Paginate" action.

  • Click open the settings of "Click to Paginate"
  • Tick "Load with AJAX"
  • Set up the AJAX timeout as 5-10s

If all the data you need could be scraped from the listing page, you can stop here and jump to Run your task - get data you want

If you want to go to each product detail page to get more info, follow the steps below.

7) Run your task - get and export data you want

  • Click "Save"
  • Click "Run" on the upper left side
  • Select "Run on your device" to run the task on your computer, or select "Run task in the cloud" to run the task in the Cloud (for premium users only)
  • Click Export Data and export them into an Excel file

 

Here is the sample output.

 airbnb_hotel_info

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline