undefined

Scrape Airbnb Data - Cloud Based Scraping

Wednesday, September 28, 2016 7:40 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.


Airbnb is a good website to find you a perfect vacation hotel. In this tutorial, we will help you to learn how to use Octoparse to get hotel info from Airbnb.

The easiest way is to use pre-built task templates of Airbnb. You don't need to configure scraping tasks but just enter keywords/URLs to wait for the data. For further details, you may check it out here: Task Templatesexternal-link-symbol-1.png

 

If you want to build the task from scratch, you can continue to read this tutorial. Here is the Airbnb room source link that we will be using as an example.
https://www.airbnb.com/s/New-York--NY--United-States/homes?adults=2&search_type=pagination&s_tag=A2EV74MC&tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&children=1&place_id=ChIJOwg_06VPwokRYv534QaPC8g&federated_search_session_id=2e7da092-4a51-48db-ba26-9746f41ac068

 

Here are the main steps in this tutorial [Download task file here]

1. Go to Web Page - open the target website

2. Build a Loop Item - click each hotel link

3. Extract data - scrape information from the detail page

4. Modify the XPath of data fields

5. Create pagination- scrape data from multiple pages

6. Modify the XPath of Pagination

7. Run your task - get data you want

 

1) Go to Web Page - open the target website

  • Enter the URL on the home page and click Start

 

 

2) Build a Loop Item - click each hotel link

  • Select the first two blocks to detect all blocks
  • Click on "Loop click each URL" to enter the detail page

A Loop Item will be created and Octoparse opens the first hotel page automatically.

 

3) Extract data from the detail page

  • Select any info you want and click on Extract the text of the element
  • Select Add customer field -> Page-level data -> Page URL if you would like to pull the page URL from the current page
  • Double click the data field to modify the name

 

4) Modify the XPath of data fields

The Airbnb page design is tricky and auto-generated XPaths usually does not for all the pages. No worries! We have prepared everything you need. You can just use the element XPath provided below.

  • Switch to Vertical View - Vertical View can help modify multiple data fields easily
  • Double click on the XPath to modify it
  • Input the new XPath to it

Here are Xpaths for different fields of Airbnb pages:

Hotel Title: //h1
Number of review: //button[contains(@aria-label,'Rate')]
Review rating: //button[contains(@aria-label,'Rate')]/../preceding-sibling::span[1]
Number of guests: //span[contains(text(),'guest')]
Number of bedrooms: //span[contains(text(),'bedroom')]
Number of bath: //span[contains(text(),'bathroom')]

Number of beds: //span[contains(text(),'bed')][not(contains(text(),'room'))]
Price: //div[contains(@style,'pricing')]/div[1]//span

 

5) Create pagination

  • Click on Go to Web Page to open the listing page again
  • Select the next page button (">") at the bottom of the main page
  • Choose Loop click single element from the Tips

A Pagination will be created in the workflow

  • Drag the workflow to the right position 

 

6) Modify the XPath of Pagination and Loop Item

The auto-generated XPath does not always work well. In this case, we will need to modify the XPath of the Pagination and Loop Item

  • Click on Pagination
  • Enter the XPath: //*[@aria-label='Next']
  • Click on Loop Item
  • Change Loop Mode to Variable list
  • Enter XPath: //a[contains(@aria-labelledby,'title')]
  • Click Apply to save

 

loop item xpath

    

Tip!

XPath plays an important role in locating the correct element in Octoparse. To learn more about it, please refer to the following tutorial:

What is XPath and how to use it in Octoparse

 

 

The next page is loaded with AJAX, so we need to add AJAX timeout to the "Click to Paginate" action.

  • Click on Click to Paginate
  • Go to the Options
  • Tick Load with AJAX
  • Set up the AJAX timeout as 5-10s

 

7) Run your task - get data you want

 

Here is the sample output.

 sample output

 

Author: The Octoparse Team

 

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today!

 

Author's Picks

Octoparse Smart Mode -- Get Data in Seconds

Get Started with Octoparse in 2 Minutes

Collect Data from LinkedIn

Collect Data from Amazon

Collect Data from eBay

Top 30 Free Web Scraping Software

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf

 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept Close