Scrape Data from Jalan.net | Japanese Sites Case Study

Tuesday, September 19, 2017 9:02 AM

Welcome to Octoparse case tutorials!

Japan is a popular traveling country but before you go there, booking a nice hotel is quite important. Today we are going to learn how to scrape hotel information from a Japanese website: Jalan.net.

List features covered 

Now let's get started!

 

Step 1. Set up basic information

  • Click "Quick Start"
  • Create a new task in the Advanced Mode
  • Complete the basic information

 

Step 2. Navigate to the target website

  • Enter the target URL in the built-in browser (you can search anything in Jalan.net or copy the example URL here)
  • Click "Go" icon to open webpage

 

Step 3. Creating a list of items

We can see that all the hotel information sections are in similar layout, which means we could make a list of all these sections.

  • Move your cursor to the first hotel information section
  • Click when the highlighted part covers the whole block

If you can not select the right part, just click anywhere in the first block and keep clicking “Expand Area button”until the red dotted line encircles the whole block.

  • Click "Create a list of items" 
  • Click "Add current item to the list"

Now, the first section has been added to the list, we need to finish adding all the sections to the list.

  • Click "Continue to edit the list"
  • Click the second section with similar layout
  • Click "Add current item to the list" again

Now we get all the sections added to the list.

  • Click "Finish Creating List"
  • Click "Loop", which means Octoparse would go through the list to extract data

 

Step 4. Select the data to be extracted and rename data fields.

In this step, we will begin extracting data from the loop list of hotel information sections. By navigating to the "Extract Data" action and clicking it, you will notice that the first information section is outlined with green dotted line. That means we need to extract data just within this section by following the steps below. Note that the extraction action we will be setting up for this section is going to apply to the rest of the list. Say we want to capture the product name and price

  • Click the hotel name
  • Select "Extract text"
  • Follow the same steps to extract the other data
  • Rename any field if necessary
  • Click "Save"

 

 

Step 5. Set up pagination

Now we need to flip through multiple web pages to extract as many data as possible by setting up pagination action.

  • Click the “次へ”
  • Choose “Loop click the element”

As the Xpath for “次へ” is different in pages, we need to modify the Xpath to locate it precisely.

  • Go to the “Advanced Option”
  • Enter the correct Xpath in the “Single Element” box://*[text()='次へ']
  • Click “Save”

 As the “次へ” exists even in the last page, the loop will not end itself. We need to end the loop after clicking  “次へ” 5 times for there are totally 6 pages. You can set up the time according to how many pages you are going to extract.

  • Go to the “Advanced Option”
  • Open "End loop when"
  • Tick "Exection time reach"
  • Enter "5" in the box
  • Click "Save"

 

Step 6. Start running your task 

Now we are done configuring the task and it's time to run the task to get the data we want.

  • Click "Next"
  • Click "Next"
  • Click "Local Extraction"

 

There is Local Extraction and Cloud Extraction (premium plan). With Local Extraction, the task will be run in your own machine; with Cloud Extraction, the task will be run on Octoparse Cloud Platform,  which means you can basically set it up to run and turn off your desktop or laptop and data will be automatically extracted and saved to the cloud. Features such as scheduled extraction, IP rotation and API are also supported with the Cloud. Find out more about Octoparse Cloud here

 

Step 7. Check and export the data

After completing the data extraction process, we can choose to check the data extracted or click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.

 

Done!

To learn more about how to crawl data from a website, you can refer to these tutorials:

Web Crawling Case Study | Crawling Data from Booking.com

Web Scraping Case Study | Crawling flight information from ticket websites

Scraping Hotel Reviews from Tripadvisor.com

 

Author: The Octoparse Team

Download Octoparse Today

For more information about Octoparse, please click here.

Sign up today!

btn_sidebar_use.png
btn_sidebar_form.png