Switch Drop Down Menu for Data Extraction

Wednesday, September 06, 2017 10:30 AM

Welcome to Octoparse case tutorials! In this tutorial, I will elaborate how to switch drop down menu for data extraction in Octoparse. I will take capturing car or truck information from e Bay Motors for an example.

 

List features covered 

 

Now let's get started!

 

Step 1. Set up basic information

  • Click "Quick Start"
  • Create a new task in the Advanced Mode
  • Complete the basic information
  • Click  "Next"

 

 

Step 2. Navigate to the target website

  • Enter the target URL in the built-in browser (the example URL is https://www.ebay.com/motors)
  • Click "Go" icon to open webpage

 

Step 3. Switch drop down menu

Now we come to extract the values from the drop-down menu. Octoparse has the function to automatically extract every item in a drop-down menu by using "Loop switch combobox".

  • Click the "Any Make" drop-down menu
  • Choose "Loop switch combobox"

We will see all the values in the menu are in a loop list and as the first value "Any Make" is useless, we need to modify the XPath to exclude it.

  • Enter the correct XPath: //SELECT[@id='mke_ct']/OPTION[position()>1] into the "Variable list" box
  • Remember to click "Save"

We can see "Any Make" is out of the list now.

  • Click "Search Car & Trucks"
  • Choose "Click an item"

Note that here we need to set to open this search result page in a new tab to make sure Octoparse could go back to the home page.

  • Click "Click Item" action
  • Go to the "Advanced Options"
  • Tick "Open the link in a new tab"
  • Click "Save"
  • Note to go back to "Go To Web Page" action and then click "Click Item" action to let Octoparse reload the page in a new tab

 Step 4. Create a list of items

As we move to the result page, we can see that all the car information blocks are in similar layout, which means we could make a list of all those blocks and extract data from each one.

  • Move your cursor to the first car information block
  • Click when the highlighted part covers the whole block

If you can not select the right part, just click anywhere in the first block and keep clicking "Expand Area Button" until the red dotted line encircles the whole block.

 

  • Click "Create a list of items" 
  • Click "Add current item to the list"

 

Now, the first item has been added to the list. We need to finish adding all the items to the list.

  • Click "Continue to edit the list"
  • Click the second section with similar layout
  • Click "Add current item to the list" again

Now we get all the sections added to the list.

  • Click "Finish Creating List"
  • Click "Loop", which means Octoparse would go through the list to extract data

 

Step 5. Select the data to be extracted and rename data fields.

In this step, we will begin extracting data from the loop list of car information sections. By navigating to the "Extract Data" action and click it, you will notice that the first information section is outlined with green dotted line. That means we need to extract data just within this section by following the steps below. Note that the extraction action we will be setting up for this section is going to apply to the rest of the list. Say we want to capture the car name and price.

  • Click the car name
  • Select "Extract text"
  • Follow the same steps to extract the other data
  • Rename any field if necessary
  • Click "Save"

 

Step 5. Set up pagination  

Now we need to flip through multiple web pages to extract as many data as possible by setting up pagination action. However, the result page we are now at only contains one page as the amount of results is small. We need to search another value, of which the results are more.

  • Go back to "Go To Web Page" action to reload the page
  • Click the "Loop itemfor drop-down menu
  • Here we click "Acurain the loop list
  • Click "Click an item"

We will immediately go to a new result page which contains pagination button.

  • Click the next page button ">"
  • Choose "Loop click the element"

Step 6. Start running your task 

Now we are done configuring the task and it's time to run the task to get the data we want.

  • Click "Next"
  • Click "Next"
  • Click "Local Extraction"

 

There is Local Extraction and Cloud Extraction (premium plan). With a Local Extraction, the task will be run in your own machine; with a Cloud Extraction, the task will be run on Octoparse Cloud Platform,  which means you can basically set it up to run and turn off your desktop or laptop and data will be automatically extracted and saved to the cloud. Features such as scheduled extraction, IP rotation and API are also supported with the Cloud. Find out more about Octoparse Cloud here

 

Step 7. Check and export the data

After completing the data extraction process, we can choose to check the data extracted or click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.

 Done!

 

 

To learn more about how to crawl data from a website, you can refer to these tutorials:

Drop-down Menu Configuration Rule

Scraping Product Detail Pages from eBay.com

Web Scraping Case: Scraping Restaurants Information from Yell.com

Scrape Detail Page Data with Pagination

 

 

Author: The Octoparse Team

Download Octoparse Today

For more information about Octoparse, please click here.

Sign up today!

 

 

 

 

btn_sidebar_use.png
btn_sidebar_form.png