undefined

Scrape Web Data from A Drop-Down Menu 1

Sunday, October 9, 2016 9:49 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

A drop-down menu is a list of items that appear when clicking on a button or text selection.

This tutorial will show you how to select options in a drop-down menu in Octoparse. 

 

You may need the following sample URL to follow through:

https://www.mycarinfo.com.my/Valuation/SearchVehicle?version=free

 

Let's say we have created a task with the sample URL. Find the drop-down menu on the webpage.

1. Click on the drop-down menu and click "Loop through options in the dropdown"

2. A Loop Item has been created and added to the workflow automatically to loop through options in the drop-down menu

3. Check if all the options we need have been included in the Loop Item

  • Click on  22.png   the Loop Item for the drop-down, then refer to the looped items in the list 
  • Check if all the items added to the loop are desired. If not, refine the list using the XPath function: position().

 

Tips!

Position() refers to where the option is located in the drop-down menu.

 

For example, in this case, the first option in the drop-down menu is "-Select-", which is not a real option but a header, and we want to remove it from the list.

dropdown_menu

So you can just add "[position()>1]" to the current XPath. By doing so, the loop item will include every single option with a position greater than 1, or we can say just exclude the first option.

 dropdown_menu_xpath

Tips!

When a drop-down menu is detected and created in Octoparse, all available options will be selected by default.

Besides the method of adding [position()>1] we just introduced to modify the list by adding or removing items, there are more methods you can use with XPath function position().

Adding [position()="x"] to the end of the XPath to include only options of certain positions, i.e. position( )=1, position( )=2, etc. For this example, if you want to choose the year

1996, the Xpath added should be [position()=27]

To learn more tricks, please refer to How to select a specific option from the drop-down list?

 

4. We are now done configuring the drop-down menus. Click on the confirmation button to complete the search.

When there are multiple drop-downs on the web page and we want to loop through them. i,e. get the results of the different combinations,

we can just follow the steps of looping through one drop-down menu as we introduced before, and repeat it several times.

The loop items newly built should be inside the former one, like this:

dropdown_menu_workflow

 

Tips!

You might want to know which options in the different drop-down menus give us back the results accordingly. You can check the tutorials below to see how to achieve it:

How to extract the selected option from drop-down menus?

 

If you need future assistance with your data project, feel free to submit a request here.

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline