Web Scraping Tutorial: Handling Searches with Multiple Dropdown MenusSunday, October 30, 2016 11:12 PM
“Could I use a web scraping tool to extract the values from multiple drop-down menus directly, or should I write the codes myself when dealing with such web scraping requests?”
People often have the queries above when using the web scraping tools to extract data from multiple drop-down menus, where the contents of another drop-down menu are dynamically linked to what you choose in the first one.
In this tutorial, I will use the Census of India for an example to show you how to scrape web data from websites with a drop-down list using a web scraping tool.
(Download the extraction task of this tutorial HERE )
Open our web scraping application Octoparse. ➜ Choose “Advanced Mode”. ➜ Click “Start” ➜ Complete basic information.
Enter the target URL in the built-in browser. ➜ Click the “Go” icon to open the web page.
(URL of the example:
Now we come to extract the values from the drop-down lists. There are four drop-down menus on the website. In this tutorial, I will take the values in the last two drop-down menus as the loop items in web scraping.
Assume that we want to find the profile of Andamans District in Andaman and Nicobar Islands State, here we will extract the values from the "Select state" and "Select districts" drop-down menus during the web scraping process.
Click "Select state" and choose "Loop switch Combobox".
We need to change the XPath here as we just want the value Andaman and Nicobar Islands in the “Loop Item” box. Just change the position of the XPath and click “Save”.
The same way to extract the value Andamans under “Switch Dropdown” action. Remember you should wait until the web page fully loads.
Now we come to loop the values in the last two drop-down menus.
Click "Select Sub-district" under “Switch Dropdown” action and choose "Loop switch Combobox".
We need to change the XPath here as we don't want the value "<-Select Sub-district->” in the Loop Item box. Just change the position of the XPath and click “Save”. Then you could get what you delete the value "<-Select Sub-district->” in web scraping.
The same way to extract the values in the “Select village” drop-down list under “Switch Dropdown” action.
Then we could search the result. Click "Submit" under the last "Switch Dropdown" action in the Workflow Designer. ➜Choose "Click an item".
We can extract the text from sub-district and village first as the layout of this list is not similar to the information below.
Click "Diglipur" and choose "Extract text". The same way to extract “Aerial Bay”. ➜Modify the name in the “Field Name”.
Now we can extract the sections with a similar layout. Move your cursor over those target sections, where you would extract data in web scraping.
Click the first highlighted link ➜ Click “TR” to expand the selected area. ➜Click “Create a list of items” (sections with similar layout). ➜ Click “Add current item to the list”.
Then the first highlighted link has been added to the list. ➜ Click “Continue to edit the list”.
Click the second highlighted link. ➜ Click “TR” to expand the selected area. ➜ Click “Add current item to the list” again.
Now we get all the links with similar layout in web scraping. ➜Then click “Finish Creating List” ➜ Click “loop” to process the list for extracting the elements from each page.
You would find that some lists have two fields while some only have one, so we could extract the data from the second item "Area of the village" as it has another field "81.86", which is missing in the first item.
Just manually choose the second item "Area of the village (in hectares)" in the Workflow Designer and extract the data.
Extract the title of the "Area of the village (in hectares)". ➜ Click and select “Extract text”.
All the contents will be selected in Data Fields. ➜ Click the “Field Name” to modify.
Click “Next” ➜ Click “Next” ➜ Click “Local Extraction” to run the task on your computer. Octoparse will automatically extract all the data selected.
The data extracted will be shown in “Data Extracted” pane. Click the “Export” button to export the results to Excel file, databases or other formats and save the file to your computer.
Author: The Octoparse Team
For more information about Octoparse, please click here.