Scrape Web Data from A Drop-Down Menu 2Thursday, October 13, 2016 3:30 AM
In the previous tutorial we have shown you how to scrape web data from single drop-down menu. You would find there are another multiple drop-down menus, where the contents of another drop-down menu don’t link to what you choose in the first one.
In this tutorial, I will take Booking for example to show you how to scrape web data from the website with multiple drop-down lists.
(Download my extraction task of this tutorial HERE just in case you need it.)
Choose “Advanced Mode”. ➜ Click “Start” ➜ Complete basic information.
Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.
Now we come to extract the values from the drop-down menus. Assumed that we want to find a room for different numbers of adults and children. Here we will extract the values from the “Adults" and "Children" drop-down lists under the "Search" tab.
It’s much easier this time as we could directly extract the values from the drop-down menus without entering the XPath manually.
Click the combobox under "Adults" tag. ➜Click "Loop switch combobox". Now we could loop click the values in the drop-down menu.
The same way to extract the values in the “Children” drop-down menu. Click the combobox under "Children" tag. ➜Click "Loop switch combobox".
Drag the second “Loop Item” box after “Switch Dropdown” action in the first “Loop Item” box under the Workflow Designer so that we can grab all the elements of sections from multiple drop-down menus.
Click "Switch Dropdown" in the second "Loop Item" box. ➜ Click "Search". ➜Click "Click an item". ➜ Click “Save”. Now that we come to the search result page.
Now we can extract the search results. Move your cursor over the section with similar layout, where you would extract data.
Click the first highlighted link ➜ Create a list of sections with similar layout.
Click “Create a list of items” (sections with similar layout). ➜ Click “Add current item to the list”.
Then the first highlighted link has been added to the list. ➜ Click “Continue to edit the list”.
Click the second highlighted link. ➜ Click “Add current item to the list” again.
Now we get all the links with similar layout. ➜Then click “Finish Creating List” ➜ Click “loop” to process the list for extracting the elements from each page.
Extract the title of the first section. ➜ Click the title. ➜ Select “Extract text”. Other contents can be extracted in the same way.
All the contents will be selected in Data Fields. ➜ Click the “Field Name” to modify.
Click “Next” ➜ Click “Next” ➜ Click “Local Extraction” to run the task on your computer. Octoparse will automatically extract all the data selected.
The data extracted will be shown in “Data Extracted” pane. Click “Export” button to export the results to Excel file, databases or other formats and save the file to your computer.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!