Scrape Web Data from A Drop-Down Menu 1Sunday, October 9, 2016 9:49 AM
Drop-down menu is used in most websites, where the contents are dynamically linked to what you choose in the drop-down list.
In this tutorial, I will take eBay for example to show you how to scrape the data from the website with drop-down list.
(Download my extraction task of this tutorial HERE just in case you need it.)
Choose “Advanced Mode”. ➜ Click “Start” ➜ Complete basic information.
Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.
(URL of the example: http://www.ebay.co.uk/motors )
Now we come to extract the values from the drop-down list. Since we could not directly extract the values from the drop-down menu on this website, we need to manually add a “Loop” action and enter the XPath of the drop-down list.
Assumed that we want to find a car on this website. Here we will extract the values from the "Any make" drop-down list under the "Find a car" tab.
As for the XPath of the drop-down list, it's easy to copy it from Chrome extension or FireBug.
Drag a “Loop” action into the Workflow Designer. ➜ Click the loop box in the Workflow Designer. ➜ Choose "Variable list" and paste the XPath in the “Variable List” text box. ➜ Click “Save”.
Now you could find that values from the drop-down menu are extracted in the "Loop" action. See the "Loop Item" box.
Drag a "Click Item" into the loop box. ➜ Choose “Advanced Options” and set the AJAX timeout. ➜ Click “Save”.
Click "Search Cars & Trucks". ➜ Click “Click an item” ➜ Click "Open the link in new tab" to make sure you can locate in the search page, and click "Save".
Now we can extract the search results. Move your cursor over the section with similar layout, where you would extract data.
Click the first highlighted link ➜ Create a list of sections with similar layout.
Click “Create a list of items” (sections with similar layout). ➜ Click “Add current item to the list”.
Then the first highlighted link has been added to the list. ➜ Click “Continue to edit the list”.
Click the second highlighted link. ➜ Click “Add current item to the list” again.
Now we get all the links with similar layout. ➜Then click “Finish Creating List” ➜ Click “loop” to process the list for extracting the elements from each page.
Extract the title of the first section. ➜ Click the title ➜ Select “Extract text”. Other contents can be extracted in the same way.
All the content will be selected in Data Fields. ➜ Click the “Field Name” to modify.
Click “Next” ➜ Click “Next” ➜ Click “Local Extraction” to run the task on your computer. Octoparse will automatically extract all the data selected.
The data extracted will be shown in “Data Extracted” pane. Click “Export” button to export the results to Excel file, databases or other formats and save the file to your computer.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!