List & Detail Web Page - Advanced ModeThursday, March 24, 2016 5:47 AM
When we scrape product information from e-commerce websites, more often than not, we expect to extract data not only from the search result page but also from each product's detail page. In this tutorial, we will teach you how to build a customized crawler to achieve that purpose.
Let's say we need to search for "camera lens" on eBay. See the sample URL below:
In this case, we want to extract the title of the camera lens from the listing page first, and then go to its detail page to get the specifics.
The smart detection feature in Octoparse 8.4 is more powerful than ever. We can use it to detect the webpage to save us some time.
- Click Auto-detect web page data in the Tips box and wait for it to complete
- Switch between the auto-detect results to find your desired data fields (result 1 in this case)
- Modify the settings of the data fields by renaming them and removing the ones you don't want in the Data Preview section
When we search for popular product lines like the one we use to demonstrate, chances are that we need to navigate through multiple search result pages and extract data from each one of them.
Click on the Check button to see if Octoparse has successfully located a next page button
- Uncheck Add a page scroll and click Create workflow
- Select Click on link(s) to scrape the linked page(s) to generate a Click URLs in the list action
Now Octoparse has taken us to the detailed page for further data extraction.
In case the auto-detect function fails for some websites, we can also set up the workflow manually.
See the following steps:
1. Select the first item on the list page
2. Click Select all on the Tips panel
3. Click Extract text of the selected elements
4. Select the first title on the list page again
5. Click Click element and move to the detail page
If you have further issues with the task or have a suggestion that would make this a better resource for you, we’d love to hear about it. Submit a request here.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.