undefined

List & Detail Web Page - Advanced Mode

Thursday, March 24, 2016 5:47 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

When we scrape product information from e-commerce websites, more often than not, we expect to extract data not only from the search result page but also from each product's detail page. In this tutorial, we will teach you how to build a customized crawler to achieve that purpose.

Let's say we need to search for "camera lens" on eBay. See the sample URL below:

https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=camera+lens&_sacat=0&LH_TitleDesc=0&_odkw=camera+lens&_osacat=0 

 

In this case, we want to extract the title of the camera lens from the listing page first, and then go to its detail page to get the specifics.

The smart detection feature in Octoparse 8.4 is more powerful than ever. We can use it to detect the webpage to save us some time.

  • Click Auto-detect web page data in the Tips box and wait for it to complete
  • Switch between the auto-detect results to find your desired data fields (result 1 in this case)
  • Modify the settings of the data fields by renaming them and removing the ones you don't want in the Data Preview section

When we search for popular product lines like the one we use to demonstrate, chances are that we need to navigate through multiple search result pages and extract data from each one of them.

  • Click on the Check button to see if Octoparse has successfully located a next page button

  • Uncheck Add a page scroll and click Create workflow

 

auto-detect results

 

 

  • Select Click on link(s) to scrape the linked page(s) to generate a Click URLs in the list action

 

Linked pages

Now Octoparse has taken us to the detailed page for further data extraction. 

 

Tip! 
In case the auto-detect function fails for some websites, we can also set up the workflow manually.
See the following steps:
1. Select the first item on the list page
2. Click Select all on the Tips panel
3. Click Extract text of the selected elements
4. Select the first title on the list page again
5. Click Click element and move to the detail page

 

If you have further issues with the task or have a suggestion that would make this a better resource for you, we’d love to hear about it.  Submit a request here.

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline