Octoparse

Clicking each link in a list and scraping data from a new page is a common scenario in web scraping. This tutorial will show you how to click through a listing page to a detail page for getting the data you need. This is especially useful when extracting from e-commerce sites (Amazon, eBay, etc) and business directories (Yelp, Yellowpage, etc).

web scraping with octoparse - extract from item page

You may need this link to follow through:

<a href="https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008" rel="nofollow noopener noreferrer" target="_blank">https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008</a>

___________________________________________________________

1. Use "Auto-detect" to set up the workflow

Once you've created a new task using the example URL (<a href="https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008" rel="nofollow noopener noreferrer" target="_blank">https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008</a>), select Auto-detect web page data. Octoparse will now detect any data on the page, and you can click Create workflow to generate the workflow.

- Once you've created a new task using the example URL (<a href="https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008" rel="nofollow noopener noreferrer" target="_blank">https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008</a>), select Auto-detect web page data. Octoparse will now detect any data on the page, and you can click Create workflow to generate the workflow.

Select Select subpage URL on the Tips panel and choose an option from the dropdown menu. Here you can choose Title_URL.

- Select Select subpage URL on the Tips panel and choose an option from the dropdown menu. Here you can choose Title_URL.

Octoparse will now take you to the detail page of the first product.

Auto-detect the web data again or click on target data fields such as title, condition, price, etc. to scrape them

- Auto-detect the web data again or click on target data fields such as title, condition, price, etc. to scrape them

Click on the first product title that contains the product page URL. The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.

- Click on the first product title that contains the product page URL. The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.

Click Select all similar elements on the Tips panel

- Click Select all similar elements on the Tips panel

Note: If there is no Select all option on the Tips panel after you select the first URL, please continue to select the second URL.

Select Loop click each element, or Loop click each URL from the Tips panel.

- Select Loop click each element, or Loop click each URL from the Tips panel.

Once you get this pop-up, click on No

- Once you get this pop-up, click on No

The loop-click step will be auto-generated and added to the workflow.

Note: To loop click-through all the links on the list, it is important that you select the anchor element. Octoparse automatically identifies tags for selected items. So when you select an item with a URL, the selected tag would be "A", which stands for an anchor that usually links one page to another.

If you find Octoparse does not locate the A tag, you can click the "A" on the Tips panel.

Click on target data fields such as title, review, price, etc. to scrape them

- Click on target data fields such as title, review, price, etc. to scrape them

Note: Setting up a wait time in Options for steps like "Click Item" or "Extract Data" can effectively avoid data skipping and make the crawling process more human-like. (Usually, 2-5 seconds would work well). Then click Apply to confirm.

1. Use "Auto-detect" to set up the workflow

2. Set up the workflow manually