All Collections
Using Octoparse
Click each link in a list and scrape data from new pages
Click each link in a list and scrape data from new pages
Updated over a week ago

Clicking each link in a list and scraping data from a new page is a common scenario in web scraping. This tutorial will show you how to click through a listing page to a detail page for getting the data you need. This is especially useful when extracting from e-commerce sites (Amazon, eBay, etc) and business directories (Yelp, Yellowpage, etc).

web scraping with octoparse - extract from item page

You may need this link to follow through:


1. Use "Auto-detect" to set up the workflow

  • Select Select subpage URL on the Tips panel and choose an option from the dropdown menu. Here you can choose Title_URL.

Octoparse will now take you to the detail page of the first product.

  • Auto-detect the web data again or click on target data fields such as title, condition, price, etc. to scrape them


2. Set up the workflow manually

  • Click on the first product title that contains the product page URL. The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.

1.png
  • Click Select all similar elements on the Tips panel

Note: If there is no Select all option on the Tips panel after you select the first URL, please continue to select the second URL.

  • Select Loop click each element, or Loop click each URL from the Tips panel.

  • Once you get this pop-up, click on No

The loop-click step will be auto-generated and added to the workflow.

Note: To loop click-through all the links on the list, it is important that you select the anchor element. Octoparse automatically identifies tags for selected items. So when you select an item with a URL, the selected tag would be "A", which stands for an anchor that usually links one page to another.

If you find Octoparse does not locate the A tag, you can click the "A" on the Tips panel.

  • Click on target data fields such as title, review, price, etc. to scrape them

Note: Setting up a wait time in Options for steps like "Click Item" or "Extract Data" can effectively avoid data skipping and make the crawling process more human-like. (Usually, 2-5 seconds would work well). Then click Apply to confirm.

1112.png

Did this answer your question?