Clicking each link in a list and scraping data from a new page is a common scenario in web scraping. This tutorial will show you how to click through a listing page to a detail page for getting the data you need. This is especially useful when extracting from e-commerce sites (Amazon, eBay, etc) and business directories (Yelp, Yellowpage, etc).
You may need this link to follow through:
1. Use "Auto-detect" to set up the workflow
Once you've created a new task using the example URL (https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008), select Auto-detect web page data. Octoparse will now detect any data on the page, and you can click Create workflow to generate the workflow.
Select Select subpage URL on the Tips panel and choose an option from the dropdown menu. Here you can choose Title_URL.
Octoparse will now take you to the detail page of the first product.
Auto-detect the web data again or click on target data fields such as title, condition, price, etc. to scrape them
2. Set up the workflow manually
Click on the first product title that contains the product page URL. The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.
Click Select all similar elements on the Tips panel
Note: If there is no Select all option on the Tips panel after you select the first URL, please continue to select the second URL.
Select Loop click each element, or Loop click each URL from the Tips panel.
Once you get this pop-up, click on No
The loop-click step will be auto-generated and added to the workflow.
Note: To loop click-through all the links on the list, it is important that you select the anchor element. Octoparse automatically identifies tags for selected items. So when you select an item with a URL, the selected tag would be "A", which stands for an anchor that usually links one page to another.
If you find Octoparse does not locate the A tag, you can click the "A" on the Tips panel.
Click on target data fields such as title, review, price, etc. to scrape them
Note: Setting up a wait time in Options for steps like "Click Item" or "Extract Data" can effectively avoid data skipping and make the crawling process more human-like. (Usually, 2-5 seconds would work well). Then click Apply to confirm.