Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
The latest version for this tutorial is available here. Go to have a check now!
In this tutorial, we are going to show you how to scrape the product information from Amazon.com.
To follow through, you may want to use this URL in the tutorial:
We will enter each detail page of Bluetooth Headphones and scrape the details including the product title, brand, rating, and price.
This tutorial will also cover:
Here are the main steps in this tutorial: [Download task file here ]
1. "Go To Web Page" - to open the targeted web page
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Walmart.com, we strongly recommend Advanced Mode to start your data extraction project.
We strongly suggest you turn on the "Workflow Mode" to get a better picture of what you are doing with your task, just in case you mess up with the steps.
2. Create a pagination loop - to scrape all the results from multiple pages
Amazon.com applies the AJAX technique to the pagination button. Therefore, we need to set up AJAX Load for the "Click to paginate" action.
3. Create a "Loop Item" - to scrape all the items on each page
When extracting data throughout multiple pages, you should always begin your task building on the first page.
Octoparse will automatically select all the links to the detail pages on the current page. The selected links will be highlighted in green while other links to the detail pages will be highlighted in red.
Octoparse will click through each link captured in the "Loop Item", and open the detail page.
If you want to learn more about AJAX, here is a related tutorial you might need：
4. Extract data - to select the data for extraction
After you click "Loop click each element", Octoparse will open the detail page of the first hotel.
When the content of the page has already shown out, but it is still loading, you could click the "X" button at the right end of the navigating bar to stop loading.
5. Save and start extraction - to run the task and get data
Here is the sample output. You can see some blank fields in the column "Price". This is because these products are out of stock and thus they don't have the price information.
By default, if Octoparse cannot find the element of the defined pattern on the page, the field will be left blank. However, Octoparse may fail to find the element of the defined pattern even if the element needed is shown on the website. If you encounter this problem, here are a related tutorial you might need：
Happy data hunting!
Was this article helpful? Contact us at any time if you need our help!