How to Extract Product Information from Amazon

Monday, March 28, 2016 5:06 AM

Click HERE to download the .otd file before you get started. The extraction rule of this task is stored in this .otd file. 

 

Step 1. Download Octoparse and install it. Register a new account at www.octoparse.com. Or directly click the “Sign up” option the Login interface.

  

Step 2. Click “start”to build a new task./Hit the “Quick start” button in Navigation Panel to Create a new task.( Here we use Advanced Mode.)

 

Step 3. Complete basic information. ➜ Click “Next”.

 

Step 4. Design Workflow to configure the extraction rule. You can check your configuration rule in Workflow Designer here if something goes wrong.

 

Step 6. Create a list of links of all the subcategories. Wait until the page loaded, click the first subcategory. ➜ Choose “create a list of items”.

 

 

Select “Add current item to the list” ➜ “Continue to edit the list” ➜ Click the second subcategory.

 

 

Select “Add current item to the list” again.

 

When you get all the subcategory links, click “Finish Creating List”. ➜ Select “Loop” to process the list.

 

Step 7. Now you can see it automatically enter the first category page

 

Click “Next Page” ➜ “loop click next page” to create a loop action to process all the web pages. The action of pagination has been added to the extraction rule.

 

Then go back to the first product section. If you want to capture the information inside the product section, you have to click the detail link to get into the detail page. ➜ Choose the detail link. ➜ Click the first product title to "create a list of items" again.

 

Then click “Add current item to the list” ➜ “Continue to edit the list”.

 

Then click the second product title. ➜ Click “Add current item to the list” ➜  “Finish Creating List”

 

As can be seen, all the detail links on the first page are all here. And Click “loop” to process the list.

 

Step 8. Now you’re on the detail page. Then extract any information you need. Click on the product title to extract it.

 

Click “Extract Text”.

 

Click on price to extract. ➜ Then click “Extract Text”. And you get the product title and price in the Customize Current Action box.

 

You can change your field name right here.

 

Same way goes to other information. Select what you want to extract!

 

Step 9. Now look at the Workflow designer.

 

Drag the second “Loop Item” before “Click to paginate” action.

 

Step 10. Now we are done configuring extraction rule! Click “Next” to process configured rule. When images are not needed, you can choose not to load images to speed up the extraction.

 

Now the Task is completed! Choose the “Local extraction” to run the task on your computer.

 

The data extracted will be shown in "Data Extracted" pane. Click button to export the results to Excel file, databases or other formats and save the file to your computer.

 

 

Happy Data Hunting!

 

 

 

 

 

 

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

 

 

 

For more information about Octoparse, please click here.

Sign up today.

 

 

Author's Pick

 

 

Octoparse Smart Mode -- Get Data in Seconds

Get Started with Octoparse in 2 Minutes

Smart Mode No Coding No Training

Scrape Job Postings from Glassdoor 

Scrape Job Postings from Indeed.com

Scrape Job Postings from Monster.com

Scrape Content Details from Freelancer.com

Get Updated Data with Clicks

 

 

 

 

 

Contact
us

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.