How to Extract Product Information from AmazonMonday, March 28, 2016 5:06 AM
Click HERE to download the .otd file before you get started. The extraction rule of this task is stored in this .otd file.
Step 1. Download Octoparse and install it. Register a new account at www.octoparse.com. Or directly click the “Sign up” option the Login interface.
Step 2. Click “start”to build a new task./Hit the “Quick start” button in Navigation Panel to Create a new task.( Here we use Advanced Mode.)
Step 3. Complete basic information. ➜ Click “Next”.
Step 4. Design Workflow to configure the extraction rule. You can check your configuration rule in Workflow Designer here if something goes wrong.
Step 6. Create a list of links of all the subcategories. Wait until the page loaded, click the first subcategory. ➜ Choose “create a list of items”.
Select “Add current item to the list” ➜ “Continue to edit the list” ➜ Click the second subcategory.
Select “Add current item to the list” again.
When you get all the subcategory links, click “Finish Creating List”. ➜ Select “Loop” to process the list.
Step 7. Now you can see it automatically enter the first category page
Click “Next Page” ➜ “loop click next page” to create a loop action to process all the web pages. The action of pagination has been added to the extraction rule.
Then go back to the first product section. If you want to capture the information inside the product section, you have to click the detail link to get into the detail page. ➜ Choose the detail link. ➜ Click the first product title to "create a list of items" again.
Then click “Add current item to the list” ➜ “Continue to edit the list”.
Then click the second product title. ➜ Click “Add current item to the list” ➜ “Finish Creating List”
As can be seen, all the detail links on the first page are all here. And Click “loop” to process the list.
Step 8. Now you’re on the detail page. Then extract any information you need. Click on the product title to extract it.
Click “Extract Text”.
Click on price to extract. ➜ Then click “Extract Text”. And you get the product title and price in the Customize Current Action box.
You can change your field name right here.
Same way goes to other information. Select what you want to extract!
Step 9. Now look at the Workflow designer.
Drag the second “Loop Item” before “Click to paginate” action.
Step 10. Now we are done configuring extraction rule! Click “Next” to process configured rule. When images are not needed, you can choose not to load images to speed up the extraction.
Now the Task is completed! Choose the “Local extraction” to run the task on your computer.
The data extracted will be shown in "Data Extracted" pane. Click button to export the results to Excel file, databases or other formats and save the file to your computer.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.
If this video tutorial is not available for you, you can click hereto see the corresponding graphic tutorial.