Scraping Data from Walmart.com

Saturday, December 31, 2016 1:43 AM

 

Octoparse enables you to scrape the best sellers from walmart.com.   

 

In this web scraping tutorial we will scrape all product data under “Pick it up TODAY” tab from walmart.com with Octoparse.

The website URL we will use is https://www.walmart.com. 

The data fields include product name, star rating score, number of reviews, price and product image URL.

 

You can directly download the task (The OTD. file) to begin collect the data. Or you can follow the steps below to make a scraping task to scrape the best sellers' information from walmart.com.

(Download my extraction task of this tutorial HERE just in case you need it.)

 

Step 1. Set up basic information.

 

Click “Quick Start” ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information.

 

Step 2. Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.

 

(URL of the example: https://www.walmart.com

 

Step 3. Click on the “Pick it up TODAY” tab. ➜ Choose “Click an item”. ➜ Click "Save".

 

 

Step 4. Right click on the “Next” pagination link. ➜ Choose “Loop click in the element” to turn the page.

 

 

(Note:

If you want to extract information from every page of search result, you need to add a page navigation action.

You can right click the “Next” pagination link to prevent triggering the link. 

You can click “Expand the selection area” button until “Loop click in the element” appears. )

 

Step 5. Move your cursor over the section with similar layout, where you would extract data.

 

Click the first item ➜ Click the "Expand the selected area" button until you see the A tag ➜ Create a list of sections with similar layout. Click “Create a list of items” (sections with similar layout). ➜ “Add current item to the list”.

Then the first item has been added to the list. ➜ Click “Continue to edit the list”.

Click the second item ➜ Click the "Expand the selected area" button until you see the A tag ➜ Click “Add current item to the list” again. Now we get all the links with similar layout. ➜Then click “Finish Creating List” ➜ Click “loop” to process the list for extracting the elements from each page.

 

 

Step 6. Extract the product detail information.

 

You can select the item that would has the full information you needed since sometimes the first item will not include all the content you want to extract.  

Extract the product name. ➜ Click the product name ➜ Select “Extract text”. Other contents can be extracted in the same way. All the content will be selected in Data Fields. ➜ Click the “Field Name” to modify. Then click “Save”.

 

 

Step 7. Drag the “Loop Item” box before the “Click to paginate” action of the "Cycle Pages" box so that we can grab all the elements of sections from multiple pages.

 

 

Step 8. Check the workflow.

 

Now we need to check the workflow by clicking actions from the beginning of the workflow.

Go to the webpage ➜ Click Item (Tick "AJAX Load" checkbox and set a timeout of 10 seconds➜ The Cycle Pages box (The "Nexy" button has been selected correctly) ➜ The Loop Item box (All the teams have been selected) ➜ Click Item  Extract Data (All the data fields are extracted correctly) Click to Paginate.

 

 

Step 9. Click “Save” to save your configuration. Then click “Next” ➜ Click “Next” ➜ Click “Local Extraction” to run the task on your computer. Octoparse will automatically extract all the data selected.

 

 

 

Step 10. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.

 

 

 

 

 

 

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today!

 

Author's Picks

 

Scrape Data from Yellowpages.com

Scraping Online Dictionary-Merriam-Webster

Scraping Product Detail Pages from eBay.com

Scraping Hotel Reviews from Tripadvisor.com

 

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf

 

 

btn_sidebar_use.png
btn_sidebar_form.png