Scraping Data from Walmart.comSaturday, December 31, 2016 1:43 AM
Octoparse enables you to scrape the best sellers from walmart.com.
In this web scraping tutorial we will scrape all product data under “Pick it up TODAY” tab from walmart.com with Octoparse.
The website URL we will use is https://www.walmart.com.
The data fields include product name, star rating score, number of reviews, price and product image URL.
You can directly download the task (The OTD. file) to begin collect the data. Or you can follow the steps below to make a scraping task to scrape the best sellers' information from walmart.com.
(Download my extraction task of this tutorial HERE just in case you need it.)
Step 1. Set up basic information.
Click “Quick Start” ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information.
Step 2. Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.
(URL of the example: https://www.walmart.com)
Step 3. Click on the “Pick it up TODAY” tab. ➜ Choose “Click an item”. ➜ Click "Save".
Step 4. Right click on the “Next” pagination link. ➜ Choose “Loop click in the element” to turn the page.
If you want to extract information from every page of search result, you need to add a page navigation action.
You can right click the “Next” pagination link to prevent triggering the link.
You can click “Expand the selection area” button until “Loop click in the element” appears. )
Step 5. Move your cursor over the section with similar layout, where you would extract data.
Click the first item ➜ Click the "Expand the selected area" button until you see the A tag ➜ Create a list of sections with similar layout. Click “Create a list of items” (sections with similar layout). ➜ “Add current item to the list”.
Then the first item has been added to the list. ➜ Click “Continue to edit the list”.
Click the second item ➜ Click the "Expand the selected area" button until you see the A tag ➜ Click “Add current item to the list” again. Now we get all the links with similar layout. ➜Then click “Finish Creating List” ➜ Click “loop” to process the list for extracting the elements from each page.
Step 6. Extract the product detail information.
You can select the item that would has the full information you needed since sometimes the first item will not include all the content you want to extract.
Extract the product name. ➜ Click the product name ➜ Select “Extract text”. Other contents can be extracted in the same way. All the content will be selected in Data Fields. ➜ Click the “Field Name” to modify. Then click “Save”.
Step 7. Drag the “Loop Item” box before the “Click to paginate” action of the "Cycle Pages" box so that we can grab all the elements of sections from multiple pages.
Step 8. Check the workflow.
Now we need to check the workflow by clicking actions from the beginning of the workflow.
Go to the webpage ➜ Click Item (Tick "AJAX Load" checkbox and set a timeout of 10 seconds) ➜ The Cycle Pages box (The "Nexy" button has been selected correctly) ➜ The Loop Item box (All the teams have been selected) ➜ Click Item ➜ Extract Data (All the data fields are extracted correctly) ➜ Click to Paginate.
Step 9. Click “Save” to save your configuration. Then click “Next” ➜ Click “Next” ➜ Click “Local Extraction” to run the task on your computer. Octoparse will automatically extract all the data selected.
Step 10. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!