We have created a series of web scraping tutorials for you to get on board quickly with our latest version Octoparse 8. By the end of the series, you will be able to build a crawler from scratch and pull data from any website you want.
In this lesson, we will go through how to scrape eCommerce data using the auto-detect algorithm on Octoparse 8. This is also a no-code web scraping example for scraping eCommerce data. You can follow us to build such a crawler for practice. Let’s begin this eCommerce web scraping journey without coding.
Most of the websites share similar layouts. For example, eBay is a webpage containing many items nested in a list.
Octoparse’s brand new auto-detect algorithm is specially designed to scrape these kinds of pages. It automatically detects for listing data (including text elements and links), “Next page” buttons, “load more” buttons and scrolls down the page, and then it generates the scraping task automatically.
Step 1: Create a new task
Enter the example URL into the search box. Click “Start” to create a new task.
Step 2: Get data via auto-detect
Octoparse will load the webpage URL in the built-in browser and start the auto-detect process. Please wait patiently until the process is complete and when more info is provided on the “Tips” panel.
Step 3: Check the data
Once the auto-detection is complete, follow the instructions provided on “Tips” and check your data in the preview section. You can rename the data fields or remove those that are not needed. The detected data will also be highlighted on the webpage for you.
Step 4: Confirm your options
Now, go to “Tips” and check your options. Based on the type of data detected, a number of options are provided for you to choose from. In this example, the listed data is detected so you are provided with the option to:
Option 1: Scrape the data in the list
This option is selected by default as Octoparse thinks this is what you need to do for sure.
Option 2: Click the “Next” button to capture multiple pages
Apparently, Octoparse has detected a “Next” button on the page. Check this option if you want Octoparse to click the “Next” button to scrape data from more pages.
To find out if the button detected is the correct one, click “Check” and see if it gets highlighted on the webpage. If you need to re-select the “Next” button, click “Edit” and follow the instructions on “Tips”.
Option 3: Click on the “links” to capture data on the page that follows
Now Octoparse is asking if you want to click on the links detected and scrape more information from the detail pages. Check this option if this is what you need.
To confirm if the links are the ones you’d like to click through, click “Check” to have the links highlighted on the web page.
In this case, we only want to scrape the list information across all pages, so we’ll go ahead and check the first and the second option.
Step 5: Save task settings
Octoparse will generate a workflow automatically based on the data detected and the saved settings. You can choose to run the task now or edit the workflow manually.
If everything looks good, you can hit save and run to get your data.
Don’t forget to practice with the HelloWorld test site. If you encounter any difficulties, feel free to submit a ticket or email us at firstname.lastname@example.org. To know how to optimize your task, you can check out lesson 2.