Step by Step Guide to Scraping & Cleansing eBay Product Data

Today, we’ll show you step by step for how to build your own no-code eBay web scraper to collect eBay product data in bulk. And then we’ll conduct a quick research using the extracted data (price, sold number, product specification, availabilities and more) to analyze if one kind of product sells better than the other kinds of product. If you are in the eCommerce business or are looking to run a similar analysis, definitely read on.

To makes things easier, we will use Octoparse for data collection and QuickTable for data cleansing & analysis. Both of these tools are built for non-coders and are quite intuitive to work with.

Why scrape eBay product data

As a globally popular C2C e-commerce platform, there are more than 1.7 billion product listings on eBay, making it an irreplaceable data source for all kinds of business analysis. It enables you to gather a variety of data, such as product categories, price specifications, prices, availabilities, top sellers, sales, etc. This data can be valuable for finding out the perfect market niche for your business, setting the right prices for the products you’re selling, monitoring how your customers are liking your products, sneaking around your competitors, and much more.

Define a goal for the study

The purpose of this study is to find out if there is a relationship between the prices, sold numbers and scales of the toy vehicles that are selling on eBay. For our source data, we’ll run a quick search on eBay using the keyword “toy car”. What we’ve got in the search results is the product data we need for the research, especially, this is the URL to the search results: https://www.ebay.com/sch/i.html?_from=R40&_nkw=toy+car&_sacat=0&_pgn=35

Collect data – Scrape product data using Octoparse

First, we’ll use Octoparse to scrape the product data from the search results pages.

Go ahead and download Octoparse on your device. As soon as you have it installed, sign up for a free account. The free plan is good for most web scraping scenarios unless you are on for something of a larger scale.

Step 1: Create a new task with auto-detection

Enter the target URL into the search bar, and click the “Start” button. Octoparse will start loading the page right away.

Once the page finishes loading, click “Auto-detect webpage data”. Octoparse will scan the page and try its best to “guess” what data you’d like to extract. Learn more about auto-detection.

Tips
Octoparse also provides pre-built templates for eBay. You can find it on the home page or by searching for “eBay” in the search bar. With the template, you’d only need to enter a few required parameters to start collecting data from eBay right away.

Step 2: Create and modify the workflow

Click “Create the workflow” after auto-detection is done. A scraping workflow will be auto-generated on the right-hand side. You may click on each step to check if it works properly, and add or delete any unwanted steps. The data to be extracted is highlighted in red inside the built-in browser and is also available in the preview section below. You can edit the data fields directly in the preview section whenever needed.

Tips
If Octoparse fails to detect the desired data on the page, you can turn off the “auto-detect” feature and build a workflow manually by following the tips on the Tips panel. Check out this step-by-step tutorial for how to build an eBay scraper from scratch.

Step 3: Scrape and export the data

Once the workflow is tested working, click on the Run button. You can choose to run your scraping task in the cloud if you have a premium subscription or on your local device if you don’t have one. It can take a while to scrape the data depending on the exact amount of data you’d like to fetch. You can wait till the task completes or stop the run when you have just enough data extracted. The data extracted can be exported and downloaded as CSV, Excel, or whatever formats work for you.

Clean data – turn data into usable formats

Now that we have the source data ready, it’s time to dig into the analysis part of it. We will use Quicktable to clean the dataset and create a graphic illustration of how these data points relate to one another.

Please sign up for a free account on QuickTable first.

Step 1: Upload the CSV file and create a new recipe

Launch QuickTable and create a new project named “Toy Car Analysis on eBay”. Upload the scraped CSV file into the project as a new dataset and open it. Click the Save Recipe button to create a new recipe.

Step 2: Clean and rename the data columns

You’d want to get rid of any unwanted data from the dataset and keep only the columns that are relevant to the research. For the sake of this study, we’ll keep the columns “Title”, “Price”, and “sitem__hotness”. Rename the columns as needed.

Step 3: Extract the data into usable formats

3.1 Extract price information
The “Price” data we have now is currently in string format so you’d want to convert the data into numerical values before doing any sort of calculations with the data. We’ll keep only the lower price for each product. Select the Price column, click the Text button, then select Substring->Extract numbers. A new column “Price0” will be created to include only the lower price of the price range. You can delete the original data column and rename Price0 to Min Price.

3.2 Extract sold number with a formula
The data we now have for “sitem__hotness” is quite messy. It can be something like “Last one” or any number followed by the word “sold”. Apparently, we are only interested in the numerical part of it.

To extract sold number information from the column “sitem__hotness”, we’ll use a formula. Click on the Formula button and the formula panel will show up on the right side. Enter a name for the new column and then input the below formula: IF(CONTAINS(`sitem__hotness`, “sold”), `sitem__hotness`, “”)

3.3 Extract scale information with regular expression
Notice there isn’t a single column for the scale of the toys but this information is very often revealed in the Title column. So in this case, we’ll need to clean the Title column using a regular expression that extracts the scale info from the product titles. Click “Text->Substring->Regular extraction”, then select column “Title” in the Substring panel, enter the regular expression, 1[:/][0-9]{1,3}, then enter the new column name “Scale”. The scale information will then be extracted to the new column.

Still, the extracted scale info is displayed with either “/” or “:.” and can be tough to work with. To keep the format consistent, we’ll replace one of them with the other one. Click Replace, select column Scale, and enter “/” as the Original value and “:” as the New value, then click the “Save” button.

3.4 Calculate the average price and sold numbers
Last but not least, we’ll group the products by scale, and find out what the average prices and sold numbers are for toys of various scales. To do that, we need to add a “Group by” step. Click the “Group by” button, select the column Scale in the list box “Group by”, and Min Price and Average in the list box “Column calculations”. Click Save and you’ve got the average prices and sold numbers of the product of different scales.

With QuickTable, every step of the data cleaning process will be saved and kept in the Recipe panel. You can review the steps or reuse the formula whenever needed.

Analyze data – visualize data using charts and graphs

Now we have the data cleaned and ready, the easiest way to see if there’s any relationship between the toy scales, the prices and the sold number is to create a chart. Click Chart, and define X and Y as shown in the following image. Click the “Save” button and you’ll get a chart created right away.

Looking at this chart, it is clear that the 1:32 toy cars are the best-selling ones on eBay, and the 1:18 toy cars are more expensive than toys of any other scale. In conclusion, our study does show that out of all the toy vehicles that are selling on eBay, those of 1:32 scale sell the best. So if you are a seller who wants to boost sales volume, you’d probably want to sell more of those. On the contrary, if you are aiming for higher prices items, you’d probably want to sell more 1:18 toy cars.

Wrap up

This study is only an oversimplified example of how you can conduct research of interest using product data on eBay. There’s no doubt about eBay being one of the largest data goldmines for market research, competitor analysis, trend research and much more. Tools like Octoparse and QuickTable make it possible for anyone to compile and analyze the data without the need to code. Of course, the idea is not restricted to eBay but can be extended to any other eCommerce stores like Amazon, AliExpress, and others.