Finding suppliers has been an ongoing challenge for most eCommerce businesses. If you are one of them, you might get the answer from Alibaba, one of the largest B2B marketplaces made for buyers and sellers.
In this article, you’ll learn how to scrape data from Alibaba and utilize it to find the best supplier. To make it easier to follow, we’ll demonstrate the process using “toy car” as an example and show you how to get an insight into toy car suppliers’ verified years, average product ratings, and total reviews.
Why scrape data from Alibaba
In the past 20 years, Alibaba has attracted more than 26 million active buyers worldwide, as well as suppliers from more than 190 countries and regions. There are more than 5,900 product categories on Alibaba. Naturally, it becomes a massive database of suppliers.
Unlike other retail platforms, Alibaba gives buyers more supplier information. For instance, there is a category of suppliers called Verified Suppliers on Alibaba. Each verified supplier has been verified onsite and has a business license, and all of the products they list on Alibaba include the verified years. With such information, it may be easier to identify suppliers that are more reliable than others.
Use Octoparse for data scraping
We’ll use a web scraper, Octoparse, that is built for non-coders to scrape and extract webpages easily. If you have not used it before, please download it. Once you have installed it on your device, you can sign up for a free account to log in.
Step 1: Enter the target URL to create a new task
Launch Octoparse and log in, paste the target URL into the search bar and click the “Start” button. The page will be loaded in the Octoparse built-in browser after seconds.
Target URL: https://www.alibaba.com/trade/search?fsb=y&IndexArea=product_en&CatId=&tab=all&SearchText=toy+car&viewtype=
Step 2: Create and customize the workflow
To have Octoparse scan the page and find any data you might be interested in, select “Auto-detect webpage data.” Octoparse will highlight the data that has been detected and can be extracted. You can select “Switch auto-detect results” and see what else Octoparse has detected for you if the highlighted data is not what you want.
Once you’ve selected all the data fields as needed, click “Create workflow” on the Tips panel. Then, a workflow will show up on the right-hand side. You can check if each step runs properly by clicking through the steps.
Step 3: Scrape and export data
When the task is set to go, click Run to get the job started. You’ll choose if you want to run the task on your device or run it in the Cloud. Running on your device is great for quick runs, while running tasks on the Cloud can be less of a fuss when things are done on the Cloud servers and will not interfere with whatever you are doing with your machine. Cloud runs are usually faster also as there are more cloud servers dedicated to getting the job done.
After the scraping is complete, export the data into CSV file formats. Up to this point, we’ve got ourselves a structured data file for data cleansing and analysis.
Use QuickTable for data cleansing and analysis
Now we can use QuickTable to clean and analyze the data. This is a no-code data modelling and transforming tool for anyone to clean, transform, enhance, and analyze data in a drag-and-drop manner. You can learn more about this powerful tool on QuickTable and sign up for a free account.
Clean data in three steps
Step 1: Upload the data file and create a new recipe
Log in to QuickTable, create a new project named “Toy Car Analysis”, and upload the scraped CSV file into the project as a new dataset. Then open it and click Save Recipe.
Step 2: Keep and rename the columns
Looking at the dataset, we’ll see that it contains more data than what we need. Let’s go ahead and keep only the columns “Title_URL”, “Year”, “fc3_URL”, “score”, and “View.” Then rename them as “product link”, “supplier-verified years”, “supplier profile link”, “rating”, and “review number” for further use.
Step 3: Extract the number value of columns “supplier-verified years” and “review number”
Columns “supplier-verified years” and “review number” are now in string format. Because we want to calculate the max value of verified years and the average value of the review number of each supplier, we need to extract the numbers into numerical values first.
To extract numbers from these columns, click the “Format” button and select Substring->Extract numbers. Then you can remove the original string columns and rename the new numerical columns for further processing.
Perform a simple data analysis with the “Group by” feature
When we look closely at the data file, it’s not hard to notice that one supplier might provide several goods on Alibaba. So, we need to group all data by suppliers.
Click the “Group by” button, select “supplier profile link” in the list “Group by” bar, then add the values in the “Column calculation” bar to get the maximum of “supplier-verified years”, the average value of “rating”, and the total of “review number.”
After a click on the “Save” button, now we can check the maximum verified years of every toy car supplier and determine which supplier might be more reliable based on their average ratings and number of reviews.
We’ve just collected a small amount of data from Alibaba while there are a lot more possibilities. For example, on the product listing page, you can extract the product’s minimum order and supplier regions. Other info like the business type, main products, main markets, etc., can also be collected from suppliers’ profile pages. Go and explore Alibaba, I bet you can find just the data you need to find the perfect suppliers.