Blog > Data Collection > Post

Web Scraping for Sports Stats

Tuesday, July 31, 2018

 Statistics, or big data, has transformed the sports industry, from team composition, playing strategy to marketing operation, from the organizations who own sports teams to all the business around it, like consulting, media, or even betting agencies. Forbes has estimated the sports industry will reach $73.5 billion by 2019.

When it comes to scraping sports data from websites, many people will think of using R, Python or API of the websites. But all of them are quite difficult for people with no prior programming background, like me. 

So here I would like to introduce the means for non-tech people to scrape sports data from websites, by using Octoparse, a beginners-friendly web scraping tool. The advantages you could get are:

Easier - Point & Click visible operation, no programming required.

Faster - You don’t need to study the websites or test your coding.

Various Data Formats - Excel, CSV, JSON, HTML, or export to your database, including SQL Server, MySQL, and Oracle.

And the last but not the least, it’s FREE!

 

Where could you scrape the sports data?

To address this question, we need to understand what’s sports stats for? The purpose of sports statistics could break down into two parts: Performance Analytics & Market Value Analytics. Somehow the latter will be affected by the former.   

Sports performance analytics will require the information including tables, results, fixtures, and standings. Mainly these informations could be found on the relevant official sites, like NBA.com, FIFA.com, NFL.com; or some third party websites providing the congregated information, like sportstats.com. Regarding the market value analytics, apart from the above-mentioned information, it requires information from Social Medias or portal sites, to evaluate their social influence.

  

 

How can you scrape the sports data?

Instead of a step by step tutorial on a specific website, here I prefer to show you a roadmap for web scraping sports data from different kinds of platforms, helping you find out the right path for web scraping sports data.

 

Scraping Table Information

Most sports data are shown in a table, so with the same scraping workflow, you can extract the information from the sports official sites or any third party websites. To create the scraping crawler for retrieving table information, you can follow this two articles:

3 Steps to Scrape Men’s Ranking from FIFA.com

Scraping Betting Odds for Sports Analytics

 

 

Scraping data from Social Media

To scrape reviews or tweets from Social Media for market value analysis, you can open the searching result page in the built-in browser of Octoparse, or build up key-words inputted scraping tasks. Please follow the instructions of these articles:

YouTube: Scraping Video Information and Reviews of 2018 World Cup

Twitter: Scraping tweets from Twitter

Scraping with Key-words inputted

 

 

Build Your Sports Data Feed

If you need to build a sports data feed, keeping the extracted data updating automatically and continuously, you may want to use Octoparse premium functions: Cloud Extraction. The benefit of it including:

- The scraping task can be scheduled to run in the cloud at any time and frequency

- Data extracted can feed in the database automatically

- Data collected speed increase up to 6-20 times

- Connected with Octoparse API, with which you can feed the data into your own systems

 

Conclusion

 Actually, you don’t need to figure out all the scraping tutorials above, but just one of them could help you understand the working logic of scraping tasks, then you can apply to other similar websites. 

 

Any questions you encounter in scraping data, Octoparse Support Team is ready to help!

 

 

 

Happy Data Hunting!

 

Download Octoparse Today

 

Author's Picks

Video: What is web scraping?

Extracting dynamic data with Octoparse 

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png