Intro to Data Science Part 2: Sports Betting Data Basics

Intro to Data Science Part 2: Sports Betting Data Basics

Jason Scavone
Intro to Data Science
Modeling
May 5, 2023

A finger reaching out to start a sports betting data download

 

Before we can settle down and get to the business of analyzing data, first we need some data. You probably already figured that part out. There’s a crush of sports betting data out there; all we have to do is go get it. 

While we can’t provide an exhaustive list on places to source sports data for betting, we do have a couple of tried and true favorites to help get you started looking for baseball, pro football, college football, and women’s basketball. Plus, we’ll show you how to go hunting for more if the sport you want to analyze isn’t in that bunch. 

For now, it’s time to soak up some stats.

 

The Easy Way

First, we’re assuming most everyone can do basic spreadsheet work. If that’s not you, you may want to learn simple Excel functions before you come back to this series. 

There are several sites that allow you to export directly to .csv files that you can then import into Excel or Google Sheets. 

You’re probably already familiar with the Sports-Reference.com family of sites and with good reason. Sometimes you need to kill hours on the clock combing over Baseball-Reference.com to find out what Troy Glaus did in the ‘02 World Series. We get it. (10-for-26 with 3 homers and 8 RBIs. Saved you a click.)

Each table on the sports-reference sites will allow you to export the data as a .csv file using the “Share & Export” button. Or you can download tables as Excel workbooks directly. This works across sites for all of their sports, so it’s a useful task to master.

 

Sports betting data from baseball-reference.com

 

Another useful site for advanced baseball data is Baseball Savant. You can easily select columns of statistics you want to examine, then use the “Download CSV” button to export data into a file you can import into Excel or Sheets.

 

Pitcher data from Baseball Savant

 

Example Data

Here’s some sample data you can start to look through. These are the same .csv files we’re going to create in the next section, but if you’re dead set against doing anything with code, you can use to get a sense of the kinds of files we’ll be dealing with.

The Advanced Method

This next part is optional, but it’s an easy skill to quickly pick up, even for those who have no experience coding. Or who break out in hives at the idea of code. This will be painless, we promise.

(Even if you do only want to work in Excel, grabbing these packages using the programming language R is sometimes the only method available. This is well worth taking on.)

R is a language designed to handle large chunks of data, so it’s ideal for these kinds of tasks where we’re pulling down large swaths of sports betting data. In our introduction piece we suggested you may want to acquire an R integrated development environment. We use the open-source program RStudio

The first step is to create a project by going to File > New Project. Note which directory you’re using. When you export your files, that’s where they’ll end up.

Next, you have to install data packages.  

To install any package, use the command [install.packages(“packagename”)], without the brackets. As you start typing, RStudio will autofill commands and help out with the syntax. Enter the install command on the first line and click the “run” button in the upper right.

For example, to install the baseball package, enter [install.packages(“baseballr”)] and click “Run” or hit Ctrl-Enter. These are case sensitive, so if you get an error message, check your capitalization.

 

The console window will show the process going through its steps, and will display a message when it’s complete. 

Once you’ve installed the packages, create a new script by going to File > New > R Script (or using Ctrl-Shift-N), paste in the code in the new window and choose the “Source” button in the upper right, next to “Run.”

 

Completed script in RStudio for sports betting data

 

Here are a few examples you can run to export data into .csv files. Anything with a pound sign in front of it is a note that explains the following bit of code. 

 

Baseball – baseballR

# load library
library(baseballr)
# Get pitch-by-pitch data from statcast
# you can use the player search functions on baseballr to find players
shohei <- statcast_search(
start_date = "2022-04-06", end_date = "2022-04-15",
playerid = 660271, player_type = 'batter'
)
# write the data to CSV (for upload to Excel)
write.csv(shohei, "Shohei_statcast_data.csv")
# Get game-by-game logs from fangraphs
# this is the data we'll be using for the series
shohei <- fg_batter_game_logs(playerid = 19755, year = 2022)
# write data
write.csv(shohei, "Shohei_Fangraphs_game_logs.csv")

 

One thing to note when you’re looking up players: player ID numbers generally correspond to their MLB Advanced Media IDs, though some stats will use FanGraphs IDs. A shortcut to get those IDs without digging through either site is to ask ChatGPT. It also includes a lookup function.

Pro Football – nflfastR

# load library
library(nflfastR)
# get play by play data for 2019 through last year
nfl <- load_pbp(seasons = 2019:2022)
# write data
write.csv(nfl, "NFL_play_by_play_2019_2022.csv")

 

College Football – cfbfastR

# load library
library(cfbfastR)
# get play by play data for 2022
cfb <- cfbd_plays(year = 2022)
# write data
write.csv(cfb, "CFB_play_by_play_2022.csv")

 

WNBA – wehoop

# load library
library(wehoop)
# get WNBA play by play data
wnba <- load_wnba_pbp(seasons = 2022)
# write data
write.csv(wnba, "WNBA_some_2022_play_by_play.csv")

 

There are dozens of R packages for sports data online. Some of them you’ll have to go hunting for. But you can also use Sportsdataverse.org, which serves as something of a clearinghouse for many available packages

Get started pulling down data and bringing it into Excel if you want to practice your R. These scripts we’ve provided should give you the framework you need to adapt for other datasets with minimal editing. 

When we return, we’ll start looking at how to begin exploring the data to be able to start answering questions and testing hypotheses. 

If you have any questions, hop into the Discord and fire away. We’ll be like one big study group. And then we’ll see you in about a week and a half.

If you missed our other installments, here’s where you can find those:

Copyright 2024 © Unabated Sports, Inc. All Rights Reserved.

DISCLAIMER:
This site is strictly for educational and informational purposes only and does not involve any real-money betting. If you or someone you know has a gambling problem and wants help, call 1-800-GAMBLER. This service is intended for adults aged 18 and over only.