Intro to Data Science: Glossary

Jason Scavone

Intro to Data Science

April 1, 2023

A man scans a glossary

Binomial distribution

A type of distribution that measures the likelihood of two possible, but mutually exclusive, outcomes. In other words, a success or failure.

Binomial distributions are used when you have a set amount of trials and the probability of success is the same for all trials. It is running X trials and recording Y number of results.

If we were to flip a coin 100 times to see how often heads comes up, the result would be binomial distribution.

Distribution

An analysis of all possible outcomes for an event, and how often those individual outcomes occur in the overall set of possibilities.

For example, there are 36 specific outcomes when you roll two dice together, but only one way to roll a two (by rolling a one on one die, and a one on another). One divided by 36 is 2.7 percent. There are two ways to roll a three (a two and a one, and a one and a two). That happens 5.6 percent of the time.

The full distribution of rolls looks like this:

A distribution of dice rolls — (From math.stackexchange.com)

Monte Carlo

Simulations used to help predict the outcomes of events that are uncertain. These simulations use a range of values as inputs, rather than specific, fixed values.

By running the simulations over and over again, Monte Carlo simulations reveal a pattern of predicted outcomes, offering a distribution of possible outcomes. Unabated’s prop simulators, for example, use Monte Carlo simulations to arrive at the likelihood of potential outcomes based on a range created from prop projections.

Multivariable regression

A type of regression that is used to establish the relationship of an outcome, or dependent variable, to one or more independent variables.

In other words, if you want to learn how a baseball team scores runs (the dependent variable), you might use slugging percentage and on-base percentage as your independent variables.

See also: regression

Negative binomial

The chief difference between a binomial distribution and a negative binomial distribution is instead of recording how many times we get X result over Y trials, we record how many trials it takes to achieve X number of results.

A binomial distribution asks “Out of 100 coin flips, how many times will it come heads?” A negative binomial distribution asks “How many times will you need to flip a coin to get heads 50 times?”

Package

In the programming language R, a package is an extension you can add to your R installation. There are several packages that can be used to analyze sports data. We go into more depth on installing packages in our Sports Betting Data Basics section.

Poisson distribution

A type of distribution that measures the likelihood of a number of events happening over a set period of time, using an average of how often those events typically occur.

A Poisson distribution could be used, for example, to assess the likelihood of a pitcher who averages seven strikeouts a game to get specifically five, six, seven, or eight punchouts in a given matchup.

Regression

A process to establish the relationship between dependent and independent variables.

The most commonly used is a linear regression. This shows whether independent variables are a good predictor of dependent variables. It can also be used to analyze which variables are the best predictor of dependent outcomes. Linear regressions find the most efficient “line” through data points that reveal underlying correlations. Linear regressions can be simple, exploring the relationships between one independent variable and their dependents. Or they can be multivariate, using multiple independent variables.

For example, if we think weighted on-base average will have a direct impact on runs scored in a baseball game, or average depth of target will mean more points in a football game, we could analyze this using a linear regression.

More data science

For the rest of our Intro to Data Science series, here’s where you can find more:

Learn From The Pros

START MY TRIAL!

Intro to Data Science: Glossary

Binomial distribution

Distribution

Monte Carlo

Multivariable regression

Negative binomial

Package

Poisson distribution

Regression

More data science

Latest Articles

Contact Your Representative About the Big Beautiful Bill’s Gambling Tax Provision

Why I’m Finally Starting to Believe in Evolution

How To Bet The CFL: A Primer

Getting Precise About Closing Line Value

Five WNBA Betting Tips For NBA And College Bettors

The Sports Bettor’s Guide To Betting The Kentucky Derby

Intro to Data Science Extra: Build a Playoff Series Simulator

A Step-By-Step Guide to Sports Betting With Crypto

Latest Videos

Learn About The Props Simulator

Learn About The Partial Game Derivative Calculator

Learn about the CLV Calculator

Learn about the Derivatives Calculator

Learn about the Hold Calculator

Betting Tools

Betting Odds

Betting Calculators

Betting Education

Unabated