Intro to Data Science Extra: Build a Playoff Series Simulator

Intro to Data Science Extra: Build a Playoff Series Simulator

Jason Scavone
Intro to Data Science
Simulator
September 25, 2024

 

We’re rolling deep into the fall now, with everything that entails: hoodies; increased weekend wing consumption; buying Halloween candy early “just to get ahead” and then buying all new candy two weeks later after you eat all the Twix. Oh, and MLB and WNBA playoffs. With playoffs come series bets and with series bets come some surprisingly beatable prices at sportsbooks. Here’s how to build a simple Monte Carlo playoff series simulator in R to find them.

This is just a basic framework. Consider it a jumping-off point to building something bigger and badder. Bring in additional variables, add in conditions, or adapt it to other types of bets as you see fit.

But first, what’s Monte Carlo simulation, and why do we want to use it?

 

Why Use Monte Carlo Simulation?

The nerdy answer is that Monte Carlo simulation takes probabilities you provide for different variables, and then it generates an outcome based on the model you tell it to use. 

The fun answer is that it’s a set of rules built around computerized dart-throwing. 

This is a square with a circle in. You can tell, because it’s a square with a circle in it.

The circle represents our dartboard. The simulator picks a number at random. Anything that falls within the confines of our circle, on the dartboard, is a hit. Anything not in the circle is a miss. And an opportunity for your wife to yell at you for putting a bunch of holes in the wall.

For example, if we run the simulation with just a few throws, we could see it end with 30 are inside the circle and five outside. Our “hit” probability is 86 percent.

As we do more trials, we can be more certain of the results. After about 150 throws, about 15 are outside the target. Which is around a 90 percent hit rate. 

The full picture is coming into focus. And your wife is definitely hollering about 15 new holes in the wall.

Monte Carlo simulation works the same way. We generate a random number between 0 and 1. If the number is less than the probability we’ve assigned the event, we get a “win.” More, and it’s a “loss.” 

The simulator repeats this process however many times you designate – 1,000 or 10,000 or 100,000, whatever the number may be. You could do it millions of times if you need that level of precision.

Monte Carlo simulation is great for events with uncertain outcomes – like sports – because it accounts for uncertainty and randomness by using a large sample of trials to help sharpen up the probabilities around potential outcomes. 

Why Build a Playoff Series Simulator in R?

There’s a cheat code to all of this: using advanced large-language models like ChatGPT or Claude. If you give it the probabilities and parameters, an LLM can run a Monte Carlo for you.

You could even set all this up to run in Excel without doing any programming. Or if you don’t want to spring for a subscription to an AI site that may limit how much data analysis you’re allowed on a free plan. 

But we prefer R for a couple of reasons. 

First, you have granular control of a more transparent process. Adding and subtracting variables can sometimes make LLMs drift off course. You have to double-check the work anyway, and it’s not always easy to see what’s happening under the hood. You may have less control over a process that isn’t as customizable. 

Working in Excel can be fine for small data sets but as you increase the number of iterations you want to run and the number of variables, spreadsheets can bog down to be near-unusable. 

R is designed for statistics. It’s fast, it’s customizable, and it’s free. 

 

Building the Playoff Series Simulator Functions

You need a few basic functions to get your playoff series simulator working right. Obviously, the core simulation function, but also a means to input your variables, a way to extract your results, and a way to convert probabilities to price. 

We’ll start with the latter because it’s fairly quick and painless.

For the sections on building the code, we’ll give you the broad strokes, and then a brief rundown of the technical parts if you want to know what certain commands do. If you’re content with the broad strokes, you can skip the technical sections. We won’t be offended.

 

Probability to Price

This uses the standard formula for converting a probability X to odds where favorite odds = (X/1-X) * 100 and underdog odds = (1-X/X) * 100.

It uses a conditional statement to determine whether the odds are negative or positive, and apply the appropriate formula.

The Technical Part

Whenever we need to convert probability to price, we can call this with an argument by using function(prob) where (prob) is the probability we feed it later in the script. 

We’re using (prob >= 0.5) as the condition to find the right formula. When probabilities are over .500, they’re positive and it uses the favorite formula. 

“Return” gives us our object – in this case the price. “Round” is simple rounding to eliminate decimals from the price.

 

 

 

Prompts for Your Variables

For the main function, we’re including some prompts to help make the script more flexible.

It’s especially helpful in the first round of a playoff when there are multiple series going on. This saves you the trouble of having to get in and rewrite the code with new values each time. 

That way you can’t accidentally overlook changing certain values when you’re moving quickly and getting very weird results. 

Speaking from experience, here. 

Just a note: The prompt calls for the team with home field advantage first, which will typically – but not always – be the series favorite. For brevity in the rest of the playoff series simulator code, we’re calling this team the “favorite.”

The Technical Part

The prompts use the “readline” function to store the value in the variable. “Prompt” is the text the script will display. Using “paste0” isn’t strictly necessary, but it lets you call your variable to use in the prompt. If our team with home field is the New York Yankees, once we input “NYY”, the next prompt will be “Enter NYY’s home win probability as a decimal:”

 

 

The Core Simulator

The next section defines a vector of the order for home court (or field, or ice) as a 2-2-1-1-1 series like you’d find in the NBA or NHL. (Essentially, a vector is a like a list we can progress through.)

If you’re dealing with a series shorter than seven or a different home field progression, you will have to manually tweak this section for your proper home field format. 

For example, if you’re simming an MLB division series, you need to manually edit this part to read “c(favorite, favorite, underdog, underdog, favorite).”

Once we have that established, the playoff series simulator uses a “for loop” to work through the series one game at a time. It uses a random number to generate a win or loss for the home team, then loops through the next game. 

When one team reaches four wins, the simulation stops, and it records a series win or loss before the next iteration starts the process all over again.

The Technical Part

The function simulate_series takes two parameters, favorite_wins and underdog_wins. To get those parameters, we use a for loop over the entire home_court_order. 

First, the function checks to see if the game is a home game for the favorite. If it is, it uses runif(1) to generate a random number between zero and one. When that number is below the favorite’s home win probability, we assign the favorite a game win in the series. 

If that happens, it advances the count for favorite_wins by one. Otherwise, it assigns a win to the underdog, adding either ‘favorite,” 4”’ or ‘underdog,” 4”’ to the results object. 

If the game is in the part of the loop where the underdog is home, it repeats the process with the underdog home win probability instead. 

Lines 35 and 36 check to see if either the favorite or underdog have four wins in the series yet. 

(Remember to manually edit this section, too, so you have the right win condition if you’re simming a series that’s best-of-five or best-of-three.) 

If a team wins the series in less than the full number of games, it triggers the condition to break the loop and assigns a series score where either the favorite or underdog have four wins and the other team has zero to two, whichever is stored in the favorite_wins or underdog_wins variable at the time. 

If the loop runs all seven games, it goes to line 38 where the series is resolved and the winning team is at four wins, and the losing team is at three.

 

Head to Monte Carlo

“Iterations” defines how many times the simulation will run. The “replicate” function runs the simulate_series function over the number of iterations, storing the number of times the favorite and underdog each win. “Results” creates a vector of all iteration results. 

The Technical Part

The command “grepl” combs through the results vector looking for ‘favorite,” 4”’ (or 3, or 2 depending on your series win condition – make sure you update that number here if you’re doing a shorter series). 

When it finds one, it records a “1” for true. If the series in the vector doesn’t include ‘favorite, “ 4”’ it records a “0” for false.

The “mean” command creates a new vector of those results and calculates the probability based on the number of “1” values. 

If it finds 68,000 true results in 100,000 iterations, our favorite_prob would then be .68, or 68 percent. 

 

 

Display the Results

Still with us? Great. We’re almost there. Now we’ve got our two series win probabilities. Next we just spit-shine the results and deliver the goods. 

This section prints out the results of our playoff series simulator, both as probabilities and as overall series prices.

The technical part: The “cat” function prints text in the console. The “round” section turns the favorite or underdog probability into a percentage, and caps the decimal places at two. 

Now we tap in the probabilities-to-odds function and print the results of that conversion. The “\n” command just puts the text on a new line so it’s easier to read. 

 

Individual Series Outcomes

With all the heavy lifting done, we can add in a function to check series exact outcome prices. 

This just combs through our existing results to find specific outcomes, get the probabilities, and convert those to a price to display.

The Technical Part

The outcome_probs function creates a table out of the results vector so it can check how many times each series outcome occurred, and divides that by the number of total iterations. 

The for loop here just runs through that table and extracts the probabilities for each type of series (ie. Favorite 4, Underdog 3; Favorite 4, Underdog 2; etc.)

Then we use our odds conversion function again, and print out each series outcome price. 

With all of that in place, the final command runs the actual simulator.

Here’s what it looks like after you run through a sample series. We used the Yankees and Tigers with some basic fixed probabilities.

 

Gathering Data

The Game 1 series price will give us a solid jumping-off point for a home team favorite’s price, particularly if you use the Unabated Line as a source of true probability.

But you still need a win probability for the team that doesn’t have home field when the games are on their turf.

If you have a number in mind for what home field is worth, great. The Game 1 line will point you toward the neutral-field price and you can work out the price on the game when the series shifts to the next venue. 

In basketball, if you think home court is worth 2 points on the spread and the first game has the home favorite -4, you remove home field to get -2 on neutral. In games where the underdog is home, the two teams would be a pick’em. 

A -4 favorite converts in the Alt Lines Calculator to -157 using a 0-point alt line, and the Odds Converter tells us that’s a 61.1 percent implied win probability. And a game at pick’em is a 50 percent implied win probability.   

When Home Court Advantage Is Unknown

But if you don’t have a clue what home field is worth in the sport, you can still do a little detective work to get close. 

One way to approximate it would be to look at previous matchups between the teams during the regular season to see how books are changing their prices – but be careful to account for any other factors that may have affected that game’s lines like injuries or weather. 

If in their most recent matchup the home favorite was -6 and the previous matchup was at the underdog’s court where the favorite was -3, you could ballpark a home-court advantage of -1.5. This won’t be precise, but it gives you something to work with from the outset.

 

Mid-Series Simulation

If you remember the prompts we set up at the beginning, one of them asked how many games each team has won so far in the series. 

Books hang adjusted series prices and individual outcome prices as most series progress. You can stay on top of that by running a simulation after every game and seeing how the adjusted prices compare to yours. 

This is using our Yankees-Tigers example, if the Yankees had home field and were up 2-1 in a best-of-seven.

 

 

Sometimes you can catch a good price in situations like when a heavy series favorite loses the first game at home. It’s not uncommon to see series prices swing too far the other way and you end up getting a good number on a superior team. 

Put it in Practice

Before the start of the Mercury and Lynx series in the opening round of the WNBA playoffs, we made a couple of tweaks and ran a simulation.

The WNBA playoffs are best-of-three in the first round. The first step was to adjust home_court_order vector to just c(favorite, favorite, underdog). 

We also needed to change every series win condition from “4” to “2.” 

Once we did that, we used a home win probability of 66 percent for the Lynx and Phoenix’s probability as 55 percent.

 

Take it Further

This playoff series simulator is functional, but it’s just the opening bid. Get creative in how you adapt it. 

Do you think favorites up 3-0 should get a probability boost to sweep? Write a condition for it. If you don’t want to manually change the home field vector, you could create prompts to do it. You would have to build a list of different sports, series lengths, and home-field scenarios.

For an MLB series, you may want to address probable starters. You could make a function to account for who’s pitching. 

If you were simulating an NBA series, you could build in an injury probability function. Especially if you think a team had an injury-prone star player. Maybe you might assign a conditional penalty for any series where his injury probability kicks in.

Or maybe instead of working off of one strict set of probabilities, you could add a function to input multiple probabilities representing high, low and middle estimates and create a weighted blend of those probabilities to get your series price. 

Go nuts. It’s your code. 

Get the Playoff Series Simulator

To grab the complete code for the simulator, hit the Unabated Github

For more data science you can add to your arsenal, check out our ongoing Intro to Data Science series.

Have questions or want to talk to other bettors on your modeling journey? Come on over to the free Discord and hop into the discussion.

Copyright 2024 © Unabated Sports, Inc. All Rights Reserved.

DISCLAIMER:
This site is strictly for educational and informational purposes only and does not involve any real-money betting. If you or someone you know has a gambling problem and wants help, call 1-800-GAMBLER. This service is intended for adults aged 18 and over only.