
Creating predictive online sports betting models
Cipher Sports co-founder and chief technology officer Darryl Woodford calls for more transparency from sportsbook operators to increase awareness and understanding

Within the sports betting world, a complex dance of numbers and predictions is constantly shaping the odds bettors engage with. While seasoned gamblers and casual enthusiasts make assessments that something is more or less likely to happen than the odds say, it’s safe to assume few of us grasp the intricate web of predictive analytics models that underpin these probabilities.
And that presents a problem. The casual bettor doesn’t have access to the data or the professional predictive models, and only the most serious ones will make their own models. For the betting industry, this opaqueness about how predictive models shape the odds could be detrimental to the public perception of sports betting.
This is why I believe it is worth peeling back the curtain in order to reveal how predictive models can be built, whether as a beginner or a major sportsbook. Only through this kind of transparency can bettors gain a strong grasp on what they are up against.
Predictive models: From spreadsheets to artificial intelligence
When talking about predictive models, we should be clear about what we mean. While each predictive model is trying to do the same thing — calculating an assessment of a future event — there can be varying levels of complexity to how this is achieved.
The first level that could be called a predictive model is the Microsoft Excel spreadsheet. These models are typically used by bettors that have a little more data to work with than the average fan, such as basic team or player-level data. They may pull this data into a spreadsheet, run a series of formulae to create permutations of the data, and use the predicted team rating to measure the difference between the two teams. While such models are quite basic, their simplicity is what makes them a popular choice for bettors. Andrew Max, an author who has published two volumes on building statistical sports models in Excel, deserves much credit for popularizing this approach.
Next, you’ve got more sophisticated models developed through programming languages like R and Python. With this approach, you’re building a computer script that synthesizes all the data and provides predictions on a game for you. This requires some basic knowledge of how to code, but you’re mostly just doing the same calculations that would be done in a spreadsheet. However, since the calculations are fully automated, it’s easier to scale larger volumes of data.
Machine learning models
We now step into the realm of traditional machine learning models. This approach involves building models, frequently regression or random forest, and relying on the machine to dissect extensive data sets and identify which variables are most predictive. Whereas before with spreadsheet and computer modeling techniques you had to determine which data is most important, now the machine is deciding for you.
Finally, we arrive at where I am spending a lot of time lately — reinforcement learning models. Unlike machine learning models, reinforcement learning includes data about odds and is focused on a decision rather than a simulation. This allows the reinforcement learning model to identify betting opportunities where the market is undervaluing aspects or statistics. In other words, rather than trying to predict the whole game, the model is narrowing in on where a bettor may find an edge. Currently, reinforcement learning models are so new that most bookmakers haven’t yet grasped their full implications. A view I expect will change as these models become more prevalent.
There’s no question bookmakers and their data providers have a substantial advantage in determining the odds of an upcoming sports event. Casual bettors should understand this before placing a wager. As for the more serious bettors, they can try to level the playing field by using more advanced techniques where they may find an edge, particularly in smaller markets. How far they wish to go in that endeavor is entirely up to them.

Dr Darryl Woodford PhD is the chief technology officer at sports analytics company Cipher Sports Technology Group. He is a data scientist and Python developer, with a particular interest in sports betting.