Alfredo García (left) and Andres Barge (right) are partners of sportyy. With me, they have developed a Tennis forecasting mathematical model. Both are university professors, PH. D. in Economics and experts in quantitative models. The objective of this interview is to shed light on mathematical prediction models for sports betting. I would like to make it clear that, due to my participation in sportyy, this interview cannot be considered to be fully independent.

**What was your first contact with the Sports Betting world?**

**AB:** I’ve been betting since I was young. Even before there was the Internet I usually bet with my friends. My first contact with online betting was in 2004, when I saw some adds of several online bookmakers. Soon afterwards I discovered Betfair and I started taking it more seriously.

**AG:** I also bet since I was young. Before the online sports betting explosion I used to “bet” (because basically that was what I did) in the stock exchange. I took intra-day positions trying to look for big surprising movements. It was something similar to betting on superdogs. Later on, as I was a big basketball fan, my beginnings in sports betting were linked to the great Spanish Basketball generation.

**What is your method for finding value odds and selecting your picks?**

**AB:** The starting point is that the human being has biases in its perception. This is well docummented in scientific literature. Therefore, the subjective estimation of probabilities made by punters is obviously not necessarily correct. Our idea is to find statistical models to forecast the likelihood of each event on the basis of objective data. Then we can compare our probabilities with the probabilities implied in the odds.

**AG**: If you have only seen a tennis player play once and his performance was unbeleivable, will you be able to isolate your mind and estimate correctly his chances to win the next match? or will you be influenced by that great match? Moreover, will you be able to analyse a match properly after 10 consecutive lost bets? Models have no feelings and this is what we are looking for when selecting our picks.

**Can you tell us more about these statistical models?**

**AB &AG**: Statistical models feed on past data. They try to answer the following question: What usually happens in similar situations? Obviously, there will never be 2 identical situations, but there can be very similar ones. The models used at sportyy take into account many variables such as the ranking, the state of form, the typical performance when the player is favourite or dog, the surfaces, performacance against lefty players or big servers or the standard patterns in the different tournaments and rounds. An important part is the calculation of our own ranking, as the ATP ranking is far from being a good estimator.

**Why can be trust in statistical models?**

**AB**: The bigger advantage is that they can be tested *a posteriori*, what cannot be done estimating subjective probabilities. That is, we can see what the model would have said for any of the matches that have been already played, using obviously only the information that existed before the match.

**AG**: There is no getting away from this fact: computers have an unbelievable capacity to store and analyze a huge volume of data. I have a friend who says to me “I don’t think playing at home is a relevant variable in tennis“. And I answer him that it is relevant in some tournaments, for some nationalities and when the difference between the players isn’t very high. In order to make this statement we have complex models that analyse a database with tens of thousands of matches. A variable like this, isolated, doesn’t generate a 10% yield but the acummulation of multiples variables can do it. Obviously this isn’t enough if the model-estimated probabilities are better than the market. We also have to beat the bookies overround.

**What are the advantages and disadvantages of using a mathematical model to select your picks?**

**AB & AG**: First of all, our mathematical model doesn’t only select picks, it does much more. It calculates the probability of victory for both players in a match. Then the model 1) provides our picks 2) find the bookie (only serious ones…) with the best odds 3) optimize the stake to bet as a % of bankroll and 4) determines the minimum odds at which the bet has value for us. All of this is really difficult to optimize for someone who doesn’t use a powerful statistical model. Another advantage is the ability to asses objectively (with a number) the influence of certain variables and their independecne.

On the negative side, it’s very difficult for the model to incorporate info such as the comments of the players, their feelings about the match, injuries… Some of this info that can be relevant in some matches is lost. This is the greatest disadvantage, we cannot gather subjective information.

**What is the grade of uncertainty about your expected results?**

**AB**: It’s important to distinguish between trend and variance. The key for a positive trend to exist is that observed past patterns repeat in future. That is, the perceptions biases must be relatively stable. If we analyze a wide range of years it looks like these biases are quite stable. On the other hand, in the sports betting world variance is very high, as you can read in this analysis. Here you can see that the same model can generate very different results depending on the luck factor. The advantage of statistical models is that we can estimate the chances of a determined maximum drawdown.

**AG**: Statistical models have two main sources of uncertainty. First is related to randomness. A tipster can be brilliant but bad luck can put him out of the business. This uncertainty can be reduced as the number of bets grow. The random factor in 5,000 bets is much smaller thant the one in 100 bets. For that reason in sportyy we always say that the investment in “picks” is a long term investment. The second source is called “parametric uncertainty”. The estimation of the effect of one variable has nothing to do with its true value. For instance, if we estimate that Nadal improves 10 percentage points for his chances of winning on clay, we can ask ourselves … is that its true value? is that value constant all over the sample? The size and quality of the database reduces significantly the parametric uncertainty but you cannot make it dissapear.

**The results of your model in your first 3 months are far from being satisfactory. At January 7th 2014, a -2.2% yield in 330 bets in the Premium PRO version and -7.5% in 145 picks in the Premium version. What is the reason?**

**AB**: It’s true. The results since we launched have been quite worse than we expected. The difference between both products is circumstantial. The Premium version is less volatile than the Premium PRO. I will talk about the Premium PRO subscription because it contains all the picks of our model. In bad runs it will lose less and in good runs it will also win less. Although results aren’t good up to date, we are convinced that 300 bets are few to see a trend. We have planned to launch around 1100-1200 picks per year (PRO) so we haven’t reached 30% of all yearly picks yet. Our average stake is 1.54% and our average odds is 2.59. The difference between winning and losing one bet can be around 4 points. This means in very few bets we can be near breakeven. On the other hand, if we analyse the figures of the last months we can see that the benefit in the dogs picks has been in line with what we expected. It has been the favorites the ones which have made the loss. We really think we will recover the losses and we don’t except this abnormal behaviour of the favorites will continue for much longer.

**AG**: The key has been the favourites picks. It could be a trend change in the market behaviour for pricing favorites or just random. If it’s the first cause the model will adjust automatically. When something new happens the model goes through a short bad run but it learns quickly and incorporates that info. If it’s just random, we will recover sooner or later altough we wil probably lose some favorites opportunities in future.

** ****Have you made improvements in the last couple of months? Does the model evolve over time?**

**AB & AG:** Yes to both. The model evolves over time in two ways. On one hand we keep on adding variables and improving our methodology. As an example, we have recently acquired a more powerful database. The quantity and quality of data reduces the uncertainty and we hope this translate into better results. On the other hand, the model adapts to the latest results automatically. That is, tha bad results of favorites in the most recent months will influence the model picks in the future.

** Is there any risk that factors analysed in the past that are the basis of your model won’t repeat again in future?**

**AB & AG: **That risk always exist. We just have to take it into account. It’s like driving a car; the risk of an accident is there and we cannot forget it. But we can calibrate it somehow. We have tested the stability of the factors of our model in annual and biannual periods and it holds on. For example, the market has priced excesively the uncertainty in the Pre-Austraian Open tournaments. does it mean it will be like this for ever? Of course not. However, as we explained before, when something new happens the model uses to go through a short bad period until it learns and incorporates it to improve its selections.

**Which is your staking method? Any advice for the punters?**

**AB & AG**: In order to decide the stake (how much to bet) we use the so-called Kelly method. The stake determined by the Kelly method depends on the odds (the higher the odds, and therefore the risk, the lower the stake and vice versa) and the value of the bet, also called “overlay”. The overlay compares the probability calculated with our model with the one implicit in the odds. The Kelly method has been mathematically proved and is very useful, but its pure version, usually called Full Kelly, carries too much risk as it advises bankroll percentages that may not correlate with bettor’s actual utility preferences. In Sportyy we apply a modified Kelly method. We don’t have any problem in lowering our potential profits if we minimize the impact of bad runs as well as the probability of bankruptcy. Many tipsters don’t want to speak about this but traditional bankroll management systems are quite sensible to bad streaks. As we know, randomness is a generator of very good but also very bad runs. Don’t trust tipsters whose average suggested stakes are around or higher than 5% of your bankroll. They can be great sports experts but at the same time can also drive you to bankruptcy.

#Matthew

The picks currently come with the best odds at that time in a serious bookie, the minimum odds at which the model thinks there is still value and the optimal stake. Some days we send 10 picks and there is no time to explain each of them. However, we upload information and explanations in our section TennisLab periodically.

We mainly use MATLAB and Stata

Thanks for the insight. Given the highly statistical angle to your selections, do your picks still come with an explanation or reasonings for them or is it just pick, stake and odds?

Hi,

Nice article. Please tell me what kind of software do you use for your model?

Kind regards,