Using Convolutional Neural Networks in a betting use-case won’t make you a millionaire – but it will certainly help you to lose (much) less in the Lottery.
The German state-run lottery (“Lotto 6 out of 49”) has a fundamental construction error. Consider for example horse bets: You know what your quota is for any given horse. If you bet on the favorite, your profit expectation is low. For a 1 Euro bet, you may only get 1,10 Euro in case your horse is the first to gallop through the finish line. This is because the winning probability is high. If you bet on an outsider, you may get 10 Euro return for a 1 Euro bet, but the probability of you winning is very low. The quotas are calculated so that the bet provider will always win. If the stake (the sum all players bet in a race) is, say 100000 Euro, the quotas are calculated so that the payout is generally about 80 percent of the stake.
The German Lotto works similarly – with one fundamental difference. The payout is half (yes, a very bad ratio for players) of the stake. BUT: The quotas are not published. If you play the same numbers as lots of other players, i.e., a “favorite” combination, your payout (given that you win) is very low and if you play a rare or “outsider” combination, your payout (again given that you win) is high. This does, in fact, make a huge difference. The lowest payout for six correct numbers (without super number) in the German Lotto history was 8644,41 Euro, and the highest was 4,1 million Euro. For three correct numbers you typically get between just under 10 and 15 Euro.
So, all you’d need to know is which numbers all the other players play – and you’d be well off. Unfortunately, these numbers are not published. There is, however, a way around it: the German Lotto Company does publish the stake for each class and the payout. And from this data, you can also extract the number of players in each round. This is not quite true, the number of players is here actually the number of betting slips, but that is what we want anyway. And, of course, you have the drawn numbers for each round.
With classical statistical analysis, you could easily calculate popular and unpopular numbers from that data. But it really doesn’t boil down to popular individual numbers, which you should avoid, like the “lucky” numbers 3 and 7 and the numbers that could appear in birth dates. It is the combinations that matter.
Figure 1: Lotto-Field (left) and its representation as an image (right), ready to be fed as input data into a Convolutional Neural Network (CNN). Each square symbolizes one pixel.
That is, where Neural Networks enter the stage, Convolutional Neural Networks (CNNs), to be precise. You may now think: “What? CNNs are good at recognizing patterns in images, for example in Object Detection. But how can they learn Integer combinations?” Well, the answer is quite simple. Just imagine your lotto field as an image, as shown in Figure 1. It has 7 pixels in width and height – and it is grayscale.
So, setting up the input data is quite easy, as the drawn numbers are readily available from the Lotto company from the first drawing on 9.10.1955. We used the data from the beginning up to 28.10.2023. Wednesday drawings started on 6.12.2000 and are also included (but can be excluded, if specified). Altogether there are 4747 drawings in the dataset.
Setting up the labels is a different matter. The Lotto company publishes the number of winners per class – and the total sum paid out per class. The term “number of winners” is, as already mentioned, not a correct description, as it should really be called the number of betting slips – but we use these terms here interchangeably. From that data, one can calculate the winning expectation. It is linearly correlated to the fraction of the total number of winners (or winning betting slips) with respect to the total number of players (or betting slips). We used this fraction as the label for each drawing. This is, in a nutshell, how the labels are set up. The full picture is a bit more complicated, but we will not bother you with the details [1] here.
Figure 2: Number of players (betting slips) since 1955, calculated by dividing the total stake by the price of a betting slip.
We are, in principle, ready to go – but let’s do some data exploration first. This should be standard in any data science project – but in this case, it also gives some interesting insights. Let’s first of all look at the number of players over time. You can see the development in Figure 2, for Saturday drawings.
Notice the huge drop in the number of players on 4.7.1981. Only about half the number of players took part in the lottery compared to the week before (27.6.1981). The reason is that the Lotto company doubled the prices from 50 Pfennig (0.255646 Euro) to 1 Mark (0.511292 Euro) in that drawing. From a business perspective, this was a disaster. It turned out that the price elasticity is -1, calculated with the mean value method), indicating that Lotto-Players were not prepared to pay more for a bet. This is not surprising. With one betting slip for one Mark, one could win exactly the same as with two betting slips for 50 Pfennig (0.5 Marks) the week before. So why should anyone put more bets?
In addition, the huge outliers after 1990 are noticeable – they are either due to extremely high Jackpots or special drawings (mainly due to the anniversary years of the Lotto company).
The distribution of the fraction of winners per drawing is also interesting. It is given in Figure 3 for 8 classes only, missing out the class “two correct numbers and ‘Superzahl’ (additional number)”.
Figure 3: Frequency of the number of winners as a fraction with respect to the total number of players (blue values) and expected frequency of the number of winners as a fraction with respect to the total number of plyers (red line). Winning class 9 is omitted (i.e., two correct numbers and Superzahl).
At first sight, this distribution looks sort of okay. However, it is slightly skewed, i.e., the bulk of the fraction of winners is slightly to the left of the mean. This is not quite what we expect, so let’s investigate this a bit further.
The expected probabilities of winning in each of the 8 classes is given by the Lotto company in [2]. They sum up to 0.0186338. The expected sum of the probabilities of winning in these classes can be calculated as 0.0186239. That is pretty close – but what about the fractions to the left of the mean? Why are they more frequent than expected?
We will leave you to think about this for now – and suppose an answer at the end (*).
But let’s now turn to the model. Remember, our “image”-size is 7 x 7 pixels, which is very tiny. It would of course not make sense to apply transfer learning with a pretrained model, i.e., on Imagenet, especially as our data looks nothing like the images of the objects in that dataset. Remember that the complexity of the data should reflect the complexity of the model used to deal with the data. A Resnet50 has about 27 million parameters.
For this use-case, we built our own toy-model, with just under 30000 parameters. The summary of the architecture is given in Figure 4.
Figure 4: Model architecture summary – the Input layer is omitted. Input shapes are 7x7x1.
As you can see, there are four consecutive Convolutional Layers with a Leaky Relu activation followed by a Dropout Layer. We missed out the typical Pooling Layers, as the size of the input data would diminish too quickly otherwise. We have optimized the hyperparameters and the model is trained with a learning rate of 0.0005, a batch size of 800, a dropout rate of 0.2, and the alpha-value for the Leaky Relu activation is also 0.2.
The dataset was split into a training- and test-set in the ratio 80:20 and the data was not randomly shuffled before the train-test split, as we wanted to evaluate on the most recent drawings only. The reason for this is that the popularity of numbers might vary over time. 9 and 11 might have come out of fashion after 2001 – and 42, despite not a number that could depict a birth calendar day or birth month, might be more popular since Siri uses it as the answer to the question: “What is the meaning of life?”.
Figure 5: Predicted ratios of winners to the number of players (blue) and true values of the same ratios (red) for the last two years (104 drawings) through 28.10.2023.
The labels, i.e., the ratio of the number of winners to the number of players, were scaled for the model training, using a Gaussian normalization. In the evaluation, we have rescaled these values for comparability with the true labels – and the result is shown in Figure 5.
The model looks quite good in predicting the ratio of the number of winners to the number of players based on the drawn numbers. A numerical evaluation is given in the table below:
Model | Mean squared error | Mean absolute error |
Exp_003 | 1.386215267975257e-06 | 0.0009395752372314235 |
Table 1: MSE and MAE for the model on the test set. These values are, unlike the ones shown in Figure 5 where only the last two years are displayed for reasons of readability, based on the entire test-set, which consists of 846 drawings.
But what is all that good for? Why should anyone be interested in a prediction of the fraction of winners in a lottery drawing? Well, the answer is quite simple: You can generate better Lotto numbers with this model. To simulate that, we took a brute force approach and generated all possible combinations of 6 out of 49 numbers, altogether 13.983.816. We then got the predictions and sorted the input numbers according to the ratio of the number of players to the number of winners. Finally, we separated the 50.000 combinations with the lowest ratio of players to winners – and saved them.
We used all 4747 drawings from 9. 10. 1955 through 28. 10. 2023, that is all Saturday and all Wednesday drawings in that period – and in each drawing we played 2000 slips, to get rid of statistical outliers which would happen if we had a high win in a scenario where we only played a few games. The downside of this massive approach is runtime – the simulation takes more than 3,5 hours for one player – but we mimicked three players, to benchmark them against each other.
Player 1 used a randomly chosen number, Player two randomly chose from the 50000 best combinations that our model predicted, and Player 3 only chose from the numbers between 1 and 31, as they represent calendar days and partly months – and we assumed that lots of people would play their or their partners birthdays.
All the Lotto rule changes over the years were implemented and considered – and the additional number (Zusatzzahl), which does not exist anymore, and the super number (Superzahl) were randomly generated for each player. In the graphic below, the total sums that each player won up to a certain drawing are shown. We also calculated the winning expectation for each player, without considering any inflation effects. The numbers, given in the legend of the plot for each player are simply the total sum won divided by the total amount spent on Lotto slips.
Figure 6: Accumulated wins for all simulated players over all drawings. The winning expectation is given in the legend.
As you can see, the player using our model is doing considerably better than the other two players. The theoretical expectation value is 0.52, as the Lotto company pays out slightly over half of the revenue of each drawing to the winners.
Interestingly, the third player, only using calendar day numbers, is not doing worse than the one who just played random numbers.
We also did several simulations with fewer games per drawing, i.e., 10 or 100 – and the results are much more erratic, as high wins by chance hugely influence the results. To visualize this, we also plotted the accumulated expectation values that each player achieved for a specific and relatively small number of games per drawing in Figure 7 on the next page.
As you can see there, the expected win shoots up at the very beginning for several players, i.e., when the drawings start. However, this is a completely arbitrary effect. Imagine you play only one game, and you win 12 Euros. Your winning expectation is then 10. The second interesting observation is that the expected win is well under the theoretical average expectation. This is because one needs high wins to achieve the expected win of 0.52 percent – but they are very rare. And this says a lot about the nature of the lottery.
Figure 7: Winning expectation, normalized to 1 Euro, for all players and 10 games per drawing (left) and 100 games per drawing (right).
This shows that Lotto is basically a very unsocial game. It takes the money from the vast majority of losers or “poor” winners, who get a few Euros every now and then to keep them playing – and gives it to the “rich” winners, who cash in millions. It resembles the American Dream, i.e., the possibility that a dishwasher can make his or her way right up to top positions in business or politics and earn millions. It happens – and it keeps people motivated but it happens extremely rarely. Most Americans stay put where they are.
But back to our model. Yes, the erratic nature of the results, even when 2000 games are played per drawing, is a nuisance. But there is a way to make the simulation of the random player and the player using our model much more comparable. Can you think how?
(Yes, give it a try!!!)
Okay, here is what we did. We changed nothing for the player using our model. He or she takes a random combination out of the 50000 best sets of numbers. But the rules for the random player are quite different, as he plays exactly the same numbers as the one using our model. Only he does not get the profit for his winning class from the current drawing – but from the last one, i.e., one week earlier. The result is shown in Figure 8.
Figure 8: Accumulated wins for two simulated players over all drawings, one playing the optimized combinations (red line), using the model. The other playing the same numbers and hence winning exactly the same classes /blue line) – but receiving the class-profits from the previous week. The winning expectations, normalized to 1 Euro, are given in the legend.
In conclusion, we have shown that you can improve your Lotto numbers with a Convolutional Neural Network. Which is a neat thing. However, we never managed to get a combination that boosted our expected win to over 1. Otherwise, we could now stop working and play Lotto instead. In our best simulation, we achieved 0.945. However, if you do play Lotto, you can reduce your expected loss from 50 percent of your stake to about 10 – 30 percent, possibly even less.
In other words: If you play 20 Euro a week, you can “win” (by not losing) about 25 to 30 Euro a month.
(*) We still owe you our presumed explanation about the slight skew with respect to the number of winners per drawing.
The probabilities for winning in each class are calculated based on getting a certain class right, i.e., the probability of getting three numbers out of the six you bet on. However, if you get four numbers right, you automatically have, in addition, three times three numbers right. But all wins do not count – you always only get the payout for your highest winning class. This rule of the lottery leads to the slight skew in the number of winners, i.e., there are slightly less winners than one would expect.
[1] The complications in setting up the labels are that the Lotto-Rules changed several times over the years, new classes were added, others, like the class six correct numbers with “Zusatzzahl” (additional number) were dismissed. A jackpot was introduced, which was distributed among all winners once it had accumulated to more than 45 million Euro. This forced distribution has recently been given up. For the calculation of the winning expectation, the price changes must be known and considered – and, in theory, the winning expectation must be considered for each class separately. We have considered all these complications and implemented the data preparation accordingly – without getting into the nitty-gritties here.
[2] https://www.lotto.de/lotto-6aus49/info/gewinnwahrscheinlichkeit (accessed 20.11.2023)
0 Kommentare