Analyse some of the statistics of the National Lottery (especially sales information), and see whether there are any trends, similarities, patterns, or correlations in the data.

The National Lottery

Introduction

Aims

The Aims of this Project are:

To analyse some of the statistics of the National Lottery (especially sales information), and to see whether there are any trends, similarities, patterns, or correlations in the data.
To check whether some of the statistical variables surrounding the National Lottery fit a known statistical distribution.

How the Statistics Were Collected

The statistics were collected from the Internet. I originally found complete information on the first 187 lottery draws, but decided that this was too many to be easily manageable. Therefore I decided to randomly select 50 draws from this set of information. I used my calculator to generate random numbers from 1 to 187 (without generating previous numbers), and used this to select a final table of data which contained 50 randomly selected lottery draws, which I then sorted by draw number. The data will be reproduced in portions throughout the project. No difficulties in collecting data were encountered.

How the Lottery Works

People who play the lottery pick six different numbers on a ticket from the range 1 to 49. Any person can play the lottery as many times as they like, but each ticket sold counts separately in the final statistics.

Six numbers are selected using a machine which selects six ‘random’ balls from a selection of balls marked 1 to 49. This is done without replacement. The machine also selects a ‘bonus ball’. There are three different machines, and there are ten separate ball sets. The machine and ball set that are used for each lottery draw are selected by two separate people. Camelot, the company that run the lottery, claims this ensures the randomness of the lottery draw.

After the balls are selected by the machine, people can collect their prizes if they have matched enough balls. Possible valid matches (in descending order) are:

All of the six main numbers matched (jackpot).
Five of the numbers plus the bonus number matched.
5 numbers matched (not including the bonus ball).
4 numbers matched (not including the bonus ball).
3 numbers matched (not including the bonus ball).

Analysis of the Lottery Statistics

Sales Statistics

The first attempt I made at finding patterns in the National Lottery was to look at the statistics for sales. This is because these are fairly easy to predict. A table summarising the sales statistics for all of the randomly chosen 50 draws is in Table ...

This is a preview of the whole essay

All of the six main numbers matched (jackpot).
Five of the numbers plus the bonus number matched.
5 numbers matched (not including the bonus ball).
4 numbers matched (not including the bonus ball).
3 numbers matched (not including the bonus ball).

Analysis of the Lottery Statistics

Sales Statistics

There ought to be a correlation between ticket sales and the total number of prize winners (because the more tickets are bought, the more people should win prizes). To check whether this is the case, I carried out the following correlation calculation (All calculated figures in this project, unless stated otherwise, are quoted to 3 decimal places. As many significant figures as possible are used in intermediate calculations.):

H0: There is no correlation between ticket sales and the total number of prize winners (= 0)

H1: There is a correlation between ticket sales and the total number of prize winners ( 0).

Sxx = 1.443 1016

Syy = 5.680 1012

Sxy = 2.445 1014

r = = 0.854

(r is the product-moment correlation coefficient)

Using critical value tables, the critical value for a sample size of 50 at a significance level of 5% (a typical significance level used) is 0.2353,

0.854 0.2353

The correlation coefficient is in excess of the required critical value, I reject H0 and I can conclude that there is a significant positive correlation between ticket sales and the total number of prize winners. The correlation is more obvious when one looks at a scatter diagram of the data, together with a best-fit line. The calculation of the best fit line is below and the scatter diagram is in Figure ?.

y = a + x

= = 0.017

a = - = 1.034 106 – (0.017 5.671 107) = 7.288 104

y = 7.288 104 + 0.017x

The best-fit line has been drawn on the scatter diagram in Figure ? and it clearly shows the pattern – as ticket sales increase, in general, so do the number of prize winners

Because the total prize fund depends entirely on the number of sales (as shown in Figure ?, 45% of the revenue generated by people playing the National Lottery goes towards the prize fund), the correlation coefficient for the number of sales and the prize fund should be 1 (if two variables are in direct proportion, the correlation coefficient between them should be 1).

Thus, I checked the correlation coefficient between the number of sales and the prize fund:

H0: The number of sales and the prize fund are in direct proportion (r = 1)

H1: The number of sales and the prize fund are not in direct proportion (r 0).

Sxx = 1.443 1016

Syy = 2.923 1015

Sxy = 6.495 1015

r = = 1

r = 1, I accept H0: There is a direct proportion relationship between ticket sales and total prize fund, as I expected. This is shown on as a scatter diagram in Figure ?, and the diagram in Figure ? is overlaid by a best-fit line, which goes through all the points.

Also if there is a directly proportional relationship between these two factors, the correlation coefficient of total prize winners and the total prize fund should be the same as that between total sales and total prize winners (r = 0.854). I tested this:

H0: There is no correlation between the number of total prize winners and the total prize fund (= 0)

H1: There is a correlation between the number of total prize winners and the total prize fund ( 0).

Sxx = 2.923 1015

Syy = 5.680 1012

Sxy = 1.100 1014

r = = 0.854

I reject H0 – there is a correlation between the number of total prize winners and the total prize fund, and the correlation coefficient is the same as before.

The Numbers Chosen by the Lottery Machine

Another test I performed was to see whether there was any correlation between the number chosen by the lottery machine for each of the balls (1st ball drawn, 2nd ball drawn, etc., up to the 6th ball chosen), and the total number of Lottery winners that week. One might expect some correlation, if, for example, people in general tend to pick lower numbers. The data for the balls chosen, and the total number of winners each week, is presented in Table ?.

I found the product-moment correlation coefficient between each ball drawn (1st drawn, 2nd drawn, and so on up to the 6th ball drawn) and the total number of prize winners. The calculation is not given, but it was performed in the same way as before. The resulting correlation coefficients are given in Table ?:

Next I tested the correlation coefficients I had found:

H0: There is no correlation between each ball drawn and the total number of prize winners.

H1: There is a correlation between each ball drawn and total number of prize winners.

The critical values for n = 50, at the 5% level are 0.2353 (found from tables).

See Table ?

Thus there appears to be a correlation between the first ball drawn every week and the total number of prize winners, although the correlation is only just valid, because the value of the correlation coefficient is very close to the critical value.

The correlation is negative, which implies that the lower the value of the ball chosen as the 1st ball, the higher the number of prize winners that week. This may seem strange, but it can be explained.

One simple explanation, which may or may not be the case, is to do with birthdays: if people pick their birthday dates for lottery numbers, as many do, then low numbers (below 31) are more often picked. When these numbers are selected by the lottery machine, more people will win prizes.

Ball Set and Machine Used

After checking whether the number of prize winners was influenced by the balls drawn, I checked whether there was a correlation between the ball set used and the number of prize winners, as well as the machine used and the number of prize winners. The data for the ball sets, the machine used and the total prize winners for each week are presented in Table ?.

Again, the working is not given for the calculation of the product=moment correlation coefficient, but the coefficients are given in Table ?:

As can be seen, these coefficients are almost 0, and certainly do not indicate any relationship between the factors involved.

Is the Lottery Random?

Theoretically, the balls chosen by the National Lottery machine should follow a uniform distribution. This is what one would expect if the system used is truly random. One can test how well the balls chosen fit a uniform distribution. First I counted the total number of times each ball occurred in my sample of 50 draws. The data is presented in Table ? in a stem and leaf diagram.

Testing the numbers picked in the 50 draws at the 5% level,

H0: The numbers fit a uniform distribution (they are random).

H1: The numbers do not fit a uniform distribution (they are not random).

Expected number of times each ball is picked =50 = 6.122

See Table ?

= 29.0

Degrees of Freedom = y = 49 – 1 = 48

(5%) = 65.17

29.0 65.17

I do not reject H0: there is no evidence to suggest the selection of the first ball is not random.

Conclusions

The Conclusions drawn from this project are summarised below:

There is a very significant correlation between the number of lottery winners and the number of ticket sales, as would be expected.
The total prize fund is in direct proportion to ticket sales, as claimed by Camelot, the lottery operators.
There is a negative correlation between the first ball picked and the total number of prize winners. The pattern suggests picking high numbers would be a good idea to maximise potential jackpot earnings.
There is no correlation between the ball set used and the total number of prize winners.
There is no correlation between the machine used and the total number of prize winners.
There is no evidence that the selection of the lottery balls is not random.

Analyse some of the statistics of the National Lottery (especially sales information), and see whether there are any trends, similarities, patterns, or correlations in the data.

This is a preview of the whole essay

Document Details

Related Essays

Statistics Coursework - Bivariate Data.

Bivariate Data - The aim of this coursework is to discover whether there is...

Statistics - My aim is to investigate whether it is possible to gain inform...

house prices and sales