I have chosen to base my project on football statistics because they are both readily available and interesting enough for deep analysis. As a starting point I decided to look at the generally accepted theory of 'Home Advantage'.

Authors Avatar

Statistics Coursework – Football

Introduction

I have chosen to base my project on football statistics because they are both readily available and interesting enough for deep analysis. As a starting point I decided to look at the generally accepted theory of ‘Home Advantage’.

Home advantage, or the tendency for the home team to do better than they would away, could have several causes. It could be partly psychological – the home team would almost always have the majority of the crowd behind them, cheering them on. It could also be to do with the condition of the pitch – Premiership teams sometimes find it hard to play on muddy, waterlogged pitches of some lower-division teams.

Another factor is the attitudes of referees and officials. Because they are intimidated by the home crowd they often give decisions in favour of the home team, meaning teams may also have a worse disciplinary record when playing away.

Hypotheses:

  1. Teams have a worse disciplinary record away than at home
  2. Better attended teams have a greater home advantage
  3. More successful teams have a better disciplinary record

Collecting Data

I found that football statistics were easy to find on the internet. I obtained mine from two main sites:

http://soccer-stats.football365.com

http://www.bettingzone.co.uk

There is a very small risk that some of the data I collected could be incorrect. However, I have found alternate sites for the Premiership statistics (such as www.4thegame.com) which gave the same results. I also think that a betting site must give accurate statistics because they are such an important part of gambling

Using Software

I chose to input my data into Microsoft Excel because it makes it much quicker and easier to manipulate the data.


Hypothesis 1 – Teams have a worse disciplinary record away than at home

Discipline ‘points’ system

On the internet I was able to find out the numbers of red and yellow cards for each team at home and away. However, in order to give an overall impression of how good or bad the team’s discipline was I needed to turn these two pieces of data into one measurement. I decided to use the points system (as on www.4thegame.com). Under this system a yellow card counts for one point whereas a red card is more severe and counts for three.

To make this easier to calculate I used formulae in Excel:

Because some divisions have different numbers of teams than others, some teams played more games than others. This means their players had slightly more opportunities to get booked or sent off, so their points totals might be higher. To correct for this I divided the points scores by the number of games each team had to play to give a ‘Disciplinary Points Per Game’ score. This can then be compared to any other team in any division.

To give a measure of how much better or worse the team’s disciplinary record is away and at home I decided to divide the away points per game score by the home. I subtracted one from this and expressed it as a percentage. This gives a positive percentage if the team has a worse disciplinary record away and a negative one if it is worse at home.

Pilot Study

In order to find out how well my data would support my hypothesis about teams having a worse disciplinary record away than at home I made a bar chart using Excel to show the difference between disciplinary points per game away and at home.

As you can see most teams have a considerably worse disciplinary record away than at home, as shown by the taller red bars. For this bar chart I simply ranked the teams in the Premiership and the First Division from the top of the Premiership (1) to the bottom of Division 1 (44). The names of these teams can be found in the appendix at the back.

Stratified random sampling

In order to better represent football at other levels of the game I also collected data for lower divisions (Division 2 and Division 3). However this gave me far too much data – a total of 92 teams – to perform statistical tests such as the Wilcoxon Signed Rank Test. In order to cut down on this I decided to use random sampling to lower the number of teams involved.

However, if I just randomly selected teams from all of the divisions put together I might over-represent some divisions over others, affecting the results. To make this fairer I decided to use stratified random sampling, with the different divisions as the strata. This way I was sure to get proportionate numbers of teams from each division.

Join now!

I chose to take 25% of the teams in each division, to give me 23 sets of data – a much more manageable figure! I chose the teams by writing the numbers of the teams in each division e.g. 1-24 on small pieces of paper. I folded these up, shuffled them and picked them at random until I had the right number.

Once I had chosen the teams I put them in a new spreadsheet. I produced another bar chart similar to the one I had produced for the preliminary test. This illustrates how well my randomly sampled ...

This is a preview of the whole essay