For my sample I will use the method Random Quota. I am going to use this because I want my sample to be unbiased and totally random. I will take 30 samples from each group of players; attackers, midfielders and defenders. There are other sample methods I could have used:
Systematic Sampling: Taking data at regular intervals, such as every 5 or 3 players. I did not use this because this is not entirely random because the players are listed in order by the clubs they play for, so this would mean that there would be a player from every club, and this is not random enough..
Stratified Sampling: The population is divided into strata or groups. A sample is taken from each group, and the number of samples taken depends on the proportion in each strata which is replicated in the sample. I did not use this method because I wanted the same number of samples from each group.
Quota: Choosing a sample by a specific thing. For example age, sex. I used this by putting the players into groups of attackers, midfielders and defender. I used this because this helped me to organise the data effectively.
On the next couple of pages I have included the data I have used for this hypothesis. I have also included the cumulative frequency graph and the box plots that I drew for this hypothesis.
Table for cumulative frequency:
Attackers:
Midfielders:
Defenders:
Interpretation:
From my box plots I can see the following:
Attackers: There is a negative skew, which means that there are a higher number of high scores than low scores. It also has the highest median which means that the points are generally higher than midfielders and defenders. Attackers have the biggest range which means that the points are less consistent and that there is a wide range of high and low scores. However this means that attackers are unpredictable as to whether they will score very high, or very low.
Midfielders: The median is in the middle of the box, which makes this box plot symmetrical, which means that there are as many high points as low points. The median is the lowest which means that there are more low points than high points.
Defenders: This is a positive skew which means that there are more lower scores than high scores. Defenders have the lowest range which means that the scores are all consistently low.
From my cumulative frequency graph I can see that attackers have more points than defenders and midfielders. The curve finishes further along the x axis, which means that attackers have more points.
I also found out the standard deviation for Attackers, Midfielders and Defenders to see if there were any outliers, and to see how many players had points close to the mean in their group. Standard deviation is a measure of spread, it shows us how consistent the data is, and this will help us to see which group of players score points consistently well.
To work out the standard deviation of a group I first had to find out the mean. To do this I used a frequency table similar to the one I drew earlier for find out cumulative frequency.
1. To find the mean from the table I divided the total of the frequency with the total of frequency multiplied by midpoint. The formula I used to represent that is this:
2. The mean is then represented by this:
3. You then find the deviation from the mean, which is symbolised like this:
4. Then you square the deviations to get rid of any negatives:
5. Then find the sum of these values:
6. Then divide by the number of values:
7. Then finally square root to get back to the same units. This is the formula for standard deviation:
If the data you are working with is in a frequency table, the n value is replaced by:
This gives the formula:
Using this formula and step by step method I can find the standard deviation for Attackers, Midfielders and Defenders.
This is my working out:
I added the standard deviation onto the mean twice, and substituted it from the mean twice to find a range to detect outliers. These are the ranges I used:
From my standard deviation graph I can see that Attackers has the largest range and that Midfielders and Defenders have a very similar range of standard deviation.
Attackers: 1 outlier – Shearer with 261 points.
Midfielders: 1 outlier - Piers with 196 points.
Defenders: 2 outliers - S Campbell with 179 points and Hyypia with 173 points.
These players are most probably outliers because they play at a continuously high level. The Standard Deviation shows us that Defenders and Midfielders are the most consistently scoring, but the Attackers score the most.
The problem with using the mean as average is if you have any outliers than the mean is raised or lowered because of these. A good alternative would be the median because this is not affected by outliers.
In conclusion I can see that attackers score more points than midfielders and attackers. Both my cumulative frequency graph and my box plots clearly show this. My hypothesis was correct and I have proven this.
Hypothesis 2:
I think that 7+ Ratings will effect the number of points more than goals or clean sheets because 7+ ratings involve all players where as goal scored only applies to Attackers and Midfielders and Clean Sheets only applies to defenders.
I will use a Stratified Sample (where the sample is proportional to the original population) for this hypothesis.
I will use the sample size of 50 and I need to work out how many players from each group I should use:
Sample Size: 50
Number of Defenders: 85
Number of Midfielders: 84
Number of Attackers: 54
Total Players: 223
Proportion of Population = Number of players from group x Sample Size
Total number of players
Defenders: 85 x 50 = 19.05 = 19
223
Midfielders: 84 x 50 = 18.9 = 19
223
Attackers: 54 x 50 = 12
223
I will use the same random method that I used in hypothesis 1 to select the players from the groups.
I will draw scatter graphs to see which way of scoring points has the best correlation. This will show me which way of scoring points affects the number of points a player scores. I will use Spearman’s Rank to see how strong the correlation, and therefore find out which way of scoring points affects the number of points a player scores most.
I plotted scatter graphs to see if there was a correlation between:
Points scored and 7+ ratings achieved
Points scored and goals scored
Points scored and clean sheets
I then found the line of best fit for each graph and worked out the equation. I then used spearman’s rank to find out how strong the correlation was for each graph.
Working out mean points on scatter graphs:
Equations for the lines of best fit:
Working out for Spearman’s Rank
Interpretation:
For Points Vs 7+ Ratings, there is a strong positive correlation and the Spearman’s Rank proves this. This tells me that the more 7+ Ratings players achieve, the higher they score. The fact that all the 3 groups had positive correlation shows me that all 3 groups (defenders, midfielders, attackers) are all affected by 7+ ratings. The correlation rating of 0.85 from my Spearman’s Rank shows how strong the correlation is, so therefore the 7+ ratings must have a large affect on the number of points that the players score.
For Points Vs Goals, there is a medium positive correlation. This can be seen on my scatter graph clearly, with only the attackers having an obvious positive correlation where as midfielders and attackers only have a slightly visible positive correlation and because of this I only drew my line of best fit using attackers as they gave the best correlation. Spearman’s Rank showed that the rating was 0.41, which is a medium positive correlation, which means that the number of points a player scores only slightly depends on whether he scores or not.
For Points Vs Clean Sheets I was only able to plot defenders on the scatter graph because midfielders and attackers are not scored on clean sheets. From the graph you can see that there is a positive correlation and Spearman’s Rank confirms this, with a score of 0.91, which shows that there is a very strong positive correlation between how many points defenders score and the number of clean sheets achieved. However this doesn’t tell me anything about the effect clean sheets has on midfielders and attackers, which leads me on to conclude that overall clean sheets do not affect how many points, in general, the players score, unless of course that player is a defender.
Conclusion:
In conclusion I can see that 7+ Ratings affect the number of points a player scores because all the players can achieve 7+ ratings, where as goals are most commonly scored by attackers and clean sheets are most commonly achieved by defenders, and therefore Goals Scored and Clean Sheets can not have the same impact on the players points as 7+ Ratings do. I found out that Clean Sheets have a large impact on the number of points a defender scores, however my hypothesis was to find out which way of scoring affects any player, and not just a defender.
Overall Conclusion:
The aim of this investigation was to find out which players you should include into a fantasy football team to win. I looked at two hypotheses in this experiment, they were:
Hypothesis 1
I think that attackers achieve more points than midfielders and defenders because they have more chance of gaining points than defenders and midfielders for example, goals. You get 5 points for each goal and an extra 5 points if you score 3 or more goals. I think that attackers will also score more because unless they are booked during the game, there is no way for them to loose points, unlike defenders who loose points for conceding goals. I will use the data I have to prove this.
Hypothesis 2:
I think that 7+ Ratings will effect the number of points more than goals or clean sheets because 7+ ratings involve all players where as goal scored only applies to Attackers and Midfielders and Clean Sheets only applies to defenders.
I analysed the data I had, and drew graphs to reach conclusions on both of my hypotheses. Both of my hypotheses were supported, with Attackers achieving more points in Hypothesis 1 than midfielders and defenders and 7+ ratings affecting the number of points more than Clean Sheets and Goals scored in Hypothesis 2.
If I were to pick a fantasy football team based on my findings then I would make sure that I picked defenders because they are more consistent at scoring well. I would pick attackers because they do score the highest; however they are not as consistent as defenders of midfielders so there is a risk that they could score extremely low as well as extremely high.
However there is a price limit on fantasy football and the top names that score the big points, such as Henry, Shearer, S Campbell and van Nistelrooy aren’t going cheap. It would probably be a better idea to purchase a team of average scoring players for medium prices, than purchase a few top scoring players for high prices and then use low scoring and low priced players to make up the rest of your team.
The only problem that I encountered in the investigation was handling the data. This was because there was such a large amount of data, and different data was used in hypothesis 1 to hypothesis 2. However this problem was soon solved when I got used to using the data and organised it successfully.
In hypothesis 1 we grouped the data to make it easier to work with and to put into a cumulative frequency graph. However this changed the accuracy of the data. A problem with Spearman’s Rank was having tied ranks when ranking the points or other rating. This was easily solved by finding the mean of the rank number and using this instead.
The investigation had limitations. We could not use goalkeepers because of the different scoring system, so we could not use them in our investigation and therefore could not effectively predict a complete winning team. We also had to work with last years data, and some players such as Wayne Rooney have increased their performance dramatically since last year, just as some players have decreased their performance dramatically, perhaps through injury or lack of training. The last limitation was the fact that we only used premiership players and not players from any other divisions.
If I had the chance I would extend my investigation to cover goalkeepers. I would also use more recent and accurate data to try and pick an actual fantasy football team. It would then be interesting to actually take part in the fantasy football game and see how well my team would score.