# GCSE STATISTICS/Data Handling Coursework 2008

[Type text]

GCSE: Data Handling Coursework

Introduction

For this data handling project, I shall use data from athletics; track and field events and also the mass and height of the pupils from years 7 to 11, from the Athletics data spreadsheet. The subjects are only boys, from one school. There is a large amount of data in the sample, including times for 100m, 200m, 400m, 800m and 1500m, and also events such as long jump, triple jump, shot, javelin and discus. There is also a bleep test result and height and mass of students too. The data should be reliable, however I shall check for any anomalous records, then discard any from my sample.

I shall make three hypotheses based upon this data. I shall then show how I will test these hypotheses in my plan to prove or disprove them.

Hypotheses

The bleep test is an indication of aerobic respiration, event within the data; it is a test of endurance and also fitness. I think that fitness and health are related and the BMI, body mass index, of a person can be a good representation of health, despite sometimes not taking into account people with high muscle: fat ratios. I therefore think that people with a BMI in the “healthy” 20-24 bracket will have a better score for the bleep test than those outside of it.

As children grow, they grow stronger and fitter, the shot putt is a good test of strength. Therefore I think that people have a further shot putt distance the older they become, as people are unlikely to lose strength as their age increases.

My final hypothesis is that the 100 metre times will be normally distributed with no skew, this means that there will not be extreme outliers towards one direction but not the other and data distribution should follow symmetrically around the median.

I will therefore investigate the following hypotheses:

1. People with a healthy BMI have a higher bleep test.
2. The older people are the further their shot putt.
3. 100 metre times follow a normal distribution.

Plan

I will prove or disprove these hypotheses by presenting the data I choose to sample in different ways.

For the first hypothesis I shall use a scatter diagram to consider the relationship between body mass index and bleep test score. This will allow me to observe any patterns or correlations in the data. I will plot the quadrants, then to analyse the correlation I can look at the double mean point and look at how data is spread throughout quadrants and see if it is possible to draw a line of best fit, then being able to find the equation of the line of best fit, in terms of y = mx + c . I will measure how correlated the data is using Spearman’s correlation coefficient, which will give me a value of correlation between 1 and -1 showing me how strong the correlation is.

In order to study the second hypothesis I shall use box plots. I will be able to compare difference as the children grow older easily the median averages and how data is spread, i.e. inter quartile range. I will group the data by year, then create box plots using autograph which will show lower quartile, median, upper quartile, and outliers. I will then be able to see how the results change as the students are older.

I will analyse the distribution of 100 metre times by grouping the data into different class intervals, calculating the frequency density for each group, making a grouped data table then producing a histogram with autograph. I can use this to see if the data looks symmetrical, meaning the distribution would be likely to be normal. I will then test by comparing mean, median and mode of the grouped data, normal distribution would mean these values were the same. Finally I can work out the standard deviation and look at the percentage of data within 1, 2 and 3 standard ...