# GCSE STATISTICS/Data Handling Coursework 2008

Extracts from this document...

Introduction

[Type text]

GCSE: Data Handling Coursework

Introduction

For this data handling project, I shall use data from athletics; track and field events and also the mass and height of the pupils from years 7 to 11, from the Athletics data spreadsheet. The subjects are only boys, from one school. There is a large amount of data in the sample, including times for 100m, 200m, 400m, 800m and 1500m, and also events such as long jump, triple jump, shot, javelin and discus. There is also a bleep test result and height and mass of students too. The data should be reliable, however I shall check for any anomalous records, then discard any from my sample.

I shall make three hypotheses based upon this data. I shall then show how I will test these hypotheses in my plan to prove or disprove them.

Hypotheses

The bleep test is an indication of aerobic respiration, event within the data; it is a test of endurance and also fitness. I think that fitness and health are related and the BMI, body mass index, of a person can be a good representation of health, despite sometimes not taking into account people with high muscle: fat ratios. I therefore think that people with a BMI in the “healthy” 20-24 bracket will have a better score for the bleep test than those outside of it.

Middle

In the data Lower Quartile = 5.5

Median = 6.15

Upper Quartile = 6.925

Therefore the inter quartile range is 1.425.

1.5 X 1.425 = 2.1375

5.5 – 1.425 = 4.075 the data highlighted blue are lower outliers

6.925 + 1.425 = 8.35 so the data highlighted red are upper outliers.

3: 0.8

4: 0.2 0.4 0.7 0.7 0.8 0.8 0.95

5: 0 0 0 0.2 0.3 0.3 0.3 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.6 0.6 0.7 0.7 0.7 0.75 0.8 0.9

6: 0 0 0 0 0.1 0.1 0.1 0.1 0.2 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.6 0.75 0.8 0.8 0.9 0.9

7: 0 0 0.2 0.3 0.3 0.4 0.5 0.5 0.5 0.5

8: 0 0 0 0 0.2 0.3 0.3 0.5 0.7

9: 0.5

I discarded these data then looked at the yellow year 8 diagram.

In the data Lower Quartile = 5.25

Median = 6

Upper Quartile = 6.5625

Therefore the inter quartile range is 1.3125.

1.5 X 1.3125 = 1.96875

5.25 – 1.96875 = 3.28125 therefore there are no lower outliers

6.5625 + 1.96875 = 8.53125 and also no upper outliers

3: 0.8

4: 0 0 0.5 0.5 0.5 0.6 0.75 0.75 0.8

5: 0 0 0 0 0 0 0 0 0.2 0.25 0.25 0.25 0.3 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.6

6: 0 0 0 0 0 0 0 0 0 0 0 0.1 0.3 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.75 0.75 0.8

7: 0 0 0 0 0 0 0 0.1 0.2 0.3 0.5 0.5 0.75

8: 0 0.3 0.4 0.5

Finally I looked at the orange year 7 diagram as before.

In the data Lower Quartile = 4

Median = 5

Upper Quartile = 6

Therefore the inter quartile range is 2

1.5 X 2 = 3

4 – 2 = 2 so there are no lower outliers

6 + 2 = 8 the data highlighted red is an upper outlier.

2: 0.75

3: 0 0 0 0 0 0.25 0.5 0.5 0.5 0.5 0.5 0.5 0.5

4: 0 0 0 0 0 0 0 0 0 0 0 0.1 0.2 0.25 0.5 0.5 0.5 0.5 0.7 0.75 0.75

5: 0 0 0 0 0 0 0 0 0 0 0 0.2 0.25 0.25 0.5 0.5 0.5 0.5 0.6 0.75 0.8

6: 0 0 0 0 0 0 0 0 0 0.1 0.1 0.1 0.2 0.2 0.5 0.5 0.75

7: 0 0 0 0 0 0 0.3 0.5

8: 0.3

Having then discarded all of the outlying data I created the box and whisker diagrams again.

From the orange year 7 box there is a definite increase in the median. The inter quartile range is smaller, so data is more closely grouped and also grouped towards a longer throw. The high points and low points are also both greater in year 8 than year 7. Neither data is particularly skewed.

The difference between years 8 and 9 is not this conclusive. The quartiles are all slightly higher, but the high point drops. This however is due to the difference in weight of the shot thrown. It increases between year 8 and 9, but not year 7 and 8.

Conclusion

For the data to be normally distributed: Approximately 68% of data lies within one standard deviation of the mean

i.e. 68% lies within μ±σ

Similarly 95% lies within μ± 2σ

And 99% lies within μ± 3σ

For year 7: Standard deviation = 2.1952

Evaluation

Hypothesis 2 was the only hypothesis to be proven correct, however I was able to analyse why hypothesis 3 was incorrect, and also look at links between the distribution and age too.

Hypothesis 1 was incorrect, however this was the least likely to be proven right as BMI is a simple indication of something that is often too complicated to be shown in such a categorical way.

Overall the project therefore had mixed results, however I was able to draw conclusions from all three hypotheses which is a strong positive. I tried to make hypothesis that were not definite as there would be no point in stating obvious points to then prove them correct, so it is understandable that the whole project did not go completely smoothly.

To better the investigation I would use a wider variety of results if possible – there are obvious limitations with the data I used for this project. It is only from one school, and only boys as well. There are also not very many pupils who have complete records – there are very many pieces of data missing. I could use a national database for example with much more data so as to reduce the risk of anomalous graphs and to make the project more reliable and valid, including results for both genders.

of

This student written piece of work is one of many that can be found in our GCSE Miscellaneous section.

## Found what you're looking for?

- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month