Edexcel GCSE Statistics Coursework ictl

Investigation into 100m times and long jump distances

Introduction

I intend to use my school’s athletic sports results database to conduct an investigation into the relationship between the 100m times and the long jump distances throughout the year groups. This database contains secondary data which are both quantitative and qualitative from years 7 to 11 in RGS. This data should be reliable because the data was recorded under supervision.

I have chosen to use quantitative data for my investigation because qualitative data tends to be much more limited than quantitative data as quantitative data can take any numerical value whereas qualitative data can only take specific values (e.g. colours: blue, red green).

I believe that the faster somebody runs the higher and further s/he will jump. I believe this because many fast runners have long legs, which enable them to run with a longer stride. Also, it takes more energy fore someone with shorter legs to run the same distance and at the same speed as somebody with longer legs.

I also believe that somebody’s running speed will improve as he/she ages throughout secondary school. I believe this because many people start their growth spurt between year 8 and year 10 and will continue growing until they are about 18. Also, older people will improve as they grow older as they would have had more practice.

I also think that running speeds will follow a normal distribution throughout the year groups with the majority of people with a time near the centre of the distribution and a few people with a faster or slower time. I believe this because most people of the same age will run at roughly the same speed as their height and weight are also similar, however, they will be some people who are heavier or lighter, and faster or slower than average.

I will therefore investigate the following hypotheses:

The faster somebody runs the 100m, the further they will jump in the long jump.

The 100m times will improve over time throughout the year groups.

The 100m times will follow a normal distribution throughout the year groups.

Plan

For my first hypothesis, I will use a scatter graph to see if there is any relationship between the 100m times and the long jump distances. This will enable me to see if there is a correlation between the two. The correlation can be found by finding the double mean point, drawing a horizontal and a vertical line through it and examining the spread of data over the four quadrants. If there us a reasonably strong correlation, I will draw a line of best fit, which I can work out the equation of later on. The strength of the correlation will show how strong the relationship between 100m times and long jump distances is.

For my second hypothesis, I will use a box plot which will enable me to easily compare at a glance the median, inter-quartile range the lowest and highest value of each year group. To do this, I will draw a stem and leaf diagram and then work out the median and the quartiles. I will also consider any values that are very high or very low from the rest of the data.

For my third hypothesis, I will use a histogram which will enable me to see whether the years follow a normal distribution. To do this, I will need to group the data and calculate frequency densities. The shape of the histogram will show if the year follows a normal distribution: a symmetrical shape will mean the distribution is likely to be normal. The distribution will also be likely to be normal if the mode, median and mean have similar values.

Data collection

In the RGS athletic sports results database, there are 96 entries for years 7 and 8 each; and 100 entries for years 9, 10 and 11 each. There are also different years recorded for each entry (except year 7) to measure progress and improvement: 2006-7, 2005-6 and 2004-5. I have decided to take a sample of 40 pupils from each year group. This sample size is large enough so that it can represent the whole year groups as the sample size is just under half the population size. I have also decided to include years 7, 9 and 11 but not years 8 and 10 in my sample because removing two years between the “milestone” years will have little effect on the overall sample. I have decided to only use the 2006-7 data as it is the only year which all the year groups have results for.

I have decided to use quota sampling to create my sample because it is fast to conduct and mostly random if sorted by something irrelevant. Because quota sampling is mostly random, it should be free of bias. However, I will ignore incomplete entries (i.e. entries which had missing information) which may have an effect on the sample.