# data handling

Extracts from this document...

Introduction

Data Handling Project

Planning

I intend to investigate the relationship between the number of hours of TV watched per week by students and their KS2 maths results. I think the more TV a student watches the less successful they will be.

Hence, I expect a negative correlation i.e. the two sets of data will be inversely proportional.

Firstly, I will retrieve the relevant data i.e. gender, hours of TV watched and maths results from the spreadsheet provided. As there are on average 200 students in each year, years 7-11, almost over 1000 students, it would be difficult to analyse such large data. Therefore, I will pick one of the five year groups randomly and base my investigation on the selected year group.

I will sort the year group into two sub-groups according to their gender. I will then apply the method of systematic sampling to the data. This will make the data more represent able. I have randomly selected Year 9.

As there are 261 students in Year 9 and I intend to have a sample of 30 students I will therefore select every 8th student and randomly eliminate two, thus leaving me with a sample of 30 students.

261 |

30 |

= 8.7

261 | 32 -2 = 30 | |

8 |

The collected data sample of 30 students is the raw data.

Middle

The evidence from the sample suggests males, on average, scored higher in their KS2 maths results than females, for this particular year.

In order to support the above statement I will compare the mean, mode, median and range of the KS2 maths results for males and females.

## Mean maths results

## Mean maths result for females = 4.06

Mean maths result for males = 4.66

## Mode maths results

Mode maths result for females = 4

Mode maths result for males = 5

## Median maths results

Median maths result for females = 4

Median maths result for males =5

Range of maths results

## Range of maths results for females = 2

Range of maths results for males =2

I have summarised these results in a table:

Maths results | Mean | Mode | Median | Range |

Females | 4.06 | 4 | 4 | 2 |

Males | 4.66 | 5 | 5 | 2 |

### Stem and leaf diagrams

### Year 9 Females

Stem | Leaf | Frequency |

0 | 1,6 | 2 |

1 | 0,4,4,6,7,7,8 | 7 |

2 | 1,1,2,4,4 | 5 |

3 | 9 | 1 |

4 |

Year 9 Males

## Stem | Leaf | Frequency |

0 | 4,8 | 2 |

1 | 0,0,0.5,2 | 4 |

2 | 0,0,0,0,1,2 | 6 |

3 | 0,0 | 2 |

4 | 2 | 1 |

## Averages

Hours of TV watched (hrs) | ## Mean | Modal class interval | Median | Range |

Females | 18 | 10-20 | 17 | 38 |

Males | 19 | 20-30 | 20 | 38 |

From the Year 9 sample, the mean, modal class interval and median were higher for males than for females. The difference in values for these measures for males and females for my Year 9 sample was not too big.

The modal class interval shows that on average males watched more hours of TV yet scored higher grades in their maths results. The range for both males and females for the hours of TV watched was the same. This refutes my original hypothesis for my sample of students from Year 9.

Conclusion

15 - 6.25 | x 100 |

15 |

i.e.

= 58.3%

15 - 10 | x 100 |

15 |

whereas =33.3% of females achieved level 4 and above

above

15 - 14 | x 100 |

15 |

Only =6.7% of the males achieved a level 6

Review

I started with a hypothesis stating that the more time spent watching TV the less successful students would be in their KS2 maths results. Though this is a logical line of enquiry my analysis of the selected data refute this hypothesis.

It was difficult to establish any strong correlation from the scatter diagrams. I have a positive correlation from my scatter diagrams however the gradient of the line of best fit is small, indicating low positive correlation. This could be due to my data being secondary and group sample selective and small. Hence, my results maybe slightly biased.

Considering my analysis for my original selected group i.e. Year 9 and my hypothesis being refuted I extended my project and drew a scatter diagram for a random sample of 60, from the whole school.

Apart from a couple of results at the top of the level 5 column, which are also quite dispersed, I think I have a negative correlation for the overall data. The gradient in this case is more distinctively negative than it was for the first set of data. They are also opposite to each other i.e. the gradient for the line of best fit for Year 9 was positive, while the gradient for the overall group is negative, confirming my hypothesis.

- 25 -

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

## Found what you're looking for?

- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month