# Data Handling.

Ralph Weatherburn 11T

Maths Coursework: Data Handling

In this investigation I am comparing body mass index (BMI) against average number of hours of TV watched a week to see what correlation (if any) there is. This is because it seems relevant to BMI (since the less exercise people do the heavier they are, thus the higher BMI). The aim will be to show the correlation, if one exists, clearly and prove it, taking into account and bad results or bias in the data. To do this, I will first look at the population (1200 people). I must have at least 5% for the results to be reliable and without bias, so I will take 240 people (5%) as the sample. I will take stratified samples, and work out each person’s body mass index; this is done by dividing their weight in kilograms over their height squared in metres. The groups I will be dividing them into for stratified sampling will be gender and year group, which I will then take 48 simple random samples from each group.  A more detailed investigation could be done by adding other factors, such as distance from school or method or transport to school (allowing more or less time to watch TV). My hypothesis is that males will watch TV more, and the higher the years will watch more TV. The two categories I have chosen (gender and year group) will be sufficient to show a correlation however, since it will show any trends between gender and/or year group towards their TV habits. The gender identifies trends between boys and girls on how much they like to watch TV as a gender, and the year group shows how peoples TV habits change as they get older. There should be no bias in this since there is 5% of the population and there is an equal amount of each gender, and only a small difference in amounts of people in the year group.

I have collected the data and recorded it in the form of a table (as printed), and recorded them in a spreadsheet. To make more sense of this data, I will pick out points and trends using bar graphs, box and whisker diagrams, scatter diagrams and cumulative frequency graphs. I will also use standard deviation and dispersion to help plot these results.

I will first table the data in a frequency table and a cumulative frequency table, to list and sort the data.

This table shows the BMI, grouped into groups of 5. The frequency is the number of people in that category, and the cumulative frequency is the total (which adds up to the population, 240). As we can see, the frequency of people with certain BMI goes up until a point, and then decreases afterwards. The mean BMI is 20.11621, the median is 19.84 and there are 2 modes: 18.52 and 19.81. Now the results have ...