The correlation between the height and weight of pupils at Mayfield High.

Authors Avatar

Rosanna Marr

For this project I have decided to look at the correlation between the height and weight of pupils at Mayfield High.

We were given data about an imaginary high school, which showed the gender, date of birth, IQ, eye colour along of each pupil with other information. This table shows the amount of pupils in Mayfield High:

On deciding what I was going to investigate it became clear that much of this information was unnecessary, so, on my database, provided from my school computer, shown on Excel, I deleted certain categories, but kept information on height, weight, surname, forename, gender, year group and age, which I felt might come in handy during my enquiry.  

To collect the data I needed, I simply transferred it from the provided statistics on Excel, to other worksheets, splitting the information into several different categories. My first category was a mixed sample, including both genders and all of the years. I used an equation;

I used this in order to generate random whole numbers between 1 and 1183, giving me a completely unsystematic mix of pupils from Mayfield High. By typing it into a column in Excel and pressing enter, a number between 1 and 1183 will appear; by repeating this equation 50 times I have the basis of my random mixed sample. The only difficulty with this is that the same number can be repeated twice, in which case you have to look closely at the end to make sure there are no repeated numbers (To generate these random numbers. (I could have also used my calculator and pressed SHIFT and RAN# and then the calculator displays a number between 0 and 1. In order to make the number between 1 and 1183, the number of students in the school, I would multiply the number shown on the screen by 1183 and then round this to the nearest whole number.

Because I am taking a random sample of the whole school it means that during this section of work, I will simply be looking at the correlation between the height of individual people and their weights, rather than seeing if age or gender affects this. Therefore, hopefully to find a higher correlation between height and weight, I will also be splitting the data into gender and age groups.  

After I had generated the random numbers, I found these in the data, then copied and pasted each one into a different worksheet. This made my random mixed sample.

Before I explain how I went about using this sample to try to discover correlation between my two chosen factors, I think it necessary to state what outcome I think I will find after putting this information into graph-form and analysing it. I think I will definitely find a certain amount of positive correlation, but that the relationship will not be very strong. I think I will have to carry on looking into this in more depths in order to find a distinct relationship between pupils’ height and weight. I think that on a whole there will be a large range between the heights and weights of the pupils. High-school age is the time when people’s height and weight fluctuate the most; therefore I think the 5-year age gap will affect the strength of the positive correlation. The age puberty begins tends to control the growth of teenagers, but starts at a different time for everyone. These factors are limitations that I shall have to overcome by splitting the school into more categories in order to find a more realistic relationship between height and weight. Generally I think I will notice that as height increases, so will the weight, however I think that in this section the connection will be less apparent.

To sort out the pupils into different weights I made a frequency table for continuous data.

 

Then, using Microsoft Excel’s “Chart Wizard”, I produced a chart showing this information more clearly.

I then decided to work out the mean, median and mode of this data. In order to do this more easily I added 2 more columns to my frequency table.

Join now!

 

To work out the mean I divided the fx value by the frequency, 2595/50 which gave me 51.9kg. To find the median class I found which group contained the 25½th frequency. This was 50<55kg. The modal class can simply be found by seeing which group has the largest frequency, which is obviously 45<50kg with 13. I had to work out the mode and median as classes because the data is continuous, therefore I cannot just find one number for these values. The range of this ...

This is a preview of the whole essay