Figure 1.0 also indicates a positive correlation but I don’t think it is a strong correlation because the points are visually widely scattered. To get a better idea of how reliable the line of best fit is I will have to work out the correlation coefficient. If the coefficient is between 0.5 and 1 (because I know it’s positive) then the straight line is a good fit and if it’s closer to 0 then it’s not a good fit to the data.
Working out the correlation coefficient
The formula to find the Product Moment Correlation Coefficient is
Where r is the correlation coefficient, Sxy is the co-variance and Sx & Sy are the standard deviations of x & y respectively. I calculated each of the values separately and used a spreadsheet to check my work.
Fig 1.1
The correlation coefficient is about 0.5. This indicates that there is a relationship between the two variables (because it is between 0.5 and 1) but as it’s closer to 0.5, I feel it doesn’t show that height and weight are particularly strongly related and that the straight line is not a realistic representation.
I will now look more closely at the straight line (line of regression) …
Fig 1.3
Extending the line of regression to the left, I can see that a child less than 0.5m tall will have a negative weight (not possible); in addition extending the straight line of regression to the right suggests that people can grow infinitely tall (over 3m??). Therefore this line of best fit is only relevant within a certain limits. This boundary would be between 1.25m tall and 2.03m tall.
Within these limits, I have chosen this graph below (figure 1.4) as a more accurate representation of the data – which allows better predictions. So someone who is 1.9m tall should have a weight of about 68kg (see red line).
Fig 1.4
From what I have read, predictions are probably even better if we separate the genders so this is what I will do next.
Data separated for Girls and Boys
I will use the same process that is, enter data and used the program Autograph to plot a scatter graph.
These are my results for girls.
For the correlation coefficient of this data, I used the function button in Excel.
These are my results for boys.
Perhaps to get a better correlation coefficient the data also needs to be separated by age.
Data in groups: yr 7 and 11, separated for Girls and Boys
Year 7 girls.
Year 7 boys.
My response to hypothesis 1, that weight increases as height increases is that, it is true. However, the correlation
Coefficient shows that the relationship is variable between the sexes and for individual age groups. I am not going to go any further in this line of exploration because the data is so unpredictable and it’s driving me insane.
But I will look at the average height of the boys and girls.
Boys are naturally taller than girls
I believe that boys are naturally taller than girl so I will conduct tests to see whether my hypothesis is right or not. I will first get the averages for all the boy/girls in each year by obtaining the total heights and then dividing it by the number of pupils. I will create a table showing the year, height and average of height for each year. I will first do a table for the females and the second table is for the males.
In the two graphs above, both boys and girls increase in height as they increase in age. The comparisons are not very good because the scales are different.
This analysis that I have just made shows that my hypothesis was correct and that boys are naturally taller than girls. Since there is a gender difference in terms of height, the data will need to be separated into girls and boys if it is to be used to predict the height of a student given the students’ age. But before we can do this I will need to look at the distribution of heights according to gender, I will do this using cumulative frequency.
Using a cumulative frequency graph to investigate distribution of heights in girls
This graph tells me that 50% of girls were less than 1.59m tall, that a quarter of them were smaller than 1.53m and three quarters were below 1.64m. This data can also be represented as a box and whisker diagram which is better for showing skew.
This box and whisker diagram shows a slight negative skew meaning that the range from the upper quartile to the median is less than the range from the lower quartile to the median so that there are more girls bunched together above the median. Using box and whisker diagrams it’s easier to compare the heights of girls to the heights of boys.
To gain a better comparison I need to put girls and boys onto the same axes.
Comparing the two, the girls have a slight negative skew and the boys are more or less symmetrical. The boys appear to have a wider range than the girls. Although the lower quartiles are about the same, that is one quarter of boys is less than 1.52m and one quarter girls are less than 1.527m, the median, upper quartile and the top most measurements are consistently higher than the girls. To get more dept in this hypothesis I will now create a histogram.
But this is not as accurate as I had wished it to be, because if I were to go back/forward 10 years then the height would probably be different for the year groups of 7-11 and the girls could be taller than the boys instead.
I will now make a histogram to compare the distribution of heights of year 7 girls to boys.
Comparing the distribution of heights of year 7 girls to boys
Female
Male
Year 7
How frequency density is calculated
The frequency density in the table above is calculated by dividing the frequency by the class width, e.g. for the interval 1.45 ≤ x < 1.5
f.d. = 7 ÷ 0.05 = 140
Distribution of boys’ heights
Frequency Polygon can be drawn from the histogram by connecting the mid-points of the tops of the bars. Because I think it is easier to compare the distribution of heights of girls and boys using frequency polygons, from now on I am only going to use frequency polygons
Comparing Frequency polygon of boys (blue) heights to girls (red) heights I can see immediately that there are more boys than girls. By using proportional stratified sampling I ended up with 26 boys and 22 girls. To get over this problem of different sample sizes I feel it better to compare the percentage frequency of boys to the percentage frequency of girls.
Using excel I worked out percentages and entered them onto the autograph data sheet.
There are still more boys who are 1.56 metres or less than there are girls. The heights of boys’ fall into a smaller range than the height of the girls’. As this range is probably important I am going to compare the distribution of heights once again but this time using box and whisker diagrams.
It box and whisker diagram shows that girls (yellow) have both a wider range and wider inter-quartile range of heights than do the boys (blue). The advantage of the box and whisker diagram is that it deals with sample proportions (quarters/quartiles). It doesn’t matter how big or small each sample is.
Now that I know that boys are taller than girls I can now predict that boys are heavier than girls so I will make a table and find the mean for them to see if I am right.
From this data I am sadly mistaken and I now see that girls are heavier than boy from year 7-9 but after that boys are heavier than girls. So over all girls are heavier than boys for a certain point but boys are heavier in the long-run.
As age increases, the rate of growth decreases