4.The older someone is the better the correlation is between height and weight.
I will investigate this hypothesis by drawing a scatter graph for each year group, (7,8,9,10,11), and measuring the deviation, of the points from the line of best fit. The graph, with the least deviation will have the best correlation between height and weight.
Stratified Sample
I will need to do a stratified sample, because there are different number of pupils in each year group, so if I took the same number of pupils from each year group, my results would be biased. I will divide, the school, into ten groups, year groups, and then gender groups.
So for example, to find out how many male students in year 7 I needed, I divided 151 by 1183 and multiplied the answer by 60.
151/1183 = 0.127 x 60 = 7.65
This rounds up to 8, so I had 8 male pupils from year 7 in my sample.
To find the pupils to include in my sample, I pressed the Ran# button on my calculator, and then multiplied it by the number of pupils in the group that I was selecting pupils from (for example, there are 151 male students in year 7, so I would press ran# x 151). I rounded the number up to a whole number, and included that number pupil in the list of males in year 7 in my sample. I kept repeating this until I had the right number of pupils in each group.
There will be no bias in my sampling, because the numbers generated by the calculator are completely random.
1. Will there be a positive correlation between height and weight?
To answer this hypothesis I used the computer program called Autograph to draw a scatter graph, plotting height in metres on the x-axis and weight in kg on the y-axis. My graph shows that there is a reasonably strong positive correlation between height and weight. The line of best fit has an equation of y=36.74x-6.597. This means that the line has a gradient of 36.74 and intercepts the y-axis at –6.597. The standard deviation is 0.1223 from the x-axis, and 9.727 from the y-axis, which is the mean of the distance away from the line of best fit .So this has proved my hypothesis correct.
There is a positive correlation between height and weight.
Because I think gender influences height and weight, I will now plot separate scatter graphs for boys and girls, so I can see if a different line of best fit is needed. By calculating the gradient and y intercept on my graphs, I can see that the equation for boys is y=28.9x+5.1, and the equation for girls is y=42.1x-15.1. This tells me girls’ weights increase more quickly with their heights than boys’. Also the correlation on the girls’ graph is better than on the boys’ graph, which insinuates that there is a larger spread of height and weight in boys’ than girls’. So I will now investigate my hypothesis concerning spread.
2. The Spread of boy’s height is larger than the spread of girl’s height.
I will now look at whether there is a larger spread of heights in my male sample than in my female sample. First I will group my data into groups starting at 1.21≤x≤1.25, and going up to 1.86≤x≤1.9, so I can plot a cumulative frequency graph, and draw up 2 frequency tables for each gender. I will work out the cumulative frequency, and then plot a cumulative frequency graph, with the girls and boys data plotted separately and the cumulative frequency on the y-axis, and the height on the x-axis height. Here are the two frequency tables: -
Boys
Girls
The cumulative frequency graph showed me that the boys and the girls both had quite a small inter quartile range (IQR), because the gradient was quite steep for the central 50% of the y axis. The boys’ lower quartile was 1.55 metres and their upper quartile was 1.7 metres, meaning there was an IQR of 0.15 metres. The lower quartile of the girl’s data was 1.545 metres and the upper quartile was 1.68 metres, meaning there was an IQR of 0.135 metres. So the boys inter quartile range was larger than the girl’s inter quartile range, proving my hypothesis correct.
But, the minimum girls height was 1.2, and the maximum was 1.8, giving a range of 0.6, whereas the minimum boys height was 1.41 and the maximum boys height was 1.91, giving a range of 0.5, which is 0.1m lower than the girls, meaning that my hypothesis is incorrect, and the spread of the girls height is actually larger than the spread of the boys height. To make the minimum, maximum, IQR, and range data easier to understand, I have drawn box and whisker diagrams for the girls and boys heights on the same axis.
I will now investigate the standard deviation of the girls and boys height, which will be another indication as to whether the girls’ or the boys’ data has a larger spread. Standard deviation is the square root of the mean of the square of the deviation from the mean. To do this I will need to find the mean height, and subtract it from the mid class value for each group. The equation for standard deviation is
Here are the standard deviation calculations
Girls
0.7267/30=0.024223333
√0.024223333=0.15563847
Boys
0.65465541/30=0.021821847
√0.021821847=0.147722195
These calculations show that the girls’ standard deviation is larger than the boys’ standard deviation, indicating that there is a higher spread of height for girls than for boys.
Out of my three measures of spread, two have shown that girls’ height has a larger spread of height, and one has shown that boys have a larger spread of height.
-
The IQR of the boys’ height is 0.15metres, and the IQR of the girl’s height is 0.135metres.
-
The range of the boys’ height is 0.5metres, and the range of the girl’s height is 0.6metres.
-
The standard deviation of the boys’ height is 0.147722195, and the standard deviation of the girls’ height is 0.15563847.
So, from my calculations, I know that my hypothesis, that the spread of boys’ height is larger than the spread of girls height, is incorrect, but the alternative, that the spread of the girls’ height is larger than the spread of the girls’ height totally correct either. But, although my investigation has proved indecisive, there are 2 measures, which indicate that the girls’ height has a larger spread, and just 1 which indicates that the boys’ height has a larger spread. So I will come to the conclusion that: -
The Spread of girl’s height is larger than the spread of girl’s height.
- The average boys’ weight will be larger than the average girls’ weight.
To investigate this hypothesis I will find out the different numbers, which characterise the centre of the data, and can be interpreted as the average. I will use some sample data, to explain these three measures of central tendencies.
Sample Data: - 2,3,4,4,4,5,5,7,8,8,9
The Mode is the most frequently occurring number. This is the group, or number, which has the highest frequency. In the example, it would be 4, because there are three 4s, the highest frequency.
The Median is the middle value. When the data, is arranged in ascending order, the group, or number, with the middle value, is the median. In the example, there are 11 data points, so the middle number in 11 is 6, and the sixth value is 5
The Mean is the total of the data, divided by the number of items. So in the example, it would be 59/11, which equals 5.36.
Here are 2 frequency tables, with the grouped weight data, of the boys and girls, separately. From these tables I will calculate the mode, median, and mean.
Boys
0
Girls
Cumulative frequency Scatter Graph
35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-85
From my calculations, I can see that the following is correct: -
So my hypotheses that the average boys’ weight will be larger than the average girls’ weight, has been proved correct, because for the three measures of average, the boys was always higher.
I will now look at the averages for the ungrouped data.
Boys
Girls
From my calculations I can see that the following is correct: -
Also, as another way of discovering the central tendencies of the boys’ and girls’ weight data, I have calculated the frequency densities of different width groups, and plotted two histograms, one for each sex.
Boys Girls
The histograms that I have drawn from this data have showed me that the modal group for the girls is 55<x<59, and the modal groups for the boys was 45<x<49,50<x<54, and 55<x<59. This means that the girls and the boys had the same measure of central tendencies. The graphs also show that the frequency density bar in the middle of the girls’ graph was the highest weight, whilst the two frequency density bars in the middle of the boys’ graph, aren’t the highest bars on the graph. This shows that the central tendency for the boys’ is lower than the girls’.
This has proved my hypothesis correct.
So, using the grouped data, 3 measures said the boys’ had a higher average weight, and none said the girls’ had a higher average weight. Using the ungrouped data, 3 measures said the boys had a higher average weight, and none said the girls’ had a higher average weight. Lastly, for the histogram, 1 measure said the girls’ had a higher central tendency, and 1 said the girls’ and boys’ central tendency was equal. So I can safely come to the conclusion that: -
The average boys’ weight is higher than the average girls’ weight.
4.The older someone is the better the correlation is between height and weight.
To investigate this hypothesis, I have drawn a scatter graph for each year group, and here are the mean deviations of the data points from each line from each line.
The higher the deviation, is, the worse the correlation is. I will now plot the mean deviations on a graph to make my data easier to understand.
This graph shows me that my hypothesis is wrong. I thought that the higher the year group, the better the correlation would be between the height an weight, which means there would’ve been less deviation of the data points from the line of best fit.
But my results show me that instead of the deviation decreasing as the age gets higher, the deviations actually increases to 9.75mm, in year 9, from 5.57mm in year 7, but then decreases to 6.5mm in year 11. This shows that the correlation between height and weight is actually best in year 7. This must be because the pupils grow most in yeas 8,9,and 10, so their height and weight are unbalanced, and the correlation between the two isn’t very good, so there is a larger mean distance between the data points and the line of best fit, The correlation is better in year 7, because they haven’t started growing a lot yet, and in year 11 because they have finished having growth spurts. So
The Correlation is best in younger people, decrease, and then increases again towards the end of puberty.
Summary
There is a strong positive correlation between height and weight. I know this because I have plotted a scatter graph, and the equation of the line of best fit, is y=36.74x-6.597. Also the line of best fit for girls (equation y=42.1x-15.1) is steeper, than the best fit line for boys’ (equation y=28.9x+5.1). The Spread of girl’s height is larger than the spread of girl’s height. I came to this conclusion because the IQR of the boys’ height is 0.15metres, and the IQR of the girl’s height is 0.135metres, and the standard deviation of the boys’ height is 0.147722195, and the standard deviation of the girls’ height is 0.15563847.
The average boys’ weight is higher than the average girls’ weight, because of statistics such as the mean of the boys’ weight is 53.67, and the mean of the girls’ weight is 52.83, and the mode for the boys’ is 60, and just 47 and 50 for the girls’.
Finally the correlation is best in younger people, decrease, and then increases again towards the end of puberty. The mean deviation of the data points from the line of best fit in year 7 is 5.57, year 9 is 9.75, and in year 11 6.5.
Evaluation
I think my project went well, and I have fulfilled my aims, to prove correct or incorrect my four hypotheses. But although my sample was large enough to come to a reasonable conclusion, as to whether my hypotheses were correct or incorrect, for my results to have been more foolproof, I think I should have had a larger sample, perhaps 100 pupils altogether, with 50 of either sex. For example I only had 8 pupils in year 11, whilst having 14 pupils in year7 and 8. Eight is quite a low number of data pints to plot a graph with, so to make my results more conclusive, I need more pupils. Because I need a stratified sample, to keep my pupil sample fair, the only way I can get more pupils in the higher year is to increase my total sample. So this was one of my limitations, that I did not have a big another sample to provide full proof results. I don’t think there was any bias in my results, because my sample was random, so there can’t have been any bias. I think my plan was very effective, because I went through it in order, and completed all of my graphs and calculations that I needed. To further my work, I could investigate other things which effect height and weight such as whether the pupil was left handed or right handed, and their IQ. One of my other limitations was grouping my data, because I did it with my data grouped, which meant my answer was only an estimate. So to improve my project, I could have done standard deviation for the height data, with the data ungrouped. So If I did the calculations with ungrouped data, my results would be more accurate.
©Mohamed Hassan
20-01-07