To draw out the histograms of weights, I must again calculate the frequency density.
Now I am able to draw the histograms of girls’ and boys’ weights.
The histograms show that in general the boys weighed more than the girls. Both show a small dispersion of results from the mean.
In order to make a further comparison of girls’ and bys’ weights I will draw frequency polygons from the histograms.
The frequency polygons show that more girls than boys have a weight below 45kg, and more boys than girls have a weight above 45kg.
To continue, I will sort the data into stem and leaf diagrams as it is grouped, and calculate the averages so I can make further comparisons.
Key: 2/9 = 29
The three averages were all higher for boys than for girls, although the data for boys was less widely spread out with a range of 36kg compared to 39kg for the girls. The evidence from the sample would suggest that 14/30 or 47% of boys and 13/30 or 43% of girls have a weight between 40 and 50kg.
These conclusions for both height and weight have been taken using a sample of only 30 boys and 30 girls. To confirm that these results are accurate and true of the entire population, I would need to either enlarge the sample size or repeat the whole procedure using a different sample.
Following this line of enquiry, I have made this hypothesis:
In general, the taller a person is, the more they will weigh.
In order to test this hypothesis, I need to take a new sample of 30 students of either gender.
These values will be plotted on a scatter diagram so that I can identify a correlation and find the relationship between height and weight.
The scatter diagram shows a moderate positive correlation between weight and height, suggesting that the taller a person is the heavier they are. The line of best fit suggests that a person who is 1.80m tall will weight 74kg.
Earlier in the investigation I found evidence to suggest that weight, and perhaps height, are affected by gender. I shall now investigate how gender affects the correlation between weight and height. I predict that:
Correlation between height and weight will improve if the genders are considered in isolation.
I will use the random sample of 30 boys and 30 girls taken at the start of the investigation to test this hypothesis, and plot this on 3 different scatter diagrams, showing the genders individually and the sample as a whole.
The evidence in the scatter diagrams supports my hypothesis that correlation between height and weight is stronger if boys and girls are studied individually.
The lines of best fit on the diagrams show that a boy who was 1.80m tall would weight 70kg, whereas a girl of the same height would weight 73kg.
The equations of the lines of best fir would enable me to calculate predictions for height or weight.
Finding the equations of the lines requires calculating the gradient of the line, and the point at which it crosses the y-axis.
Boys only: y = 0.1 x + 0.9
10
y = 0.01 x + 0.9
Girls only: y = 0.5 x – 2.1
7
y= 5 x –2.1
70
Mixed Population: y = 0.15 x + 1.2
17
y = 15 x + 1.2
1700
Using the equation, I ca predict that a girl 1.50m tall would weight 50kg.
y = 5 x – 2.1
70
x = 70 ( y + 2.1 )
5
x = 70 ( 1.50 + 2.1 )
5
x = 50kg
The line of best fir is an estimation of the relationship between height and weight, using only he sample of data.
There are anomalous values, for example the girl who is 1.90m tall and weighs 40kg, which does not follow the relationship.
Cumulative frequency is very useful when comparing sets of continuous data. I will use it in cumulative frequency curves to show data trends.
The following tables show the cumulative frequency for height and weight for boys, girls and the mixed population.
The curves will be drawn on the same axis to make comparing them easier.
The curves have enabled me to read off easily and accurately the median, upper and lower quartiles and the interquartile range. These are shown for both height and weight in the following tables.
For height, the data for both boys and girls is very similar. They are both equally spread, discounting outliers in the lower and upper quartiles of values, and the median values are identical. This suggests that gender has little effect on height. However, there must be slight differences between the genders, as when the mixed population is considered the median is slightly raised, even though the interquartile range is smaller.
In terms of weight, all the values were lower for girls than for boys, suggesting that girls weight less generally, and have a tighter distribution than boys. For example the median weight for girls is 45kg, 4kg less than the median weight for boys, and the range is 13kg compared to 14kg.
This was also demonstrated in the box and whisker diagrams drawn to present the above data.
The box plots show that the girls had higher and lower heights than the boys, but apart from that the diagrams are the same. This suggests that gender does not have an affect on trends in height. They also show that for weight, the lowest and highest values for boys (38kg and 74kg) were both higher than for the girls (29kg and 68kg). Also the interquartile range for girls was 1 cm less than for boys, so the girls’ data is less widely spread.
The cumulative frequency curves also enable me to make predictions of percentages of students with heights or weights within a certain range. For example, I can estimate of the number of boys in the school who will have a weight of between 50kg and 65 kg. The curve that 16 boys had a weight of up to 50kg, and 26 had a weight of up to 65kg. So 26 – 16 = 10 boys had a weight of between 50 and 65kg. Using this information, I can estimate that 10/30 or 33% of boys in the school will be between 50 and 65kg in weight. In other words, if a boy was selected at random from the school, the probability that his weight would be between 50 and 65kg is 1/3.
The cumulative frequency graphs show the relationship between the data for the genders. The median weight for boys was 49kg. The curve shows that 19 girls had a weight of less than 49kg. So 11 girls have a weight greater than the median for boys. This shows that whilst in general boys are heavier than girls, there is evidence to suggest that 11/30 or 37% of girls have a weight greater than the median weight for boys.
Summary
During this investigation, I have stated and tested two hypotheses. I have found that:
-
There is a positive correlation between height and weight – in general, the taller a person is the more they weight.
-
The points on the scatter diagram are less widely dispersed about the line of best fit for boys than they are for girls. This suggests that the correlation is stronger for boys, and that the boys’ heights and weights are more predictable.
-
The points on the scatter diagrams for boys and girls are less dispersed than the points on the scatter diagram for the mixed population. This would suggest that the correlation between height and weight is stronger when the genders are considered individually.
-
The scatter diagrams can be used to estimate height and weight, either by reading off the values from the graph or by using the equations of the lines of best fit.
-
The cumulative frequency curves show that the girls’ and boys’ heights are very similar, but that boys are heavier than girls generally.
-
The median weight for boys is higher than for girls.
-
From the box plots, it can be seen that boys are heavier than girls in general, but not exclusively so. The cumulative frequency curves can be used to estimate that 37% of girls have a weight greater than 49kg, the median weight for boys.
-
The results and conclusions would be more accurate and better supported if larger sample sizes had been used, or the ages of students had been taken into consideration.
-
The relationships and predictions are based on general trends observed within the data sample. In both samples there were exceptional individuals whose measurements fell outside of these trends.
Based on these observations, I will extend my investigation to include the affect of age, alongside gender, on the relationship between height and weight. To do this, I will take a stratified sample of the population according to gender and age. By doing so I can be as sure as possible that my sample is representative of the whole school, in the correct ratios ages and genders so that the sample is unbiased. The sample size to be taken from each stratum is calculated below.
Year 7 Boys: 151 x 30 = 7.5 (8)
604
Girls: 131 x 30 = 6.8 (7)
579
Year 8 Boys 145 x 30 = 7.2 (7)
604
Girls: 125 x 30 = 6.5 (7)
579
Year 9 Boys: 118 x 30 = 5.8 (6)
604
Girls: 143 x 30 = 7.4 (7)
579
Year 10 Boys: 106 x 30 = 5.2 (5)
604
Girls: 94 x 30 = 4.9 (5)
579
Year 11 Boys: 84 x 30 = 4.2 (4)
604
Girls: 86 x 30 = 4.5 (4)
579
The numbers in brackets are the sample size to be taken. The answers have to be rounded, to get a sample of the correct size, and also because it is impossible to collect a sample of 7.5 boys.
Here is the data collected for the sample:
This is the summary of results for the stratified sample across the entire sample.
Although I know that this sample is an unbiased representation of the whole school, there isn’t enough data to make meaningful statements about individual year groups. This means that I will have to take a 10% sample of each year group and gender.
These are the sample sizes for 10% samples:
Due to time constraints, I have chose to look at only Year 7 boys.
This is the summary of results for the stratified sample of Year 7 boys.
To compare the data, I will use standard deviation. This will allow me to calculate how strong the correlation is for the two different samples, and then prove or disprove my hypothesis.
Standard Deviation: Year 7 Boys
Σ(x –x) = 2 + 9 + 4 + 2 + 0 + 8 + 6 + 2 + 11 + 5 + 13 + 24 + 1 + 7 + 0
n 15
= 6.3 cm
Excluding exceptional value of 24cm
= 70
14
= 5 cm
Standard Deviation: All Boys in Stratified Sample
Σ(x –x)=
n (7+3+18+17+5+14+25+9+13+9+27+9+5+1+7+6+6+6+19+9+9+11+19+4+19+19+3+21+4+20) ÷ 30
= 11.5cm
Excluding exceptional values of 25 and 27cm
= 292
28
= 10.4 cm
The heights of the boys in the entire school have a much larger spread than the Year 7 boys, as the standard deviation of the stratified sample of the whole school was 10.4cm, more than twice the standard deviation of the Year 7 boys when considered in isolation. (Anomalous results were excluded when finding these values).
The best way to see the relationship between height and weight for Year 7 boys is to draw another scatter diagram and draw lines of best fit. As there are some untypical points, I have drawn 3 lines of best fit, one excluding the point (35, 130) and one is a curve. I will use the mean vertical dispersion of points from the lines of best fit to determine which is the most suitable for the data.
Green Line of Best Fit
Red Line of Best Fit
Blue Line of Best Fit
Considering the means of vertical dispersions of points, the curve of best fit is the best approximation of the relationship between height and weight as it has a mean of 7.1 compared to 10.8cm. However, by excluding the point (35,130) and drawing a line of best fit, the correlation is still strong with a mean of 8cm. As a result, I will use the red line as the line of best fit, in order to compare this correlation to the correlation of boys from all years. I will draw another scatter diagram for all the boys in the stratified sample, and follow through the process for finding the mean of vertical dispersions for comparison.
Line of best fit for all boys in stratified sample
Comparing the mean of vertical dispersions of the boys in Year 7, and of all the boys in the stratified sample, the evidence suggest that considering the year groups in isolation gives a stronger correlation between height and weight. This mean of vertical dispersions of Year 7 boys was 8cm. The mean of vertical dispersion for all of the boys was 12.6cm, more than 1.5 times the mean of vertical dispersions of the Year 7 boys.
Final Summary
These are the final conclusions I have made from this investigation after extending the line of enquiry and refining my hypotheses.
-
A sample of 30 students stratified over age and gender shows that the mean height is 161 cm for both boys and girls. However, the range of heights was considerably greater for boys than for girls, which suggests that there would be many boys with a height smaller than the girls.
-
A 10% sample of the boys in Year 7 suggest that this age and gender has a mean height of 154 cm, with a mean deviation about the mean of 5 cm, excluding exceptional values. Comparing this to the stratified sample for the whole male sample, which has a mean height of 161cm and a mean deviation of 12.6 cm, the evidence suggests that both age and gender affects the strength of the correlation and there for accuracy in the approximation of the relationship between height and weight.
In taking a stratified sample, I eliminated the bias of age, where the proportion of boys to girls and the different ages was not reflected in the original sample. Keeping the sample within the ratio of numbers in each age and gender, I have reduced the possibility of one category being represented more than another and therefore affecting the results. The consequences have been a more fair representation of the school’s population, which theoretically will have contributed to the increased accuracy and reliability of the results and conclusions that I have drawn.