Mayfield High School
I am investigating the pupils of Mayfield High School. It is a fictitious school, although the data is based on that of a real school. The line of enquiry I have decided to follow is the relationship between height and weight of the pupils.
The following table shows the numbers of pupils in the school:
Year Group
Boys
Girls
Total
7
51
31
282
8
45
25
270
9
18
43
261
0
06
94
200
1
84
86
70
604
579
183
Using this information, I have chosen to use a sample size of 30, as it is a large enough number to get a fair representation of the population, and divides fully into 360 in the event that I would need to draw any pie charts.
To begin with this line of enquiry, I shall take a random sample of 30 boys and 30 girls from the whole school register, recording their heights and weights. In order to do this I will allocate each student a number, generate random numbers using my calculator, and take the data of the corresponding student.
Boys
Girls
Height (cm)
Weight (kg)
Height (cm)
Weight (kg)
62
48
32
35
41
45
30
36
53
40
73
51
46
53
50
40
47
47
59
38
47
45
42
29
58
48
52
33
65
50
59
52
54
40
66
50
73
59
49
47
64
42
57
45
60
41
71
40
55
68
63
47
54
48
55
66
32
48
60
60
52
38
65
45
55
74
61
38
72
42
69
48
70
50
62
54
70
57
51
39
57
64
54
68
68
64
57
40
52
45
53
65
62
52
90
40
69
65
74
47
80
68
79
45
68
58
63
48
52
38
33
55
52
45
78
55
70
72
59
48
In doing this I have encountered a few extreme values in the data that I have had to discard because they are seemingly mistakes in filling in the forms or entering the data into the database. For example, a lower-school girl had a weight of 140kg, which in my opinion was not feasible, and so I discounted it from the sample and took another students data instead.
Here are the frequency tables for the above data, separated by gender. As the data is continuous I have grouped it in a grouped frequency table.
Boys
Height, h (cm)
Tally
Frequency
30 < h < 140
¦
40 < h < 150
¦¦¦¦
4
50 < h < 160
¦¦¦¦ ¦¦¦¦ ¦
1
60 < h < 170
¦¦¦¦ ¦¦¦
8
70 < h < 180
¦¦¦¦
5
80 < h <190
¦
90 < h < 200
0
Weight, w (kg)
Tally
Frequency
20 < w < 30
0
30 < w < 40
¦¦
2
40 < w < 50
¦¦¦¦ ¦¦¦¦ ¦¦¦¦
4
50 < w < 60
¦¦¦¦ ¦¦
7
60 < w < 70
¦¦¦¦
5
70 < w < 80
¦¦
2
Girls
Height, h (cm)
Tally
Frequency
30 < h < 140
¦¦¦
3
40 < h < 150
¦¦
2
50 < h < 160
¦¦¦¦ ¦¦¦¦ ¦
1
60 < h < 170
¦¦¦¦ ¦¦¦
8
70 < h < 180
¦¦¦¦
5
80 < h <190
0
90 < h < 200
¦
Weight, w (kg)
Tally
Frequency
20 < w < 30
¦
30 < w < 40
¦¦¦¦ ¦
6
40 < w < 50
¦¦¦¦ ¦¦¦¦ ¦¦¦
3
50 < w < 60
¦¦¦¦ ¦
6
60 < w < 70
¦¦¦¦
4
70 < w < 80
0
Firstly, I shall consider the trends in height. To do this, I will record the data in a histogram because it is continuous.
In order to draw the histogram I must calculate the frequency density of the bars. This is done by: Frequency density = frequency ?class width
Boys
Height (cm)
Frequency
Frequency density
30 < h < 140
0.1
40 < h < 150
4
0.4
50 < h < 160
1
.1
60 < h < 170
8
0.8
70 < h < 180
5
0.5
80 < h < 190
0.1
90 < h < 200
0
0
Girls
Height (cm)
Frequency
Frequency density
30 < h < 140
3
0.3
40 < h < 150
2
0.2
50 < h < 160
1
.1
60 < h < 170
8
0.8
70 < h < 180
5
0.5
80 < h < 190
0
0
90 < h < 200
0.1
Now I am able to draw the histograms of girls' and boys' heights.
The histograms show that the heights of boys and girls are very similar. They show a small dispersion of results with little variation for the boys, although there are some outlying values for the girls (for example the girl who is over 190cm tall).
In order to make a further comparison between heights of boys and girls, I will use the histograms to draw frequency polygons.
The frequency polygons show that there are fewer boys with heights below 140cm and above 190cm than there are girls, but more who are between 140 and 150cm and 180 and 190cm.
To continue with the line of enquiry, I will sort the data into stem and leaf diagrams as it is grouped, and calculate the averages. This will enable me to compare the heights of the different genders further.
Boys
Stem
Girls
Frequency
Leaf
Leaf
Frequency
2
3
0,2,3
3
4
7,7,6,1
4
2,9
2
1
8,7,5,5,4,4,3,2,2,2,2
5
0,1,2,3,4,5,7,7,9,9,9
1
8
9,8,8,5,4,2,2,0
6
0,1,2,3,3,5,6,9
8
5
3,2,0,0,0
7
,3,4,8,9
5
0
8
0
0
9
0
Key: 13/2 = 132 cm
These are the average results for height:
Heights (cm)
Mean
Modal Class Interval
Median
Range
Boys
59
50 < h < 160
58
48
Girls
59
50 < h < 160
59
60
Two of the three measures of average were the same for boys and girls, although the median height was slightly lower for boys (158 cm compared to 159cm). The data for boys showed tighter dispersion, with a spread less than that of ...
This is a preview of the whole essay
0
Key: 13/2 = 132 cm
These are the average results for height:
Heights (cm)
Mean
Modal Class Interval
Median
Range
Boys
59
50 < h < 160
58
48
Girls
59
50 < h < 160
59
60
Two of the three measures of average were the same for boys and girls, although the median height was slightly lower for boys (158 cm compared to 159cm). The data for boys showed tighter dispersion, with a spread less than that of the girls (the range for boys was 48cm compared to 60cm for the girls).
The evidence from the sample suggests that 11/30, or 37% of both boys and girls have a height of between 150 and 160cm.
Now I shall investigate the weights of the sample, following the same process.
To draw out the histograms of weights, I must again calculate the frequency density.
Boys
Weight (kg)
Frequency
Frequency Density
20 < w < 30
0
0
30 < w < 40
2
0.2
40 < w < 50
4
.4
50 < w < 60
7
0.7
60 < w < 70
5
0.5
70 < w < 80
2
0.2
Girls
Weight (kg)
Frequency
Frequency Density
20 < w < 30
0.1
30 < w < 40
6
0.6
40 < w < 50
3
.3
50 < w < 60
6
0.6
60 < w < 70
4
0.4
70 < w < 80
0
0
Now I am able to draw the histograms of girls' and boys' weights.
The histograms show that in general the boys weighed more than the girls. Both show a small dispersion of results from the mean.
In order to make a further comparison of girls' and bys' weights I will draw frequency polygons from the histograms.
The frequency polygons show that more girls than boys have a weight below 45kg, and more boys than girls have a weight above 45kg.
To continue, I will sort the data into stem and leaf diagrams as it is grouped, and calculate the averages so I can make further comparisons.
Boys
Stem
Girls
Frequency
Leaf
Leaf
Frequency
0
2
9
2
8,8
3
3,5,6,8,8,9
6
4
8,8,8,8,5,5,5,5,2,2,1,0,0
4
0,0,0,0,5,5,5,5,7,7,7,8,8
3
7
9,8,7,3,2,0,0
5
0,1,2,4,5,5
6
5
8,8,5,4,4
6
0,5,6,8
4
2
4,2
7
0
Key: 2/9 = 29
Weights (kg)
Mean
Modal Class Interval
Median
Range
Boys
48
40 < w < 50
49
36
Girls
47
40 < w < 50
46
39
The three averages were all higher for boys than for girls, although the data for boys was less widely spread out with a range of 36kg compared to 39kg for the girls. The evidence from the sample would suggest that 14/30 or 47% of boys and 13/30 or 43% of girls have a weight between 40 and 50kg.
These conclusions for both height and weight have been taken using a sample of only 30 boys and 30 girls. To confirm that these results are accurate and true of the entire population, I would need to either enlarge the sample size or repeat the whole procedure using a different sample.
Following this line of enquiry, I have made this hypothesis:
In general, the taller a person is, the more they will weigh.
In order to test this hypothesis, I need to take a new sample of 30 students of either gender.
Height (cm)
45
54
63
60
60
60
59
56
54
65
65
64
72
65
49
Weight (kg)
52
40
60
50
46
51
52
74
52
56
59
42
46
44
37
Height (cm)
65
70
06
57
80
75
79
62
63
72
60
65
67
77
62
Weight (kg)
72
52
74
36
42
57
45
72
45
51
55
48
66
57
56
These values will be plotted on a scatter diagram so that I can identify a correlation and find the relationship between height and weight.
The scatter diagram shows a moderate positive correlation between weight and height, suggesting that the taller a person is the heavier they are. The line of best fit suggests that a person who is 1.80m tall will weight 74kg.
Earlier in the investigation I found evidence to suggest that weight, and perhaps height, are affected by gender. I shall now investigate how gender affects the correlation between weight and height. I predict that:
Correlation between height and weight will improve if the genders are considered in isolation.
I will use the random sample of 30 boys and 30 girls taken at the start of the investigation to test this hypothesis, and plot this on 3 different scatter diagrams, showing the genders individually and the sample as a whole.
The evidence in the scatter diagrams supports my hypothesis that correlation between height and weight is stronger if boys and girls are studied individually.
The lines of best fit on the diagrams show that a boy who was 1.80m tall would weight 70kg, whereas a girl of the same height would weight 73kg.
The equations of the lines of best fir would enable me to calculate predictions for height or weight.
Finding the equations of the lines requires calculating the gradient of the line, and the point at which it crosses the y-axis.
Boys only: y = 0.1 x + 0.9
10
y = 0.01 x + 0.9
Girls only: y = 0.5 x - 2.1
7
y= 5 x -2.1
70
Mixed Population: y = 0.15 x + 1.2
17
y = 15 x + 1.2
1700
Using the equation, I ca predict that a girl 1.50m tall would weight 50kg.
y = 5 x - 2.1
70
x = 70 ( y + 2.1 )
5
x = 70 ( 1.50 + 2.1 )
5
x = 50kg
The line of best fir is an estimation of the relationship between height and weight, using only he sample of data.
There are anomalous values, for example the girl who is 1.90m tall and weighs 40kg, which does not follow the relationship.
Cumulative frequency is very useful when comparing sets of continuous data. I will use it in cumulative frequency curves to show data trends.
The following tables show the cumulative frequency for height and weight for boys, girls and the mixed population.
Heights (cm)
Cumulative Frequency
Weights (kg)
Cumulative Frequency
Boys
Girls
Mixed Population
Boys
Girls
Mixed Population
< 140
3
4
< 30
0
<150
5
5
0
<40
2
7
9
<160
6
6
32
<50
6
20
36
<170
224
24
48
<60
23
26
49
<180
29
29
58
<70
28
30
58
<190
30
29
59
<80
30
30
60
<200
30
30
60
The curves will be drawn on the same axis to make comparing them easier.
The curves have enabled me to read off easily and accurately the median, upper and lower quartiles and the interquartile range. These are shown for both height and weight in the following tables.
Heights (cm)
Median
Lower Quartile
Upper Quartile
Interquartile Range
Mixed
60
54
67
3
Boys
59
53
68
5
Girls
59
53
68
5
Weights (kg)
Median
Lower Quartile
Upper Quartile
Interquartile Range
Mixed
47
42
57
5
Boys
49
45
59
4
Girls
45
41
54
3
For height, the data for both boys and girls is very similar. They are both equally spread, discounting outliers in the lower and upper quartiles of values, and the median values are identical. This suggests that gender has little effect on height. However, there must be slight differences between the genders, as when the mixed population is considered the median is slightly raised, even though the interquartile range is smaller.
In terms of weight, all the values were lower for girls than for boys, suggesting that girls weight less generally, and have a tighter distribution than boys. For example the median weight for girls is 45kg, 4kg less than the median weight for boys, and the range is 13kg compared to 14kg.
This was also demonstrated in the box and whisker diagrams drawn to present the above data.
The box plots show that the girls had higher and lower heights than the boys, but apart from that the diagrams are the same. This suggests that gender does not have an affect on trends in height. They also show that for weight, the lowest and highest values for boys (38kg and 74kg) were both higher than for the girls (29kg and 68kg). Also the interquartile range for girls was 1 cm less than for boys, so the girls' data is less widely spread.
The cumulative frequency curves also enable me to make predictions of percentages of students with heights or weights within a certain range. For example, I can estimate of the number of boys in the school who will have a weight of between 50kg and 65 kg. The curve that 16 boys had a weight of up to 50kg, and 26 had a weight of up to 65kg. So 26 - 16 = 10 boys had a weight of between 50 and 65kg. Using this information, I can estimate that 10/30 or 33% of boys in the school will be between 50 and 65kg in weight. In other words, if a boy was selected at random from the school, the probability that his weight would be between 50 and 65kg is 1/3.
The cumulative frequency graphs show the relationship between the data for the genders. The median weight for boys was 49kg. The curve shows that 19 girls had a weight of less than 49kg. So 11 girls have a weight greater than the median for boys. This shows that whilst in general boys are heavier than girls, there is evidence to suggest that 11/30 or 37% of girls have a weight greater than the median weight for boys.
Summary
During this investigation, I have stated and tested two hypotheses. I have found that:
* There is a positive correlation between height and weight - in general, the taller a person is the more they weight.
* The points on the scatter diagram are less widely dispersed about the line of best fit for boys than they are for girls. This suggests that the correlation is stronger for boys, and that the boys' heights and weights are more predictable.
* The points on the scatter diagrams for boys and girls are less dispersed than the points on the scatter diagram for the mixed population. This would suggest that the correlation between height and weight is stronger when the genders are considered individually.
* The scatter diagrams can be used to estimate height and weight, either by reading off the values from the graph or by using the equations of the lines of best fit.
* The cumulative frequency curves show that the girls' and boys' heights are very similar, but that boys are heavier than girls generally.
* The median weight for boys is higher than for girls.
* From the box plots, it can be seen that boys are heavier than girls in general, but not exclusively so. The cumulative frequency curves can be used to estimate that 37% of girls have a weight greater than 49kg, the median weight for boys.
* The results and conclusions would be more accurate and better supported if larger sample sizes had been used, or the ages of students had been taken into consideration.
* The relationships and predictions are based on general trends observed within the data sample. In both samples there were exceptional individuals whose measurements fell outside of these trends.
Based on these observations, I will extend my investigation to include the affect of age, alongside gender, on the relationship between height and weight. To do this, I will take a stratified sample of the population according to gender and age. By doing so I can be as sure as possible that my sample is representative of the whole school, in the correct ratios ages and genders so that the sample is unbiased. The sample size to be taken from each stratum is calculated below.
Year 7 Boys: 151 x 30 = 7.5 (8)
604
Girls: 131 x 30 = 6.8 (7)
579
Year 8 Boys 145 x 30 = 7.2 (7)
604
Girls: 125 x 30 = 6.5 (7)
579
Year 9 Boys: 118 x 30 = 5.8 (6)
604
Girls: 143 x 30 = 7.4 (7)
579
Year 10 Boys: 106 x 30 = 5.2 (5)
604
Girls: 94 x 30 = 4.9 (5)
579
Year 11 Boys: 84 x 30 = 4.2 (4)
604
Girls: 86 x 30 = 4.5 (4)
579
The numbers in brackets are the sample size to be taken. The answers have to be rounded, to get a sample of the correct size, and also because it is impossible to collect a sample of 7.5 boys.
Here is the data collected for the sample:
Year Group
Boys
Girls
Height (cm)
Weight (kg)
Height (cm)
Weight (kg)
7
54
43
50
47
7
64
40
48
39
7
43
41
49
40
7
44
42
62
54
7
56
35
50
44
7
47
47
61
48
7
36
38
76
57
7
70
57
-
-
8
48
40
52
58
8
70
49
59
64
8
34
42
57
51
8
52
37
55
42
8
56
59
49
47
8
62
44
59
65
8
54
48
62
54
9
55
67
55
65
9
67
48
64
55
9
67
52
59
42
9
80
48
53
48
9
70
55
53
65
9
53
46
66
45
9
-
-
58
55
0
50
65
79
45
0
80
72
70
50
0
65
50
73
51
0
80
40
75
50
0
80
68
72
50
1
58
54
69
51
1
82
52
78
55
1
65
45
62
48
1
81
72
62
48
This is the summary of results for the stratified sample across the entire sample.
Boys
Girls
Median height (cm)
60
60
Mean height (cm)
61
61
Range of heights (cm)
48
31
Median weight (kg)
48
50
Mean weight (kg)
50
51
Range of weights (kg)
37
26
Although I know that this sample is an unbiased representation of the whole school, there isn't enough data to make meaningful statements about individual year groups. This means that I will have to take a 10% sample of each year group and gender.
These are the sample sizes for 10% samples:
Year
Boys
Girls
Total
7
5
3
28
8
5
2
27
9
2
4
26
0
1
9
20
1
8
9
7
Due to time constraints, I have chose to look at only Year 7 boys.
Year 7 Boys
Height (cm)
52
65
50
52
54
62
60
52
65
59
41
30
55
61
54
Weight (kg)
37
46
59
25
43
47
38
54
50
47
31
35
32
56
48
This is the summary of results for the stratified sample of Year 7 boys.
Boys
Median height (cm)
60
Mean height (cm)
54
Range of heights (cm)
35
Median weight (kg)
45
Mean weight (kg)
43
Range of weights (kg)
34
To compare the data, I will use standard deviation. This will allow me to calculate how strong the correlation is for the two different samples, and then prove or disprove my hypothesis.
Standard Deviation: Year 7 Boys
?(x -x) = 2 + 9 + 4 + 2 + 0 + 8 + 6 + 2 + 11 + 5 + 13 + 24 + 1 + 7 + 0
n 15
= 6.3 cm
Excluding exceptional value of 24cm
= 70
14
= 5 cm
Standard Deviation: All Boys in Stratified Sample
?(x -x)=
n (7+3+18+17+5+14+25+9+13+9+27+9+5+1+7+6+6+6+19+9+9+11+19+4+19+19+3+21+4+20) ÷ 30
= 11.5cm
Excluding exceptional values of 25 and 27cm
= 292
28
= 10.4 cm
The heights of the boys in the entire school have a much larger spread than the Year 7 boys, as the standard deviation of the stratified sample of the whole school was 10.4cm, more than twice the standard deviation of the Year 7 boys when considered in isolation. (Anomalous results were excluded when finding these values).
The best way to see the relationship between height and weight for Year 7 boys is to draw another scatter diagram and draw lines of best fit. As there are some untypical points, I have drawn 3 lines of best fit, one excluding the point (35, 130) and one is a curve. I will use the mean vertical dispersion of points from the lines of best fit to determine which is the most suitable for the data.
Green Line of Best Fit
Weight (kg)
Height on Line of best fit
Dispersions from the line
Mean of Vertical Dispersions
25
27
25
0.8 cm
31
36
5
32
38
7
35
42
2
37
46
6
38
47
3
43
55
46
60
5
47
61
,2
48
63
9
50
66
54
72
20
56
76
5
59
80
30
Red Line of Best Fit
Weight (kg)
Height on Line of best fit
Dispersions from the line
Mean of Vertical Dispersions
25
40
2
8 cm
31
46
5
32
47
8
35
49
9
37
51
38
52
8
43
57
3
46
59
6
47
61
2,1
48
62
8
50
64
54
68
6
56
69
8
59
72
22
Blue Line of Best Fit
Weight (kg)
Height on Line of best fit
Dispersions from the line
Mean of Vertical Dispersions
25
27
25
7.1 cm
31
41
0
32
43
2
35
49
9
37
52
0
38
53
7
43
57
3
46
59
6
47
59
3, 0
48
60
6
50
60
5
54
61
9
56
61
0
59
61
1
Considering the means of vertical dispersions of points, the curve of best fit is the best approximation of the relationship between height and weight as it has a mean of 7.1 compared to 10.8cm. However, by excluding the point (35,130) and drawing a line of best fit, the correlation is still strong with a mean of 8cm. As a result, I will use the red line as the line of best fit, in order to compare this correlation to the correlation of boys from all years. I will draw another scatter diagram for all the boys in the stratified sample, and follow through the process for finding the mean of vertical dispersions for comparison.
Line of best fit for all boys in stratified sample
Weight (kg)
Height on Line of best fit
Dispersions from the line
Mean of Vertical Dispersions
35
35
21
0.8 cm
37
38
4
38
40
4
40
43
21, 5, 37
41
45
2
42
47
3, 13
43
48
6
44
50
2
45
52
3
46
54
2
47
55
8
48
57
3, 10, 23
49
59
1
50
60
5
52
64
3, 18
54
67
9
55
69
57
72
2
59
75
9
65
86
36
67
89
34
68
91
1
72
97
7, 16
Comparing the mean of vertical dispersions of the boys in Year 7, and of all the boys in the stratified sample, the evidence suggest that considering the year groups in isolation gives a stronger correlation between height and weight. This mean of vertical dispersions of Year 7 boys was 8cm. The mean of vertical dispersion for all of the boys was 12.6cm, more than 1.5 times the mean of vertical dispersions of the Year 7 boys.
Final Summary
These are the final conclusions I have made from this investigation after extending the line of enquiry and refining my hypotheses.
* A sample of 30 students stratified over age and gender shows that the mean height is 161 cm for both boys and girls. However, the range of heights was considerably greater for boys than for girls, which suggests that there would be many boys with a height smaller than the girls.
* A 10% sample of the boys in Year 7 suggest that this age and gender has a mean height of 154 cm, with a mean deviation about the mean of 5 cm, excluding exceptional values. Comparing this to the stratified sample for the whole male sample, which has a mean height of 161cm and a mean deviation of 12.6 cm, the evidence suggests that both age and gender affects the strength of the correlation and there for accuracy in the approximation of the relationship between height and weight.
In taking a stratified sample, I eliminated the bias of age, where the proportion of boys to girls and the different ages was not reflected in the original sample. Keeping the sample within the ratio of numbers in each age and gender, I have reduced the possibility of one category being represented more than another and therefore affecting the results. The consequences have been a more fair representation of the school's population, which theoretically will have contributed to the increased accuracy and reliability of the results and conclusions that I have drawn.
GCSE Maths Statistics Coursework
Hannah Napier