Introduction:
What is my investigation about?
My second coursework for my Mathematics GCSE course is based on statistics. I have been provided with data for Mayfield High School. Mayfield is a fictitious High School but the data is based on a real school.
Mayfield has 1183 students from years 7 to 11. The data that is provided on each student includes, Name, Age, Year Group, IQ, Weight, Height, Hair Colour, Eye colour, Distance from home to school, usual method of travel to school, number of brothers and sisters, key stage 2 results in English, Mathematics and Science.
Year Group
Number of Boys
Number of Girls
Total
7
51
31
282
8
45
25
270
9
18
43
261
0
06
94
200
1
84
86
70
Total
604
579
183
This is the data given to us on the main question sheet. We had to go and retrieve the data for each pupil from the schools network in order for us to carry out the investigation.
There are a number of possible lines of enquiries that we could carry out. For example:
. the variations in hair colour
2. the variations in eye colour
3. the relationship between the hair colour and the eye colour
4. the distances travelled to school
5. the relationship between the height and the weight
6. the relationship between two sets of Key Stage 2 results
7. the relationship between IQ and Key Stage 2 results
8. the height to weight ratio in terms of body mass index
9. the relationship between the number of hours of TV watched and the IQ
0. the relationship between the gender and the IQ
After discussion with my teacher I have decided to carry out an investigation between the height and the weight of a sample of students. I was considering doing the average number of hours of TV watched per week and the IQ of each student. However, I could not continue with this as my results may have been affected as the results should also take into consideration what types of programs they watch. If someone watches a lot of television, but watches educational programs may have a higher IQ than someone who doesn't watch as much television but watches cartoons and other programs. This is why I could not continue with this enquiry. Therefore I chose the height vs. the weight.
I will look to establish a relationship between the height of the students and the weight of a randomly selected group of students. I will use various methods to present and analyse my results. I believe that the two sets of data will be directly proportional, the taller you are the more you weigh.
Height vs. Weight
Sample:
I have decided take a sample of 30 students from the school. I chose 30 as it divides exactly into 360 which are how many degrees there are in a circle therefore it will make it easier to draw a pie chart from the data I will have collected.
Now that I have decided to take 30, I will first take 30 students from the school to compare the boys against the girls. I will later in my investigation take each year in turn. I will take 30 students from the school but I will make have to make sure I minimise the possibility of having biased results and making sure that every student has equal opportunity in being selected.
Year Group
Number of Boys
Number of Girls
Total
7
51
31
282
8
45
25
270
9
18
43
261
0
06
94
200
1
84
86
70
Total
604
579
183
The total in the school is 1183 and there are 604 boys and 579 girls. To find a correct amount of boys and girls to take a sample of I have to find the ratio of how many girls to the total number of students there are and to multiply that number by 30 (sample size). In I will also do the same for the boys this will give me a fair number of how many boys and girls I will need to select which would consider the proportion of boys which is slightly higher than girls.
I will also need to consider the number of boys and girls to pick from each year. This is how I worked it out:
Year 7:
To determine how many boys and how many girls I will need to pick from year 7 I need to first find out how many boys and girls I need from year 7. To do this I will divide the total in year 7 by the total number of students.
==> 282/1183 = 0.24
==> 0.24 x 30 = 7.2 --> 7 (nearest whole number)
From the total of 30 students 7 will come from year 7. The next step is to find out how many boys and how many girls I will pick from year 7. I divided 7 by the total number of boys and girls then I multiplied that figure by the number of boys then by the number of girls.
Number of boys:
==> (7/282) x 151 = 3.748 --> 4 (nearest whole number)
This means that I will pick 4 boys from year 7. Again a slight degree of bias is introduced when I rounded up.
Number of girls:
==> (7/282) x 131 = 3.251 --> 3 (nearest whole number)
This means that I will pick 3 girls from year 7. Rounding the number has introduced bias again.
Year 8:
I will use the same method that I used for year 7 to generate the numbers of boys and girls from year 8.
==> 270/1183 = 0.23
==> 0.23 x 30 = 6.9 --> 7 (nearest whole number)
From year 8 I will also select 7 students.
Number of Boys:
==> (7/270) x 145 = 3.759 --> 4 (nearest whole number)
Number of girls:
==> (7/270) x 125 = 3.241 --> 3 (nearest whole number)
Year 9:
Total number of boys and girls from year 9:
==> 261/1183 = 0.22
==> 0.22 x 30 = 6.6 --> 7 (nearest whole number)
Number of Boys:
==> (7/261) x 118 = 3.165 --> 3 (nearest whole number)
Number of Girls:
==> (7/261) x 143 = 3.835 --> 4 (nearest whole number)
Year 10:
Total number of boys and girls from year 10:
==> 200/1183 = 0.17
==> 0.17 x 30 = 5.1 --> 5 (nearest whole number)
Number of Boys:
==> (5/200) x 106 = 2.65 --> 3 (nearest whole number)
Number of Girls:
==> (5/200) x 94 = 2.35 --> 2 (nearest whole number)
Year 11:
Total number of boys and girls from year 11:
==> 170/1183 = 0.14
==> 0.14 x 30 = 4.2 --> 4 (nearest whole number)
Number of Boys:
==> (4/170) x 84 = 1.976 --> 2 (nearest whole number)
Number of Girls:
==> (4/170) x 86 = 2.024 --> 2 (nearest whole number)
Now that I know how many boys and how many girls I need from each year, I need to sort the data out so that I can easily source it all out. I firstly sorted the data into each year group. I then sorted out the genders. I took each set of data, e.g. boys in year 7, into a separate sheet and assign each a new number. I then needed to generate a random number to pick the specific students I was going to use in my sample. I did this for each year group.
There are two ways of selecting random students according to their special digit assigned to them. I could put all the boys (604) names into a hat and pick out 15 at random and record the names. However, in this case this is not appropriate as there is far too many people and the hat would have to be pretty big. Instead I will use my calculator using the random number button on the calculator.
How to use the random number button:
Press [shift] [ran#] on the calculator and it will display a random number between 0 and 1. You then multiply this number by the number of possibilities there are i.e. in my investigation the random number generated would be multiplied by the number of boys (604) and that will give me a number which is the number of the boy to be picked. In almost 99% of the cases the number will have to be rounded which will introduce bias into this investigation. I will do this fifteen times each time recording the number generated and if by any chance the calculator produces the same number I will simply have to ignore it and record another one.
E.g. In a lottery draw, the total number is 49.
Generate a number between 0 and 1 --> 0.496
Multiply the number by 49 --> 24.304
Gives 24.304 round the number --> 24
24 is the number generated
These four simple steps are a much better and much more efficient way rather than choosing numbers / people from a hat. That is why I will use the random button on my calculator for this investigation quite often.
Picking the sample:
To pick the random students that I will use in my investigation I have decided to use the random number button on my calculator. First I will pick the required number of students from year 7.
Year 7:
From year 7 I will need to pick four boys and three girls. First the girls, I arranged them separately in a separate excel spread sheet.
Girls:
Each girl from year 7 was assigned a special ID number, in this case ranging from 1 to 131. I used the random button to generate thirty numbers.
==> I used the following formula to generate the numbers (Ran#) x 131
Number of tries
Random Number
Number (nearest whole number)
54.234
54
2
69.037
69
3
7.816
8
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
8
7
Carney
Esther
Female
.50
44
54
7
Higgins
Joanne
Alicia
Female
.50
45
69
7
Kelly
Jenifer
Fay
Female
.30
45
I took those three numbers I generated and I looked through my year 7 girls' sheet and I located the three and copied only the necessary information.
Boys:
I did the same for the boys as I had done to the girls. I first sorted the boys in year 7 in a separate spread sheet and assigned each boy with a special ID ranging from 1 to 151. I then used the random number button generate four ID's and those will be the boys I select in my sample.
==> This is the formula I used to generate the numbers below (Ran#) x 151
Number of Tries
Random Number
Number (nearest whole number)
18.988
19
2
20.385
20
3
7.214
7
4
97.093
97
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
I then took these numbers and sourced out the students from my data.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
7
7
Bingh
Daniel
Male
.56
35
20
7
Bond
James
Sean
Male
.47
50
97
7
McKracken
Phil
Peter
Male
.58
48
19
7
Sharpe
Billy
Richard
Male
.43
41
I used the same method for each year right the way through from the remaining, Year 8 to Year 11 for both the girls and the boys.
Year 8:
From year 8 I need four boys and three girls.
Girls:
==> (Ran#) x 125
Number of Tries
Random Number
Number (nearest whole number)
00.625
01
2
66.875
67
3
39.75
40
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
40
8
Dom
Kate
Female
.59
50
67
8
Indera
Emily
Sophia
Female
.52
45
01
8
Neelam
Kate
Female
.45
81
...
This is a preview of the whole essay
40
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
40
8
Dom
Kate
Female
.59
50
67
8
Indera
Emily
Sophia
Female
.52
45
01
8
Neelam
Kate
Female
.45
81
Boys:
==> (Ran#) x 145
Number of Tries
Random Number
Number (nearest whole number)
46.4
47
2
53.215
53
3
66.12
66
4
72.5
73
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
47
8
Fahmed
Ali
Male
.61
48
53
8
Gore
Mike
John
Male
.63
56
66
8
Jarvel
Kenneth
Male
.66
46
73
8
Kevill
Dean
Michael
Male
.52
43
Year 9:
From year 9 I need four girls and three boys.
Girls:
==> (Ran#) x 143
Number of Tries
Random Number
Number (nearest whole number)
73.931
74
2
27.556
28
3
75.075
75
4
1.44
1
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
1
9
Bellfield
Janet
Female
.58
40
74
9
Jones
Sarah
Ann
Female
.53
40
75
9
Jones
Samantha
Louise
Female
.62
45
28
9
Smith
Anjelina
Louise
Female
.50
45
Boys:
==> (Ran#) x 118
Number of Tries
Random Number
Number (nearest whole number)
00.654
01
2
74.222
74
3
55.106
55
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
01
9
Simons
Jack
Male
.64
59
74
9
Laters
Richard
Tang
Male
.69
65
55
9
Huggard
Malcolm
Male
.52
52
Year 10:
From year 10 I will need two girls and three boys.
Girls:
==> (Ran#) x 94
Number of Tries
Random Number
Number (nearest whole number)
39.95
40
2
74.448
74
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
40
0
Hall
Jane
Samantha
Female
.51
36
74
0
Scampion
Stephanie
Female
.55
60
Boys:
==> (Ran#) x 106
Number of Tries
Random Number
Number (nearest whole number)
25.122
25
2
48.442
48
3
55.014
55
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
25
0
Chung
Jason
Male
.71
56
48
0
Hunt
Gareth
Barry
Male
.72
62
55
0
Kaura
Karan
Kaz
Male
.66
63
Year 11:
From year 11 I will pick two girls and two boys.
Girls:
==> (Ran#) x 86
Number of Tries
Random Number
Number (nearest whole number)
56.932
57
2
20.124
20
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
57
1
McCreadie
Billie
Crystal
Female
.63
38
20
1
Buyram
Dawn
Elizabeth
Female
.65
42
Boys:
==> (Ran#) x 84
Number of Tries
Random Number
Number (nearest whole number)
33.516
34
2
6.044
6
There was some rounding involved which would have introduced some bias into my results also some numbers were repeated which meant that I had to ignore it and redo it.
ID
Year Group
Surname
Forename 1
Forename 2
Gender
Height (m)
Weight (kg)
34
1
Hawkins
Tim
Male
.62
63
6
1
Cripp
Justin
Carl
Male
.67
50
Now that I have picked out the 30 students who I will investigate, I will group them together and I will present them in a data capture sheet.
Skewness:
Skewness is a measure of the asymmetry of the data around the sample mean. If skewness is negative, the data are spread out more to the left of the mean than to the right. If skewness is positive, the data are spread out more to the right. The skewness of the normal distribution (or any perfectly symmetric distribution) is zero. In my investigation I will use the skewness to see the strength of the normal distribution.
Data I will use:
I now gathered all my results into one table which will make it easier to group. I assigned new id's to each individual which would mean that if I ever need to refer to the student, writing his/her full name will take up unnecessary time. This new ID will make it easier to identify each individual.
New ID
ID
Surname
Forename
Forename 2
Sex
Height (m)
Weight (kg)
8
Carney
Esther
Female
.50
44
2
54
Higgins
Joanne
Alicia
Female
.50
45
3
69
Kelly
Jenifer
Fay
Female
.30
45
4
7
Bingh
Daniel
Male
.56
35
5
20
Bond
James
Sean
Male
.47
50
6
97
McKracken
Phil
Peter
Male
.58
48
7
19
Sharpe
Billy
Richard
Male
.43
41
8
40
Dom
Kate
Female
.59
50
9
67
Indera
Emily
Sophia
Female
.52
45
0
01
Neelam
Kate
Female
.45
81
1
47
Fahmed
Ali
Male
.61
48
2
53
Gore
Mike
John
Male
.63
56
3
66
Jarvel
Kenneth
Male
.66
46
4
73
Kevill
Dean
Michael
Male
.52
43
5
1
Bellfield
Janet
Female
.58
40
6
74
Jones
Sarah
Ann
Female
.53
40
7
75
Jones
Samantha
Louise
Female
.62
45
8
28
Smith
Anjelina
Louise
Female
.50
45
9
01
Simons
Jack
Male
.64
59
20
74
Laters
Richard
Tang
Male
.69
65
21
55
Huggard
Malcolm
Male
.52
52
22
40
Hall
Jane
Samantha
Female
.51
36
23
74
Scampion
Stephanie
Female
.55
60
24
25
Chung
Jason
Male
.71
56
25
48
Hunt
Gareth
Barry
Male
.72
62
26
55
Kaura
Karan
Kaz
Male
.66
63
27
57
McCreadie
Billie
Crystal
Female
.63
38
28
20
Buyram
Dawn
Elizabeth
Female
.65
42
29
34
Hawkins
Tim
Male
.62
63
30
6
Cripp
Justin
Carl
Male
.67
50
Height:
To analyse the height I will need to draw a 'data capture sheet' which will help me to group my data into intervals to draw a bar chart. From the bar chart I will refine it to draw a histogram.
I decided to range the groups 10cm apart as it is a smaller and simple scale.
Boys:
I will look at the boys and girls differently.
Height, h (cm)
Tally
30?h<140
40?h<150
50?h<160
60?h<170
70?h<180
Histogram:
Class Interval (CI)
Tally
Frequency (f)
Frequency Density ( f / CI)
Frequency Density x 10
30 ?h< 150
2
0.1
50 ?h< 155
2
0.4
4
55 ?h< 165
6
0.6
6
65 ?h< 180
6
0.4
4
I will draw a histogram as my data is continuous. The histogram will show me the spread of my data and will give me an idea as to what kind of data I am using.
Girls:
Height, h (cm)
Tally
Frequency
30?h<140
40?h<150
50?h<160
9
60?h<170
3
70?h<180
0
Histogram:
Class Interval (CI)
Tally
Frequency (f)
Frequency Density ( f / CI)
Frequency Density x 10
30 ?h< 150
2
0.1
50 ?h< 155
6
.2
2
55 ?h< 165
5
0.5
5
65 ?h< 180
0.066666667
6.666666667
Similarly I will draw a histogram for the girls to see what kind of spread there is and compare it to the boys.
Pie Chart for the Heights:
The pie charts show that the boys tend to be between 160 and 170 cm tall and the girls are about 150 to 160 cm tall. I displayed the results in pie charts as the data was continuous so I couldn't use a comparative bar chart.
The pie chart shows us that the modal height for the boys is 160 < h < 170 and the modal for the girls is 150 < h < 160.
To work out the mean of the boys and the mean of the boys or the girls you simply add up the total of the height and divide it by the sample.
Boys Girls:
New ID
Height (m)
4
.56
5
.47
6
.58
7
.43
1
.61
2
.63
3
.66
4
.52
9
.64
20
.69
21
.52
24
.71
25
.72
26
.66
29
.62
30
.67
Total
25.69
Frequency Polygon for Heights:
I drew a frequency polygon to compare the boys against the girls to compare the two. I also add the mixed to see how the boys and girls affect the mixed population.
Girls
Boys
Mixed
30?h<140
0
40?h<150
2
3
50?h<160
9
4
3
60?h<170
3
8
1
70?h<180
0
2
2
The frequency polygon shows that girls are shorter than boys. Girls tend to be about 150cm and boys are at about 166cm. However, there are a higher proportion of girls in the 150cm range than there are boys in the 166cm range. This suggests that the girls heights are a little more constant that.
Steam and Leaf Diagram for Heights:
Boys:
A steam and leaf diagram is useful at helping us to work out the median of data. It arranges it in order and it also resembles a bar chart if you turn it on its side.
Steam
Leaf
Frequency
30
0
40
3,7,
2
50
2,2,6,8,
4
60
,2,3,4,6,6,7,9
8
70
,2,
2
There is a total of 16 boys therefore the median is between the 8th and 9th number. There fore we find out the 8th number which is 162 and the 9th which is 163 and we add them and divide by 2. We get 162.5cm which is the median height for the boys. The steam and leaf diagram shows that there is a negative skewness as the bulk of the data is towards the end.
Girls:
Steam
Leaf
Frequency
30
0,
40
5,
50
0,0,0,1,2,3,5,8,9,
9
60
2,3,5,
3
70
0
In the sample there are 14 girls. This means the median lies in between the 7th and 8th reading. The 7th number is 152 and the 8th number is 153. If we find the average of the two we get 152.5 cm which is the median. The diagram shows that the skewness is almost zero as the bulk of the data is in the middle.
Range:
Boys:
The highest value for the boys was 172 cm and the lowest was 143 cm.
The range --> 172 - 143 = 29
Girls:
The highest value for the girls was 165cm and the lowest was 130.
The range --> 165 - 130 = 35
From the steam and leaf diagram I can work out the Interquartile range.
Boys:
There are a total of 16 boys so the first quartile is:
* 16/4 = 4th number
* The fourth number is 152cm
* The third quarter is:
* 16/4 = 4
* 4 x 3 = 12th number
* The 12th number is 167cm.
Therefore the Interquartile range is = Q3 - Q1
The Interquartile range is 15cm.
Girls:
Similarly for the girls the first quartile is between the third and fourth numbers which is 150cm and the third quartile is between the twelfth and thirteenth number.
Q1 -150cm
Q2 - 162.5cm Interquartile range is = 9cm
Averages:
Heights (cm)
Mean
Modal Class Interval
Median
Range
Boys
.60
60 - 170
62.5
29
Girls
.53
50 - 160
52.5
35
The mean, the mode and the median are all higher for the boys than the girls. This suggests that the boys were taller over all. For the boys, 50% of the boys are between 160 - 170 cm and the girls 64% of them are between 150 and 160 cm tall. The range is lower for the boys than the girls which tend to suggest that the boys were more consistent. The girls were spread over a bigger distance. The higher range for the girls tends to suggest that the girls varied more in their height. However, a bigger proportion of girls were in the modal class interval than the boys. This may be due to the fact that there were more boys than girls.
My sample was a total of 30 students from which 16 were boys and 14 were girls.
The median on the cumulative frequency may not be the same as it may not have been drawn accurately enough or the reading was not read off accurately.
I plotted the box and whisker diagram for the girls and boys to compare the results. For the boys I can see that the majority of the spread is towards the right hand side making it a positive skew. The girls is also a positive skew but is much closer to a normal distribution as it is more towards the centre of the scale.
Weight:
Now I will go on to analyse the weight. I will first group the weights together and then analyse them.
Boys:
Weight, W (kg)
Tally
Frequency
30 ?w < 40
40 ?w < 50
5
50 ?w < 60
6
60 ?w < 70
4
70 ?w < 80
0
80 ?w < 90
0
Histogram:
Class Interval (CI)
Tally
Frequency (f)
Frequency Density ( f / CI)
Frequency Density x 10
30 ?w< 40
0.1
40 ?w< 60
1
0.55
5.5
60 ?w< 90
4
0.13
.3
For the weight I will also draw a histogram as the weight is also continuous data.
Girls:
Weight, W (kg)
Tally
Frequency
30 ?w < 40
2
40 ?w < 50
9
50 ?w < 60
60 ?w < 70
70 ?w < 80
0
80 ?w < 90
Histogram:
Class Interval (CI)
Tally
Frequency (f)
Frequency Density ( f / CI)
Frequency Density x 10
30 ?w< 45
5
0.33333333
3.333333
45 ?w< 50
6
.2
2
50 ?w< 90
3
0.075
0.75
Similarly for the girls I will draw a histogram to see how my data looks.
Pie Chats to show spread of Weights:
I will use a pie chart to show the spread of the boys and girls weights over the respected groups. I once again cannot use a comparative bar chart as I have a different number of the boys and a different number of girls.
Boys: Girls:
From the pie charts I can conclude that a bigger proportion of the boys are between 50 and 60 kilograms. The girls have over 50% between 40 and 50 kilograms.
Therefore, the modal class interval for the boys is between 50 < w < 60 kilograms and the modal class interval for the girls is 40 < w < 50 kilograms.
I then worked out the mean weight of the girls and the mean weight of the boys. I simply added up the weights for the girls and divided by the number of girls in the sample (14) and did the same for the boys but divided by 16 as there are more boys in the sample.
Boys: Girls:
New ID
Height (m)
4
35
5
50
6
48
7
41
1
48
2
56
3
46
4
43
9
59
20
65
21
52
24
56
25
62
26
63
29
63
30
50
Total
837.00
Frequency Polygon for Weights:
Girls
Boys
Mixed
30 ?w < 40
2
3
40 ?w < 50
9
5
4
50 ?w < 60
6
7
60 ?w < 70
4
5
70 ?w < 80
0
0
0
80 ?w < 90
0
The frequency polygon shows that generally the boys weigh more than the girls. There is more of a spread of the boys than that of the girls, but there is one girl which is probably an exception to this. The boys' weights are mainly around 55kilograms whereas the girls are mainly around the 45 kilogram mark. To me, the frequency polygon suggests that the boys weights are more constant than the girls weights.
Steam and Leaf Diagrams for Weights:
Boys:
I will draw a steam and leaf diagram to make it easier to find the median of the data.
Steam
Leaf
Frequency
30
5,
40
,3,6,8,8
5
50
0,0,2,6,6,9,
6
60
2,3,3,5,
4
70
0
80
0
There is a total of 16 boys which means the median is between the 8th and 9th number. The 8th number is 50 and the 9th number is 52. If we add them and divide by two, the median is 51kg.
Girls:
Steam
Leaf
Frequency
30
6,8
2
40
0,0,2,4,5,5,5,5,5
9
50
0,
60
0,
70
0
80
,
In my sample there are 14 girls. Therefore the median lies in between the 7th and 8th number. In this case the 7th number is 45 and the 8th number is also 45. Therefore, if we add the two and divide by two, we get the median to be 45kg.
Range:
Boys:
The highest value for the boys was 65 kg and the lowest was 35 kg.
The range --> 65 - 35 = 30
Girls:
The highest value for the girls was 81 kg and the lowest was 36 kg.
The range --> 81 - 36 = 45
Similarly to the height I will use the steam and leaf to find the Interquartile range.
Boys: Interquartile range = 16cm
Girls: Interquartile range = 7.5cm
Averages:
Weights (kg)
Mean
Modal Class Interval
Median
Range
Boys
52.3125
50 - 60
51
30
Girls
46.86
40 - 50
45
45
The mean, the mode and the median are all higher for the boys than the girls again for the weights. This suggests that overall the boys weighed more than the girls. As science has told us that boys develop more muscles and tend to weigh more than girls. However, there is an exception. There is one girl which weighs 81 kg. About, 38% of the boys lie in the modal class interval and about 56% of the girls lie in the modal class interval. This suggests that the boys were evenly spread and the girls were not as evenly spread. The girls range is higher than the boys by 15. This shows that the boys' weights were constant and the girls' weights were not as constant. My sample was a total of 30 students and from which 16 were boys and 14 were girls. From this I was expecting the boys to overwhelm the girls in all fields. The cumulative frequency shows a different median as the graph may not be accurate enough or perhaps the reading was unclear to read of as accurately.
For the box and whisker I will use the median from the steam and leaf as that seems to be much more accurate. The box and whisker diagram showed there is a negative skew as the concentration is more towards the left hand side.
Extending the Investigation:
I will now look at the height and the weight of the boys and girls together. I will then calculate a formula to link them.
Height:
Height, h (cm)
Tally
Frequency
30?h<140
40?h<150
3
50?h<160
3
60?h<170
1
70?h<180
2
Histogram:
Class Interval (CI)
Tally
Frequency (f)
Frequency Density ( f / CI)
Frequency Density x 10
30 ?h< 145
2
0.1333333
.333333
45 ?h< 155
0
0
55 ?h< 160
6
.2
2
60 ?h< 165
6
.2
2
65 ?h< 180
6
0.4
4
On the histogram I drew the mean the median marked in different colours. I did this then I drew the standard deviations to see how close the height is to a normal distribution.
One form of the mathematical model is an equation for the frequency distribution, in which the number of observations, or values, is assumed to be infinite:
where e, which is the base of natural logarithms, is approximately 2.7, and y represents the frequency of the value x. For simplicity, values of x have been chosen such that the mean value of x equals 0, and the variance equals 1. In more general cases the equation is slightly more complicated. The graph of this equation is the bell-shaped curve called the normal, or Gaussian, probability curve.
Mean:
New ID
Height (m)
.50
2
.50
3
.30
4
.56
5
.47
6
.58
7
.43
8
.59
9
.52
0
.45
1
.61
2
.63
3
.66
4
.52
5
.58
6
.53
7
.62
8
.50
9
.64
20
.69
21
.52
22
.51
23
.55
24
.71
25
.72
26
.66
27
.63
28
.65
29
.62
30
.67
47.12
I will now work out the standard deviation of my data. I am doing this to check how close the distribution is to a 'normal distribution'.
According to A-level books, the normal distribution is equal to:
[ mean + 1 S/d] => 68%
This means that the mean plus one standard deviation and the mean minus one standard deviation should fill 68% of the histogram to give a normal distribution.
Also:
[mean + 2 S/d] => 95%
This means that mean plus 2 standard deviations and the mean minus two standard deviations should fill 95% of the histogram.
Standard deviation of the height:
New ID
Height (m)
Height (m) ²
.50
2.25
2
.50
2.25
3
.30
.69
4
.56
2.43
5
.47
2.16
6
.58
2.50
7
.43
2.04
8
.59
2.53
9
.52
2.31
0
.45
2.10
1
.61
2.59
2
.63
2.66
3
.66
2.76
4
.52
2.31
5
.58
2.50
6
.53
2.34
7
.62
2.62
8
.50
2.25
9
.64
2.69
20
.69
2.86
21
.52
2.31
22
.51
2.28
23
.55
2.40
24
.71
2.92
25
.72
2.96
26
.66
2.76
27
.63
2.66
28
.65
2.72
29
.62
2.62
30
.67
2.79
47.12
74.26
One Standard Deviation:
Out of a total of 30 there were 23.6 people that filled the area. This is a total in percentage of 79%. This is higher than a normal distribution.
Two Standard Deviations:
Out of a total of 30 there were 29.74 people which filled the area. This was almost all of it and this expressed as a percentage is 99%. This is higher than a normal distribution.
Weight:
Weight, w (kg)
Tally
Frequency
30 ? h < 40
3
40 ? h < 50
4
50 ? h < 60
7
60 ? h < 70
5
70 ? h < 80
0
80 ? h < 90
Histogram:
Class Interval (CI)
Tally
Frequency (f)
Frequency Density ( f / CI)
Frequency Density x 10
30 ? w < 45
9
0.6
6
45 ? w < 50
8
.6
6
50 ? w < 60
7
0.7
7
60 ? w < 90
6
0.2
2
On the histogram I drew the mean the median marked in different colours. I did this then I drew the standard deviations to see how close the height is to a normal distribution.
Mean:
New ID
Weight (kg)
44
2
45
3
45
4
35
5
50
6
48
7
41
8
50
9
45
0
81
1
48
2
56
3
46
4
43
5
40
6
40
7
45
8
45
9
59
20
65
21
52
22
36
23
60
24
56
25
62
26
63
27
38
28
42
29
63
30
50
493.00
I will now work out the standard deviation of my data. I am doing this to check how close the distribution is to a 'normal distribution'.
According to A-level books, the normal distribution is equal to:
[ mean + 1 S/d] => 68%
This means that the mean plus one standard deviation and the mean minus one standard deviation should fill 68% of the histogram to give a normal distribution.
Also:
[mean + 2 S/d] => 95%
This means that mean plus 2 standard deviations and the mean minus two standard deviations should fill 95% of the histogram.
Standard deviation of the Weight:
New ID
Weight (kg)
Weight (kg) ²
44
936.00
2
45
2025.00
3
45
2025.00
4
35
225.00
5
50
2500.00
6
48
2304.00
7
41
681.00
8
50
2500.00
9
45
2025.00
0
81
6561.00
1
48
2304.00
2
56
3136.00
3
46
2116.00
4
43
849.00
5
40
600.00
6
40
600.00
7
45
2025.00
8
45
2025.00
9
59
3481.00
20
65
4225.00
21
52
2704.00
22
36
296.00
23
60
3600.00
24
56
3136.00
25
62
3844.00
26
63
3969.00
27
38
444.00
28
42
764.00
29
63
3969.00
30
50
2500.00
493.00
77369.00
One Standard Deviation:
Out of a total of 30 there were 18.6 people that filled the area. This is a total in percentage of 62%. This is lower than a normal distribution.
Two Standard Deviations:
Out of a total of 30 there were 26 people which filled the area. This was almost all of it and this expressed as a percentage is 87%. This is lower than a normal distribution.
Height vs. Weight:
I will now compare the height and the weight together to devise a formula. This will hopefully give me a relationship between the height and the weight of students at Mayfield High School.
I drew a table to show the results I am going to plot.
As I found earlier that the boys tend to be slightly taller than the girls. I decided to investigate further the boys and the girls separately. I did this so I can devise a formula which links the boys weights and their heights and a separate one which links the girls heights and their weights.
Boys:
This is the spread of the points for the boys. I will now draw a h on w regression line (purple) and a w on h regression line (blue). I will also mark the centroid with a circle around it.
I drew the scatter diagram on the next page with the y on x regression line and the x on y regression line. Below is the working for the equations of the lines.
Y on X : y = mx + c
C = - 41
Mx = y1 / x1
Mx = 29 / 0.5
Mx = 58
Y = 58x - 41
If I rearrange that I get:
X = (y +41) / 58
I will use the PMCC to check the strength of the correlation.
PMCC:
Height
Weight
.56
35
.47
50
.58
48
.43
41
.61
48
.63
56
.66
46
.52
43
.64
59
.69
65
.52
52
.71
56
.72
62
.66
63
.62
63
.67
50
25.69
837.00
Product Moment Correlation Coefficient:
r = product moment correlation coefficient.
r = Shw
(Sw²) x (Sh²)
The correlation coefficient will lie between -1 < r < 1.
Where Sh² = ?H² - (h`²)
n
Where Sw² = ?W² - (w`²)
n
Where Swh = ?WH - (w`h`)
I worked out the standard deviations on the calculator.
Sh = 3.862
Sw = 8.505
Swh = 26.06
r = Shw
(Sw²) x (Sh²)
= 0.802
.
The PMCC (product moment correlation coefficient) is 0.802 which suggests there is quite a strong positive correlation.
Girls:
The girls scatter diagram looks like this. One point seems to be off. There seems to be a negative correlation. When I drew the h on w regression line (blue) and the w on h regression line (purple) it looked like this.
The equations for the regression lines are:
y-on-x Regression Line: y=-33.49x+98.11
x-on-y Regression Line: x=-0.002014y+1.625
I will use the PMCC to check the strength of the correlation.
PMCC:
Height
Weight
.50
44
.50
45
.30
45
.59
50
.52
45
.45
81
.58
40
.53
40
.62
45
.50
45
.51
36
.55
60
.63
38
.65
42
21.43
656.00
Boys and Girls:
New ID
Height (m)
Weight (kg)
.50
44
2
.50
45
3
.30
45
4
.56
35
5
.47
50
6
.58
48
7
.43
41
8
.59
50
9
.52
45
0
.45
81
1
.61
48
2
.63
56
3
.66
46
4
.52
43
5
.58
40
6
.53
40
7
.62
45
8
.50
45
9
.64
59
20
.69
65
21
.52
52
22
.51
36
23
.55
60
24
.71
56
25
.72
62
26
.66
63
27
.63
38
28
.65
42
29
.62
63
30
.67
50
I will plot the results for the girls and the boys on a scatter diagram to see what kind of correlation I get. I will then look at the correlation in more detail.
My results do not show much of a relationship. Therefore I plotted the line of best fit to
Once I had plotted the line of best fit it gave me a much more clear indication of the relationship between the height and the weight of students. It showed that as the height increased the weight also increased.
The formula for this line is: w=26.8h+7.677
From past experiences I have seen that the line of best fit is not a good way to predict other heights against weights with one of the factors missing. I did some research into an A-level text book and found the x on y and the y on x regression lines. These lines account for the error between the line of best fit and the points both vertically and horizontally. It does this separately which means the line is much more accurate.
The regression h on w line was plotted in purple and the formula was: h =0.002212w + 1.461
The regression w on h line was plotted in blue and the formula was: w =26.8h + 7.677
Where the median of the height and the weight meet is called the centroid. This is also where the h on w regression line and the w on h regression line cross.
I can now use the w on h regression line to predict values for the weight from the height or I could use the h on w regression line to predict the values for the heights from the weight.
Height (m)
Weight (kg)
.50
44
.50
45
.30
45
.56
35
.47
50
.58
48
.43
41
.59
50
.52
45
.45
81
.61
48
.63
56
.66
46
.52
43
.58
40
.53
40
.62
45
.50
45
.64
59
.69
65
.52
52
.51
36
.55
60
.71
56
.72
62
.66
63
.63
38
.65
42
.62
63
.67
50
Total: 47.12
493.00
Conclusion:
From my investigation of Mayfield High School, I had decided to investigate the relationship between the height and the weight of the pupils at this particular school. I took a stratified sample which took into account the class sizes, the number of boys and girls in proportion to the entire school and I carefully selected at random thirty students in total.
I had first looked at the boys and the girls separately to see what kind of data I was handling and to see whether there was a different relationship for the girls than the boys. The mean height for the boys was 160 cm, the modal class interval was 160 - 170cm and the median is 162.5. About 50% of the boys lie in the modal class interval. There is a positive skew for the height. For the girls, the mean is 153cm, the modal class interval is 150 - 160 cm and the median is 152.5cm. About 64% of the girls lie in the modal class interval and there is also a positive skew for the girls' height. It is much closer to a normal distribution than the boys as it is closer to the centre of the scale and the mean and the median are much closer together.
I then looked at the weights for the boys and the girls. The mean for the boys' weight is 52.3125 kg's, the modal class interval is 50 - 60 kg and the median is 51 kg. About 38 % of the boys lie in the mode which shows that the boys' weights were a lot more spread out. The girls mean is 46.86 kg's, the modal class is 40 - 50 kg's and the median is 45kg's. About 64% of the girls lie in the modal class interval. The skewness of the boys' weights is negative and the skewness for the girls is also negative. The boys' weight is closer to a normal distribution as the girls weights are predominately to the left hand side.
I then extended the investigation by looking at the boys and girls heights and weights together. The modal height was 150 - 160 cm and 43% of the students lie in this class interval. The modal class interval for the weight is 40 - 50 kg's and 47% of the students fall into this group.
I then looked to devise a formula which would link the boys' weights and their heights and the same for the girls. For the boys I got a relatively positive correlation. I then drew two lines of best fits which were the x on y regression line and the y on x regression line.
The formula for the:
H-on-W Regression Line: W=0.006162H+1.283
W-on-H Regression Line: W=64.96H-51.98
These are the formulae which link the boys' heights and the weights.
The PMCC shows the strength of the correlation and in the boys' case it was quite strong.
I then looked at the relationship of the girls. The scatter diagram I plotted showed there was a negative correlation.
The equations for the regression lines are:
y-on-x Regression Line: y=-33.49x+98.11
x-on-y Regression Line: x=-0.002014y+1.625
I looked at the strength of the correlation which showed that it was a fairly weak negative correlation (-0.2597).
I then looked at the overall relationship. I did this last as I had noticed from my earlier results that the boys and girls had quite a huge difference so this would have affected the results. Therefore I wanted separate equations for boys and girls then to have an overall one to estimate generally.
h =0.002212w + 1.461
w =26.8h + 7.677
There was generally a positive correlation but it was quite weak. I calculated the PMCC to see how strong the correlation was (= 0.2311).
I had predicted earlier in my investigation that the height and the weight are directly proportional. My results agree with my prediction. However, the girls study contradicts this but on the whole it agrees. I will now investigate where and why this turn happens with girls by looking at each year individually.
Looking at Each Year individually:
In this section I will look at each year individually to see whether there is a turn on the results. For this I will need to take another sample as the number of students from each year in my previous sample is too small. I will take 10 girls and 10 boys from each year. I chose ten as it is quite a large sample and will enable me to plot an accurate scatter diagram.
I again used the random number button on the calculator to pick 10 boys and then 10 girls from each year.
Year 7:
Girls:
Year
Surname
Forename 1
Forename 2
Gender
Height
Weight
7
Higgins
Joanne
Alicia
Female
.50
45
7
Jones
Sarah
Ann
Female
.59
49
7
Green
Emily
Joanne
Female
.64
40
7
Smith
Mary
Ann
Female
.62
40
7
Benjamin
Emma
Veronica
Female
.63
45
7
Corrie
Rachel
Amy
Female
.50
40
7
Harding
Tanya
Female
.55
40
7
Cook
Melissa
Female
.62
49
7
Croft
Trisha
Female
.54
40
7
Ormsby
Rebecca
Ann
Female
.52
52
w = 248.8h - 336.9
h = 0.0008147w + 1.527
Number of points, n: 10
Mean, x: 1.571
Mean, y: 54
Standard Deviation, x: 0.05243
Standard Deviation, y: 28.98
Correlation Coeff, r: 0.4502
I plotted the scatter diagram for the girls in year 7 and there it shows that as the height increased the weight of the girls also increased. This shows that the girls at year 7 are following the pattern that would normally be expected. This is also true as in year 7 girls tend to be less worried about their appearance and they have not reached puberty as yet.
Boys:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
7
Carney
Jonathan
Alan
Male
.55
50
7
Punnu
Aded
Male
.65
69
7
Austin
Steven
Male
.54
43
7
Afsal
Oliver
Fred
Male
.55
53
7
Langly
Shane
Male
.50
40
7
Pearce
Stuart
Male
.50
34
7
Sharpe
Billy
Richard
Male
.43
41
7
Brown
Ben
Male
.58
40
7
Seedat
Sajeed
Robert
Male
.65
46
7
Henderson
Paul
Stephen
Male
.42
40
w = 78.73h - 75.4
h = 0.004984w + 1.31
Number of points, n: 10
Mean, x: 1.537
Mean, y: 45.6
Standard Deviation, x: 0.07457
Standard Deviation, y: 9.372
Correlation Coeff, r: 0.6264
The boys in year 7 had a positive relationship. The weight increased as the height increased this is similar to the girls as the boys have not reached the age of puberty as yet. This is what I would have expected in year 7 as they are still relatively young.
Year 8:
I will now look at year 8 to see if the relationship changes here.
Girls:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
8
Kelly
Vicky
Bethany
Female
.65
52
8
Hall
Laura
Female
.44
49
8
Anderson
Kylie
Jane
Female
.52
52
8
Moloney
Erin
Nazia
Female
.62
51
8
Smith
Susan
Sandra
Female
.68
52
8
Giles
Nichole
Caron
Female
.45
43
8
Meert
Maureen
Margaret
Female
.62
46
8
Dixon
Ayeshia
Female
.55
60
8
Healy
Emily
Elizabeth
Female
.67
52
8
Latimer
Jade
Louise
Female
.57
52
y = 15.43x + 26.57
x = 0.005797y + 1.282
Number of points, n: 10
Mean, x: 1.577
Mean, y: 50.9
Standard Deviation, x: 0.08198
Standard Deviation, y: 4.23
Correlation Coeff, r: 0.2991
The height vs. the weight for the girls in year 8 is shows there is still a positive correlation but it is not as strong as it was in year 7. This suggests that the girls are starting to become aware of their appearance.
Boys:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
8
Bath
Arthur
Gordon
Male
.52
52
8
Tong
David
James
Male
.65
51
8
Morris
James
William
Male
.65
35
8
Cullin
Adam
Shane
Male
.52
45
8
Paine
Charlie
Bob
Male
.44
50
8
Boye
Jay
Male
.52
60
8
Winters
Andy
Lucithen
Male
.63
55
8
Johnson
Wayne
Leon
Male
.50
49
8
Wilson
Christopher
Philip
Male
.53
32
8
Dobson
Anthony
Male
.60
60
Number of points, n: 10
Mean, x: 1.556
Mean, y: 48.9
Standard Deviation, x: 0.068
Standard Deviation, y: 8.904
Correlation Coeff, r: -0.03865
For the boys in year 8 it shows no correlation. This is quite surprising as I would have expected a similar trend to year 7. This may be down to the fact I was unlucky with my results or simply that there is a big spread of data in year 8.
Year 9:
Girls:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
9
Rogerson
Claire
Female
.6
42
9
Cireen
Claire
Female
.62
55
9
Dixon
Mary
Zeoy
Female
.49
52
9
Brown
Chantelle
Margaret
Female
.8
62
9
Taylor
Donna
Leigh
Female
.76
60
9
Bowlker
Amna
Female
.35
51
9
Violet
Kate
May
Female
.52
50
9
Bellfield
Janet
Female
.58
40
9
Power
Donna
Louise
Female
.66
45
9
Yates
Christine
Anne
Female
.64
42
h = 0.006096w + 1.298
w = 21.05h + 16.17
Number of points, n: 10
Mean, x: 1.602
Mean, y: 49.9
Standard Deviation, x: 0.1235
Standard Deviation, y: 7.259
Correlation Coeff, r: 0.3583
The girls in year 9 seem to have a stronger positive correlation than in year 8. This is the year in which I would have expected to see some sort of change in the relationship. However, this particular school may be different and the change may come in year 10 or year 11 if there is a change.
Boys:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
9
Glenn
Edward
Male
.48
40
9
John
Simon
Male
.56
60
9
Ether
Paul
David
Male
.53
45
9
Samas
Sam
Male
.48
43
9
Abejuro
Herman
Male
.60
60
9
Bowy
Jake
Male
.54
44
9
Burgess
John
Male
.80
64
9
Booth
Gary
Male
.73
52
9
Saturn
Perry
Male
.72
62
9
Snelgrove
Jonathan
James
Male
.62
42
h = 0.008235w + 1.184
w = 60.03h - 45.21
Number of points, n: 10
Mean, x: 1.606
Mean, y: 51.2
Standard Deviation, x: 0.105
Standard Deviation, y: 8.964
Correlation Coeff, r: 0.7031
The boys' relationship in year 9 is a much more strong positive correlation. This suggests that the boys are now reaching the puberty stage and their muscles in their body are beginning to develop this will increase the weight. They are also growing in height at the same time.
Year 10:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
0
Dickson
Amy
Ruth
Female
.75
56
0
Hamilton
Jo
Female
.73
65
0
Anderson
Taz
Female
.80
60
0
Lee
Canaice
Female
.56
45
0
Brown
Emily
Jayne
Female
.62
54
0
Hughes
Donna
Louise
Female
.66
45
0
Razwana
Tahira
Female
.62
46
0
Slater
Sara
Female
.60
50
0
Bullock
Janice
Maria
Female
.72
51
0
Bhatti
Sadia
Female
.62
48
h = 0.008505w + 1.226
w = 64.31h - 55.26
Number of points, n: 10
Mean, x: 1.668
Mean, y: 52
Standard Deviation, x: 0.07346
Standard Deviation, y: 6.387
Correlation Coeff, r: 0.7395
The girls in year 10 showed a positive correlation. The weight increased as the height increased. In year 10 girls begin to go into puberty and many become more conscious about their appearance. This set of results however does not show this for this particular school.
Boys:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
0
Stallin
Joseph
Male
.84
62
0
Scundrick
Andrew
Edward
Male
.74
80
0
Tison
Wilfred
Andrew
Male
.80
72
0
Honda
Pablo
Male
.85
62
0
Grant
Michael
Paul
Male
.74
64
0
Lock
Lee
James
Male
.50
50
0
Lambert
Daniel
Male
.77
80
0
Ray
Adam
Daniel
Male
.80
40
0
Agha
Shohaib
Male
.66
70
0
Stallin
Joseph
Male
.84
62
w = 17.14h + 34.14
h = 0.001255w + 1.673
Number of points, n: 10
Mean, x: 1.754
Mean, y: 64.2
Standard Deviation, x: 0.1011
Standard Deviation, y: 11.81
Correlation Coeff, r: 0.1467
The boys in year 10 show generally a positive correlation. This is what I would have expected as at this age boys tend to be growing much more both in height and their muscles are developing more as well.
Year 11:
Finally year 11 this is where I would have predicted the relationship to change for the girls.
Girls:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
1
Wilson
Charlene
Astley
Female
.65
48
1
Potter
Tara
Female
.78
55
1
Grace
Davina
Female
.65
54
1
Briggs
Sarah
Louise
Female
.63
48
1
Berry
Shelly
Female
.68
54
1
Grot
June
Leah
Female
.60
48
1
O'Donall
Megan
Gwenda
Female
.33
55
1
Khan
Adila
Female
.63
48
1
Alsam
Samia
Female
.55
36
1
Heap
Louise
Stephanie
Female
.80
42
w = -4.539h + 56.2
h = -0.002008w + 1.728
Number of points, n: 10
Mean, x: 1.63
Mean, y: 48.8
Standard Deviation, x: 0.1233
Standard Deviation, y: 5.862
Correlation Coeff, r: -0.09548
The graph shows that the girls' correlation has turned negative in year 11. The weight tends to decrease as they get taller. This may be due to that girls in year 11 become much more conscious of their appearance and that they may be in adolescence. Up to this point the girls height against the weights have had a positive correlation but this is where the relationship changes.
Boys:
Year
Surname
Forname 1
Forname 2
Gender
Height
Weight
1
Vincent
Nigel
Barry
Male
.8
62
1
Rottecth
Amine
Reggie
Male
.61
42
1
Heath
Malcom
John
Male
.75
68
1
Lewis
James
Adam
Male
.68
56
1
Nickholas
Danny
Michael
Male
.52
38
1
Black
Kevin
Male
.86
56
1
Curtis
david
John
Male
.78
67
1
Major
William
Brian
Male
.8
68
1
Hughes
Mark
Male
.65
58
1
Chinny
Anthony
Norton
Male
.62
56
h = 0.007945w + 1.253
w = 73.24h - 67.92
Number of points, n: 10
Mean, x: 1.707
Mean, y: 57.1
Standard Deviation, x: 0.1019
Standard Deviation, y: 9.782
Correlation Coeff, r: 0.7628
There is a positive correlation for the boys in year 11 which is as I had expected as boys in year 11 have develop muscles and they have grown in height. Also, the are now about 16 years old and are allowed joining gyms and therefore the development of muscles. Boys are not as conscious about their weight as the girls are.
From my further investigation into which year the weight vs. height relationship changes the boys tend to remain at a proportional growth whereas the girls change to an inversely proportional growth in year 11 in Mayfield High School. This is due to physical changes boys and girls go through during puberty and adolescence and boys and girls becoming more aware of their physical appearance.
Evaluation:
My investigation into Mayfield High School, which was based on a real school's data, was successful. I believe that I got fairly accurate results and I was able to define a relationship between the height and the weight of the boys and girls quite clearly. I was able to find when girls and boys become more conscious about their physical appearance at this particular school.
However, if I wanted to find a much more clear relationship I should have taken a much bigger sample. I took a sample of thirty to begin with then I took 20 from each year. I could have also taken a few schools in this area and to compare it with schools in other areas. This may give me information as to some natural or industrial factors which may delay the growth of girls and boys or which may enhance the growth of the two.
Also, bias was introduced in my project wherever I rounded numbers. I tried to minimise this as much as possible by limiting the number of times I rounded and to how many decimal places.