How the mass and height of the pupils differ from each other in different year groups. Hypotheses:
Maths Coursework
PLANNING
Introduction:
This piece of coursework that is going to be studied is about Data Handling. The data I will be looking at will be from Mayfield High. There is a whole database of information available and I will be looking at two specific pieces of data. When examining this set of data I will then be able to make various hypotheses to investigate my project with. I will then draw conclusions on the data from my results. I will also draw graphs that are relevant to prove my point. The two pieces of data that I will be investigating are:
> How the mass and height of the pupils differ from each other in different year groups.
Hypotheses:
There will be a main theme for my main hypotheses, which will be split up into three hypotheses. The main theme however will be height and mass.
) Boys are taller than girls in Year 7 and 8.
Aim: I will show this data by, firstly, plotting all the heights of boys and girls on separate so I can compare the heights of all the pupils and can easily see the spread of data. Then I will calculate the average height of the boys and girls to clearly see the taller gender. Furthermore, to prove my hypothesis I will plot a cumulative frequency graph with a box and whisker diagram with boys and girls on the same graph. This will show the quartiles of the data, the spread of the data and the median. I will also use standard deviation, which will show how much the data deviates from the mean. This will again show the spread of data.
2) Boys are heavier than girls in KS4
Aim: To show this data I will just use a histogram on which I will plot both the weights of the females and males. This will show me the spread of data and in which weight interval most the girls and boys will fall into. Therefore I can easily tell which gender is the heavier. I will also use standard deviation for the same reason as in hypothesis 1.
3) Height and weight are positively correlated in Year 7.
Aim: I will show this data by using a scatter diagram with a line of best fit. This will represent this data best because it will show the correlation clearly. This is the only hypothesis where a scatter graph can be used and I will use 30 points on the graph, which will ensure maximum reliability. Also, to explain the correlation given by the graph I will include Spearman's Rank.
Method:
I will start with a huge database of information on Mayfield High. The first step I will take is to sort the data. This will involve choosing the variables I want to investigate. The two I am going to choose is weight and height. All the other data apart from the names, years and gender will be hidden so I will be left with a clear table of height, weight, gender, year and name. Once the data is sorted I will then take my sample. But before I sample the data I will go through the data and highlight any anomalies in each year group as they will affect my results. However I won't omit them I will just highlight so I am aware of them because omitting them will ruin my sample numbers. As I am selectively sampling my data I may come across an anomaly and have to include it in my data. I will sample my data using two methods of sampling; the first will stratified sampling. This is when a certain percentage of a group is chosen in relation to its size. It also involves putting the data into groups called strata. I will need 10% of whole data and I will put it into year groups which will be my strata. I will then take an even number of samples from each year group in relation to its size. However I will then selectively sample the strata groups. This is when every nth term is chosen. The reason I will choose this as my method is because I will get a relatively even number of males and females. This is essential as it is key to my hypothesis as I am studying the differences between boys and girls.
Description of data:
There are many different types of data; each one is a different way of collecting information we have. The data that will be collected in this investigation will be quantitative data. This is data that consists of numbers. Weights are an example of quantitative data. However quantitative data can either be continuous or discrete. Discrete data can only take particular values. For example, you can buy shoes in exact sizes (6, 6 1/2, 7, 7 1/2 etc.). These values are discrete and there are no values in between them, so discrete data has an exact value.
Continuous data, on the other hand, can take any value. For example, your foot could be 18cm or 20cm or even anywhere in between these two values. Continuous data cannot be measured exactly. The accuracy of a measurement depends upon the device you measure with.
Therefore weight and height can take any value and are therefore examples of quantitative continuous data.
However the data collected from the Mayfield High database is secondary data. This is because we didn't actually collect the data ourselves. But I am going collect my primary data by using the RGS database.
Actual Data representation:
Total number of people in data = 1,182.
Therefore the number I need to obtain in my sample is 10% of this, which are 118 people.
Below is a table to represent the different strata and the number of pupils I need to take from each:
Year Group
Number of students
Size of sample 10%
Rounded number of pupils
7
281
28.1
28
8
269
26.9
27
9
260
26.0
26
0
99
9.9
20
1
69
6.9
7
Now I have my strata I have taken a 10% sample of each and ended up with the number to take from each group and a total of 118 pupils. Now I have to selectively sample each stratum. I selectively sampled each year group by choosing every 10th person.
Evaluation of data, its accuracy and potential problems:
There are not many problems with the method used to collect the data. The only real problem I can foresee is the anomalies providing inaccuracies in my graphs.
Also the data collected is secondary data and therefore there is no real way of knowing if it is reliable or not as I did not collect it myself.
There are no signs of any obvious bias either within the data I collected however it is hard to tell as the only way to really find out is to determine who produced the data. However we do know that the data was collected using the whole of Mayfield High School and therefore there shouldn't be any bias in the data.
However the data does show many signs of inaccuracy. Whilst sorting the data there are many pupils in each year that just do not fit the pattern. For example there are some people who in Year 7 have a height of 2.00 metres and some people who weigh 110 kg. These are ridiculously out of the pattern and are therefore anomalies. I overcame this by sifting through the data and highlighting the anomalies to make sure I was aware of them. Apart from these few inaccuracies the data on the whole is very accurate and a good source to sample from. The few limitations are of the method is that a 10% sample size may provide us with only a small number of pupils in each year. This will reduce the overall reliability of our results.
Patterns within the data:
it is clear to see from looking at the pupil samples taken that the higher the age group the heavier and taller the pupils are. However my hypotheses relate to the gender and the basic trend that appears is the height and weight seem to increase in the same year from girls to boys, i.e. boys are taller and heavier than girls in each year. This is the basic trend that can be seen. For example in Year 7, the shortest girl is 1.3m and the shortest boy is 1.41m. Also in Year 8 the tallest girl is 1.71m whereas the tallest boy is 2.00m. Therefore by just looking quickly at the data and picking out tallest and shortest, this is already proof that my hypotheses are half true.
For weight, the heaviest female in Year 10 is 60kg whereas the heaviest male is 68kg. This again is further proof that my hypotheses are becoming correct.
So the general trends and patterns that can be picked up simply by looking at the tables are that; boys are generally taller and heavier than girls in the same age group.
ANALYSIS
Graphs, observations and calculations:
All the graphs that will be drawn will be relevant to the data. They will show how the height and weight of girls and boys vary in each year. All the results that are coloured in red will be omitted from the calculations but included in the graphs. The reason for omitting them from the calculations is because they will provide us with inaccurate results. The anomalies were dealt with by being left out of the mean calculations but were included in the graphs.
Hypothesis 1 - Boys are taller than girls in KS3:
Calculations:
Mean Height (m)
Modal Height (m)
Range
Year 7 Girls
.51
.48, 1.52 and 1.6
.8 - 1.3 = 0.5
Year 7 Boys
.54
.5, 1.51 and 1.65
.65 - 1.41 = 0.24
Year 8 Girls
.59
.42, 1.59, 1.62 and1.7
.71 - 1.42 = 0.29
Year 8 Boys
.62
.5, 1.55, 1.68 and 1.72
2.0 - 1.3 = 0.7
Year 9 Girls
.61
...
This is a preview of the whole essay
Hypothesis 1 - Boys are taller than girls in KS3:
Calculations:
Mean Height (m)
Modal Height (m)
Range
Year 7 Girls
.51
.48, 1.52 and 1.6
.8 - 1.3 = 0.5
Year 7 Boys
.54
.5, 1.51 and 1.65
.65 - 1.41 = 0.24
Year 8 Girls
.59
.42, 1.59, 1.62 and1.7
.71 - 1.42 = 0.29
Year 8 Boys
.62
.5, 1.55, 1.68 and 1.72
2.0 - 1.3 = 0.7
Year 9 Girls
.61
.55, 1.64 and 1.7
.71 - 1.4 = 0.31
Year 9 Boys
.64
No modal height
.85 - 1.32 = 0.53
These calculations have been calculated so it is easier to see the averages I will be using for this hypothesis. The mean shows clearly which gender in each year group has a taller average height and the range has been calculated to show the spread of data. Also the modal height has been used to help further prove my hypothesis.
YEAR 7:
Comparison:
The reason for including these two graphs is because it shows the actual heights of every person in the year, one showing all the male heights and the other showing female heights. It helps show the spread of data.
The two graphs drawn show the different heights of boys and girls in Year 7. By finding the mean it shows which gender is taller. The boys have a mean height of 1.54, taller than the girls, who average at 1.51. This is shown by the blue line.
There is a lot more spread though in the graph for girls, whereas the boys are all relatively the same height, meaning the range is small. The range for the boys' is from 1.41 to 1.65, whereas the girls' is from 1.30 to 1.8. All this data can be summed up and shown clearly by drawing a cumulative frequency graph with a box and whisker diagram shown below. The tables below show the plot table for the cumulative frequency diagram and also show standard deviation.
Table of Values of Boys (Males):
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 0.2 0.1 0.2 0 0
0.2 § x < 0.4 0.3 0.2 0 0
0.4 § x < 0.6 0.5 0.2 0 0
0.6 § x < 0.8 0.7 0.2 0 0
0.8 § x < 1 0.9 0.2 0 0
§ x < 1.2 1.1 0.2 0 0
.2 § x < 1.4 1.3 0.2 0 0
.4 § x < 1.6 1.5 0.2 11 11
.6 § x < 1.8 1.7 0.2 4 15
.8 § x < 2 1.9 0.2 0 15
Öf = 15
Öfx = 23.3
Öfx² = 36.31
Mean = 1.553
Standard Deviation = 0.08844
Table of Values of Girls (Females):
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 0.2 0.1 0.2 0 0
0.2 § x < 0.4 0.3 0.2 0 0
0.4 § x < 0.6 0.5 0.2 0 0
0.6 § x < 0.8 0.7 0.2 0 0
0.8 § x < 1 0.9 0.2 0 0
§ x < 1.2 1.1 0.2 0 0
.2 § x < 1.4 1.3 0.2 1 1
.4 § x < 1.6 1.5 0.2 7 8
.6 § x < 1.8 1.7 0.2 3 11
.8 § x < 2 1.9 0.2 1 12
Öf = 12
Öfx = 18.8
Öfx² = 29.72
Mean = 1.567
Standard Deviation = 0.1491
The standard deviation for each gender was calculated. The reason it was calculated was because it shows how far away the data is from the mean. From the table it can be seen that the standard deviation is larger for boys than it is for the girls. This means that there is more spread in the data for the boys. As seen below from the box and whisker diagram there is a larger spread in data for the girls and the boys' heights are concentrated in one small area. Therefore the standard deviation should be smaller for the boys because the larger the deviation, the further the data is from the mean. The standard deviation for the boys is 0.08844, which means there is hardly any spread in the data for the boys and all the results are very close to the mean. For the girls however, the standard deviation is 0.1491, which means there is a lot more spread in their data than the boys and their results are as far as 14cm from the mean.
This is a cumulative frequency diagram with box and whisker diagrams plotted onto it. It shows the heights of the boys and girls. This means that I can see which gender is taller and also see the spread of data by the quartiles. The purple line is the boys and the red line is the girls. This cumulative frequency diagram generally shows that boys are taller than girls. It shows this because it shows that there are more boys whose height is between 1.7 and 2m. That means there are a larger proportion of boys who are taller than girls. Also it shows that there are more girls who are between 1.3 and 1.4m than boys.
The box and whisker diagram shows the median and the quartiles of then data. The yellow box shows girls and the green box shows boys. The girls' box and whisker diagram shows that the two extremes of the data i.e. the range is from 1.3 to 1.8. However for the boys there is a lot less spread and their range is from 1.41 to 1.65. This large range in the girls is down to an anomalous result. It also shows that the heights of girls are concentrated mainly between 1.46 and 1.6m, whereas the height of the boys is mainly concentrated between 1.5 and 1.6m. Therefore, this shows that more boys are in the taller category than girls.
Therefore the cumulative frequency diagram, box and whisker diagrams and standard deviation show that:
Boys are taller than Girls in Year 7
YEAR 8
Comparison:
The graphs showing the height of the girls has a small range as the results are all close to each other whereas the boys have more spread. The tallest male is 2m whereas the tallest girl is 1.7m. The range for the girls is from 1.42 to 1.7m and the range of the boys is from 1.5 to 2m. Therefore this proves that the boys are generally taller than the girls without finding the mean. However just to make sure the mean is also found and shows that the average height of girls to be 1.59 compared to an average height of 1.62 for the boys. The graph below shows this clearly:
Table of Values of Boys (Males):
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 0.2 0.1 0.2 0 0
0.2 § x < 0.4 0.3 0.2 0 0
0.4 § x < 0.6 0.5 0.2 0 0
0.6 § x < 0.8 0.7 0.2 0 0
0.8 § x < 1 0.9 0.2 0 0
§ x < 1.2 1.1 0.2 0 0
.2 § x < 1.4 1.3 0.2 1 1
.4 § x < 1.6 1.5 0.2 7 8
.6 § x < 1.8 1.7 0.2 5 13
.8 § x < 2 1.9 0.2 1 14
Öf = 14
Öfx = 22.2
Öfx² = 35.5
Mean = 1.586
Standard Deviation = 0.1457
Table of Values of Girls (Females):
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 0.1 0.05 0.1 0 0
0.1 § x < 0.2 0.15 0.1 0 0
0.2 § x < 0.3 0.25 0.1 0 0
0.3 § x < 0.4 0.35 0.1 0 0
0.4 § x < 0.5 0.45 0.1 0 0
0.5 § x < 0.6 0.55 0.1 0 0
0.6 § x < 0.7 0.65 0.1 0 0
0.7 § x < 0.8 0.75 0.1 0 0
0.8 § x < 0.9 0.85 0.1 0 0
0.9 § x < 1 0.95 0.1 0 0
§ x < 1.1 1.05 0.1 0 0
.1 § x < 1.2 1.15 0.1 0 0
.2 § x < 1.3 1.25 0.1 0 0
.3 § x < 1.4 1.35 0.1 0 0
.4 § x < 1.5 1.45 0.1 2 2
.5 § x < 1.6 1.55 0.1 4 6
.6 § x < 1.7 1.65 0.1 3 9
.7 § x < 1.8 1.75 0.1 3 12
.8 § x < 1.9 1.85 0.1 0 12
.9 § x < 2 1.95 0.1 0 12
Öf = 12
Öfx = 19.3
Öfx² = 31.17
Mean = 1.608
Standard Deviation = 0.1037
The standard deviation tables above show the spread of data and how far away the results are from the mean. It can be seen from the box and whisker that the boys' data is more spread and the girls this time are more concentrated around the mean. The deviation for the boys is 0.1457. This is rather large and means that the boys are spread out more over a larger range. This can be seen clearly on the box plot. The girls' deviation from the mean is a low 0.1037 and this means they are all concentrated around the mean.
The cumulative frequency graph shows again that more boys are in the taller categories. For example the purple line, that represents boys, always has a larger frequency than the red line, which represents girls. There are higher frequencies of boys who are between 1.6 and 2m than there are girls.
The box and whisker diagram shows that there is a lot more spread in the data for boys and hardly any spread for girls. The girls range is 1.42 to 1.71m whereas the boys range is from 1.32 to 2m. Also most of the heights of the girls are concentrated between 1.51 to 1.68m; however the boys' are concentrated between 1.55 to 1.72. This means that there are there are more boys in the taller height categories. Also the standard deviation matches the box and whisker diagrams showing a larger deviation for the male box plot and a small deviation for the girls.
Therefore, further proving my hypothesis the graphs for year 8 show:
Boys are taller than Girls in Year 8
Hypothesis 2 - Boys are heavier than girls in KS4:
Calculations:
Mean Weight (kg)
Modal Weight (kg)
Range
Year 10 Girls
45.2
36, 42 and 45
60 - 36 = 24
Year 10 Boys
58.2
57 and 68
68 - 40 = 28
Year 11 Girls
53.6
42, 48 and 60
65 - 42 = 23
Year 11 Boys
56.4
60 and 63
76 - 38 = 38
Again a table of averages has been calculated to prove my hypothesis. It is very clear to see in the table which gender has the larger average weight. The modal weight will show what weights most occur in the pupils and will help prove my hypothesis and the range shows the spread of the data for each gender in Years 10 and 11.
YEAR 10:
To prove my hypothesis, I have drawn a histogram, which will tell me between what weights most girls and boys are situated. Below are the tables used to construct the histogram:
Table of Values of Histogram [Females]:
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 10 5 10 0 0
0 § x < 20 15 10 0 0
20 § x < 30 25 10 0 0
30 § x < 40 35 10 2 2
40 § x < 50 45 10 4 6
50 § x < 60 55 10 2 8
60 § x < 70 65 10 1 9
70 § x < 80 75 10 0 9
80 § x < 90 85 10 0 9
90 § x < 100 95 10 0 9
Öf = 9
Öfx = 425
Öfx² = 2.083E+004
Mean = 47.22
Standard Deviation = 9.162
Table of Values of Histogram [Males]:
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 10 5 10 0 0
0 § x < 20 15 10 0 0
20 § x < 30 25 10 0 0
30 § x < 40 35 10 0 0
40 § x < 50 45 10 1 1
50 § x < 60 55 10 6 6
60 § x < 70 65 10 4 11
70 § x < 80 75 10 0 11
80 § x < 90 85 10 0 11
90 § x < 100 95 10 0 11
Öf = 11
Öfx = 645
Öfx² = 3.828E+004
Mean = 58.64
Standard Deviation = 6.428
The boys' average weight is higher but their standard deviation is lower. It shows that the boys' weight is closely spread around the mean and deviates at most 6kg away from the mean. However the girls on the other show a large spread with their data and their results at most deviate near 10kg from the mean. This means there is a larger spread in the data for the girls.
Weight
Frequency of boys
Frequency of girls
30 to 40
0
2
40 to 50
4
50 to 60
6
2
60 to 70
4
The histogram above shows the weight intervals and the frequency of girls and boys in these intervals. In this case all the intervals are the same (10kg) so the rectangles have the same width. My reasoning for choosing a histogram to present this data is because I can see between what intervals most boys and girls are between. This will help me to find out the concentration of data for each gender, thus proving the heavier gender and in all leading up to proving my hypothesis.
The histogram shows boys in yellow and girls in blue. The range can also be seen to be larger on the histogram for the girls who range from 30 to 70kg whereas the boys range from 40 to 70kg. The bars for girls take up more intervals on the histogram proving this. To further prove this the standard deviation calculated and explained above also shows this larger range in girls.
The histogram also shows that boys are heavier than girls because it there are a larger number of boys who fall in the 50 to 60kg interval. It shows that most boys come in the 50-60kg interval and also a large number of boys that fall in the 60-70kg interval. However most the girls fall into the 40-50kg interval, but only a small number of boys come into this category. Therefore this means that the boys have larger weights than the girls.
Boys are heavier than girls in Year 10
YEAR 11:
Table of Values of Histogram [Females]:
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 10 5 10 0 0
0 § x < 20 15 10 0 0
20 § x < 30 25 10 0 0
30 § x < 40 35 10 0 0
40 § x < 50 45 10 4 4
50 § x < 60 55 10 0 4
60 § x < 70 65 10 4 8
70 § x < 80 75 10 0 8
80 § x < 90 85 10 0 8
90 § x < 100 95 10 0 8
Öf = 8
Öfx = 440
Öfx² = 2.5E+004
Mean = 55
Standard Deviation = 10
Table of Values of Histogram [Males]:
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq.
0 § x < 10 5 10 0 0
0 § x < 20 15 10 0 0
20 § x < 30 25 10 0 0
30 § x < 40 35 10 1 1
40 § x < 50 45 10 1 2
50 § x < 60 55 10 3 5
60 § x < 70 65 10 3 8
70 § x < 80 75 10 1 9
80 § x < 90 85 10 0 9
90 § x < 100 95 10 0 9
Öf = 9
Öfx = 515
Öfx² = 3.063E+004
Mean = 57.2
Standard Deviation = 11.33
The standard deviation this time shows that the boys have a larger spread than the girls. The deviation is larger for boys who deviate 11kg from the mean at most, whereas the girls deviate a maximum of 10kg from the mean.
Weight
Frequency of boys
Frequency of girls
30 to 40
0
40 to 50
4
50 to 60
3
0
60 to 70
3
4
70 to 80
0
Again my reasoning for choosing a histogram to represent this data is because it shows me the heavier by gender by showing me which intervals most boys and girls fall into, which will further prove my hypothesis.
This histogram shows that there is a lot more spread in the male weights, which range from 30 to 80kg. This has also been proved by calculating the standard deviation, which also shows the males have a larger spread as they deviate further from the mean than the girls do. The histogram also shows that there are four females who fall in the 40 to 50kg interval and 4 females that come in the 60 to 70kg interval. However this may cause people to assume that girls are automatically heavier but there are a more boys that weigh between 50 to 80 kg than there are girls. There are 7 boys who weigh between 50 and 80kg but there are only 4 girls who fall in this category. Therefore even though the male data is a lot more spread out the histogram still shows that:
Boys are heavier than girls in Year 11
Hypothesis 3 - Height and Weight are positively correlated:
For this hypothesis I am going to choose only one year to represent and prove my hypothesis. I will choose the year with the largest amount of pupils as the scatter graph will be most accurate with a higher number of points. This year group is Year 7.
This graph proves my third hypothesis as the line is positively correlated. It is also a strong positive correlation and consolidates my hypothesis. Most the points are scattered in roughly the same area and very close to each other apart from one anomaly which is circled. This number of close points means that the data is accurate as all the points are close to the line of best fit.
I will know use Spearman's Rank to see whether the correlation of the line is strong.
The centroid is the centre point of the data and is shown with the green box.
Variable A (Height of pupils (m))
Rank
Variable B (Weight of pupils (kg))
Rank
Difference in ranks (d)
d2
.3
28
36
26
2
4
.41
27
50
5
22
484
.43
26
45
0
6
256
.43
25
41
22
3
9
.46
24
45
1
3
69
.47
23
44
7
6
36
.48
22
42
20
2
4
.48
21
40
23
-2
4
.49
20
43
9
.5
9
41
21
-2
4
.5
8
51
3
5
225
.51
7
45
2
5
25
.51
6
39
24
-8
64
.52
5
45
3
2
4
.52
4
33
28
-14
96
.52
3
47
9
4
6
.53
2
45
4
-2
4
.54
1
48
6
5
25
.57
0
45
5
-5
25
.59
9
47
8
.6
8
50
4
4
6
.6
7
45
6
-9
81
.6
6
38
25
-19
361
.62
5
48
7
-2
4
.65
4
43
8
-14
96
.65
3
69
2
.65
2
35
27
-25
625
.8
10
0
0
Sum of d2 = 2840
Spearman Rank (rho) = 1 -
Therefore this shows an inaccurate, poor positive correlation. This could have occurred for many reasons. The closer the number is to 1 the more accurate the correlation.
EVALUATION
Summary:
Hypothesis 1 - Boys are taller than girls in Year 7 and 8.
Aim: I will show this data by, firstly, plotting all the heights of boys and girls on separate so I can compare the heights of all the pupils and can easily see the spread of data. Then I will calculate the average height of the boys and girls to clearly see the taller gender. Furthermore, to prove my hypothesis I will plot a cumulative frequency graph with a box and whisker diagram with boys and girls on the same graph. This will show the quartiles of the data, the spread of the data and the median.
I was able to prove this by carrying out my aims for this hypothesis. I plotted two excel graphs for each year, male and female, to show all the pupils and to see the height of each pupil. I then condensed all of this data into a cumulative frequency graph with a box and whisker diagram on it with male and female diagrams. The cumulative frequency diagram proved the boys were taller than the girls and the sample taken for Year 7 provided accurate results. However when looking at the yellow box plot which refers to girls the range is rather large and the tallest female is 1.8m when the mean height for the girls in Year 7 is 1.51m. Therefore this result is an anomaly and caused the box and whisker diagram to produce a large range. The anomalous result is shown below:
7
Thompson
Bethany
Jane
Female
.80
10
If this result had been taken out then the box and whisker diagram would have a much smaller range. Despite this though the cumulative frequency diagram and box and whisker diagrams produced accurate results and clearly showed that boys are taller than girls in Year 7. The mean was also calculated to find the averages of the boys and girls height. The mean height for the boys was 1.54m, taller than the girls, who average at 1.51m. The girls however were unusually more spread out to the boys but this came from the anomaly above. Though overall the boys were taller than the girls in Year 7.
To further prove my hypothesis I decided to check it with another year group. The cumulative frequency diagram again showed that boys were the taller gender. The range this time was larger for the boys, who had more spread in the data and most of the heights were concentrated around a higher height than the girls. The boys also had a larger average height at 1.62m whereas the girls had an average height of 1.59m. Again the large range for the boys was clear to see on the box plot, whereby the boys' maximum height went to 2m. This is an anomaly:
8
Vegeta
Goku
Krillain
Male
2.00
35
Without this anomaly the range would not have been so large for the boys but the heights were still largely concentrated in on one area. Therefore again the boys are the taller gender.
Therefore overall the results obtained in relation to Hypothesis 1 were accurate enough to provide us with enough proof that boys are taller than girls in Years 7 and 8. There were though two anomalies that provided inaccuracies within the data and with the exclusion of these the data would have been very reliable. The results though remain accurate enough to prove the hypothesis.
Hypothesis 2 - Boys are heavier than girls in KS4
Aim: To show this data I will just use a histogram on which I will plot both the weights of the females and males. This will show me the spread of data and in which weight interval most the girls and boys will fall into. Therefore I can easily tell which gender is the heavier.
Again I was able to prove my hypothesis by carrying out the exact aims above. The histograms used showed clearly the heavier gender. In Year 10 there were a lot more males that weighed between the higher weight intervals than the girls meaning the boys were heavier than the girls. The mean weight for the boys was 58.2kg, a lot larger than the mean weight for the girls, which was 45.2kg. Further proof that the boys were heavier than the girls was the modal weight for each gender. The girls' modal weights were 36, 42 and 45kg whereas most of the boys weighed 57 and 68kg. Therefore there were more boys who weighed more and most of the girls weighed in at lower weights than the boys. The spread of data was also larger for the boys as well, who had the heaviest person in the year and the girls contained the smallest. That sums up that the boys are heavier than the girls in Year 10.
I decided again to double check that my hypothesis was correct by studying another age group. Year 11 data again helped further prove my hypothesis as it showed again that boys are heavier than girls. However the histogram didn't provide obvious signs that boys were a lot heavier. It showed that the girls weighed between 40 to 70kg whereas the boys weighed between 30 to 80kg. Therefore there was a lot more spread in the boys data, however most of the boys weighed between 50 and 80kg and exactly half of the girls weighed between 40 and 50kg. This tells us that the boys on the whole were heavier than the girls in Year 11. The average weight though for each gender was as follows; for males it was 56.4kg and for the girls it was 53.6kg. Therefore the boys were heavier than the girls but not by a very large margin. Again the modal weight can summarise the data perfectly. The modal weights for the girls were 42, 48 and 60kg and for the boys the modal weight was 60 and 63kg, larger than the modal weights of the girls.
Therefore having summarised the results and graphs the results obtained in relation to hypothesis were very accurate. There wasn't any real anomalies obtained in this set of data and the results provided excellent results which clearly proved my hypothesis.
Hypothesis 3 - Height and weight are positively correlated.
Aim: I will show this data by using a scatter diagram with a line of best fit. This will represent this data best because it will show the correlation clearly. This is the only hypothesis where a scatter graph can be used and I will use 30 points on the graph, which will ensure maximum reliability. Also, to explain the correlation given by the graph I will include Spearman's Rank.
This was a relatively short hypothesis and was proved by using a scatter graph. The scatter graph shows a positive correlation and proves my hypothesis. However when using Spearman's Rank to explain the correlation it showed a very weak positive correlation. In my view this was to do with ranking the data, as many of the weights were the same it was hard to differentiate which weight to rank first. This could have interfered with the calculations making it inaccurate and unreliable. Therefore my third hypothesis proved my hypothesis but was not as reliable as the other two.
Conclusions:
I have proved all three of my hypotheses by using various techniques including averages and graphs. Therefore after summarising all three hypotheses I conclude:
* Hypothesis 1 - This provided accurate results and subsequently accurate with the exception of a few anomalies. However these didn't stop my results from being reliable and I am able to link back to my prediction and conclude that boys are taller than girls in Years 7 and 8.
* Hypothesis 2 - This was the hypothesis that produced the most accurate and reliable data. There were no real exceptions to the patterns provided and the pattern was clearly, boys are heavier than girls in KS4.
* Hypothesis 3 - This was the only hypothesis that was slightly unreliable. A positive correlation was obtained in the end but when tested with Spearman's Rank a very poor correlation was calculated. The general trend though showed a positive correlation between height and weight.
Overall effectiveness of method:
Overall the data that was collected and sampled provided us with very accurate results. I say this because I was able to prove each my hypothesis with very accurate results. My predictions made were matched when it came to proving them by using cumulative frequency diagrams, box and whisker plots, histograms and a scatter graph. The patterns in the data showed that boys were taller than girls in the years studied, boys were heavier than girls in the years studied and also that the height and weight were positively correlated. Exceptions to the patterns were because of anomalies, which were highlighted but not taken out as this would have interfered with our sample size. There were only a few problems with the method which provided a few limitations but on the whole provided us with enough accuracy to prove all my hypotheses.
Limitations:
* A small sample size of only 10% limited the amount of data that we had and this consequently reduced our accuracy.
* We only limited ourselves to study one school, Mayfield High. If we studied more across the country, it would have made our project a lot more effective and reliable. This would have enabled us to check if these patterns worked across the country.
Improvements:
There are many ways in which in which the method could be improved to provide us with better result s and more accurate results. Firstly, a more accurate method of sampling could have been used. The anomalies that occurred provided us with inaccuracies in our graphs and by not only highlighting these but by removing the anomalies from the sample list our results would have been more accurate.
Another huge improvement could have been to increase our sample size. We only took a 10% sample and this left us with small numbers for each year group. By increasing this sample to about double, so 20% we would have had a lot more pupils to present data with. This will have lead to an increased reliability.
To refine our project we could have also used control groups and pre-sampling.
Significance of statistical results:
The statistical results were very significant in proving my first two hypotheses. Before drawing any graphs the statistics needed to be there and by glancing at these it could be seen which gender was either heavier or taller. The mean or average was key to knowing which gender was taller or heavier on average and the modal weight also further backed up the hypotheses. However also crucial was the standard deviation which clearly showed the spread of data. Therefore the statistical results were key in backing up my graphs and helped in proving my first two hypotheses.
Practical consequences of work:
From writing this project I have learnt invaluable mathematic skills such as different sampling techniques and how to sort data efficiently. I have learnt how to take an even sample and how to prove you hypotheses mathematically. Using cumulative frequency diagrams, box and whisker plots and histograms has increased my mathematical as has the inclusion of standard deviation and Spearman's Rank.
Extension:
In this section I will collect my primary data this time proving whether age has an influence on the height and weight. This data is data I have collected myself. The data used from Mayfield was secondary data as I did not collect it.
The primary data I collect will again be sampled and this time I will carry out a 10% sample again of the overall data. This will give me a sample number of 18 pupils.
83
83
77
34
58
45
34
25
48
37
51
52
40
60
51
38
54
45
39
54
38
44
41
35
42
52
49
43
51
42
37
52
34
86
86
68
83
77
59
83
85
77
87
70
50
86
57
50
93
76
63
91
84
65
With the data above I will prove that the age has a big influence on the height. To do this I will use Spearman's Rank to see what correlation I come out with.
Age Rank Height Rank Difference d2
93
8
76
3
5
25
91
7
84
6
87
6
70
2
4
6
86
5
86
8
-3
9
86
4
57
9
5
25
83
3
83
5
-2
4
83
2
77
4
-2
4
83
1
85
7
-6
36
44
0
41
2
8
64
43
9
51
4
5
25
42
8
52
5
3
9
40
7
60
1
-4
6
39
6
54
8
-2
4
38
5
54
7
-2
4
37
4
51
3
37
3
52
6
-3
9
34
2
58
0
-8
64
34
25
0
0
Total
316
Therefore this proves my hypothesis for my extension as this number is close to 1 and therefore it produces a strong positive correlation. This means that as height increases the age increases or as the age increases the height increases. That proves my hypothesis.
Appendix
Sample of Pupils:
7
Black
Sarah
Female
.48
42
7
Casey
Danielle
Female
.46
45
7
Cullen
Sarah
Female
.57
45
7
Earnshaw
Kayleigh
Louise
Female
.48
40
7
Harrison
Julie
Jane
Female
.30
36
7
Hunt
Melissa
Ann
Female
.65
43
7
Kelly
Danielle
Female
.47
44
7
Lloyd
Leanne
Female
.52
45
7
Minton
Jennifer
Female
.60
50
7
Parreen
Saika
Female
.60
45
7
Sing
Asha
Female
.43
45
7
Thompson
Bethany
Jane
Female
.80
10
7
Wing
Jane
Chow
Female
.52
33
7
Andrew
Sohail
Farooq
Male
.50
41
7
Bolton
Paul
Male
.65
69
7
Calins
Luke
Male
.60
38
7
Connaugton
Nick
Michael
Male
.62
48
7
Dickinson
Ben
Male
.49
43
7
Hardy
Rhys
Male
.51
45
7
Jebron
Aysham
Male
.59
47
7
Lewis
Declon
Male
.54
48
7
Marriot
Jospheh
David
Male
.50
51
7
McMillan
William
James
Male
.65
35
7
Patel
Sean
Wasim
Male
.52
47
7
Sharpe
Billy
Richard
Male
.43
41
7
SUBRA
Kamber
Male
.41
50
7
Vanvo
Ho
Male
.51
39
7
Winstanley
Paul
Male
.53
45
8
Barlow
Billie
Female
.62
49
8
Bolton
Tracy
Katie
Female
.63
45
8
Connify
Elizabeth
Claire
Female
.59
38
8
Dom
Kate
Female
.59
50
8
Hall
Gina
Louise
Female
.42
50
8
Hest
Louise
Sarah
Female
.70
45
8
Janeadoni
Natalie
Female
.70
43
8
Kim
Sannita
Female
.62
46
8
McDonald
Kath
Sarah
Female
.71
40
8
Nedpod
Jane
Maria
Female
.51
38
8
Right
Emma
Louise
Female
.51
48
8
Swali
Millias
Karam
Female
.42
29
8
Alfred
Anthony
Male
.72
51
8
Bolard
Mike
Male
.56
59
8
Carpenter
Daniel
Bradley
Male
.55
51
8
Cropper
Caio
Male
.50
45
8
Drayton
Benjamin
Male
.55
43
8
Greig
Richard
John
Male
.68
45
8
Irving
Bradley
Male
.80
57
8
Kite
Jaike
Male
.54
42
8
Marsh
Warren
Anthony
Male
.72
57
8
Moore
Robert
Lee
Male
.32
35
8
Phil
Faheem
Male
.57
51
8
Shane
Paul
Male
.50
50
8
Strange
Frank
Fred
Male
.68
69
8
Vegeta
Goku
Krillain
Male
2.00
35
8
Wordsly
Kyle
Male
.60
43
9
Barlow
Sandra
Jane
Female
.64
55
9
Brians
Holly
Female
.63
47
9
Caby
Karen
Erin
Female
.55
66
9
Clough
Samantha
Louise
Female
.71
54
9
Gordon
Cicila
Ruby
Female
.7
47
9
Hardy
Ingrid
Female
.64
40
9
Johnson
Claire
Nichola
Female
.6
46
9
Khan
Humspira
Female
.69
48
9
Martin
Lisa
Female
.70
49
9
O'Donald
Louise
Krisila
Female
.55
36
9
Read
Louise
Emma
Female
.51
48
9
Silverstone
Julie
Margaret
Female
.59
50
9
Taylor
Elizabeth
Ann
Female
.40
41
9
Williams
Ashley
Charlene
Female
.65
48
9
Anakin
Pauya
Male
.65
45
9
Boyo
Jon
Male
.47
42
9
Crawford
Jamie
David
Male
.80
51
9
Fenton
Aaron
Mark
Male
.85
55
9
Hardy
Jeff
Matthew
Male
.57
45
9
Hunt
Gareth
Blake
Male
.61
62
9
Jones
Brian
Simon
Male
.32
38
9
Luckler
Ryan
Male
.81
68
9
Mozin
Warren
Randell
Male
.72
57
9
Samas
Sam
Male
.48
43
9
Suggat
Bob
Male
.70
50
9
Wilson
David
Daniel
Male
.71
44
0
Blashaw
Holly
Female
.73
51
0
Cell
Jill
Female
.47
45
0
Fawn
Attoosa
Female
.73
45
0
Hall
Jane
Samantha
Female
.51
36
0
Kelson
Nina
Leilah
Female
.80
60
0
Mitchelle
Kiran
Female
.58
36
0
Riley
Charlotte
Female
.62
42
0
Slater
Sara
Female
.60
50
0
Turner
Sarah
Female
.39
42
0
Arnold
Kevin
Male
.70
57
0
Brown
Thomas
Andrew
Male
.63
40
0
Collins
Alex
Matthew
Male
.55
57
0
Faraday
Micheal
Joseph
Male
.65
50
0
Honda
Pablo
Male
.85
62
0
King
Joseph
Male
.70
58
0
Little
Adam
Male
.67
60
0
Paul
Thomas
Male
.61
56
0
Saj
Shida
Raja
Male
.65
68
0
Smith
Saf
Male
.89
64
0
Walton
Kevin
Trevor
Male
.80
68
1
Berry
Shelly
Laura
Female
.73
64
1
Buyram
Dawn
Elizabeth
Female
.65
42
1
Feehily
Christina
Jean
Female
.72
60
1
Heap
Louise
Stephanie
Female
.80
42
1
Kelly
Sarah
Female
.55
60
1
McMillan
Collen
Jade
Female
.58
48
1
Protrochi
Ivanova
Sege
Female
.54
65
1
Thomson
Jade
Louise
Female
.52
48
1
Beck
William
Lewis
Male
.72
63
1
Chidgley
Steven
Male
.62
52
1
Downey
Colin
Clarke
Male
.68
50
1
Hawkins
Tim
Male
.62
63
1
Justice
Tony
Philip
Male
.67
60
1
Mamood
Keith
Norman
Male
.68
58
1
Olderson
Stuart
Martin
Male
.62
48
1
Singh
Norman
Murray
Male
.51
38
1
Warne
Michael
Barry
Male
.84
76
David Osborne Mayfield High Data Handling
- 1 -