# Analysing the height and weight of boys and girls in year 11 of Mayfield High School.

Extracts from this document...

Introduction

Statistics

Introduction

In this investigation I will be analysing the height and weight of boys and girls in year 11 of Mayfield High School.

I will use different statistical methods to examine carefully the trends and patterns that lie within this selected group of people. I must take a sufficient number of boys and girls to enable to draw various graphs of my results, and to write a firm conclusion to support them. The total number of students in year 11 is 170, however I will be taking 30 boys and 30 girls a total of 60 in all.

## The Data

Below I have drawn a table showing the heights and weights of the girls and boys in year 11. My reason for choosing to sample 30 boys and 30 girls is that the large number will provide me with a wide range of results for my analysis.

Females | Height | Weight | Boys | Height | Weight |

1. | 1.62 | 54 | 1. | 1.85 | 73 |

2. | 1.72 | 51 | 2. | 1.72 | 58 |

3. | 1.65 | 54 | 3. | 1.32 | 45 |

4. | 1.65 | 42 | 4. | 1.62 | 52 |

5. | 1.68 | 48 | 5. | 1.62 | 56 |

6. | 1.61 | 54 | 6. | 1.61 | 54 |

7. | 1.61 | 54 | 7. | 1.67 | 50 |

8. | 1.52 | 38 | 8. | 1.88 | 75 |

9. | 1.70 | 50 | 9. | 1.55 | 54 |

10. | 1.68 | 47 | 10. | 1.51 | 40 |

11. | 1.75 | 56 | 11. | 1.78 | 67 |

12. | 1.63 | 45 | 12. | 1.60 | 38 |

13. | 1.59 | 45 | 13. | 1.61 | 47 |

14. | 1.76 | 56 | 14. | 1.68 | 50 |

15. | 1.71 | 42 | 15. | 1.78 | 37 |

16. | 1.52 | 48 | 16. | 2.03 | 86 |

17. | 1.53 | 42 | 17. | 1.70 | 72 |

18. | 1.60 | 55 | 18. | 1.65 | 45 |

19. | 1.70 | 54 | 19. | 1.70 | 54 |

20. | 1.69 | 50 | 20. | 1.80 | 62 |

21. | 1.65 | 48 | 21. | 1.84 | 76 |

22. | 1.67 | 48 | 22. | 1.67 | 66 |

23. | 1.83 | 60 | 23. | 1.71 | 57 |

24. | 1.67 | 52 | 24. | 1.52 | 60 |

25. | 1.60 | 54 | 25. | 1.72 | 63 |

26. | 1.63 | 44 | 26. | 1.91 | 82 |

27. | 1.63 | 44 | 27. | 1.77 | 57 |

28. | 1.72 | 51 | 28. | 1.80 | 60 |

29. | 1.68 | 54 | 29. | 1.86 | 56 |

30. | 1.73 | 64 | 30. | 1.81 | 72 |

## Tally Charts

The first statistical method I have chosen to use is the tally chart. A tally chart indicating which weight and weight range the majority lie in (frequency), allows me to draw bar charts afterwards. I will do this 4 times in all: Weight and height for boys and weight and height for girls. Furthermore I will draw frequency tables using my tally chart data to show how many times a certain weight or height has occurred between the 30 boys and 30 girls. Below I have drawn these tally charts.

This tally chart shows the height distribution of the 30 boys, as you can see I have grouped the original data into class intervals. I have chosen intervals of 10 so that I have a sufficient number of intervals.

Middle

1.85

7

12.95

1.90-2.00

1.95

1

1.95

2.00-2.10

2.05

1

2.05

Total=35 Total=60.15

Mean= Total (mid-value x frequency)

Total Frequency

=60.15 =1.72

35

Modal group= class with the highest frequency

Modal group = 1.65

The mid-value of the modal class gives an approximate value for the mode.

Median Class= class which contains the middle value= 1.65

The mid-value of the median class gives an approximate value for the median.

Range= maximum value (upper boundary of highest class) – minimum value (lower boundary of lowest class)

Range = 2.10-1.30= 0.8

The data below shows the weight of the 30 boys

Class Interval | Mid-value | Frequency | Mid- value x Frequency |

30-40 | 35 | 3 | 105 |

40-50 | 45 | 6 | 270 |

50-60 | 55 | 13 | 715 |

60-70 | 65 | 6 | 390 |

70-80 | 75 | 5 | 375 |

80-90 | 85 | 1 | 85 |

Total= 34 Total= 1,940

Mean= Total (mid-value x frequency)

Total Frequency

= 1940= 57.06

34

Modal Group = 50-60 approximate mode = 55

Median group= 50-60 approximate median = 55

Range= 90-30 = 60

The data below shows the heights of the 30 girls

Class Interval | Mid-value | Frequency | Mid-value x Frequency |

1.50-1.60 | 1.55 | 6 | 9.3 |

1.60-1.70 | 1.65 | 19 | 31.35 |

1.70-1.80 | 1.75 | 8 | 14 |

1.80-1.90 | 1.85 | 1 | 1.85 |

Total= 34 Total= 56.5

Mean= Total (mid-value x frequency)

Total Frequency

Median group= 1.60-1.70 approximate median= 1.65

Modal group= 1.60-1.70 approximate mode= 1.65

Range= 1.90-1.50 = 0.4

The data below shows the weights of the 30 girls.

Class Interval | Mid-value | Frequency | Mid-value x Frequency |

30-40 | 35 | 1 | 35 |

40-50 | 45 | 14 | 630 |

50-60 | 55 | 16 | 880 |

60-70 | 65 | 2 | 130 |

Total = 33 Total = 1,675

Mean= Total (mid-value x frequency)

Total Frequency

= 1675 = 50.76

33

Median group = 40-50 approximate median = 45

Modal group = 50-60 approximate mode = 55

Range = 70- 30 = 40

As you can see from the cumulative frequency graphs I have labelled 3 parts in particular. The median, lower quartile, upper quartile each of which is explained below.

## Median

To find the median (middle number) of a set of data you would usually arrange the values in ascending numerical order and find the middle value. If n is the total number of values then the median is ½(n+1) value.

This suggest that to find median from a cumulative frequency curve you find ½(n+1) on the vertical axis (where n is the total frequency), draw a horizontal line to the curve and read off the corresponding value from the horizontal axis.

## Upper, Lower quartile and Interquartile range

Knowing the range of a frequency distribution only tells me the extreme values. To see how the data are distributed around the median, the range is divided into four quarters.

The value one quarter of the way from the lower end of the range is called the lower (or first quartile). The middle value or the second quartile is the median itself. The value three quarters of the way from the lower end of the range is called the upper (or third quartile).

If the total frequency, n, is large then the first quartile has cumulative frequency ¼n and the third quartile is at ¾n. If n I small then the first quartile is at ¼(n+1) and the third quartile is at ¾(n+1).

The difference between the lower and upper quartiles is called the interquartile range. In any frequency distribution half of the data lies within the interquartile range. This is a very useful way to measure the spread of a set of data, since it only includes the half of the data, which is closest to the median, and avoids distortions caused by unusually large or small values. There are a number of more accurate ways in which one can measure the spread of a set of data. One of the more precise ways is shown in the next section.

## Deviation from the mean

The distance of a value from the mean is called its deviation from the mean. Due to my data being grouped I will now use the original data to get an exact deviation from the mean rather than an estimate. First to find the mean I will add up all the heights for the 30 boys and divide by 30.

Height (x) | Mean () | Deviation (x-) |

1.85 | 1.47 | 0.38 |

1.72 | 1.47 | 0.25 |

1.32 | 1.47 | -0.15 |

1.62 | 1.47 | 0.15 |

1.62 | 1.47 | 0.15 |

1.61 | 1.47 | 0.14 |

1.67 | 1.47 | 0.2 |

1.88 | 1.47 | 0.41 |

1.55 | 1.47 | 0.08 |

1.51 | 1.47 | 0.04 |

1.78 | 1.47 | 0.31 |

1.60 | 1.47 | 0.13 |

1.61 | 1.47 | 0.14 |

1.68 | 1.47 | 0.21 |

1.78 | 1.47 | 0.31 |

2.03 | 1.47 | 0.56 |

1.70 | 1.47 | 0.23 |

1.65 | 1.47 | 0.18 |

1.70 | 1.47 | 0.23 |

1.80 | 1.47 | 0.33 |

1.84 | 1.47 | 0.37 |

1.67 | 1.47 | 0.2 |

1.71 | 1.47 | 0.24 |

1.52 | 1.47 | 0.05 |

1.72 | 1.47 | 0.25 |

1.91 | 1.47 | 0.44 |

1.77 | 1.47 | 0.3 |

1.80 | 1.47 | 0.33 |

1.86 | 1.47 | 0.39 |

1.81 | 1.47 | 0.34 |

Looking at the above table you can see that there are three columns one of which you may not understand as of yet. The third column is called the deviation. It is found by subtracting the mean () from each of the values in the height column (x).

## Mean Deviation

If you try and find the mean of the deviations in the usual way you will find that your answer equals zero.

Finding the mean of the deviations in this way takes into account, which side of the mean the values are (i.e. whether the deviation is positive or negative). But this is not necessary.

To make the values more useful just consider the size of each deviation and ignore the direction. This positive value is called the modulus (sometimes shortened to mod) and is written like this:

x-

The mean size of the deviation can now be calculated.

Mean Deviation = x-

N

The sign means sum of (add up) all the values. N is the number of values, which have been added.

Therefore the mean deviation of the heights of the 30 boys is: 0.38+0.25+-0.15+0.15+0.15+0.14+0.2+0.41+0.08+0.04+0.31+0.13+0.14+0.21+0.31+0.56+0.23+0.18+0.23+0.33+0.37+0.2+0.24+0.05+0.25+0.44+0.3+0.33+0.39+0.34 = 7.19

7.19 =0.24

30

So the average distance of the values from the mean is 0.24

## Variance

An alternative way to get positive values for the deviation from the mean is to square the deviation. The squares of the deviations can now be added and their mean value calculated. This mean of the squares of the deviations is called the variance.

Variance = (x-) ² Where n is the total number of values.

## N

Heights (x) | | (x-) | (x-) ² |

1.85 | 1.47 | 0.38 | 0.1444 |

1.72 | 1.47 | 0.25 | 0.0625 |

1.32 | 1.47 | -0.15 | 0.0225 |

1.62 | 1.47 | 0.15 | 0.0225 |

1.62 | 1.47 | 0.15 | 0.0225 |

1.61 | 1.47 | 0.14 | 0.0196 |

1.67 | 1.47 | 0.2 | 0.04 |

1.88 | 1.47 | 0.41 | 0.1681 |

1.55 | 1.47 | 0.08 | 0.0064 |

1.51 | 1.47 | 0.04 | 0.0016 |

1.78 | 1.47 | 0.31 | 0.0961 |

1.60 | 1.47 | 0.13 | 0.0169 |

1.61 | 1.47 | 0.14 | 0.0196 |

1.68 | 1.47 | 0.21 | 0.0441 |

1.78 | 1.47 | 0.31 | 0.0961 |

2.03 | 1.47 | 0.56 | 0.3136 |

1.70 | 1.47 | 0.23 | 0.0529 |

1.65 | 1.47 | 0.18 | 0.0324 |

1.70 | 1.47 | 0.23 | 0.0529 |

1.80 | 1.47 | 0.33 | 0.1089 |

1.84 | 1.47 | 0.37 | 0.1369 |

1.67 | 1.47 | 0.2 | 0.04 |

1.71 | 1.47 | 0.24 | 0.0576 |

1.52 | 1.47 | 0.05 | 0.0025 |

1.72 | 1.47 | 0.25 | 0.0625 |

1.91 | 1.47 | 0.44 | 0.1936 |

1.77 | 1.47 | 0.3 | 0.09 |

1.80 | 1.47 | 0.33 | 0.1089 |

1.86 | 1.47 | 0.39 | 0.1521 |

1.81 | 1.47 | 0.34 | 0.1156 |

Conclusion

## The number of boys in year 10 is 106.

As a proportion of the total of 1183 this is 106/1183 = 0.09 (to 2d.p).

My total sample is to be 100 so I would sample:

0.09 x 100 = 9 boys from year ten.

## The number of girls in year 10 is 94.

As a proportion of the total of 1183 this is 94/1183 = 0.08 (to 2d.p).

My total sample is to be 100 so I would sample:

0.08 x 100 = 8 girls from year ten.

## The number of boys in year 11 is 84.

As a proportion of the total of 1183 this is 84/1183 = 0.07 (to 2d.p).

My total sample is to be 100 so I would sample:

0.07 x 100 = 7 boys from year eleven.

## The number of girls in year 11is 86.

As a proportion of the total of 1183 this is 86/1183 = 0.07 (to 2d.p).

My total sample is to be 100 so I would sample:

0.07 x 100 = 7 girls from year eleven.

## Conclusion

I think that the statistical investigation turned out very successful. I was able to compare my results using scatter diagrams and I managed to find a standard deviation from the mean. I think that the methods I used have given me some strong results to comment on. After having done the scatter diagrams I immediately realised that the taller the children were the more they weighed, in most cases. The graphs showed a positive correlation. I think that most accurate statistical method I used was standard deviation. It showed quite clearly the measure of spread for the different data that I used. The next time I carry out an investigation such as this one, I think that I may use stratified sampling, as it is a very strong way of reading statistics. I would facilitate me to write a stronger conclusion than I have.

This student written piece of work is one of many that can be found in our GCSE Height and Weight of Pupils and other Mayfield High School investigations section.

## Found what you're looking for?

- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month