# Maths Data Handling

Extracts from this document...

Introduction

Introduction

The line of enquiry that I have chosen is ‘The relationship between height and weight’. To investigate this line of enquiry, I am using secondary data (that I will acquire from the internet) so that there will be no bias and unfairness that is obtained through the collection of primary data in questionnaires. The data that I am using is on a fictitious school, Mayfield High School, but the actual data has been obtained from a real school. This is useful as there will be five age groups that are considered in the whole investigation. However the age groups that lie below the age of 11 and age groups that lie over the age of 16 will not be thought of in the investigation as it stretches out of the age boundary in Mayfield High School.

There are 1183 pupils in Mayfield High School and I will be using the following pieces of data on each pupil: - year group, age, gender, height and weight. This means that I will have a total of 5915 datum points to work from. This is obviously too large so I will use a sampled piece of data of 100 pupils. Since I will be using stratified sampling I will need to know how many boys and girls there are in each year. The table below shows the exact figures.

Year | Girls | Boys | Total |

7 | 131 | 151 | 282 |

8 | 125 | 145 | 270 |

9 | 143 | 118 | 261 |

10 | 94 | 106 | 200 |

11 | 86 | 84 | 170 |

I will need this table throughout my investigation so that I can construct a stratified sample. This is because I will need to know how many girls and boys there are and the number of students in each year. This will enable me to construct a fair sample, where there will be proportionate numbers of students in the sample to the actual number of students in each year.

Middle

0, 0, 0, 1, 5, 5, 9

170

0, 0, 0, 3, 5, 5, 5, 5, 5, 5

0, 0

180

0, 5, 7

190

0

Conclusion

Height (cm) | Mean | Modal Class Interval | Median | Range |

Girls | 161.7 | 160-170 | 160 | 60 |

Boys | 163.6 | 160-170 | 162 | 70 |

In the sample, the mean and median height was a little higher for the boys than for the girls, but the modal height was the same for both with a class interval of 160-170 cm. The sample for boys was more stretched, with a range of 70 cm compared to 60 cm for the girls. The evidence from the sample suggests that 18 out of 51, or 35% of boys have a height between 160 and 170 cm, whilst 19 out of 49, or 39% of girls have a height within the same boundaries. The evidence shown in the table above suggests that the distribution of height between girls and boys is fairly similar. However, the frequency polygons and the histograms drawn for the boys and girls illustrate that there are more girls that have a height below 170 cm than boys, but more boys have a height above 170 cm than girls. Even though there are more girls than boys in the overall modal class interval, the sample demonstrates that in general, the height for boys is greater than the height for girls. This means that my prediction was correct in saying that boys are likely to be taller and heavier than girls.

Evaluation

I used many techniques in my pre-test. I used the frequency tables efficiently where I was able to find the mean from grouped data. After I had created the frequency tables, I produced four stem and leaf diagrams, comparing the girls and boys, so that I could easily work out the median and gain an accurate analysis of the specific weights and heights of the girls and boys. Once this was done I was able to create the histograms. I am investigating two pieces of continuous data (height and weight) so histograms would be an appropriate piece of presentation to illustrate the different heights and weights in my sample. I did not use any pie charts as they would be inappropriate to represent the data and would make difficult to read and interpret the results.

The conclusions that have been found on the relationship between height and weight are based on a stratified sample of 51 boys and 49 girls, which eliminated the factor of bias and the growing number of students in Mayfield High School. To confirm the results that were found, I could extend the sample or repeat the whole investigation with a different sample and compare the two sets of results.

I think that my conclusions are quite reliable as they comment on the relationship between height and weight considering gender and suggest that there is a relationship, which will be looked at further on in the investigation. However, I think that the range that was found is quite unreliable to use in the conclusion. This is because the two extreme values created a big range, especially for the boys, meaning that the results would be affected, and could be proved to be unreliable. To improve this, I would find out the interquartile range as it does not take the highest and lowest values, but the upper and lower quartile, meaning my conclusion could be strengthened.

Overall my strategy worked quite well. This is because I gained accurate results by making it a fair sample through the use stratified sampling. The bias of age and gender that would have affected my results was eliminated due to my stratified sample, which I randomly sampled after. The histograms and frequency polygons that I used effectively allowed me to compare the heights and weights of girls and boys and see the trends within the relationship between height and weight, which included the formation of a new hypothesis, which is:

‘In general, the taller a person the heavier that person is likely to be’.

This means that my prediction in saying that the taller a person the heavier is correct, which also implies that there is a strong relationship between height and weight. The new hypothesis that has been formed will be looked at in detail in the next section of the investigation.

I think that the idea of the creation the histograms and frequency polygons for girls and boys alongside each other was beneficial as they allowed me to easily see any trends and patterns that should be included in my conclusions. This was a vital part of my strategy to enable me to compare the data efficiently.

Extension

Hypothesis

- In general, the taller the person is, the heavier that person is likely to be.

Plan

I have now found a new hypothesis to study. I am going to try and show that in general, the taller the person is, the heavier that person is likely to be. To investigate this hypothesis I will need a new random sample of 50 students of any gender. It will be random because I need to make sure that every student has an equal chance of being selected to be in my sample. I do not need to use stratified sampling as I am not investigating boys and girls. I am only trying to prove that if a person who is randomly selected is tall, the heavier they are likely to be. The data that is randomly sampled will be used to do this. After I have sampled the data, I will have a table that only consists of the heights and weights of 50 randomly selected students.

Height (cm) | Weight (kg) |

143 | 33 |

143 | 41 |

148 | 40 |

152 | 37 |

154 | 40 |

160 | 42 |

160 | 50 |

162 | 51 |

165 | 40 |

165 | 40 |

167 | 53 |

144 | 50 |

150 | 39 |

153 | 45 |

155 | 60 |

159 | 44 |

159 | 55 |

162 | 53 |

164 | 44 |

168 | 56 |

168 | 59 |

173 | 59 |

180 | 57 |

152 | 50 |

156 | 60 |

159 | 46 |

160 | 48 |

162 | 52 |

Height (cm) | Weight (kg) |

162 | 40 |

166 | 45 |

171 | 54 |

175 | 56 |

152 | 70 |

153 | 48 |

155 | 50 |

160 | 47 |

162 | 46 |

172 | 71 |

173 | 51 |

175 | 59 |

180 | 60 |

182 | 57 |

184 | 62 |

151 | 40 |

152 | 44 |

162 | 54 |

165 | 54 |

165 | 58 |

170 | 56 |

177 | 57 |

To compare this sampled data, I will produce a scatter diagram of height against weight. This will enable me to see any trends that will appear, or, in other words, if there is any correlation between height and weight. The scatter diagram is a good way to compare the points that are plotted as height against weight as a general trend can be seen straight away.

If there is a positive correlation, then the hypothesis that is being tested will be true. This means that there is a direct correlation between height and weight, where the greater the height means the greater the weight. If there is a negative correlation the hypothesis will be answered as false. This is very unlikely to happen, however there may be a few exceptions that may occur, which do not follow the general trend. To spot these exceptions as well as spotting the trend, I will draw a line of best fit. This will show me the trend straight away when I look at the scatter diagram and so I will be able to interpret it easily when I am constructing a conclusion. The line of best fit will also enable me to predict a weight from a given height. I can compare the data like this as well by taking one large height and one small height and comparing the weights that are given from the line of best fit.

Limitations

Similar to the first section of the investigation, there are limitations that involve the different lifestyles of students that cannot be controlled. This means that there are diverse diets and also different metabolic rates that could affect the height and weight of each student. This then implies that there are bound to be exceptions to the general trend within the scatter diagram.

Conclusion

You can see on the scatter diagram that there is a positive correlation between height and weight. This means that the larger a person’s height, the heavier that person is likely to be. The line of best fit suggests that a person with a height of 155 cm will be 47 kg, whereas a person of a height of 175 cm will be 56 kg. This shows the difference that height of a person makes towards the weight of that same person. This is that the greater the height, the greater the weight is likely to be. This statement answers my hypothesis as true.

The scatter diagram helped me show that the hypothesis was correct. It was the fact that it caused a general trend to appear when height was plotted against weight. A line of best fit made this clearer as it presented the trend so that it could be seen more easily effectively. However there are a few exceptions that can be seen on the scatter diagram that are opposed to the general trend. For example the person with a height of 152 cm has a weight of 70 kg, but the person who has a height of 180 cm has a weight of 57 kg. According to my line of best fit, the person with a height of 153 cm should have a smaller weight than the person who has a height of 180 cm, but reality disagrees with that theory. These exceptions occur because of the different diets of the students and the dissimilar metabolic rates that each one has. These are factors which cannot be controlled by a researcher so there are likely to be exceptions. Therefore, my first hypothesis is a fair statement to use when commenting on these results as it uses the factor of probability in its wording:

‘In general, the taller a person is, the heavier that person is likely to be’.

Evaluation

The random sampling that I used made this part of the investigation fair. Every student was equally likely to be selected so I got firm results that were not biased. To improve the results I got and strengthen them, I could extend the random sample, or even repeat this experiment with a completely different random sample of 50.

My conclusion is reliable as the results clearly show that there is a positive correlation between height and weight. However, the plotted points are a little spread out, which makes it a little bit difficult to spot a clear trend. Although the line of best fit fixes this, the overall conclusion could be argued as unreliable. This means that there should be another way that the correlation can be made even better in a scatter diagram. This is where the factor of gender should be considered. The sample could have been slightly biased if more girls were randomly selected than boys, or if it was the other way round. This is because in the scatter diagram, it could be difficult to find a line of best fit because of all the different heights and weights that are affected by gender. However, the purpose of this part of the investigation was to explore the hypothesis of:

‘In general, the taller a person is, the heavier that person is likely to be’.

This was answered as true and so we now know that there is this correlation between height and weight. This can lead on to another extension to the line of enquiry, where a new hypothesis should be tested that involves gender. This is that:

‘There will be a better correlation between height and weight if boys and girls are considered separately’.

Further Investigation

Hypothesis

- There will be a better correlation between height and weight if boys and girls are considered separately.

Plan

In the early section of this investigation, or, in other words the pre-test, evidence was found that suggested that height and weight were both affected by gender. I am now trying to show that there is a better correlation between height and weight if girls and boys are considered separately.

To investigate this hypothesis, I will need to use the stratified sample that I created before of 51 boys and 49 girls so that the data that I interpret will reflect the whole population of Mayfield High School. The use of stratified sampling will also eliminate the bias of age that may occur in this part of the investigation. This means that I will have accurate results within my sample.

At first, the sample that I use will be presented in three scatter diagrams (one for boys, girls and the mixed population in the sample). These diagrams will have a scale from 20 to 100 on the y-axis, and 120 to 200 on the x axis. This will allow me to compare the different results and see if the correlation has become stronger after a line of best fit is drawn. After this, I will find the equations of the lines and use them to make predictions and analyse the data further by predicting different weights when the height is known and heights when the weight is known. I will need to find the y-intercept first as I do not have an origin. This will be done by making c the subject of the formula y = mx + c.

As I am dealing with two pieces of continuous data, height and weight, cumulative frequency diagrams would be appropriate and a powerful tool to compare the different data sets. I will need to create a cumulative frequency table for height and weight separately and then draw cumulative frequency curves for boys, girls and the mixed population in my sample on the same graph. On my cumulative frequency graphs, I will only demonstrate how to find the median and interquartile range for the mixed population so the graphs will still stay clear. Through these diagrams, I will be able to find the median and the interquartile range. This will allow me to draw box-and-whisker diagrams, which will allow me to make further comparisons between girls and boys. I will also predict percentages of students who have a height or weight within a given range. This will consider ranges other than the interquartile range, which ignores 50% of the students. I can analyse distinct ranges when dealing with percentiles. The cumulative frequency graphs will also allow me to see the relationships between the data for boys and the data for girls, which I can conclude from.

There is a positive correlation between height and weight when girls are considered.

There is a positive correlation between height and weight when boys are considered.

Conclusion

The evidence gained from the scatter diagrams supports the hypothesis. This is that there is a better correlation between height and weight if girls and boys are considered separately. The correlations for girls, boys and the mixed population were all positive. They have all became better from the previous scatter diagram for the hypothesis of, ‘In general, the taller a person is the heavier that person is likely to be’. The diagrams made for the girls and boys had a much better correlation to the combined sample, which did not have as good a correlation as the diagrams for height and weight when girls and boys were considered separately. This means that there can be better results found for the relationship between height and weight if boys and girls are considered separately. The lines of best fit were drawn with ease, because of the better correlations that was formed.

As the correlations for the girls and boys were quite different and now it is known that there is a better correlation when boys and girls are considered separately, the hypothesis made in the first section of this investigation, which was, ‘the relationship between height and weight is affected by gender’, can further be proven as true. This is presented on my scatter diagrams through the lines of best fit. For example, the lines of best fit on my diagrams predict that a girl who is 160 cm tall would have a weight of 49 kg, whereas a boy of the same height would have a weight of 53 kg. This comparison shows that boys are likely to be heavier than girls even if both genders are of the same height.

The lines of best fit that I have created are straight lines. This means that you can find the equations of these lines by using the formula, ‘y = mx + c’ by finding the gradient of the line and the y-intercept. The y-intercept will need to be worked out as I do not have a graph with two quadrants or an origin. This is done by working out the gradient, putting in any values for x and y and then making c the subject of the formula in y = mx + c.Here are the y-intercepts for the data set:

Boys: c = y – mx = 40 – 87.5 = - 47.5

Girls: c = y – mx = 40 – 58.5 = - 18.5

Combined Sample: c = y - mx = 40 – 80.62 = - 40.6

If y represents weight, and x represents height, the equations of the lines of best fit for the data set are:

Boys: y = 0.625x – 47.5

Girls: y = 0.45x – 18.5

Combined Sample: y = 0.556x – 40.6

Now that I have these equations, the weight can be predicted when the height is known, or height when the weight is known. For example to predict the weight of a boy who is 170 cm tall:

y = 0.625x – 47.5 x = 170

= 106.25 – 47.5

= 58.75 kg

Since weight is a continuous piece of data, it can be appropriate to have a value with a decimal point. In fact, the use of the decimal point makes the value more accurate, meaning that my predictions will be more accurate.

Evaluation

The conclusion that I gained from the scatter graphs was reliable. It commented on the general trend and proved that there is a better correlation between height and weight when boys and girls are considered separately. But there were a few exceptional values that fell out of the trend. For example in the scatter graph for boys, there is a boy who has a height of 162 cm, but has a weight of 92 kg. According to the line of best fit, the boy should have a weight of about 49 kg. This has probably happened because the boy has a different diet to the average student, or it could be that he has a different metabolic rate. These factors are ones that cannot be controlled and so it is likely that there are some values, which fall out of the general trend.

Cumulative Frequency for Weight

Weight (kg) | Cumulative Frequency | ||

Boys | Girls | Mixed | |

20 ≤ w < 30 | 1 | 0 | 1 |

30 ≤ w < 40 | 7 | 5 | 12 |

40 ≤ w < 50 | 20 | 26 | 46 |

50 ≤ w < 60 | 34 | 41 | 75 |

60 ≤ w < 70 | 43 | 49 | 92 |

70 ≤ w < 80 | 49 | 49 | 98 |

80 ≤ w < 90 | 50 | 49 | 99 |

90 ≤ w < 100 | 51 | 49 | 100 |

Cumulative Frequency for Height

Height (cm) | Cumulative Frequency | ||

Boys | Girls | Mixed | |

120 ≤ w < 130 | 1 | 1 | 2 |

130 ≤ w < 140 | 1 | 1 | 2 |

140 ≤ w < 150 | 5 | 4 | 9 |

150 ≤ w < 160 | 19 | 21 | 40 |

160 ≤ w < 170 | 37 | 40 | 77 |

170 ≤ w < 180 | 47 | 47 | 94 |

180 ≤ w < 190 | 50 | 49 | 99 |

190 ≤ w < 200 | 51 | 49 | 100 |

Conclusion

There is a strong correlation between height and weight if gender and age are considered. Once studied, it is found that in general, boys are taller and heavier than girls.

Throughout this investigation I have found that there is a positive correlation between height and weight both across the school and within each year group. The correlation appears to be much stronger when individual year groups and separate genders are considered. However, I can only support this through the experiment that was done with the year 7 boys compared to the boys in the whole school. If I was to improve my investigation, I would investigate each year group and gender, but this would probably be predictable and I think that I have reliable results from the 10% sample of year 7 boys that can support this conclusion.

I have to remember that the school is only a small population in the world. There are many other factors that affect height and weight such as the lifestyle of people and another possible factor could be the amount of money that a person earns that affects weight. Specifically in the school, there are factors that affect the correlation between height and weight such as the distance away from school and the mode of transport used to get to school. If a person walks to school, he or she is more likely to be lighter in weight than a person who frequently comes to school by car.

This student written piece of work is one of many that can be found in our GCSE Height and Weight of Pupils and other Mayfield High School investigations section.

## Found what you're looking for?

- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month