From the data I have selected I can create charts to show the correlation between weight and height, using a best fit or trend line. The chart looks as follows:
From the above chart, with a best fit line, I can see that the line shows positive correlation. This is because the points appear to lie close to the trend line and the line is going upwards (positive) direction. On the chart I have put the data for R which is correlation. The figure I get for R is 0.4412, which on a chart with ten points would be unreliable. But with 119 points on the chart this figure is more reliable. I can also use the critical value for when n=120 that the critical value is 0.151, the critical value for n=119 would higher than this, but lower than the R value 0.4412, so I can therefore say that I am 95% confident the hypothesis is correct.
From the chart I can use the equation of the trend line to estimate the correct value of any anomalies. I can only estimate data for height value between 1.3 and 1.9 metres and weight values between 40 to 60 kilograms, as this is the place the trend line goes through. There anomalies were as followed:
The equation y = 37.394x – 9.5096 can now be used to find the correct data for the anomalies above (highlighted in italics), the actual results are:
I am happy to use this equation as it has been taken from a chart that uses 119 points and has an R-value of 0.4412 that shows good correlation.
Now that I had found out that my first hypothesis is correct I can check my second hypothesis, also using charts. The first chart is of the males’ weight against height:
The second chart shows the females’ weight against height:
The above charts can be used to show whether or not my second hypothesis is correct. The value I need to see if my hypothesis is correct is the R-value. The R-value for the males’ chart is 0.5118 and the females’ chart has an R-value of 0.3154. From these figures I can tell that the males have a greater correlation than the females do, so my second hypothesis is confirmed. Another way to judge the correlation is using a visual comparison, but these two graphs show little difference, making it impossible to compare them.
From this data I can also compare the R-values for the best-fit line of males and females to that of the value for the whole school. From this I can tell if the correlation from the school sample is made up of males or females. The R-value for the school is 0.4412, for the males it is 0.5118 and for the females it is 0.3154. From these figures I can say that the correlation for the whole school is made up from both sexes, but is largely made up of the male population of the school.
Weight throughout the school:
Now I have analysed data that compares the differences between males and females. I can go on to compare the differences between age groups throughout the school. For this I will focus solely on weight. I have chosen to focus my investigation on weight as this information can show the trends of weight gain in different sexes and the differences in weight between year groups. To investigate this further, I again need to take a sample of each year group. I will only analysis pupils in years 7, 9 and 11, as this shows the progress and change of weight over time. This part of the investigation will use a similar sample to the one previously used, but due to the small numbers of pupils in year 11, a larger sample will be needed. For this I have chosen to use a 25% random stratified sample.
Hypothesis:
I can now add more hypothesises to improve my understanding of the investigation, these are:
- The year 7 males’ weight will be less than the females’, this is because younger males tend to be more active than females, this will also cause a smaller spread of data.
- The year 9 males’ weight will be greater than the females’, as as males get older they tend to become less active, and females start to be concerned about their appearance.
- The year 11 males’ weight will be greater than the females’, as as males get older they tend to become less active, and females tend to be more concerned about their appearance. This will cause a small spread of data for the females, but a large spread of data for males.
- Males’ weight will increase in year 9, then decrease during year 11. This is because people tend to be more conscious bout their weight as they get older. The spread of data will follow a similar trend, becoming wider as age increases.
- Females’ weight will be increase from year 7 to year 9, and will decrease in year 11, this is because females put on weight during puberty (in year 9) and then become image conscious during year 11. This will cause a variation in the spread of data, similar to that of males.
I have taken a 25% stratified sample, which gives a large enough sample for year 11. The best way to analysis the difference in weights between different year groups and sexes is to use a series of box plots. I order to confirm my third hypothesis I will use a box plot which compares the weight of year 7 females and with year 7 males, the chart is as follows:
The above box plot can be used to show a number of different things, and combined with the summary table provides me with information about the weight of year 7 males and females. The first thing we can see is the similarity in medians. On the males’ box plot I can see that the median is 45, while the females’ median is 44. From the box plot, I can also compare the interquartile range of each box plot to see how each varies. For the males’ box plot the interquartile range is 11, this is because there is more boys in the upper quartile, while the females is 8.5. This shows that more males are weigh more than females. This is also shown as the males have the highest weights, with an outlier of 75 kilograms.
For each box plot I can also look at skewness, for the females’ box plot, this looks to be near normal distribution, while for the males’ there is a slight positive skew, which would entail that there are more males of a lower weight. There are two outliers on the box plot, so for further investigation I will get rid of these results. Overall, I can say that my hypothesis is rejected, as it shows that males are heavier than females.
For the next two hypothesises I can use a similar method to the one I have already used. I will firstly analysis the data for the year 9 females and males. I will again use a box plot to analysis the data, the box plot is as follows:
Again with this box plot I can compare the medians for the males and the females. In this case the median for the males is 51, while the females’ is 50. This shows that the middle weights for males and females are similar. The difference comes in the lower quartile, this quartile stretches over a longer distance, showing that females tend to be lighter, in contrast to this, the highest figure is for the females. I can also look at the skew of the box plots this shows the females have a slight positive skew, along with males. This shows that my hypothesis about the weight of year nine’s is correct, as males tend to be heavier than females.
To analysis the information for year 11 I need another box plot. This box plot is as followed:
These box plots allow easier analysis of the data, as they have clearly show different medians and upper and lower quartiles. The box plots show that the lower quartile for males is similar to the upper quartile for females and thus means that the upper 75% of the male population are heavier than roughly 50kg and 25% of the female population are heavier than 50kg. The skewness for each box plot is very similar, with both showing normal distributions. Overall, this hypothesis is confirmed, as there is a large spread of data for males, while a smaller spread for females.
Now I have compared the data between different sexes in each year group, I can go on to investigate the difference in age, in each sex. For this I need two further box plots, the first of these is the chart which shows the box plots for males, through the school:
From the box plots it is clear to see that the weight of males increases, as they become older. The increase of weight is indicated by the medians of each year group, being 45, 51 and 53 for year seven, nine and eleven respectively. What the box plots also show is how the spread of weights changes through the school. Year seven sees pupils with a wide spread of weights, followed by a decrease in the spread in year nine, this could be due to males being image conscience. After year nine, the spread of data increases, often because pupils in year eleven become less active. This shows that my hypothesis, about the difference between weights of males over a period of time, is incorrect.
I can now go onto my seventh hypothesis, which shows how females’ weight changes through the school, for this I need another series of box plots:
For this hypothesis, I can again see a trend in my results. It is clear than weight increases, as females grow older, until year 11, when they start to lose weight; this is often because of dieting, due to peer-pressure. Looking at the medians, it is clear to see the change in weight, these being 44, 50 and 45. If I also look at the spread I can also see a narrowing spread, this is because peer-pressure in year eleven is very evident. This confirms my seventh hypothesis, as the weight of females increases, before decreasing in year eleven.
Body Mass Index:
Now I have showed the differences between different year groups throughout the school, I can go on to analysis the difference in age through one sex. I have chosen to investigate boys, and will focus on the body mass index (BMI) of each year group. Focusing on the BMI will allow me to compare the obesity of pupils as they go through the school. I will use the same sample as used in the previous section.
Hypothesis:
Before I go on to analysis the data I need a new set of hypothesises, they are:
- Year 7 Males’ BMI will have a positive skew; this is because it is more likely that males will have a low level of obesity. This is because they are often more active at a younger age.
- Year 9 Males’ BMI will have a normal distribution; this is because a large proportion of males tend to be around an average obesity level. This is because it is the transition stage, where they are caring about their looks, but before they stop caring.
- Year 11 Males’ BMI will have a negative skew; this is because older males tend to lazier and less bothered about their appearance.
For my first hypothesis I need to analysis the weight and height data for the year 7 males. I have used the same sample as I previously had used, and will use Microsoft Excel to find the BMI. The equation for BMI is:
After working out the BMI for each of the pupils, I can now create histograms, which will show the distribution of each year group. From these I can then analysis the spread of data, as I did for the box plots, and come to a conclusion about the obesity levels in the school.
The histograms can be very easily made using the data-handling tool we were provided. This allows use to split the data into appropriate groups, and gives me a clear understanding of the spread of data, and therefore how obesity levels vary throughout the school. The histograms have irregular class widths, so that there is no danger that one stratum has no data in. The first histogram is for the year 7 males.
This histogram shows the changes in the BMI of year 7 males. I have not included outliers for this series of histograms, as this only causes the data spread to be inaccurate. The first comment I can make about the histogram is its skewness. This is based around the shape of a box plot, but it can also be used with a histogram.
For Pearson’s coefficient of skewness the formula for this is . Pearson’s method allows me to see the skewness as a figure, often between 3 and –3, with 0 indicating even distribution
The above histogram looks to me like having positive skew, and according to Pearson’s the measure of skewness is 0.214. This would entail that the eighth hypothesis is now confirmed because there are more males with low BMI levels, than higher levels.
For year nine males the following histogram was created.
Again I have removed the outliers, when looking at the histogram it shows slight negative skew, which would entail there are more males with high BMI levels. This is confirmed by Pearson’s which is –0.271. I can therefore say that the ninth hypothesis is confirmed wrong, as there is a greater number of males with high BMI rates, that low rates.
For the year eleven males a final histogram is needed.
This histogram shows near to normal distribution, and again Pearson’s measure of skewness will be used to confirm the skew. . Although this histogram has a slight negative skew, when looking at it, it can be seen that the skew is near to normal distribution. This means my tenth hypothesis can be rejected.
Overall, in Mayfield show varying EQI results, with year 7’s having negative skew, year 9’s have positive skew and year 11’s have normal distribution, which shows a variation between each year groups.
Overall, the investigation has shown that as height increases, so to does weight and visa versa. This information can then be used to predict other anomalous points. The investigation had also shown how weight of different sexes changes, and how it changes through the school. Hypotheses that were confirmed included one that showed that males with have a wider spread of weight data in year eleven than females do. The final part of the investigation shows the obesity levels, and how they change throughout the school, one thing that can be taken from the investigation is that obesity level are low in all year groups, although two histograms show negative, this is to a small degree and therefore would not effect the results too greatly. All the hypotheses were based on real life, and therefore I can use these to show if Mayfield is a good example of the population. The investigation showed that six of the ten hypotheses were correct, so I can therefore say that Mayfield gives a good example of the young population, although it does not necessary give an accurate one.