If I see any data which does not fit with the general trend of the data (outliers), then I will remove that piece of data, but only if it is the only one which is far off the other pieces of data.
This table shows the population statistics of the school and my sample:
Calculations
These are the calculation which I will do:
Median vs. mean – I feel that in theory, working out the median will be more reliable some ways that the mean. This is because the median is not affected by outliers as much as the mean is. However, for this type of data, I do not think that this will be the case. Firstly, I will remove any outliers from my data anyway, and secondly if the data for a year group is very similar, the median may become to high or low, depending on where most of the data is. However, the mean will not have this problem because it will encompass all the data, and not just the middle one. The way these will help me with my hypothesis will be that I can calculate the percentage differences of the mean and median between one year group and another. I can then see how the percentage difference changes with each year and by how much.
Mode – This will be unsuitable because very few heights and weights will be the same.
Interquartile range vs. range – The Interquartile range (IQR) will be more reliable because it represents all the data and not just the largest and smallest (range). The IQR will help me get an idea of where most of the pupils’ heights and weights are.
Diagrams and tables
I will use box plots to prove my hypothesis. I feel that these will give me enough data to prove my hypothesis. For example, I can work out the mean, IQR, median and range with this. These will help me to prove my hypothesis. I will then put all of the data into a table, so that I can then make a bar graph and then can compare the means graphically. I will do the same for the heights and the weights.
Planning the analysis
To try to prove my hypothesis, I will take the mean and median of the data of each year, and work out what the percentage difference is from year to year. If my hypothesis is right, then the heights will increase faster than the weights.
RESULTS
Box plot of heights
Results table of heights
Bar graph of means of heights
Box plot of weights
Results table of weights
Bar graph of means of weights
ANALYSIS
My original hypothesis was that as the year group increases, the heights and weights of the pupils will increase, but at a decreasing rate each time. I will now do some percentage difference calculations of the means to see if my original hypothesis was correct:
Heights
These are the percentage differences for the means:
Year 7 to 8: 0.08 / 1.55 x 100 = 5.16%
Year 8 to 9: 0.02 / 1.63 x100 = 1.23%
Year 9 to 10: 0.02 / 1.65 x 100 = 1.21%
Year 10 to 11: 0.01 / 1.67 x 100 = 0.60%
Weights
These are the percentage differences for the means:
Year 7 to 8: 3.29 / 46.71 x 100 = 7.13%
Year 8 to 9: 1.33 / 50 x 100 = 2.66%
Year 9 to 10: 6.84 / 51.33 x 100 = 13.33%
Year 10 to 11: -6.03 / 58.17 x 100 = -10.37%
The results of the heights show prove my hypothesis. This is because the heights always increase from year to year, but they increase at a decreasing rate, i.e. from year 7 to 8, the increase is by 5.16%, but from year 8 to 9, it is 1.23% and this keeps on going down until year 11.
However, this is not the case for the weights. Despite the weights increasing from year 7 to 10, there was no distinguishable pattern for the percentage increases. From year 7 to 8, it was 7.13%, then from year 8 to 9 2.66%. However, this jumped hugely to 13.33% but then went down by 10.37% to year 11. The percentage difference from year 9 to 11 was 1.58%
Thus from this, there can be two possibilities. Either the year 11’s had very light weights, or the year 10’s had very big heavy weights. I think that the reason is that the year 11’s had very heavy weights. I think this because for the heights, the percentage increase was generally higher, but still followed the same pattern as the weights, until year 10. For the heights, the percentage difference from year 9 to 11 was 1.82%, and this seems consistent with the weights, which was a 1.58% increase. Therefore, I think that the year 10’s had very heavy weights, and looking at my box plots, it seems that this is true, because the median for the year 10 weights is hugely larger than the median for the other years.
While I was collecting my results, I found a few outliers. After comparing the results before and after removing any outliers, I can conclude that removing them did not have a big impact on the overall results.
Criticism of box plots
Despite being able to prove my hypothesis, I still do not know how weight changes with increasing height. I know that they increase, but do not know how they increase with each other. Therefore, I need to compare the two to find this out. I will make a new plan to do this.
PLAN
Introduction
Now that, I have looked at box plots to compare heights and weights, I now want to compare them directly. I will compare the heights and weights of the pupils, but will also split them into boys and girls, so I can see how height and weight change with different genders.
Hypothesis
I think that as the height of the pupils increases, the weight will also increase. This will be true for all the pupils. However, I think that the change between them will be different from boys to girls. This is because boys and girls have very different bodies, so they will grow differently. Therefore, I think that with both boys and girls, their weight will increase with their height, but the boys’ weights will increase by a steeper gradient.
Sampling
The data which I will be comparing will be the same as the data I used for my previous investigation. Therefore, the data I will use should represent the data for the whole school.
Calculations
Gradient of the lines: This is important because it will allow me to see how the weight increases with height and by how much. It will also prove or disprove my hypothesis, so if for example the gradient for boys is larger than girls, then I will know that my hypothesis is right, because it will prove that the weights for boys increases by a larger amount than for girls.
Diagrams and tables
I will use a scatter diagram to prove my hypothesis. This will help me because I can see the equation of each line that I make. Therefore, I can see the equation for the boys and the girls. From this, I will be able to calculate the gradient, and so prove or disprove my hypothesis.
RESULTS
Scatter diagram
Data of lines:
Boys: y=51.84x-30.08. Gradient = 51.84
Girls: y=23.05x+11.93. Gradient = 23.05
ANALYSIS
The scatter diagram shows that with the boys, the weight increases by a greater amount with the weight than with the girls. Therefore, this proves my hypothesis.