You have to recognise bias. Averages, and all the other statistical shows you the trend in data, the general drift in the data, which means that they show you what the ‘normal’ height is in the whole of each year group. Bias is when the statistics do not represent the data. They do not show the actual trend. For example, outliers would raise or lessen the mean. This is one of the things you have to talk about.
The following statistical calculations were performed on the data sets:
we use grouped data. Therefore the average mean height is all the heights added together, devided by how many students you had.
You can’t add all the heights together like you would normally e.g. 139cm + 140cm because the heights are grouped into intervals.
Because of this, we take a midpoint x of each interval and use this instead. Then we do frequency * x or fx which is the sum. You devide by n which is how many students you have, i.e. total of sampled data. The result is the mean average height of that strata.
The disadvantage of this average is that if you have lets say a couple of heights which are really low, then the mean is badly affected and it also becomes low. If a few heights are very high then the mean is raised also. In summary, ‘outliers’ affect this average easily thus creating some bias in the average. What this means is that you can’t take it as a very good indicator of what the general, average height for this strata/group is. Therefore we have to take into account the outliers when analysing/evaluating.
Median: Is the middle value when all the heights are arranged in order. So if you have 40 heights you arrange them in order, and then you take the one which is in position 20 as your median. To do this with our grouped data you simple do:
N + 1
2
40 + 1 / 2 = 20.5 value. Lets take this as the height in position 21. For year 7, all, median is in ranged 150-159. Therefore we take this is 154.5.
Mode: The value that occurs to most, which frequency is the highest, that is your modal interval, the midpoint is taken as the mode. So for year 7, all, highest frequency is 16, interval 150-159. The mode is taken as 154.5. This shows that the averages are similar, so they are concurrent, so they backup my calculations.
Range: shows you the range of data. Highest value – Lowest value.
Inter Quartile Range = UpperQuartile – LowerQ
Lower Quartile = ¼ of the way along value. 25th percentile. Y7, all =
Upper Quartile = ¾ of the way along value. Or the 75th percentile.
* using c.f. graphs UQ & LQ determined and the IQR calculated to determine box and whisker plots drawn as well seperate sheet
- standard deviation also calculated upon each set. (put on same sheet as mean median)
standered deviation : is the amount by which the heights deviate which means spread out from the mean. So the higher the sd, the more the heights are spread from the mean.
- % change of median height in each year group -- new sheet and with that draw pie charts of these.
Height of this group + height of nxt group, / height of nxt group * 100 . (mean y7 + mean y8) / mean y8 * 100. = % change from y7 to y8.
* % of girls taller than mean height of boys in each year group. -- new sheet 7 girls taller than 150 in y7. so % of girls taller = 7 / 10 * 100 = 70%.
- scatter diagram of mean heights from each year group. two lines, one for male heghts other for female.positive correlation will show that ppl get taller as they age
dots for means of y7-y11. line of best fit. If the line has a +ve gradient. /. Then heights are increasing as ppl get older. Which should prove hypothesis 1. Secondly, to show the rate of this increase, calculate the gradient, m, y = mx + c of the line of best fit. That’s the rate of change.
%girls ... you do a table of girls taller than boys
Another scatter diagram of % of girls taller than boys for each year group. Gradient should not be too high. Gradient should be +ve, should tail off, this shows girls getttin bigga.
Glossary:
Mean:
Median:
Mode:
Range:
IQR:
UQ:
LQ:
SD:
% Change:
∑