In order to prove whether this hypothesis is right or wrong I require some data, to allow me to analyse and to reach a conclusion. I have decided to use a data sheet, which was created by Mayfield Secondary School. This data should be very useful, as the school is owned by state, which means that this school welcomes all children. Thus, the data will not be skewed as the selection process in this does not involve a check on the height and weight of pupils, nor is selection based on physical ability.
Pursuing this hypothesis will require the removal of irrelevant data, which is not required for this investigation. However, I will keep data such as year group and the weight, which could applied later on this enquiry.
In order to view the whole of the data, I decided to group my data onto my sheet, which meant that I was able to analyse the data, without disruptions caused by switching between individual sheets. Additionally, I separated the data according to gender for easy analysis and simplicity, to allow me to use Autograph effectively.
With all surveys, there is always some data, which cannot be legitimate and the direct result of this is rogue data. This may be because some people are self-conscious of their weight or height, which may mean that they wish to lie about their physical attributes in order to be accepted or to boost their self-esteem.
It isn’t sinful to submit false data; however, I think that such data must be roved as it may affect the results of my data.
Height outside 2m, which I took as the maximum allowed height.
The modified data for boys and girls was then imported into Autograph to produce the following graphs for boys and girls across the whole school:
Height Distribution among girls.
Height Distribution among boys
[compare mean, median, mode etc here for the whole as well as differences in distribution, and draw conclusion accordingly]
I will now investigate this data further to determine if there is a correlation between the heights and weights, and if such a correlation occurs, whether it is stronger in boys or girls.
My first action with this data will be to remove any outliers from my data, eliminating any information which differs significantly from the majority at both ends of the range.
To do this, I will sort my data weight to easily identify such outliers.
The following records were removed:
Weight significantly lower than next lowest, 25
Weight far above 93, the next largest record, While Massoma Abbas, this data was not present.
Even though for each of these records the data for height and weight may not both me anomalous, I have decided to remove the records completely if just one is anomalous. The reason for this is to ensure that the same number of samples for height and weight are used, and thus can be compared fairly and consistently.
With my modified data, I will now plot a scatter graph in order to identify any correlations or trends, and to determine the range and average values for my data. I will also construct a line of best fit, the gradient of which can be used to make predictions based on the overall trend using the correlation coefficient, which is the gradient of the line.
This graph shows a relatively clear positive correlation between Height and Weight. As height increases, generally, so does height, although at a moderate rate.
The equation of the line of best fit is:
y = 43.03x-18.47
This equation reveals a number of important facts about the correlation of my data:
The coefficient of x; the gradient of the line, shows that for every metre of height gained, statistically, height will increase by 43.03 kg.
I will apply this equation to the weight data, which will create an alternative set of heights. I will then compare these statistically calculated heights to the true heights in order to determine the reliability of the equation: