Because both height and weight are continuous data, I have chosen to group the data in class intervals of tens as this allows me to handle large sets of data more easily and will be easier to use when plotting graphs. In both the height and weight column, '120 ≤ h < 130', this means '120 up to but not including 130', any value greater than or equal to 120 but less than 130 would go in this interval. I feel I am now at the stage where I can go on to record my results in graph form. This will then allow me to analyse my data and compare the results for the differing genders, which I am unable to do with the tables above.
Weight
As I mentioned earlier both height and weight are continuous data so I cannot use bar graphs to represent it, instead I will have to use histograms as this is a suitable form of graph to record grouped continuous data. Before I produce the graph I am going to make another hypothesize that;
"Boys will generally weigh more than girls."
Histogram of boys' weights
Histogram of girls' weights
Obviously by looking at the two graphs I can tell there is a contrast between the girls' and boys' weights, but to make a proper comparison I will need to plot both sets of data on the same graph. Plotting two histograms on the same page would not give a very clear graph, which is why I feel by using a frequency polygon it will make the comparison a lot clearer.
Frequency polygons for boys' and girls' weights
This graph does support my hypothesis, as it shows there were boys that weighed between 80kg and 90 kg, where as there were no girls that weighed past the 60kg-70kg group. Similarly there were girls that weighed between 20kg and 30kg were as the boys weights started in the 30kg-40kg interval. Although by looking at my graph I am able to work out the modal group, but it is not as easy to work out the mean, range and median also. To do this I have decided to produce some stem and leaf diagrams as this will make it very clear what each aspect is, for the main reason I will be able to read each individual weight - rather than look at grouped weights. Stem and leaf diagrams show a very clear way of the individual weights of the pupils rather than just a frequency for the group-which can be quite inaccurate.
Girls Boys
From this table I am now able to work out the mean, median, modal group (rather than mode because I have grouped data) and range of results. This is a table showing the results for boys and girls;
(NB. The values for the mean and median have been rounded to the nearest whole number.)
Despite both boys and girls having the majority of their weights in the 40-50kg interval, 13 out of 30 girls (43%) fitted into this category where as only 11 out of 30 (37%) boys did which is easily seen upon my frequency polygon. I could not really include that in supporting my hypothesis as the other aspects do. My evidence shows that the average boy is 4kg heavier than that of the average girl, and also that the median weight for the boys are 3kg above the girls. Another factor my sample would suggest is that the boys' weights were more spread out with a range of 50kg rather than 31kg as the girls results showed. The difference in range is also shown on my frequency polygon where the girls weights are present in 5 class intervals, where as the boys' weights occurred in 6 of them.
Height
I am now going to use the height frequency tables to produce similar graphs and tables as I have done with the weight. Obviously as height is continuous data, as mentioned already, I am going to use histograms to show both boys and girls weights. I am also going to make another hypothesis that;
"In general the boys will be of a greater height than the girls."
Histogram of boys' heights
Histogram of girls' heights
Similarly as with the weight, I can see the obvious contrasts between the boys' and girls' heights, but the data is not presented in a practical way to perform a comparison, that is why I am going to put the two data sets on a frequency polygon.
Frequency Polygon of Boys' and Girls' Heights
This graph does support my hypothesis as the boys' heights reach up to the 190-200cm interval, where as the girls' heights only have data up to the 170-180 cm group. Similarly there were girls that fitted into the 120-130cm category where as the boys' heights started at 130-140cm. As this data is presented in
Girls Boys
With these more detailed results, I can now see the exact frequency of each group and what exact heights fitted into each groups, as you cannot tell where the heights stand with the grouped graphs. For all I know all of the points in the group 140 ≤ h < 150 could be at 140cm, which is why I feel it is a sensible idea to see exactly what data points you are dealing with. I can also now work out the mean, median and range or the data, these are the results I worked out;
Differing from the results from my weight evidence, the heights' modal classes for boys and girls differ, and much to my surprise the girls' modal class is in fact one group higher than the boys. This is very visible on my frequency polygon as the girls data line reaches higher than that of the boys. This doesn't exactly undermine my hypothesis however as the modal class only means the group in which had the highest frequency, not which group has a greater height. On the other hand the average height supports my prediction as the boys average height is 6 cm above the girls. The median height had slightly less of a difference than the weight as there was only one centimetre between the two, although again it was the boys' median that was higher. When it comes to the range of results, similarly to the weight the boys range was vaster than the girls, although there was no where near as greater contrast in the two with a difference of only 6 cm between the two. With all of the work I have done so far, my conclusions are only based on a random sample of 30 boys and girls so they are not necessarily 100% accurate, and therefore I will extend my sample later on in the project. Before I go on to further my investigation, I feel that it is necessary for me to work out the quartiles and medians of both data sets, as this allows me to work with grouped data rather than individual points as in my stem and leaf diagrams. To do this I am going to produce cumulative frequency graphs as this is a very powerful tool when comparing grouped continuous data sets and will allow me to produce a further conclusion when comparing height and weight separately. I am also going to produce box and whisker diagrams for each data set on the same axis as the curves for this allows me to find the median and lower, upper and interquartile ranges very simply (I have attached a small sheet explaining how I can find these results from the graphs I am going to produce). I am firstly going to look at weight, and to produce the best comparison possible I am going to plot boys, girls and mixed population on one graph.
Cumulative frequency curves for weight
All three of my curves clearly show the trend towards greater weights amongst boys and girls. From looking at my box and whisker diagrams I have obtained the following evidence:
These results continue to agree with my prediction made earlier that the boys will be of a heavier weight than the girls. I can see this as the lower quartile, upper quartile and mean are all of lower values than the boys, but also the boys' range of weights is shown to be greater from these results as their interquartile range is two kg higher than the girls.
Cumulative frequency for heights
These results also show the trend towards a greater height amongst the boys and girls. Similarly as done with my weight diagram, I have obtained the following evidence;
Similarly as with the weight results, these results continue to further my prediction that the boys would be of a greater height than the girls. As with the weight results this can be seen from the lower quartile, upper quartile and mean points which in the girls' case are all of a value smaller than the boys.
From all of the graphs and tables I have produced so far, I can fairly confidently say that the boys weights' and heights' are higher than the girls but none of my evidence collected so far helps me conclude my original hypothesis made; "The taller the pupil, the heavier they will weigh."
Although when looking at my cumulative frequency graphs of height and weight, I could make the statement that both diagrams appear to be very similar from appearance although I cannot make any form of relationship between the height and weight. I am now going to extend my investigation and see how height and weight can be related, and to do this the most effective way is by producing scatter diagrams. I will plot boys and girls on separate graphs as I feel the results will produce a stronger correlation when done this way and also to continue with the style I have begun with. Using scatter diagrams allows me to compare the correlations of the two graphs, and the equations of the lines of best fit (best estimation of relationship between height and weight) of each gender.
Boys' Scatter diagram of height and weight
This graph shows a positive correlation between height and weight, and all of the datum points seem to fit reasonably close to the line of best fit. There are a few points that I have circled which do not really fit in with the line of best fit - these are called anomalous points, it means that they do not fit in with the trend of the results.
Girls' Scatter diagram of height and weight
This graph similarly shows a positive correlation, although the correlation is stronger than the boys as the spread is greater on the boys graph than on the girls. The datum points on this graph are quite closely bunched together in the middle where as on the boys graph there is a wider spread of results - which would agree with the conclusion made earlier that the boys' heights and weights are of a larger range than the girls. I have again circled the anomalous points on this graph to show which data did not fit in with the trend of results. As both of my lines of best fit are completely straight, I would assume that the equation of the line would be in the form of;
y = mx + c.Wheny represents height in cm, and x represents weight in kg, the equations of the lines of best fit for my data set are (I obtained these equations from my graphs in autograph as an exact result was available, however if I were to find the results myself I would do so by finding the gradients and looking at the point where they intercept the y axis, NB. attached is a small diagram of how I would do so):
Boys: y = 0.8004x + 121.6 Girls: y = 0.7539x + 123.6
These equations can be used to make prediction of either weight when you know the height or vice versa. For example, if I were to predict the weight of a girl who is 165 cm tall this is what I'd do:
y = 0.7539x + 123.6 so, x = y - 123.6
0.7539
If y = 165 cm then x = 165 - 123.6 = 55.91
0.7539
Therefore I would predict a girl of 165 cm would weight 56 kg (rounding up to a whole number as used on my graphs and data tables) when using the equations from my lines of best fit. I have checked this, by lightly drawing a pencil line on my graph across from 165 cm up to where it meets on the line of best fit and then dragging it down to the x axis, and after doing so the line met the x axis at around 56 kg.
I have now reached a point in my investigation where my random sample of 30 boys and girls is not necessary anymore. There have definitely been some clear conclusions made from my graphs and tables already, which have all in fact fitted in with my predictions made. However my predictions are only based on general trends observed in my data, and in both the girls and boys samples there were individuals whose results did not fit in with the general trend. I cannot have complete confidence in my results so far due to the fact this is only a random sample of 30 girls and boys and age has not been considered which I now feel is a necessary factor. I have spent a good amount of time considering different genders but now I am going to look at age differences. It is only common sense that age is going to affect your height and weight, for you would think a year 7 pupil would be smaller and lighter than a pupil in year 11. As Mayfield is a growing school there would be more pupils in year 7 than in year 11, therefore my random sample was likely to contain more year 7 pupils than year 11 - this is biased and unfair. To ensure that I obtain a data set with an accurate representation of the whole school, I am going to have to take a stratified sample. Stratified sample means that you sample a certain amount from a particular group to proportion that group's size within the whole population, i.e. pupils within year 8, within the whole school.