Height Frequency Tables
Weight Frequency Tables
As I mentioned earlier both height and weight are continuous data so I cannot use bar graphs to represent it, instead I will have to use histograms as this is a suitable form of graph to record grouped continuous data. Before I produce the graph I am going to make a further hypothesis that;
"In general the boys will be of a greater height than the girls."
Frequency Diagram of boys' heights
Frequency Diagram of girls' heights
As you can see in the two diagrams, there is an apparent contrast between the male and female heights. But the data is not presented in a practical way to perform a comparison, which is why I am going to present the two data sets on a frequency polygon.
This graph supports my hypothesis as the male pupils heights reach up to the 190-200cm interval, whereas the female pupils’ heights do not have data beyond the 170-180 cm interval. The data, in the format of a stem and leaf diagram, is shown below. Stem and leaf diagrams show a very clear way of the individual weights of the pupils rather than just a frequency for the group-which can be quite inaccurate hence the reason for why I have chosen it.
With these more detailed results, I can now see the exact frequency of each group and what exact heights fitted into each groups, as you cannot tell where the heights stand with the grouped graphs. For all I know all of the points in the group 140 ≤ h < 150 could be at 140cm, which is why I feel it is a sensible idea to see exactly what data points you are dealing with. I can also now work out the mean, median and range of the data.
Differing from the results from my weight evidence, the heights' modal groups for male and female pupils differ, and much to my surprise, the female pupils’ modal class is in fact one group higher than the male pupils. This is very visible on my frequency polygon as the female pupils data line reaches higher than that of the male pupils. This doesn't exactly undermine my hypothesis however as the modal class only means the group which had the highest frequency, not which group has a greater height. On the other hand the average height supports my prediction as the male pupils average height is 0.044 m above the female pupils. The median height had slightly less of a difference than the weight as there was only 0.02 m between the two, although it was the male pupil’s median that was higher. When it comes to the range of results, the male pupils range was vaster than the female pupils. With all of the work I have done so far, my conclusions are only based on a random sample of 30 boys and girls so they are not necessarily 100% accurate, and therefore I will extend my sample later on in the project. Before I go on to further my investigation, I feel that it is necessary for me to work out the quartiles and medians of both data sets, as this allows me to work with grouped data rather than individual points as in my stem and leaf diagrams. To do this I am going to produce cumulative frequency graphs as this is a very powerful tool when comparing grouped continuous data sets and will allow me to produce a further conclusion when comparing height and weight separately. I am also going to produce box and whisker diagrams for each data set on the same axis as the curves for this allows me to find the median and lower, upper and inter-quartile ranges very simply. I am firstly going to look at weight, and to produce the best comparison possible I am going to plot boys, girls and mixed population on one graph.
Weight
I am now going to use the weight frequency tables to produce similar graphs and tables as I have already done with the height. Obviously as weight is continuous data, as mentioned already, I am going to use frequency diagrams to show both boys and girls weights. I am also going to make another hypothesis that;
"Boys will generally weigh more than girls."
Obviously by looking at the two graphs, I can tell that there is a contrast between the male and female weights, but to make a proper comparison I will need to plot both sets of data on the same graph. Plotting two histograms on the same page would not give a very clear graph, which is why I feel by using a frequency polygon it will make the comparison a lot clearer.
This graph does support my hypothesis that the male pupils will generally weigh more than the female pupils, as it shows there were male pupils that weighed between 80kg and 90 kg, where as there were no female pupils that weighed beyond the 60kg-70kg group. Although by looking at my graph I am able to work out the modal group, it is still not as easy to work out the mean, range and median also. So to do this I have again decided to produce stem and leaf diagrams as this will make it very clear what each aspect is. Stem and leaf diagrams show a very clear way of the individual weights of the pupils rather than just a frequency for the group-which can be quite inaccurate.
From the diagram above, I am now able to work out the mean, median, modal group (rather than mode because I have grouped data) and range of results. This is a table showing the results for both male and female pupils;
Despite both male and female pupils having the majority of their weights in the 40-50kg interval, 11 out of 30 male pupils (36%) fitted into this category whereas 13 out of 30 female pupils (52%) did which is easily seen upon my frequency polygon. I could not really include that in supporting my hypothesis as the other aspects do. My evidence shows that the average male pupil is 5.16kg heavier than that of the average female pupil, and also that the median weight for the male pupil is 6kg above that of the female pupils. Another factor my sample would suggest is that the male pupils weights were more spread out with a staggering range of 61kg rather than the mere 29kg as the female pupils results showed. The difference in range is also shown on my frequency polygon where the female pupil’s weights are present in 4 group intervals, where as the male pupils weights occurred in 6 of them.
These results continue to agree with my prediction which was made earlier that the male pupils will be of a heavier weight than the female pupils. I can see this as the median, lower quartile, upper quartile, inter quartile range and mean are all of lower values than the male pupils, but also the male pupils range of weights is shown to be greater from these results as their inter-quartile range is 5.3 kg higher than the female pupils.
Similarly as with the weight results, these results continue to further my prediction that the male pupils would be of a greater height than the female pupils. As with the weight results this can be seen from the upper quartile and mean points which in the female pupils case, are all of a value smaller than the male pupils.
From all of the graphs and tables I have produced so far, I can fairly and confidently say that the male pupils weights and heights' are higher than the female pupils, but none of my evidence collected so far helps me conclude my original hypothesis made; "The taller the pupil, the heavier they will weigh."
Although when looking at my cumulative frequency graphs of height and weight, I could make the statement that both diagrams appear to be very similar from appearance although I cannot make any form of relationship between the height and weight. I am now going to extend my investigation and see how height and weight can be related, and to do this the most effective way is by producing scatter diagrams. I will plot male and female pupils on separate graphs as I feel the results will produce a stronger correlation when done this way and also to continue with the style I have begun with. Using scatter diagrams allows me to compare the correlations of the two graphs, and the equations of the lines of best fit (best estimation of relationship between height and weight) of each gender.
This graph shows a positive correlation between height and weight, and all of the datum points seem to fit reasonably close to the line of best fit. There is one point that I have labeled which does not really fit in with the line of best fit - these are called anomalous or outlying points, it means that they do not fit in with the trend of the results
This graph similarly shows a positive correlation, although the correlation is weaker than the male pupils as the spread is lesser on the male pupils graph than on the female pupils. The datum points on this graph are quite closely bunched together in the middle where as on the male pupils graph, there is a wider spread of results - which would agree with the conclusion made earlier that the male pupils heights and weights are of a larger range than the female pupils. As both of my lines of best fit are completely straight, I would assume that the equation of the line would be in the form of;
y = mx + c. When y represents height in m, and x represents weight in kg, the equations of the lines of best fit for my data set are (I obtained these equations from my graphs in “Autograph” as an exact result was available, however if I were to find the results myself I would do so by finding the gradients and looking at the point where they intercept the y axis):
Male Pupils: y = 81.173x - 77.571 Female Pupils: y = 21.996x + 13.78
The reason for displaying the equation of line of best fit is that it can be used to make predictions. E.g. If the equation for Year 8 boys is y = 50x-40 this means for a boy in Year 8 his weight can be predicted if you know his height from calculating weight = 50xheight – 40
I will now be finding out the Correlation Coefficient – This is a more accurate method to compare correlation. It uses the mean of each set of data and looks at the distance away from the mean of each point. The formula, which is known as the Product Moment Correlation Coefficient or r is
(Where and are the means of the x and y values respectively)
The value of r determines correlation. It is always between –1 and 1.
-1 = Perfect Negative Correlation 1 = Perfect Positive Correlation
-0.8 = Good Negative Correlation 0.8 = Good Positive Correlation
-0.5 = Some Negative Correlation 0.5 = Some Positive Correlation
0 = No Correlation
r2 (called R2 in Excel) is the square of the correlation coefficient and allows you to look at the likelihood of obtaining correct predictions from a line of best fit. R2 is the likelihood that an increase in x will produce an increase in y. (i.e. that an increase in height will mean an increase in weight).
E.g. If the Correlation coefficient, r = 0.8
Then r2 = 0.64
= 64%
That is a 64% chance that from any point on the line increasing the height will result in an increase in weight.
PMCC for Male Pupils: 0.776466
-1 0 +1
On this diagram, the PMCC value for the Height and weight relationship for Male pupils is approximately here. This shows me that the graph was strongly positively correlated as the value 0.776 was very high and suggested strong correlation
PMCC for Female pupils: 0.33719
-1 0 +1
On this diagram the PMCC value for the relationship between Height and Weight for Female pupils is approximately here. This also showed me that the graph was positively correlated but also very weak in correlation. The value 0.33719 shows it is low and close to no/zero correlation.
Overall I as the PMCC are positive, it shows and supports my first hypothesis- that the taller the pupil, the heavier they weigh. In addition I have found out that in most cases that boys are taller and weigh more than girls as the PMCC for males is considerably higher than the females.
Throughout this project I have made many hypotheses including;
- The heavier the pupil, the taller they will be
- In general male pupils will weigh more than female pupils
- In general male pupils will be of a greater height than female pupils
I have answered all of these predictions throughout the project with either graphs or text, and it is proved that all of my hypotheses made have been generally correct. There have been some slight points which undermine the predictions, but all over they have been successful. My original task was to compare height and weight, although I have not only considered height and weight but including biased factors such as age.
I spent a great deal of time looking at the differing genders to see whether that affected the height and weight of pupils at all. When looking at this I produced frequency diagrams, frequency polygons, cumulative frequency graphs and box & whisker diagrams, stem & leaf diagrams and scatter diagrams. The overall conclusion was that boys in general are of greater height and weight - mainly defined by the mean values which were higher than that of the girls.
However, all of these hypothesis were all as a part of my main prediction; "The taller the pupil the heavier they will weigh", and from answering all of these other predictions I can confidently say that it is true. I have come to this conclusion based on all of the graphs, diagrams, tables and statements made. On the other hand there were cases where certain data undermined this prediction but that could have been because of the small samples I had allocated myself to obtain. When producing the random sample of 50, I felt that was a satisfactory amount to work with as picking up an analysis and producing graphs from this data was simple and done efficiently. If I were to repeat or further this investigation - I would definitely use a larger number of pupils for the as when the numbers of the school pupils were put on a smaller scale, I only ended up in some cases with a scatter graph with only 4 datum points upon for the year 11 students. To retrieve accurate results from this method of sampling, I feel it is necessary to use a sample of at least 100. Additionally to the stratified work, if I had a larger sample - I would also produce additional graphs, i.e. cumulative frequency/ box and whisker, as I feel that I could draw a better result from these as I felt the scatter diagrams I produced were rather pointless.
I feel my overall strategy for handling the investigation was satisfactory, if I had given myself more time to plan what I was going to do I think I would have come up with a better method and possibly more successful project. There is definitely room for improvements for my investigation - if I were to do it again I would spend a lot more time planning what I was going to do instead of starting the investigation in a hurry. Despite that I feel my investigation was successful as it did allow me to pull out conclusions and summaries from the data used.