After making the adjustments, my graph now looks like this:
This graph is a lot more sensible.
I still have the same number of records as I did with my first graph.
The graph represents a positive correlation, which is what I had expected from the contrast between ‘height’ and ‘weight’. I deleted a few outlying records that did not follow the pattern.
After working out the correlation co-efficient, using the equation ‘=Correl(C3:C484,B3:B484)’ in excel, the program gave me a value of 0.572523. This means I have a positive correlation, and the relationship between the two pieces of data is quite strong, because it is above 0. Negative correlations are represented by a number between –1 & 0, and positive numbers from 0 to 1. The number 0.572523 is almost 0.6, meaning the correlation is stronger than it is weak.
No I will compare results from different groups to see if they change depending on gender.
All girls:
There are no outliers worth deleting.
Correlation:
=CORREL(P6:P241,Q6:Q241)
=0.448597
The value 0.448597 tells me that the correlation is positive, but quite weak. I can tell this is correct by simply looking at the graph.
All boys:
There are no outliers worth deleting.
Correlation:
=CORREL(S6:S251,T6:T251)
=0.635105
The value 0.635105 tells me the correlation of this graph is also positive, and quite strong. It forms more of a straight line than the graph for the ‘Height & Weight of all girls’ does.
The boys’ correlation, since it is quite strong, represents a stronger relationship between the heights and weight of the boys. This makes it easier to work out the weight of a boy if it was unknown by looking at his height.
However, as the girls’ correlation wasn’t very strong, it is harder to work out the weight of a girl, if it was unknown, by looking at their height.
Therefore a stronger correlation makes data easier to work out.
Earlier, I predicted my results would be of a positive correlation, and I was correct; both of the graphs lean in a positive direction, and the numbers I obtained from excel show positive correlations.
Since my hypothesis was correct, I will go ahead and make a new one. This time I predict that as the year increases, the correlation will become stronger, displaying higher values. I am making this prediction because I think that right in the middle of adolescence (year 11), I think there will be more balanced heights and weights.
I will now measure the co-efficient of each of the years to help me find out if my hypothesis is correct.
Year 7: 0.532377
Year 8: 0.453785
Year 9: 0.379894
Year 10: 0.279076
Year 11: 0.780212
My prediction was very inaccurate, as the complete opposite happened; the co-efficients showed lower values as the year increased, excluding year 11. I am not sure about the co-efficient of year 11, so to check this I will make a graph using the data to see if the correlation is really as strong as it appears…
After looking at the graph, I can tell the correlation is quite strong after all. Therefore I believe the co-efficient I received earlier is correct.
I will now make a new hypothesis.
I predict that the height and weights recorded will alter depending on gender.
Above is a table with all of the initial data in it. I will use this in conjunction with all of the heights and weights to create the graphs for all 5 years.
Here are my graphs with evaluations:
After all of the data I have received, it is time to compare all of the correlations of each gender (for each year). I will work out the correlation for every year and for both genders and complete the table. I am going to do this to see if correlations are affected dependant on the gender. I do not need to draw graphs for this because the program ‘Microsoft Excel’ can calculate the correlation of two sets of data much faster. The following correlations have been acquired using this program:
This table compares the correlations of both genders in every year. After looking at all of the correlations, I have found that in all cases, the correlation of the girl’s heights and weights are lower than the correlation of the boy’s heights and weights. I think that this signifies that girls and boys have genetically different bodies. Since the boys all have stronger correlation co-efficients, this may mean that the heights and weights of males are more secure than those of girls. The heights and weights of girls are more liable to vary.
However, the correlation of the girls of year 10 is far too different for me to believe that it’s correct. I will make sure that it is right by making a graph for girls in year 10.
After examining this graph, I’m now convinced that the correlation of 0.07993 that I had received before was correct. There is no correlation here at all, because the points are so spread out.
My new hypothesis is that the relationship between ‘height and weight’ for boys is stronger than the relationship between ‘height and weight’ for girls.
I will now investigate the quartiles of the graphs and ultimately make box plots for each set of data to see if I am correct. Putting my data into box plots will make it easier to tell where each of the quartiles for the years and genders lie, and thus making it a lot easier to compare certain years and genders with each other.
I am unable to make box plots on a computer, so I will plot them on a piece of paper.
I have plotted the Box Plots after using the ‘quartile’ formula in excel, to work out the quartiles of the data. I received accurate medians, lower and higher quartiles and highest and lowest values.
I stacked the Box Plots on top of each other, so that they can easily be compared with each other.
By looking at my box plots I can see a slight increase in the interquartile ranges as the year increases. I expected this, because an interquartile is the middle 50% of the data and is an accurate generalisation of the heights of the year; I know that years 11s are taller than year 7s in most cases, and this is why I expected this.
Another thing I have noticed is that the ranges for the girls are mostly larger than the ranges for the boys. This means that girls’ heights must be more varied than the boys. I think the reason for this is because girls start to near the end of their growth sooner than boys. It makes sense that their heights are going to be quite literally all over the place.
Also, the interquartile ranges for the boys are always larger (except for year 7). This supports my theory that girls’ heights are extremely varied. These large interquartile ranges suggest that there is more variation in the middle 50% of the boys. Girls have small interquartile ranges, however, which prove that the two sexes have very different bodies. This suggests that the middle 50% of the girls have very little variation in their heights.
In some of the interquartile ranges, the medians are shown to be further to the left or to the right. When the median is on the left of the box this is called a positive skew. This is where all the heights between the lower quartile and the median are not very varied at all, but all the heights on the right of this median are more spaced out and varied. Negative skews work exactly the same way but the median is further to the right.
An example of a positive skew is the box plot for year 9 boys. This implies that the curve for this diagram (if sketch out onto a cumulative frequency diagram) would be steeper at first, but would start to reach the right of the graph towards the last part of the data.
An example of a negative skew is the box plot for year 10 girls. The interquartile range here is quite small but the whiskers are quite long, so the curve would look strongly vertical in the middle.
Conclusion
After investigating the relationship between height and weight using scatter diagrams, and then finding out how height is affected by age, the information I received back from the project reinforced what I already knew. I was aware that year 11s are taller than year 7s, and this proved that with numbers. It also taught me general things that happen to the bodies of the children as they grow, as stated above.
However, I could further my investigation by creating box plots for weight, and exploring how this is affected by age too. I could have also created cumulative frequency curves that reflect the shapes of the box plots, so that I have information of how height is affected by age in two different kinds of graph.