After doing all this I made a table of all the data I will be using. I had to manually copy and paste line of data into a new workbook from the spreadsheet and the data I have accumulated is as follows:
Now that I have 60 pieces of data which represent the population proportionally I can begin my investigation. With the data that I have I can draw several graphs to represent the heights and the weights of the different ages or genders. Because there are so many things I can do with the data I need to decide a systematic way to approach the investigation so that I am not wasting time repeating calculations. Because height and weight are continuous data I will have to construct histograms to represent the data and in order to do this I need to make cumulative frequency tables.
I am now ready to begin recording my results in a table. To start with I will look at height. When I arrange the data in order of height I wasn’t surprised to find out that the tallest person was in fact in year 11. I expected the shortest person however o be in year 7 so when I found that the shortest male from my data was in year 8 and that the shortest female was in year 11 I was shocked. The girl’s heights varied quite vastly and showed less of a correlation with age.
Boys
Girls
Boys
Girls
Now that I have constructed cumulative frequency tables can draw histograms to represent the data in a visual way. As you can clearly see in all the histograms which I have drawn the data is easier to understand when displayed in the form of a graph or chart. I constructed frequency polygons to help me compare the boys and girls heights and weights with each other. They clearly show the highest and lowest of both sexes which help me understand the relationship between the variables more efficiently.
Frequency Polygon for boys’ and girls’ Heights
From this frequency polygon you can clearly see how the data is spread. The shortest person is obviously a girl and the majority of people taller than 170 cm are boys. What is interesting is that 14 girls are between 160cm and 170cm where as only 11 boys are in that category. You can see clearly that the modal group is 160cm - 170cm for both boys and girls.
Since the data is grouped into class intervals, it also makes sense to record it in a stem and leaf diagram. This will make it easier to read off median values.
Boys
Girls
The mean height for boys is 159cm which is higher than that of girls which is 156. The sample of girls was more spread out than that of boys with a range of 0.77m.The difference in the median was very slight with only 50mm. The modal class interval was the same is this is why I expected the mean of the heights to be reasonably close.
The evidence from this sample suggests that 11 out of 30, or 37% of the boys have a height between 160 and 170cm. The evidence also suggests that 14 out of 30, or 47% of the girls have heights between the same ranges. The histograms I drew show clearly how the data is spread but decided to calculate the lower and upper quartile and inter-quartile range. This is a more accurate estimation than the histogram.
Frequency Polygon for boys’ and girls’ Weights
The boys’ weights are spread out a lot more than the girls’ weights so I expect that they have a greater standard deviation. As you can see in the graph the boy data has a sudden dip rather than a steady rise and fall. It also easy to gather that the boys have a greater range than the girls. I have decided to a stem and leaf diagram because just like the previous data this data is also grouped into class intervals. This will help me understand and analyse the data more efficiently.
Boys
Girls
All of the boy’s averages are greater than the girls even though the sample was more spread out for the boys with a range of 54kg compared to 33kg for the girls. It is fair to say that 26 girls out of the 30 in other words 87% of the girls fall n the range of 40-60cm.
I calculated the lower and upper quartile to I could find the interquartile range. This allows me to get a general idea of the width of the middle section of the data. With the inter quartile range I can get an idea of how spread out the data is in the middle. I can therefore conclude that the girl’s weights are generally quite close to each other with little range.
Now that I have a basic idea how and weight and height vary accordingly to sex. I could extend my investigation by exploring height and weight to a greater death either by comparing them with each other or possibly simply determining the reason that height and weight is affected by gender and possibly how it varies according to age.
Extending the Investigation
Hypothesis:
I think that the taller a person is the more there weight is likely to be.
To test this hypothesis I need a new sample of possibly 30 students of any gender. I will obviously have to select the data randomly using the same procedure I used before.
The reason that the age or the gender doesn’t matter is because I am only testing the relationship between the weight and height. The data I selected randomly is as follows.
Once I had collected all the necessary data in order to compare the height to the weight I constructed a scatter diagram because I thought that it would be the most sensible and appropriate way to compare the two.
Scatter diagram of Height and Weight
Further Investigation
There is a positive correlation between the height and weight of the students in my sample but it isn’t very strong. This does however support my prediction that the heavier a person is the taller they are likely to way.
There are several results which are reasonably far from the trend line and this is something I shouldn’t consider when evaluating the investigation. I am however able to use the line of best fit to make predictions.
The line of best fit suggests that if someone weighs 60kg they will have a height of 168cm.
Further Investigation
So far I have found evidence which suggests that height and weight are affected by both genders. The next obvious step is to investigate is the difference, by extending my line of enquiry to how the correlation between height and weight is affected by gender. To do so I have decided to test this hypothesis.
There will be a better correlation between weight and height if we consider both genders separately.
Seeing as I already have a random sample of 30 boys and 30 girls I intend to continue my investigation with the same data. To start with I will look at the boys:
Scatter diagram for boys
Scatter diagram for Girls
Scatter diagram for mixed population
There is obviously a better correlation between weight and height if boys and girls are considered separately. So this proves my hypothesis is true.
The lines of best fit on my diagram show that if a boy is 170cm he would weigh 67kg whereas as a girl of the same height would weigh 69kg.
I am aware that trend lines have an equation which is y = mx + c and therefore I have calculated this equation using Excel.
If y represents the height in cm, and x the weight in kg, the equations for the line of best fir are as follows.
Boys Only: y = 0.7846x + 117.94
Girl Only: y = 0.7472x + 118.38
Both Genders: y = 0.7785x + 117.53
This can allow me to check my previous predictions using the Excel Graphs.
The line of best fit is an estimation of the relationship between the height and weight. There are exceptional values in my data such as the girl with a height of 1.03m.
Cumulative frequency graphs
I think that cumulative frequency graphs are very useful when comparing different sets of data. So I have decided to construct one allowing me to compare males, females and a mixed population. To begin with I will look at height;
I have constructed a graph to display the data in order to determine the median, upper quartile, lower quartile and inter quartile range again only this time more accurately. The reason that m results will be more accurate is because the cumulative frequency graph is a continuous approximation of the distribution of values.
I can also the cumulative frequency graph to predict the percentage of given students who have a height and weight within a given range.
For example if I wanted to estimate how many boys in the school were
The box and whisker diagrams below show minimum and maximum values, the median, and the upper quartile and lower quartiles:
Box and whisker diagram for height
Box and whisker diagram for weight
Summary of My Results
● There is a definite positive correlation between height and weight. In general the taller a person is the more they weigh.
● The points on the scatter diagram for the boys graphs had a less of a dispersion about the trend lone than that for the girls.
●The points on the scatter diagram for boys and girls were less dispersed than that for the mixed population. This obviously means that the correlation between weight and height is better when boys and girls are considered separately.
●The fact that my results for the median lower and upper quartile were quite similar for the stem and leaf diagrams as well as the frequency polygons means that I have be carrying out my investigation accurately so far.
●The scatter diagrams can be used to give reasonable estimations of the weight and height. This can be done is two ways, by either reading from the graph or using equations of lines of best fit.
● Cumulative frequency curves confirm that more boys in general are taller and heavier than girls. This can also be backed up by the mean which I calculated.
●The median height for boys for both height and weight is higher than that for girls.
●From the box and whisker diagrams we can conclude that in general the inter quartile range for girls was higher for heights but lower than boys when it came to weights.
●There are several things that I could have done to make my result more reliable for example one of the more obvious ones is taking a larger sample.
● All of the predictions made were based on general trends observed in the data. These trends were obviously affected by either very high or very low values which fell outside the general trend.
●The mean for the boys was higher than the mean for girls for both height and weight.
Detailed Investigation
When I began this investigation I tried to ensure that I minimised bias by using a stratified sample which reflected both the year group and gender of the whole population. This meant that my sample was not unfair because I made these considerations. However if I am to extend this investigation even further I should now be looking at a more detailed method of analysing the data using a more systematic approach.
I intend to calculate both the standard deviation and spearman’s rank coefficient which hopefully will give me a better understanding of the data. This will be the penultimate stage of my project before I evaluate all results seeing as it’s the most time consuming.
I aim for this part of my investigation to be extremely detailed. I will look at the data in small clusters in order for me to get a better understanding about height and weight. Simple because I am going to carry out a detailed investigation I need to take a systematic approach to prevent confusion.
To begin with I will calculate the standard deviation for the height of the whole population and then just boys and girls. To extend the investigation even further I will then calculate the standard deviation for the different year groups.
I will use the exact same procedure the weight of the population and this will give me an understanding about how the data is spread in the different categories.
I will then continue my investigation by constructing scatter graphs for each of the years and then calculate the gradient for the graphs giving me an idea of the steep the line is.
After that I will then calculate spearman’s rank coefficient. The reason I have decided to do this is because I want to find out how correlated the height and weight are. I will calculate the coefficient for the boys, girls and different year groups. That way I will be able to compare the values and decide which year group or gender has a better correlation.
I will then evaluate all my results and make necessary comments to the results I obtained. I need to show my understanding of my results by evaluating all of the outcomes.
I also may carry out research on the BMI. This helps me understand how the relationship between height and weight varies according to age.
Standard Deviation
To begin with I have calculated the standard deviation for the whole population. The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. When the examples are pretty tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. When the examples are spread apart and the bell curve is relatively flat, that tells you have a relatively large standard deviation.
The formula for standard deviation is as follows:
Standard deviation for the height of the whole population
0.150282
Standard deviation for the height of the boys of the population
0.148333
Standard deviation for the height of the girls of the population
0.15358
Standard deviation for the height of the just the year sevens
0.11498686
Standard deviation for the height of the just the year eights
0.163164112
Standard deviation for the height of the just the year nines
0.097848653
Standard deviation for the height of the just the year tens
0.113705272
Standard deviation for the height of the just the year elevens
0.240023147
Scatter graph for year sevens
Weight
Scatter graph for year eights
Weight
Gradient of the line : y = 0.9158x + 107.43