These are all of the samples’ outliers for all the years and the sexes that show the lower quartile, inter quartile, upper quartile and median value:
Year 7 Girls
Year 8 Girls
Year 9 Girls
Year 10 Girls
Year 11 Girls
Year 7 Boys
Year 8 Boys
Year 9 Boys
Year 10 Boys
Year 11 Boys
Graphs
For hypotheses A, the best graph to choose would be a scatter diagram. This is because both data that I am using (height and weight) is quantitative and I have to compare both of them. I will write the height on the y axis and the weight on the x axis, so when I put up the data on the graph I can clearly see if there is a correlation or not. A histogram would not be a good choice for the first hypothesis as it is designed to display tabulated frequencies.
For hypotheses B I would use a histogram as it would fit in perfectly as histogram is used to display frequencies and the distribution of the data. I would sort out the data into groups and then work out the frequency for each one.
For hypothesis C I would use a box plot diagram as I can use it to show frequencies as well. A box plot contains an inter-quartile range, for the middle values, a median, for the average value, an upper quartile, for the higher values and lower quartile for the lower values.
Sampling
The data that I have been given is in excel so I do not need to organise it. The sample I am going to take from the data is going to be stratified. This is so I can ensure that the data is distributed throughout the whole of the database.
There are different types of samples, some of them are:
Simple random sample
Everyone has an equal chance of being selected. It is when the person randomly chooses a certain amount of people. This is one of the least biased types of sampling. For example picking out a number, from a hat full of numbers, without looking.
Census
A census is when the whole of the population is surveyed. The only true sample is when 100% of the population is surveyed. A large sample will be the most reliable and the most accurate.
Systematic random sample
In a systematic random sample people are selected according to some rule. For examples the data could be in alphabetical order and every tenth person is chosen.
However samples using this selection may not include the whole population and may not be a fair or distributed. For example it may contain only girls or only yr 10s, which will make it a very bias sample.
Quota samples
Quota samples are often used in market research.
The population is divided into groups (gender, age, sex etc). A given number (quota) is surveyed from each group. This type of sample is not random, but is cheap to carry out and can be done quickly. This can be very biased.
Stratified samples
Stratified sampling is when you divide the population of the data, which is the students in this case, into strata, or categories, which in this case will be year groups. Then from each stratum, take a random sample. The size of each sample is in proportion to the relative size of the stratum from which it is taken.
I will take 20% from the girls from yr 7 and then I will take 20% from the boys from yr 7. I will then do the same for each year, this is stratified sampling. The following table will show how many people will be sampled from each year.
I worked out the sample by first finding out how many boys and how many girls there were in each year group. I then used the formula, =name of box*0.2, in each of the boxes in the “Amount of Sample” area. I had to round the amount of sample, as I cannot have a half of a person.
On each scatter diagram below it shows the equation to make the product moment correlation coefficient. For me to produce the actual result of the product moment correlation coefficient I would need to square root “R2”. The diagrams also show the correlation between the two sets of data, making it possible to compare them.
These are all my samples:
Year 7 Girls
Year 8 Girls
Year 9 Girls
Year 10 Girls
Year 11 Girls
Year 7 Boys
Year 8 Boys
Year 9 Boys
Year 10 Boys
Year 11 Boys
Hypothesis 1
(There is a positive correlation between height and weight)
I believed that there is a positive correlation between height and weight. As you can see that on all of the graphs there is a positive correlation with a few of them (Yr 7 and 11 Boys) being a strong positive correlation.
The correlations between the height and weight are fairly strong. You can see this as the line of best fit is positive. This means that as the height of one person increases, their weight also increases; varying on the amount they grow.
A good example is of the Yr 11 boys’ graph. It shows that as the height increases, so does the weight.
This is an example from the Yr 11 boys’ graph; it shows that the person who is 1.78m, which is the smallest, tall weighs 37kg, which is also the smallest and the person who is 2.03m, which is the biggest, weighs 86kg, which is the biggest.
This clearly shows that there is a positive and fairly strong correlation between height and weight.
The above table shows the Spearman’s Rank Correlation Coefficient for each year and it is split into both of the genders. This is worked out by using the Product Moment Correlation Coefficient (on the top left hand corner of each graph) and square rooting it and rounding it to 3 decimal places. From the table you can see that the lowest figure is 0.136 and the highest is 0.707, which is a big difference in terms of Spearman’s Rank Correlation Coefficient. A very weak negative correlation would be -1, a balanced normal correlation would be 0 and a very strong positive correlation would be 1. 2 of the graphs have a fairly strong positive correlation, Yr 7 boys and Yr 11 girls. The rest of the figures have a weak positive correlation.
Hypothesis 2
(On average the students’ heights will become greater as they age)
I will prove that on average the students’ heights will increase. Below are the graphs for boys and girls that show a correlation between year 7 and 9 and year 9 and 11. This will show that the height does increase as the students’ age. For the samples I will obtain them from both genders.
The above diagram is of two box and whisker diagrams from Yr 7 heights and Yr 9 heights. From this you can see that on average, the Yr 9 heights are greater than the Yr 7s, indicating that their age has increased through the two years. I have used box and whisker diagrams to make it easier to compare both sets of data.
The two histograms above are from the two sets of data, Yr 7s heights and Yr 9 heights. From this you can also clearly see that the Yr 9s median height is greater than the yr 7s. The Yr 9s also have a smaller percentage for the lower quartile.
Above are the stats from the box and whisker diagram. You can see the lower quartile, the upper quartile and the median for both sets of data. So as it is shown, all of the Yr 9s stats are greater than the Yr 7s. The lower quartile is greater by 0.09, the median is greater by 0.04 and the upper quartile is greater by 0.04, which means that overall the heights of the Yr 7s have increased when they have reached to Yr 9. In Yr 9 the standard deviation is less than yr 7, which means that it is less varied. This definitely strengthens my hypothesis.
From the above diagram, that shows the Yr 9 heights and the Yr 11 heights, you can see that on average the height also increases. This also shows that the height increases, as pupils get older.
Above you can see two histograms that have been transferred from the two sets of data, Yr 9 heights and Yr 11 heights. I have used histograms so it is easier to compare and see the two sets of data. From this you can see that the Yr 11s have a greater amount of pupils that are the tallest in the year than in Yr 9. In year 11, 29 pupils are between the heights 1.5m and 1.8m, however in Yr 9 this is less. So this shows that on average, the greatest height is bigger in Yr 11 than in Yr 9. Also you can detect that the range is bigger in the heights of the year 9s than in Yr 11s. Yr 11s range is 0.9m while Yr 9s range is 1.4m.
Above are the stats from the Yr 9 heights and Yr 11 heights. You can plainly see that the Yr 11 heights are much greater than the year 9s, indicating an increase through age. The lower quartile is greater by 0.09, the upper quartile is greater by 0.07 and the median is greater by 0.1. In Yr 11 the standard deviation is more than yr 9, which means that it is more varied.
From the two examples above, I believe I have proved my hypothesis to be right as in the first example you can see that the Yr 9s stats are greater than the Yr 7s and the Yr 11s stats are greater than the yr 9s. This must mean that the pupils’ heights increase as they become older.
Hypothesis 3
(On average boys are taller than girls)
In this hypothesis I want to prove that on average boys are taller than girls. For this I will use the data from both genders from Yr 7, Yr 9 and yr 11. I am not going to use the years in between, as they will not be as different as the years next to it because people don’t grow significantly in 1 year but you will be able to see the difference in 2 years.
The above box and whisker diagram shows data from Yr 7 boys and Yr 7 girls. From this you can see that the girls’ Upper Quartile, Median and Inter-Quartile Range are greater than the boys. Only the boys Lower Quartile is slightly greater than the girls. The girls’ Inter-Quartile Range is greater by 0.06, the Median is greater by 0.02 and the Upper Quartile is greater by 0.02. Overall, from this data, you can see that the girls’ heights are greater than the boys. This contradicts my hypothesis as I have said that “On average boys are taller than girls”. This is then evidence that my hypothesis has been proven to be wrong for this year group. This surprised me as I did not except this to happen, but after research I have come to a possible explanation. I believe that the girls’ average height is greater in this table of results because girls mature much earlier than boys. This means that they will start to grow taller earlier than boys do, resulting in Yr 7 girls being taller than Yr 8 boys.
The above diagram shows Yr 9 boys and Yr 9 girls. Just from this diagram you can see that the boys’ height, on average, is greater than the girls. The table below also shows the stats of the data. From there you can see that the boys Lower Quartile is greater by 0.03, the boys Upper Quartile is greater by 0.07, the Median is greater by 0.06 and the Inter-Quartile Range is greater by 0.04. Mathematically this shows that the boys’ average height is greater than the girls, but not significantly. This strengthens my hypothesis.
The above box and whisker diagram shows data from Yr 11 boys and Yr 11 girls. From looking at the graph you can get an assumption that the average height of the boys is greater than the girls. Looking at the mathematical data below, the girls’ Lower quartile is greater by 0.05, but the boys Upper Quartile is greater by 0.12, the Median is greater by 0.01 and the Inter-Quartile is greater by 0.17. This is a big difference compared to the previous results of Yr 9 and Yr 7. This result shows that on average the boys average height is greater than the girls, more significantly that the previous results.
Conclusion
Hypothesis 1
(There is a positive correlation between height and weight)
For the first hypothesis I believe I have proved it to be true. This is because in all of my examples I have showed that as the weight increases so does the height, vice versa. My proof is from the 10/10 samples I have produced. There is a definite positive correlation in the sample and it is fairly strong. I used product moment correlation coefficient, spearman’s rank correlation coefficient, upper quartile, lower quartile, inter-quartile range and the median to prove the correlation and the relationship between the two sets of data from each year group. I have used box and whisker diagrams to portray my results and I have produced tables for each of the year groups and genders to show the upper quartile, lower quartile, inter-quartile range and the median. I had also removed the outliers to make my findings accurate otherwise with them my results would have been distorted. I could have improved my research if I would have used a bigger sample, to make my findings more accurate.
Hypothesis 2
(On average the students’ heights will become greater as they age)
I believe that I have proved this hypothesis to be correct. I used samples from Yr 7, Yr 9 and Yr 11 to prove this prediction. It shows that as the pupils’ age, they become taller in height. I used frequency density histograms and box and whisker diagrams to show my results. I also removed the outliers from the samples that I have used in this hypothesis. Though the evidence is in a small proportion, to elevate the strength of my hypothesis I would have to use a bigger sample.
Hypothesis 3
(On average boys are taller than girls)
On my third hypothesis I believe that it has partly been proven wrong. This is because in Yr 7, girls’ average height is greater than boys, but in the other years the boys had a bigger average. I believe this to be true because girls mature more quickly than boys, leading to their earlier height increase. In year 9, boys average height where slightly bigger than the girls. I believe this is true because this is the time where the boys start to mature and are maturing much more quickly than the girls do, so their average is only slightly bigger. In Yr 11, I have proven that the boys’ average height is much bigger than the girls in contrast to the two previous years. This is true in my opinion because as my hypothesis has stated “On average boys are taller than girls” the boys have now almost fully matured, showing off their height, which is bigger than the girls. This means that my hypothesis is only partly true. If I would have used a bigger sample from a lot more people than I would have got a more accurate result but I believe that the results would stay similar and not vary too much.
Evaluation
Overall, I think I have given accurate and reliable results for my investigation. I have taken out the outliers, used graphs to examine every hypothesis, and accurately calculated other figures like standard deviation and the mean. This helped me to identify correlations between the different data types and to provide me with reliable conclusions.
To improve my investigation, I would have used more factors from the Mayfield data, such as more pupils or from other schools. This would have given me more accurate results, because there would be more information to look at. Apart from that it would have been easier to identify any patterns and if I had more data to analyse I would have provided with more precise results
Apart from that I could have further developed my investigation by adding more hypotheses to examine other factors.