Method
To investigate my hypothesis, I used given data from Mayfield High School. It contained information such as name, age, height, weight, IQ, eye colour, hair colour etc.
Using a computer, I opened the data using Microsoft Excel. As I was only investigating height and weight of year 7 students, I deleted all the irrelevant columns, such as year 8, 9, 10, 11 students, IQ, hair colour, eye colour etc. I then sorted the data, again using the computer.
I discarded my outliers in the data. I discarded data which either had very tall or very heavy people on and this may be false data, unreliable and affect my final result.
I needed a sample of 50 students and I had 200 year 7 students. To get my sample of 50 students, I used systematic sampling. To get a sample of 50 out of 200 I chose every 4th sample. I then copied and pasted every 4th sample onto a new data sheet to get the samples I needed.
Again with the computer, I drew a scatter graph to show the information of my sample. I calculated r and drew a regression line on the graph.
I then printed out my scatter graph and my sample.
Calculating r
To show how r is calculated, I chose a small sample of 10 from my sample of 50. To get my small sample of 10, I again used systematic sample, and chose I every 5 samples. R is the correlation coefficient.
Conclusion and Evaluation
My graph shows that I proved my hypothesis. The graph has positive correlation which means that as the height of the students increased, so did the weight.
My result for r was 0.97
The computers result for r was 0.5
This proves my hypothesis that the correlation coefficient would be between 0.5 and 1. However, I had expected the two numbers to be closer together. This might be that the systematic sampling of my small sample happened to chose samples that were all in proportion to each other, with the weight increasing with the height.
To maybe get a different result, I could have tried stratified sampling. This would be where I chose the samples.
From the regression line on my graph, it is possible to estimate different X and Y values. For example, it shows that someone weighing 50 kg should be around 1.6 m tall.
To extend and improve my investigation, I could look at more students than just year 7 students. I could look at years 8, 9, 10 and 11. This would help me to see if the same conclusion applies for most people. I could also investigate boys and girls separately to see if there is any difference of the relationship between height and weight between the different genders.
To do my coursework, I used Microsoft Excel. I think this had a lot of advantages because it sorted my data for me, drew a scatter graph, calculated r, and drew the regression line. This all made it a lot quicker and easier for me. I don’t feel that using the computer had any disadvantages.