The Data
Modelling Procedures - The Scatter Diagram
Modelling Procedures
For the data on the previous page I have drawn a scatter diagram and a line of best fit, passing through the points (mean of x, mean of y). The term of line of best fit is used to describe a straight line drawn through a set of data points so as to fit them closely as possible.
Mean of x= 11458
68
= 168.50 cm
Mean of y = 05488
68
= 80.71 kg
With this information it has been possible for me to create a line of best fit on the scatter diagram I will display on the next page.
Modelling Procedures Continued
The scatter diagram I have drawn (on page 5) shows that my data has positive correlation, in this case both variables increases together. The shape appears to be showing good positive correlation (the closer the points to the line the stronger the positive correlation). This suggests that there is linear correlation between the two variables.
I will now test if the data shows normal distribution of the two variables.
I will do this by putting the engine size into intervals and counting how many cars are there in each interval.
Basically the scatter diagram shows that there is a positive linear correlation present.
I will now test if the data shows normal distribution of the two variables.
I will do this by putting the height into intervals and counting how many people there are in each interval.
The line and column graph both show that the variables are random and show the data is approximately distributed normally, because variables are random and they have linear correlation therefore the variables are bivariately normal.
Analysis and Interpretation
I have now calculated the correlation coefficient on the previous pages and found that as a result r = 0.9038
The rule is the nearer the value is to +1 or –1 the stronger the correlation. In this case there is a positive correlation so the main key is how close the correlation is to +1 and is less than 0.1 away from +1 therefore the height and weight coefficient is very strong.
Now I will intend to test this new correlation coefficient to prove that this is true.
H0 (Null Hypothesis): There is no correlation between height and weight
P = 0
H1 (Alternative Hypothesis): There is a positive correlation between height and weight
P > 0
Significance Level: 5%
This is a one tail test and by looking up the critical value in the product moment correlation coefficient table, I found that n only goes up to 60. In this case I will need to select the critical value where n= 68 and the significance level = 5%, a one tail test.
When n = 60 the critical value is 0.2144 therefore I can justify that the critical value in this case will be
Less than 0.2144
Comparing this to the correlation coefficient r = 0.9038 I found that:
0.9038 > 0.2144 therefore the null hypothesis is rejected and the alternative hypothesis is accepted in which there is a positive correlation between height and weight. I have come to this judgement because the correlation coefficient is greater than the critical value, hence null hypothesis is rejected and alternative hypothesis is accepted.
Now I have found the correlation coefficient as well as tested the correlation coefficient.
Interpretation Continued - What has been discovered?
I have discovered that the correlation coefficient is 0.9038, which illustrates an extremely strong correlation between heights and weight, as the rule suggests the closer the correlation coefficient is to +1 the stronger the correlation.
In addition I have found that I have proven the correlation to be corrected by performing a test, which I have explained on the previous page.
I had also found many statistical figures of average weight and height, etc of the sample, as well as the distribution, which could be helpful for surveys, and even fitness clubs.
Conclusion
As a conclusion I have fulfilled my aim of helping the clothing industry.
I have shown there is a positive correlation between height and weight therefore the increase in height will mean an increase in weight, which can help to decide on ordering or producing certain clothing. In addition the distribution graph and some of the statistical data that I have found earlier such as the mean could be used to order or produce clothing.
Efficient use of data and Implications of Conclusion
The data had been worth collecting, as it has shown a result that can help to benefit the clothing industry, which I have explained and this in turn, hence I have demonstrated that the data was worth collecting due to the fact that the results can now be used to order or produce new clothing for stores to increase sales, hence increase market share.
The implications of this to the population in question will mean that there will be a wider variety of clothing available that will actually suit there needs in terms of size hence will improve their quality of life. Basically the population that I had collected the sample data from will have a suitable variety of clothing available to them in the stores.
Accuracy and Refinements – Errors and Restrictions of Data
There are very obvious possible sources of error, which are that when the measurements had been taken of weight and height that visual reading mistakes could have been done, such as reading of the scales incorrectly even though two people took the readings to ensure accuracy, however there is still a possibility of error.
In addition I am unaware how sensitive the equipment was for the collection of this data.
When measuring the weight people may have been wearing heavy clothing, which was not taken in to account, which could cause my results to exist with a margin of error.
The restrictions that had been imposed on me upon the data, was that the data was restricted to a single year group, the sixth form and was only made up of males, as the data was from an all boys school. I was unable to have the height and weights for a wider group of people
In addition another restriction was the degree of accuracy, as the measurements were taken to the nearest whole number I would have preferred a further degree of accuracy possibly to two decimal places, which would have meant the use of expensive equipment but would have further increased the degree of accuracy upon my findings.
Refinements
Possible refinements that I would have made to this investigation to improve the quality are if I could have been able to have collected data of both boys and girls and made a comparison, hence the clothing industry will be further benefited through more results.
A larger sample size would have been an ideal improvement as this would have allowed me to further investigate in to the height and weight correlation in more detail and further increase accuracy of my results.
Overall as a conclusion I have fulfilled my aim and have done so efficiently, with accurate results, overall the investigation is a success.