- 2 -
Modelling Procedures:
I decided to use Excel to input my data into a table format (shown above), from this table I used Excel to draw a scatter diagram of all the data.
Scatter Diagram to Compare Life Expectancies to People Per Doctor For 50 Random Countries
The scatter diagram gives a good diagrammatic representation of the data and shows how the data is spread in roughly an elliptical nature. From this I can make an initial conclusion/statement that both data variables are random and normally distributed. Due to the elliptical nature of the data it allowed me to produce a regression line from the data. The regression lines shows visually roughly how strong or weak the correlation of the data is and in this instance the data is a relatively strong negative correlation. The strength of the correlation can be calculated using Pearson’s Product Moment Correlation.
To do this I used Excel to set-up a table consisting of (xi, yi , xi2 , yi2 , xiyi ) and the sum of all columns (shown page. 5)
- 3 -
- 5 –
Pearson’s Product Moment Correlation Coefficient
This is denoted by ‘r’
r = Sxy
Sx Sy
Sx = Standard deviation of x =
Sy = Standard deviation of y =
Sxy = Covariance = 1/50 ∑xi yi – x y
= 1/50 ∑xi yi – x y
Sx Sy
Sx = 11588.897
Sy = 12.312
Sxy = -87234.776
R = -0.624
Hypothesis Test
I’m going to test my data at a 5% significant level. p = Population Product Moment Correlation Coefficient,
H0: p = 0 (no correlation between people per doctor and life expectancy)
H1: p < 0 (negative correlation between people per doctor and life expectancy)
I’m using a 1 tail test- as from the initial scatter diagram and Pearson’s Product Moment Correlation Coefficient I’m aware that the correlation (if significant will be negative).
- n = 50 r = -0.624 r (critical value) =
Therefore by using the tables of critical values for (r) when n = 50 it is evident that the value for r (-0.624) is greater than the critical value when n = 50 at a 5% significant level.
H1: p < 0 (negative correlation between people per doctor and life expectancy) can be accepted and H0 rejected. Thus showing that at a 5% significant level there is negative correlation between people per doctor and life expectancy.
- 6-
Regression Line
Using the equation for a regression line: y- y = Sxy (x -x)
Sx2
I've generated an equation to calculate the value of (x) from (y).
-
y - 66.8 = -87234.776 (x- 5879.22)
11588.8972
Conclusion
The scatter diagram is a good initial indication of negative correlation between people per doctor and life expectancy, suggesting that for countries that life expectancy is low there will be a greater number of people per doctor- compared to a country with higher life expectancy.
Pearson’s Product Moment Correlation Coefficient determines the strength of correlation between data, i.e
- if r = o (no correlation)
- if r = -1 ( perfect negative correlation)
- if r = 1 (perfect positive correlation)
Because my calculation gave me the value of r equal to –0.624 it supported the initial interpretation of the data having negative correlation and indicated that the negative correlation was of a reasonable strength.
I decided to carry out a Hypothesis test on the data. This was carried out by the comparison of r (-0.624) with the corresponding critical values of (r) from the tables- showing negative correlation between people per doctor and life expectancy at a 5% significance level.
- 7-
Accuracy
The accuracy of my raw data is likely to be of the highest accuracy due to the fixers being obtained from the CIA (Central Intelligence of America) web site- from this I can be certain that all data is recent and for my investigation reliable. The only error likely to occur is the ever changing patient to doctor ratio, although is accounted for before the raw data was published by the CIA. I found this the most accurate and up to date source of information available for my access.
Within the calculations itself the results are also of my highest possible accuracy. I used Excel to initially calculate Pearson's Product Correlation Coefficient, Mean, Standard Deviation and Co-variance, that was then check by hand using a calculator and the formula's included within my investigation. I kept the data to 3signifcant figures as accuracy beyond this wasn't necessary for this particular investigation.
The regression line was also drawn by Excel and not by hand as to be most accurate.
The only inaccuracy that I felt might have effected my investigation is a particular significant outlier or anomal result: (a result over two standard deviations from the mean). This could have caused my standard deviation of X to increase and Y to decrease compared to all other data figures, leading to a possible inaccuracy to my Co-variance and Pearson's Product Correlation Coefficient. The anomaly is highlighted in my scatter diagram (including the regression line) as to show the change in the regression line to incorporate this outlier- another possibly affected factor in my investigation.
- 8-