Modeling Procedures:
I am now going to include the scatter diagram to see if there is correlation.
I can see that there is a negative correlation between the two variables. I am not completely satisfied that the data takes an elliptical shape so I am going to draw up histograms on each variable to see if it is possible to use PMCC. If the histograms are not normally distributed I shall use Spearmans.
The histograms are roughly normally distributed so I will use the PMCC.
Now I am going to calculate the PMCC using excel.
Now I am going to do a hypothesis test to see if there is enough evidence from my sample to conclude that there is correlation in the whole population.
: ρ = 0 (No correlation between my variables in all the countries in the world)
: ρ < 0 (Negative Correlation)
I will do a one tail test at 5% Significance level.
The critical value for PMCC at 50 items of data is 0.2353. As I can see the PMCC for my data is -0.77 so there is very strong evidence that there is negative correlation between the data.
I am going to do a regression line to do a line of best fit for the data.
Y=a + bx
b=
Using excel I have found;
=9.21
=67.69
b=-1.72
I will sub mean x and mean y to find a.
67.69= a + (-1.72*9.21)
a = -4.27
So equation of line of regression is;
y = -4.27 - 1.72x
I will use excel to draw the line of regression;
Interpretation:
I have discovered that there is negative correlation between my two sets of data. This is shown on my graph and regression line.
I can firmly say that I have achieved my aim as there is negative correlation as predicted.
The sample taken is of the whole world and so is a very good representation of the whole population.
Using the correlation I can predict that if there was a country with very high death rates then there is expected to be a low average life expectancy for each person. This would be the trend for every country but some countries may incur high death rates during war but this may not bring the life expectancy down because the death rate are only due to war and not because of natural reasons. These exceptions shall not affect the overall correlation though.
I think that the data was worth collecting because I know it is important to realize that people in third world countries are more likely to die young because of high death rates due to diseases and other factors.
Accuracy and refinements:
The one most important factor in the reliability of my work is decided by how good the sampling method is. I could have improved my sampling method by using simple random sampling instead of systematic sampling. This is because my systematic sample only included every 4th so for example every 3rd did not have a chance to be chosen. Simple random sampling shows that every item of data has an equal chance of being chosen.
Although the data is very reliable there are some improvements that could be made. First of all the data was only collected for a given year. For more accurate data I could have used data over five years to see if there is actually a difference and to see if for example at that given years there may have been high death rates due to a factor like war or disease. The deaths were out of every thousand people. For a more accurate data I could have made this out of ten thousand people. Also death rates may be very different for men, women and children.
In my data I can see on the scatter diagram that there seems to be two outliers. I will delete these and see if the line of regression will be different.
As you can see that although the equation is different it is not enough to conclude that the outliers make a big difference.
Also the sample was only from 200 countries and there are more countries in the world so a more fair representation would be to random sample from every country in the world. This was out of my hand because my source did not include some of these countries due to political reasons.
Overall I am very happy with the accuracy of my data because I got it from a very reliable source (www.CIA.gov website). Having a reliable source for my data enables me to achieve my aim of negative correlation.