Modelling procedures: -
I am going to do a scatter diagram of GDP – per capita against life expectancy at birth for my 50 pairs of data to see if there is any correlation. A scatter diagram is an appropriate modeling procedure as it shows a clear relationship between two random variables.
As you can see from the scatter diagram the points form a relationship which appears to be a curve so to try to establish a more linear relationship. I am going to do this by first logging my data for the GDP per capita and not logging the life expectancy data and then do a scatter diagram of this data. I am then going to log the life expectancy data but not the GDP per capita data and do a scatter diagram of this data. Then finally I am going to log both my data for GDP per capita and the life expectancy at birth and do a scatter diagram. I am going to check which scatter diagram gives the strongest linear correlation and that’s the data I’m going to chose.
You can see from the scatter diagrams that the log of GDP per capita against the life expectancy shows the strongest linear correlation so that is the one I am going to choose. Therefore this means that I am going to use the data for log of GDP per capita and the life expectancy at birth.
From the scatter diagram I can see that there is a positive correlation between the two variables. From looking at the scatter diagram I can see that the data takes an elliptical shape. Since the ellipse appears to be quite narrow it implies that there is a good positive correlation i.e. as one variable increases, so does the other. Therefore the data shows a clear linear relationship.
Another technique that I am going to use is a histogram because you are able to see the distribution clearly and able to determine whether I can use Pearson’s product moment correlation (PMCC) or Spearman’s coefficient of rank order. I am going to draw a histogram for each variable and if the distribution is not normally distributed I shall use Spearman’s and if it is I shall use PMCC.
As the histograms roughly show a normal distribution I am going to use PMCC method.
Analysis: -
Now I am going to calculate the PMCC with the help of Microsoft Excel.
This shows that my variables have a good positive correlation.
I am now going to carry out a hypothesis test on the correlation coefficient to see if there is enough evidence from my sample to conclude that there is correlation in the whole population.
: ρ = 0 (There is no correlation between the two variables in all the countries in the world)
: ρ > 0 (Positive Correlation)
N= 50 I will be doing a one – tail test at the 5% significant level
So the critical value = 0.2353
So 0.833872644 > 0.2353
Therefore I can conclude that there is enough evidence from the sample to say that I accept – that there is a positive correlation.
Regression line
The equation of the regression line is:
As you can see on the page here is my scatter diagram with the regression line drawn on it which was all done in excel.
This is Y upon X regression line.
Interpretation: -
From the investigation that I have carried out I have discovered that that there is a positive correlation between my two sets of data which is shown on my graph and regression line.
The aim of my investigation was to see if there is any correlation between the GDP per capita ($) of a country and the life expectancy at birth (years). I can now confidently say that I have achieved my aim as there is a positive correlation as predicted. The sample that I took is of the whole world and is a good representation of the whole population.
By using the correlation results I can predict if there was a country with a low GDP then it is expected that they have a low average life expectancy. This trend would be expected for every country in a similar position but some countries may incur lower life expectancies than normal due to some external factor e.g. war, outbreak of a new disease or some sort of natural disaster. But regardless of these exceptions they shall not affect the overall correlation.
I think that this data was worth investigating and collecting because I now realise how important the GDP per capita of a country is in affecting how long a person lives and how the higher the GDP the better the quality of life is for a person. This investigation has shown that people living in developing countries are more likely to die at a young age and will not have such a high quality of life as we enjoy in a country like the UK. I also think this investigation will act as very good evidence to try and convince richer nations to help poorer ones. This data should be given to an organisation like the United Nations to try an act as a catalyst to convince them to do something about this before it is too late.
Accuracy and refinements: -
One possible source of error was that the data may have been displayed incorrectly on the website or I may have copied it incorrectly. I would improve this by comparing data from a number of different sources to ensure accurate and reliable results.
The sampling method that I used could have been a possible source of error. This is because my systematic sample only included every 4th so for example every 3rd did not have a chance to be chosen. I could have improved my sampling method by using simple random sampling instead of systematic sampling. Simple random sampling ensures that every item of data has an equal chance of being chosen. This is a very important factor in ensuring the reliability of my work.
Even though the data is very reliable there are some improvements that could be made. First of all the data was only collected for a given year in my case it was for 2003. For more accurate data I could have used data over five years to see if there is actually a difference and to see if for example at that given years there may have been a low life expectancy due to an external factor like war or disease. Also the sample was only from 228 countries and there are more countries in the world so a more fair representation would be to random sample from every country in the world. This was not possible because my source did not include some of these countries due to political reasons and from lack of information for those countries.
In my investigation I had to reject 11 statistics for 11 countries this reduced the randomness of my sample. I would improve this by making sure that data was available for every item in the parent population.
Overall I am very happy with the accuracy and reliability of my data because I got it from a very reliable source which was www.CIA.gov. Having a reliable source for my data enables me to achieve my aim of a positive correlation.