QUALITY OF DATA
This data was found on an official government Website and thus must be accepted as accurate, although it could be subject to human error. This error could have occurred by either the ‘Department for Education and Skills’ when publishing and calculating the information or by myself, in the transference of results. To improve the quality of the data I am using, I removed those schools that had no data and those which were from special needs schools, so they would not effect the results. From the scatter diagram, you can see that there are no outliers that need to be tested.
In the appendix, both the original population and the sample can be viewed
NATURE OF THE VARIABLES
I have chosen the percentage of pupils with level 5 in maths at Key Stage 2 to be on my x-axis and the percentage of level 5 in science to be on my y-axis.
Before collecting my data, I had to ensure that the data is random, which it is, as one cannot predict the results that a child will get. After doing this, I drew a scatter diagram plotting maths results against science results. Looking at the graph, you can see that it is elliptical and has a high positive correlation, showing that it is bivariate Normal distribution.
CORRELATION CO-EFFICIENT
Now I will do a hypothesis test using the Pearson's Product Moment Correlation Co-efficient. This is calculated using the formula below:
This is easier than it looks! The first step is to calculate the following:
And then put all of these into the formula to find r (which is always between 1 and -1). Using programs such as Microsoft Excel, you can highlight the data and the computer can automatically calculate the PPMCC.
Doing this, the PPMCC is 0.8528 - this backs up my thought that my variables have a good positive correlation, as perfect correlation is at 1 or -1.
I will now carry out a hypothesis test on the correlation co-efficient comparing it with ρ (the parent population correlation co-efficient). This is called a test statistic, and will be a 1-tailed test at a 5% significance level.
Important things to know:
-
The null hypothesis, H0 represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved.
-
The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set up to establish.
-
The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either 'reject H0 in favour of H1' or 'do not reject H0'; we never conclude 'reject H1', or even 'accept H1'.
-
If we conclude 'do not reject H0', this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favour of H1; rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.
- The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in a sample is compared to determine whether or not the null hypothesis is rejected. This is found from the statistical tables.
HYPOTHESIS TESTING
H0 : ρ = 0 (shows that there is no correlation between the two variables)
H1 : ρ > 0 (shows that there is a positive correlation between the variables
- as the maths results increase, so do the science results)
n = 55 From the tables, the critical value = 0.273
0.8528 > 0.273 so we reject the null hypothesis.
If done at a 0.5% significance level, the critical value = showing that there is also evidence here to reject the null hypothesis.
INTERPRETATION OF RESULTS
This test shows that the alternative hypothesis was correct and there is a positive correlation in the data, my original assumption being correct. This indicates that there is a strong positive correlation between the performance of children at Key Stage 2 in maths and science, achieving a level 5.
So, one can conclude that if a school have a high percentage of students doing well in maths, then they will have a similar high percentage of students doing well at science and achieving level 5. Similarly, a school that has a poor performance in maths, will have an equally poor performance in science.
This means that if my parents want to find a good primary school for my brother, then they should choose a school which has a high percentage of students doing well in maths and science.
IMPROVING THE QUALITY OF DATA
There are many different ways I could do this if I was to repeat the investigation. If my parents decide what type of school they want to send my brother to (eg public or private) than I could sort the data into these categories first, and then sample and test. Another thing that I could do is remove all the schools that have percentages below the national average to see if this makes a difference to my hypothesis.