I have presented the sample on a Scatter diagram (graph 1). The KS3 data has been plotted on the x-axis and the GCSE results on the y-axis. This is because in general, y is dependent on x, and so in this case the scores of the GCSE will depend on the performance achieved in KS3. That is why they have been plotted as shown. For the x-axis, I have placed a break from 0 to 3, because the smallest piece of data I have for KS3 is 3.7 and it is very unlikely that any student will get below 3 at the KS3 level. I do realise there is a small chance of at least one being there, but from the sample there isn’t and so I can ignore the points of 0 to 3. I have plotted the means of x and y, and plotted that which allows me to see the central region of the data. It also helps me draw the best-fit line, which will be drawn after I have calculated Pearson’s Product Moment Correlation Coefficient. This calculation will provide me with information on whether the data has a linear correlation or not. If it does not, then the aim of my investigation cannot be fulfilled. If there is, then I can draw a line of best fit, and work further on my aim. I have calculated the means of x and y to show that I have knowledge on how to calculate the mean.

Table 2, shows the sample, in relation to x and y, where x = KS3 and y = GCSE point scores. It also shows, x2, y2, xy, and the means of x and y. These have been calculated for the Pearson’s Product Moment Correlation Coefficient. From this, I can use the formula and calculate if there is a linear correlation or not. If there is a correlation (r) will take a value close to zero. The nearer ‘r’ is to +1 or –1 the stronger the correlation. From observation alone I know that the correlation will be positive.

I have calculated the correlation coefficient and the value I have ended up with is 0.935. As said before, the nearer ‘r’ is to +1 or –1 the stronger the correlation. For my sample, ‘r’ = 0935 to 3 s.f. This shows that there is a very strong positive linear correlation. This means that a line of best fit would be suitable for the data, as the line will be fairly accurate. The line of best fit has been drawn on graph 1.

The value calculated, ‘r’, can be used as an estimate for ‘ρ’. Where ‘ρ’ = the Correlation Coefficient of the Parent Population. ‘r’ can also be used to carry out a hypothesis test on this value of ‘ρ’. The test consists of a Null hypothesis, where there is no correlation with the parent population. This is denoted as H0: ρ = 0. There are also three alternatives. For this data however, only one alternative will be used, because the sample shows a positive correlation and is a ‘one tailed test’. The test will be carried out below. The test will be carried out at a specific significance level. For this test I will carry it out with a 5% significance level.

From the hypothesis testing, I have shown that the correlation of the parent population is greater than one and therefore I can say there is a strong positive correlation of the parent population.

What I have discovered from the investigation is that there is a clear relationship with the KS3 and the GCSE average point scores. My aim was to find a sufficient correlation between the KS3 and GCSE scores and from that use a line of best fit to help predict future GCSE scores from any current set of KS3 scores. After the sampling the data, and presenting it on a scatter diagram, it allowed a first view on whether there would be a correlation or not. I had dotted an ellipse around all the points. From this I could tell, without a calculation of Pearson’s Product Moment correlation Coefficient, whether or not there would be strong or weak correlation by studying the width of the ellipse. The narrower the elliptical profile the greater the correlation. The Pearson’s Product Moment correlation Coefficient helped show the linear correlation and there was a very strong linear correlation meaning. This strong linear correlation would have allowed me to place a line of best fit fairly accurately. If the correlation were weak, then the line of best fit would have been hard to place. The Hypothesis testing helped show that the correlation found for the sample was a

The data collected and the investigation on them, was worth it, because this could be a simple and easy way to predict the GCSE grade points that a student could achieve just be looking at their KS3 scores. The score points at GCSE are important because a student has to achieve a minimum number of points to continue with further education, such as A-levels, and by being able to predict the point score average recently after the KS3 results, the student has more time to think about their future, and will help in the early decision making.

However this conclusion has been based on the sample. But from what I have concluded I believe that, because of the strong linear correlation, the results from the sample can be easily applied to the parent population. It would be easier to apply to the parent population if I had found the equation of the best-fit line, and produce a general rule to predict a GSCE score by just having been given a KS3 score.

I believe the only errors with the data, were that that the scores given were all averages. The sample of 50 from the parent population was too big considering that the parent population consisted of 90 pieces of data. The parent population itself were KS3 and GCSE point scores from just one particular year group. It would have been more appropriate to gather data from several year groups to give the school a more general idea on the educational level through a period of time.

To improve the investigation I would have sampled from a much bigger population to get more varied pieces of data. Once finding the linear correlation, if it there was a strong linear correlation I would have calculated the Least Square’s Regression line to find an equation for the line of best fit. This equation could then be used by itself to calculate GCSE scores by just being given the KS3 scores. I believe that using average point scores generalises a student’s ability. They may be weak in certain subjects and stronger in others. To improve this investigation, I would take the point scores of each subject and investigate their relation to the GCSE score points of the same subject. This would help show the student’s abilities in each subject and not his overall ability that may seem weaker because of a singular weakness in once subject which would bring their average point score down.