Statistics Coursework - Bivariate Data.

Bivariate Data                Shahida Jaffer

Statistics Coursework

Bivariate Data


Moving to a new area with so much choice, my parents are skeptical about which middle school my brother should go to.  They want to find a school that doesn’t just do well in one subject at Key Stage 2, but at least two out of the three subjects.  I am going to investigate this by looking at maths and science results for children at Key stage 2 achieving level 5 to see if I can find a school with a positive correlation between them.  This will be shown if a school dos well in maths, and does just as well in science, and if it does badly in maths, does it do equally as badly in science.  


The data I have used was collected from the ‘Department for Education and Skills’ Website, under the section of performance tables for primary schools at Key Stage 2.  I chose to get the data for schools within 15 miles around my postcode (MK5 8BS), the nearest being printed first.

Now I will do a hypothesis test using the Pearson's Product Moment Correlation Co-efficient.  This is calculated using the formula below:image00.png

This is easier than it looks!  The first step is to calculate the following:


And then put all of these into the formula to find r (which is always between 1 and -1).  Using programs such as Microsoft Excel, you can highlight the data and the computer can automatically calculate the PPMCC.

Doing this, the PPMCC is 0.8528  - this backs up my thought that my variables have a good positive correlation, as perfect correlation is at 1 or -1.

I will now carry out a hypothesis test on the correlation co-efficient comparing it with ρ (the parent population correlation co-efficient).  This is called a test statistic, and will be a 1-tailed test at a 5% significance level.

Important things to know:

  • The null hypothesis, H0 represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved.
  • The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set up to establish.
  • The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either 'reject H0 in favour of H1' or 'do not reject H0'; we never conclude 'reject H1', or even 'accept H1'.
So, one can conclude that if a school have a high percentage of students doing well in maths, then they will have a similar high percentage of students doing well at science and achieving level 5.  Similarly, a school that has a poor performance in maths, will have an equally poor performance in science.  

This means that if my parents want to find a good primary school for my brother, then they should choose a school which has a high percentage of students doing well in maths and science.


There are many different ways I could do this if I was to repeat the investigation.  If my parents decide what type of school they want to send my brother to (eg public or private) than I could sort the data into these categories first, and then sample and test.  Another thing that I could do is remove all the schools that have percentages below the national average to see if this makes a difference to my hypothesis.

