# The purpose of this investigation is to find if there is any correlation between two variables extracted from 5% random sampling of the Mayfield Data provided.

Introduction:

In this GCSE coursework, I will be trying to prove three hypotheses, by using statistical techniques we have learned throughout the GCSE course. My line of enquiry will be based on the relationship between a pupil’s IQ and various Key Stage 2 results. I will consider using methods such as histograms (or bar charts), box-and-whisker plots, mean, median, mode, standard deviation, scatter diagrams, product-moment correlation coefficient (PMCC), quartiles and various diagrams to represent the data – depending on which of those is suitable for my hypothesis. After the collected data is analysed, the method is explained and I will explain why I have chosen to use that particular technique. Upon each method, I should be able to draw a conclusion on whether or not there is a correlation between the data I have chosen to compare. In order for the coursework to be improved for further investigations, the evaluation at the end will suggest ways of improving the method used, or perhaps choosing to use another (more suitable) method.

In this coursework, I will need to include and use the data below:

• Intelligence Quotient (IQ)
• KS2 results in English (level)
• KS2 results in Mathematics (level)
• KS3 results in Science (level)
• Average number of hours of television watched per week

Random Sampling Method

Due to the fact that there is too many data to analyse, we were asked to take 5% of the data – a reasonable amount so that the results are meaningful and represent the whole population. If the sample is too small, it may be biased. Simple random sampling is used so that everyone has a chance of being selected to be part of the 5%. Steps to extract 5% on the spreadsheet out of 31941 datum points from KS3 and KS4 combined in the Mayfield Data.

1. Insert a new column, assign a random number between 0 and 1 to each record using the random number function =rand(). This number will change every time the spreadsheet is updated. Label this new column as Rand1.
2. Copy the random number to a new column and right click, paste special. You will get a fixed random number. Label this second column as Rand2.
3. In a third column, multiply the random number in Rand2 by 20 to get a number between 0 and 20. Label this third column as Rand3.
4. Sort the records using the sort function, according to the random number value in Rand3.
5. Pick and highlight the records whose random number value is less than 1. This should be about 5% of the whole data.

Variables

We have been given a secondary database to use for our investigation, including both qualitative data and quantitative data – discrete and continuous. The following data is included in the Mayfield Data survey.

Data provided for each student:

Name, age, year group, intelligence quotient, weight, height, hair colour, eye colour, distance from home to school, usual method of travel to school, number of brothers or sisters, Key Stage 2 results in English, Mathematics, and science.

Upon these variables to choose from, we were asked to decide on a line of enquiry and come up with three ...