• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month
Page
  1. 1
    1
  2. 2
    2
  3. 3
    3
  4. 4
    4
  5. 5
    5
  6. 6
    6
  7. 7
    7
  8. 8
    8
  9. 9
    9
  • Level: GCSE
  • Subject: Maths
  • Word count: 5329

Correlation and regression

Extracts from this document...

Introduction

Ibrar KhanAS Use of mathematics coursework

Correlation and regression

Background information

In statistics, correlation often indicates the strength and direction of a linear relationship between two random variables. If there is no correlation between the two sets of data, the points will be widely scattered and will require a circular shape to enclose them. If the two sets of data have weak positive correlation, the points can be modelled in a broad eclipse sloping upwards from bottom left to top right. Furthermore if the two sets of data have a strong positive correlation, the points lie within a narrow eclipse sloping upwards.

If the two sets of data have weak negative correlation, where one quantity generally increases as the other one decreases, the points can be modelled in a broad eclipse sloping downards from top left to bottom right. If the two sets of data have a strong positive correlation, the points lie within a narrow eclipse sloping downwards.

Finally ‘perfect’ positive correlation (a correlation co-efficient of +1) implies that as one variable moves, either up or down, the other variable will move in lockstep, in the same direction. Alternatively, ‘perfect’ negative correlation means that if one variable moves in either direction the variable that is perfectly negatively correlated will move by an equal amount in the opposite direction, also when all the data points on a scatter diagram accumulate on a straight line we say that they are perfectly positively or negatively correlated.

A correlation coefficient is a number between -1 and +1, which measures the degree to which two variables are linearly related. If there is perfect linear relationship with positive slope between the two variables, we have a correlation coefficient of +1; if there is positive correlation, whenever one variable has a high (low) value, so does the other.

...read more.

Middle

th value. The 18 th value in my case is ‘29’ marks, so therefore, the median mark for the female population on the non-calculator written paper (Paper 1) is 29 marks.

After finding the median marks for the non-calculator written paper (Paper 1) of the male and female populations, I can quite honestly articulate that so far my prediction of ‘the male population will have higher marks on the non-calculator written paper (Paper 1) and the calculator written paper (Paper 2) than the female population’ is true and accurate because the median marks for the non-calculator written paper (Paper 1) of the male population was ‘35’ marks, whereas the female population was only ‘29’ marks.

Next, I am going to find the median marks on the calculator written paper (Paper 2) for both the male and female populations by doing a similar process as above.

Starting with the calculator written paper (Paper 2) for the male population.

0.5 (29 + 1) = 15 th value

Then you start with the highest mark, which for the male population on the calculator written paper (Paper 2) was ‘46’ marks and count along, next being ‘45’ and then ‘45’ and so on until you reach the 15 th value. The 15 th value in my case is ‘30’ marks, so therefore, the median mark for the male population on the calculator written paper (Paper 2) is 30 marks.

Next, the calculator written paper (Paper 2) for the female population.

0.5 (35 + 1) = 18 th value

Then you start with the highest mark, which for the female population on the calculator written paper (Paper 2) was ‘51’ marks and count along, next being ‘47’ and then ‘42’ and so on until you reach the 18 th value. The 18 th value in my case is ‘26’ marks, so therefore, the median mark for the female population on the calculator written paper (Paper 2) is 26 marks.

After finding the median marks for the non-calculator written paper (Paper 1) and the calculator written paper (Paper 2)

...read more.

Conclusion

 1, which would indicate strong positive correlations.

Hypothesis

For my new hypothesis I am going to compare the correlation coefficients of the male and female population in terms of the mental test and the calculator written paper (Paper 2) results. On this occasion I predict that ‘the correlation coefficient for the female population will be larger than the correlation coefficient for the male population. Furthermore I again predict the values of the correlation coefficients for both the male and female population to be between 0.5 < r  1, which would indicate strong positive correlations’. I will be able to find the values of the correlation coefficients by using a CASIO graphical calculator (CFX-9850GC PLUS).

After carrying out the method I explained in the previous hypothesis I found the correlation coefficient of the mental test and the calculator written paper (Paper 2) for the male population to be ‘r = 0.775’ and the correlation coefficient for the female population to be ‘r = 0.698’. Now that I have found the correlation coefficients of the mental test and the calculator written paper

(Paper 2) for the male and female populations I can quite truthfully articulate that my hypothesis ‘the correlation coefficient for the female population will be larger than the correlation coefficient for the male population. Furthermore I again predict the values of the correlation coefficients for both the male and female population to be between 0.5 < r  1, which would indicate strong positive correlations’ was incorrect and erroneous because the correlation coefficient for the male population, which was ‘r = 0.775’ was larger than the correlation coefficient for the female population, which was ‘r = 0.698’. A fraction of my hypothesis that was proved correct was both correlation coefficients that I acquired were in the region of 0.5 < r  1, which would indicate strong positive correlations.

...read more.

This student written piece of work is one of many that can be found in our GCSE Miscellaneous section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related GCSE Miscellaneous essays

  1. Statistics coursework

    for instructor C as 0.16�29 = 4.64 Instructor D - taught 20 male pupils so 20 / 116 = 17% of the male drivers this gives me a sample of 5 as 0.16�29 = 4.93 The raw data I am given is listed in table 1.

  2. Maths Statistics Coursework

    around a value that is further away from zero % error and therefore less accurate than Key Stage 4. Although Key Stage 4's data is less consistent, it is less consistent around a value very close to zero, suggesting many pupils in Key Stage 4 guessed very accurately, on a contrary to my hypothesis.

  1. GCSE STATISTICS/Data Handling Coursework 2008

    Neither data is particularly skewed. The difference between years 8 and 9 is not this conclusive. The quartiles are all slightly higher, but the high point drops. This however is due to the difference in weight of the shot thrown.

  2. maths estimation coursework

    I would like to remove any outlying results. All of the anomolus results on this graph appear to be large numbers and so could only be an upper outlying result, and to work out which results are out lying, I must first find the upper and lower quartiles, and hence the inter quartile range before I can do this.

  1. Data handling - calculating means and standard deviations

    are investigated below: Original * Median (50th percentile) 148.5 -445.5 1st Quartile 137 -508.5 3rd Quartile 169.5 -411 Interquartile Range 32.5 97.5 Table 13 Based on the information above, the median becomes negative when the data set values are multiplied by a negative number 'a'. As for the interquartile range it is multiplied by the absolute value of 'a'.

  2. The relationship between level of parental education and SAT scores

    These reports are considered very highly accurate since the college-bound population is relatively stable from year to year. Here are the URLs of the SAT total group reports from 2006 to 2010: Total group reports in 2006 http://www.collegeboard.com/prod_downloads/about/news_info/cbsenior/yr2006/national-report.pdf Total group reports in 2007 http://www.collegeboard.com/prod_downloads/about/news_info/cbsenior/yr2007/national-report.pdf Total group reports in 2008 http://professionals.collegeboard.com/profdownload/Total_Group_Report.pdf

  1. Rollercoasters. I will use the rollercoaster database to answer the following question: ...

    Instead, I will be using the various mathematical functions on Microsoft Excel, a computer program that will allow me to store data in a spreadsheet, where I will be able to carry out these calculations. I hope that the results of the calculations will firstly prove my hypothesis and secondly allow me to make statements about the data.

  2. Statistical Experiment Plan to investigate the ability to estimate 30 and 60 seconds.

    I will not allow any person to repeat the test and will record only one trial per person at this investigation to avoid bias. Furthermore I will not allow people who have seen others do the investigation to take part in the test as they can pre-prepare them for the estimation and I will reject their estimation.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work