Correlation and regression

Authors Avatar

Ibrar Khan        AS Use of mathematics coursework

Correlation and regression

Background information

In statistics, correlation often indicates the strength and direction of a linear relationship between two random variables. If there is no correlation between the two sets of data, the points will be widely scattered and will require a circular shape to enclose them. If the two sets of data have weak positive correlation, the points can be modelled in a broad eclipse sloping upwards from bottom left to top right. Furthermore if the two sets of data have a strong positive correlation, the points lie within a narrow eclipse sloping upwards.

If the two sets of data have weak negative correlation, where one quantity generally increases as the other one decreases, the points can be modelled in a broad eclipse sloping downards from top left to bottom right. If the two sets of data have a strong positive correlation, the points lie within a narrow eclipse sloping downwards.

Finally ‘perfect’ positive correlation (a correlation co-efficient of +1) implies that as one variable moves, either up or down, the other variable will move in lockstep, in the same direction. Alternatively, ‘perfect’ negative correlation means that if one variable moves in either direction the variable that is perfectly negatively correlated will move by an equal amount in the opposite direction, also when all the data points on a scatter diagram accumulate on a straight line we say that they are perfectly positively or negatively correlated.

A correlation coefficient is a number between -1 and +1, which measures the degree to which two variables are linearly related. If there is perfect linear relationship with positive slope between the two variables, we have a correlation coefficient of +1; if there is positive correlation, whenever one variable has a high (low) value, so does the other. If there is a perfect linear relationship with negative slope between the two variables, we have a correlation coefficient of -1; if there is negative correlation, whenever one variable has a high (low) value, the other has a low (high) value. A correlation coefficient of 0 means that there is no linear relationship between the variables.

Lines of best fit can be drawn to determine some useful estimates, but would differ in placement from person to person. Therefore it is useful to have a systematic method that always gives the same result. One procedure commonly used is the “method of least squares”. The equation for the line of best fit according to the method of least squares is ‘y = ax + b’, with gradient ‘a’ and             y-intercept ‘b’. A line of best fit determined in this way is called a regression line.

Introduction

I have been provided with results of a year 9 top set mock SAT exam in mathematics. The exam is in three sections, a mental paper, a non-calculator written paper (Paper 1) and a calculator written paper (Paper 2). The top set consists of 65 pupils, 30 male and 35 female. My task is to conduct a study of the results, commenting on any hypothesis and interpreting my views by statistical charts and other terms.

Eliminating any data

A male student, who achieved 29 marks on the non-calculator written paper (Paper 1) and          30 marks on the calculator written paper (Paper 2), did not take the mental test for an unknown reason and therefore I am going to eliminate his data due to incompleteness.

Hypothesis

First of all, I am going to compare the non-calculator written paper (Paper 1) and the calculator written paper (Paper 2) results of the males and females. I predict that ‘on average the male population will have higher marks on the non-calculator written paper (Paper 1) and the calculator written paper (Paper 2) than the female population’. I am going to prove this hypothesis by drawing two ‘Back to back’ Stem and leaf diagrams with one showing the non-calculator written paper (Paper 1) results of both the male and female population and the other showing the calculator written paper (Paper 2) results. A ‘Back to back’ Stem and leaf diagram will facilitate me in finding measures of location such as the median, which is the middle value in an ordered list and the modal value, which is the most common value. Furthermore I am going to find the remaining measure of location, which is the mean. A ‘Back to back’ Stem and leaf diagram cannot assist me in finding the mean as it can with the other measures of location, but the mean can be found by using a simple method, which is dividing the total of values by the number of values. By finding the following measures of location I can compare the non-calculator written paper    (Paper 1) and the calculator written paper (Paper 2) results of the males and females and observe which population is the more intelligent in the mathematics top set for year 9.

‘Back to back’ Stem and leaf diagram

Data can be shown in a variety of ways including graphs, charts and tables. A Stem and Leaf Plot is a type of graph that is similar to a histogram but shows more information. The Stem-and-Leaf Plot summarizes the shape of a set of data (the distribution) and provides extra detail regarding individual values. The data is arranged by place value. The digits in the largest place are referred to as the stem and the digits in the smallest place are referred to as the leaf (leaves). The leaves are always displayed to the left of the stem. Stem and Leaf Plots are great organizers for large amounts of information.

Non-calculator written paper (Paper 1)

                                  Male population                                    Female population

Join now!

Calculator written paper (Paper 2)

                              Male population                                    Female population

Stem and leaf diagrams have both advantages and disadvantages in there use. A common advantage would be the fact that you can store a large amount of data in a smaller space, also stem and leaf diagrams can be drawn and filled in more quickly than a line plot. Furthermore it is easy to find the ...

This is a preview of the whole essay