• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Bivariate Data Exploration

Extracts from this document...

Introduction

Maths Coursework                Tim Durden

STATISTICS 2:

Bivariate Data Exploration

Aim:

The aim of this investigation is to see if there is a correlation between the engine size of a car and the insurance group that it resides in.

Introduction:

In our present day there is an ever-increasing public demand for value-for-money products and services, especially in cars, shopping and clothing markets. For students, this is even more important as everything they buy (unless they are particularly affluent) can easily amount to debt (through extensive student loans). For students in particular, cars are very often an essential means of transport, and so, like most things, it is important for a student to get the best deal for their car.

However, insurance companies and car dealers are very much aware of the student situation and have classified certain cars as ‘student cars’, and to clarify this, include cars from Peugeot (106, 306), Renault (Clio), Citroen (Saxo), and Vauxhall (Nova) to name but a few.

Now it seems that these cars all have relatively low engine sizes, commonly ranging from 900-1800cc, and are all placed in relatively low insurance groups (and therefore have lower insurance costs), but this may not be the case for all cars, especially those with larger engine sizes.

This investigation will examine data from a range of cars, varying

...read more.

Middle

6

1.6

9

1.4

10

1.8

15

1.4

5

2

12

2.5

16

1.2

3

2.5

15

0.95

3

1.6

3

1.2

3

1.2

4

1.4

8

2

11

1.8

11

1.8

10

1.5

7

0.9

2

1.1

5

1.6

10

1.1

4

1.2

4

1.4

5

1.6

6

1.4

7

1.4

5

1.6

10

1.3

5

1.3

6

1

3

2

11

1.1

4

1.4

4

Modelling Procedures:

Now the data could be compared to see if there was correlation. The first step was to draw a scatter diagram, with the X-axis as engine size and Y-axis as the insurance group.

The followed graph was

...read more.

Conclusion

Accuracy & Refinements:

Firstly, the sample size (50 datasets) was selected using random numbers generated by a calculator. Whilst this method does produce random numbers, the numbers are formed as part of an equation, and so may not prove completely random. A much better approach would have been to use a systematic sample, which would have been obtained from the parent population (once the data was ordered by a variable, e.g. insurance group) by counting through the sampling frame, i.e. every 2nd or 4th dataset was selected.

Secondly, if a larger sample had been collected, the accuracy of the correlation would be increased. There would be more points to plot and therefore the correlation would be much more representative of the entire population (e.g. a sample of 500 cars out of 50,000 in Essex), even if there were more cases of outliers to the correlation.

Thirdly, it was felt that having data that was ‘secondary’ gave rise to bias and error of data collection. If data had been ‘primary’, that is collected by the researchers themselves, the data may have been more accurate. With regards to this investigation, it is possible that because the company were selling cars, there may have been some bias as towards which cars they buy and sell. Cars that were of a poor standard would not have been purchased for secondary sale.

Page  of

...read more.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. Anthropometric Data

    It is said that the dependent variables are values which react to the change in the response to the independent variables. An independent variable is a variable which is presumed to affect or determine a dependent variable. This can changed as required, and these values do not represent a problem requiring explanation in an analysis.

  2. AS statistics coursework - correlation coefficient between height and weight in year 11 boys ...

    The fact that the boys and girls both not only have positive correlations which are at minimum moderate but they also both have steep gradients for their regression lines and low residuals which indicates that height and weight are most definitely dependant on one another i.e.

  1. Statistics Coursework

    89.95 147 94.18 188 96.56 229 99.33 25 68.48 66 83.86 107 89.95 148 94.18 189 96.83 230 99.33 26 69.31 67 84.13 108 89.95 149 94.18 190 96.83 231 99.35 27 70.11 68 84.21 109 89.95 150 94.44 191 96.83 232 99.47 28 71.16 69 84.39 110 90.21 151

  2. Intermediate Maths Driving Test Coursework

    The 'Number of lesson' mean is almost the same as the one in the preliminary analysis which was 23.02917, same goes for the range, and it is only two numbers off the previous one which was 36. In the 'Number of mistakes' the mean and range is also almost the

  1. Driving test

    In the data, some of the driver's are missing their "number of mistakes". I will use my formula to fill in some of this missing data: Driver Gender Lessons Mistakes 63 M 32 -0.2x32+22.7 =16 74 F 24 -0.2x24+22.7 =17 118 F 39 -0.2x39+22.7 =14 216 M 18 -0.2x18+23.7 =19

  2. Used Cars - What main factor that affects the price of a second hand ...

    In order to get a reliable sample of cars I will take a stratified sample of 50 cars. This means that the ratio of the different sized cars in the sample of 50 cars will have to be as in the database.

  1. Design an investigation to see if there is a significant relationship between the number ...

    I felt that this would be difficult to do, as the shelved structure of the bay would mean a grid would inevitably incorporate the ledges and gullies caused by this shelved structure. However, a line transect would avoid these gullies.

  2. Estimating the length of a line and the size of an angle.

    So it is an unsuitable method to use. Also sometimes you might not get enough people in the year group you want to represent the population sample, which distorts the data. While the advantage of it is that it is easy to get the information needed and is very cheap

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work