Bivariate Data Exploration

Authors Avatar

Maths Coursework                Tim Durden

STATISTICS 2:

Bivariate Data Exploration

Aim:

The aim of this investigation is to see if there is a correlation between the engine size of a car and the insurance group that it resides in.

Introduction:

In our present day there is an ever-increasing public demand for value-for-money products and services, especially in cars, shopping and clothing markets. For students, this is even more important as everything they buy (unless they are particularly affluent) can easily amount to debt (through extensive student loans). For students in particular, cars are very often an essential means of transport, and so, like most things, it is important for a student to get the best deal for their car.

However, insurance companies and car dealers are very much aware of the student situation and have classified certain cars as ‘student cars’, and to clarify this, include cars from Peugeot (106, 306), Renault (Clio), Citroen (Saxo), and Vauxhall (Nova) to name but a few.

Now it seems that these cars all have relatively low engine sizes, commonly ranging from 900-1800cc, and are all placed in relatively low insurance groups (and therefore have lower insurance costs), but this may not be the case for all cars, especially those with larger engine sizes.

This investigation will examine data from a range of cars, varying in both engine size and insurance group, and if a positive correlation is found between insurance group and engine size, then the concept of ‘student cars’ will not be such a worrying factor when a student goes to buy his first car, however if there is no correlation then it is entirely possible that insurance companies are charging too much for cars in the ‘student car’ category.

Data Collection:

To start the investigation, data needed to be collected before any conclusions could be made. A local used car showroom was approached, and data from all cars on their forecourt (and in the showroom) was collected. This data was taken directly from the records that the company keep for each car they attempt to sell, which meant the data was ‘secondary’. The data collected was that of car engine size, and insurance group. The population consisted of all cars on the company’s current sales list (regardless of age, mileage, fuel type etc). The population size was 106 (the maximum numbers of cars the company could fit on their land).

Join now!

The next objective was to create a random sample from the population, with a sample size of 50. Firstly, the data was numbered from 1-106, in the order that it was collected. A random number generator from a calculator was then used to produce numbers that were less than 1, and were to 3 decimal places. All numbers above 0.106 were rejected, as were duplicates if they occurred. As the random numbers were generated, the sample data was selected based on the generated number and the number next to the population (as listed), until 50 sets were collected. This ...

This is a preview of the whole essay