Handling data Used car prices

Authors Avatar


Introduction

The overall aim of this project is to find out what the major influences are affecting the price of a second hand car. There are some major influences directly affecting the price, i.e. the age of the car, the number of miles done by the car since its production and the number of modern features the car has. In this project I will be looking at the age and the mileage and how they affect the price of a second hand car.

Aims

  1. Assess whether there is a link between the percentage of depreciation in the value of the car from new to second-hand and the age of it and its mileage.

My first aim is to find out how the 2 biggest influences affect the value of it when bought second hand. The percentage of depreciation in the value of the car is affected by many other factors; however, here I am trying to find out how the age and mileage alone affects it. The first two hypotheses stated below will help me investigate this objective.

  1. Investigate what style of car has the largest depreciation value.

This aim is different from the first one because it now looks at how the style of the car affects the depreciation value. I will be looking at hatchbacks, saloons and others (coupes, utilities and azuras) as part of this investigation. From this I will be able to see which style of car causes the largest drop in price, regardless of the age and the miles done by the car.  

Hypothesises

  1. Relating to the first aim, I predict that as the age of the car increases, the percentage of depreciation in the value of the car increases.

My prediction here states that older cars will be cheaper to buy than newer cars but which are still second-hand. For this hypothesis I will be suing secondary data, from the database of used cars, and primary data collected from used car magazines or from the internet to add to the database so I can select appropriate data. Firstly I will group the data that I have into different age groups. This stratified sample will produce a representative sample so then random numbers can then be used to generate cars from each group in numbers proportional to their numbers in the whole set. This is because there are likely to be fewer cars as the number of years increase. By doing a stratified sample for this particular situation there is no chance of bias towards a particular age group as each group will be fairly represented. For each car in the stratified sample work out the value of depreciation from the price of it when new and when bought second-hand. I will then work out the average cost of a second-hand car for each age group. The results of this will then be placed onto a scatter diagram. A line of best fit will show a general trend in the relationship. I will then work out the spread of data by using standard deviation.  

  1. Relating to aim 1, I think that as the mileage of any car increases, the percentage value for its depreciation increases.

This is closely linked to the first hypothesis. Again I will be doing a stratified sample by grouping the data for cars into groups of mileage. Then by randomly picking cars from each group I will work out the value of depreciation for the cars randomly picked from the sample. From this I will then be able to plot the information on a scatter diagram, where a line of best fit can be used to represent a general trend. Now I will have two sets of data both of which on scatter diagrams. With this I can work out how good the relationship is between the two sets of data (mileage and age) using the product moment correlation coefficient.

  1. Relating to the second aim, I think that the depreciation values for saloons will have the least depreciation value.

Saloons are very modern, attractive cars compared to the hatchbacks and others, therefore I think that the depreciation values for Saloons will be generally lower than for the other cars. To prove this I will list all the cars in the database under the headings of Saloons, Hatchbacks and others. I will then group the data for each group into sets of different depreciation values (%). This will be worked out using the formula function in Excel. With this information, I can then plot 3 histograms, 1 for each type of car. I can then form three frequency polygons from this which will be what I will compare. If my prediction is correct then I should see that when comparing the 3 frequency polygons, the Saloon should be the lowest out of all three. To avoid any bias I will be make sure that each group has varied value of depreciation.  

Join now!

Representation & Interpretation of Data

Hypothesis 1 - I predict that as the age of the car increases, the percentage of depreciation in the value of the car increases.

For this hypothesis I have made 6 groups to split the cars into. These 6 stratums are: 1-2 years, 3-4 years, 5-6 years, 7-8 years, 9-10 years and 11+ years old. These age groups cover the whole population. The following ...

This is a preview of the whole essay