I have been given the task of finding what affects the price of a used car, using a spreadsheet given to me displaying data on a hundred cars with data on about each car.

I have been given the task of finding what affects the price of a used car, using a spreadsheet given to me displaying data on a hundred cars with data on about each car. The data on the cars were: (See Spreadsheet 1)

Make                                Model                                Price When New

Used Price                        Age                                Colour

Engine Size                        Fuel Type                        MPG

Mileage                        Service                        Owners

Length of MOT                Tax (Months left)                Insurance Group

Doors (Amount)                Style                                Central Locking

Seats                                Gearbox                        Air Conditioning


Immediately from looking at those categories I omitted colour, fuel, service, doors, style, central locking, seats, gearbox, air conditioning and airbags. I omitted this data because it is of a low range of contains words, these would be hard to show on graphs and would show me little evidence of what affects a used car price.

E.g. Colour: Cannot produce a scatter graph as it uses words.

        Seats: Has a range of 2-5 and would produce poor scatter graphs and would be hard to find a direct relationship on.          

Then from the remaining categories I picked age, insurance group, MPG, mileage and of course used price, as this is what I was investigating. It then dawned one me that I could use the depreciation price, the price when I took the used price away from the new, this perhaps could be a more accurate look at the data as some cars depreciate quicker than others. Looking further into

The mileage groups were.   0-5000        






With these sorted I took 40% at random from each group and ended up with this. I ensured it was random by drawing numbers out of a hat respective to the numbers of the car, I then noted that number and placed in back in so each time the chance of drawing a single card was equal and didn’t change. If I drew the same one twice I simply ignored that, placed it back in and redrew.       (See Spreadsheet 3)                                

If actually counted there are 41 cars. As 40 and 41 are very close, rather than tamper with any results which could make them biased I simply left them.

From this data I then compiled scatter graphs on them just as before.


  • Age, I believe that there will be a strong negative correlation as there was before but as this is supposedly a more reliable sample it should be more evident.
  • MPG, I believe there will be a strong negative correlation as there was before but should be more evident due to sample being more reliable.
  • Mileage should have a strong negative correlation due to reasons above.
  • Insurance group should have a strong positive correlation due to reasons mentioned above.

See graphs 5,6,7 and 8.

Conclusions on Stratified Sampling.

As you can see some very strange results came up.

  • Age showed the very strong negative correlation as I said there would be.
  • MPG showed a strong negative correlation as well as I said.
  • Mileage proved very weird. The data was in two groups basically one showing high mileage and low price while the other low mileage and low price. From this I can deduce that the mileage is a limiting factor of used price.
  • Insurance group showed no correlation with data all over the place, show perhaps my random sample was a mishap and in fact insurance has no relationship or very little with used price.
From individual graphs you can see that the majority of the cars are around the 20,000 to 60,000 miles range in both the random and stratified samples. Standard deviation could perhaps tell me which sample is more accurate so that could be an extension to the work done.I mentioned a bell shape graph before. By this I mean one, which slowly goes up to a peak then reduces down, with the majority of the data displayed in the middle and only some or no data displayed in the highest and lowest areas.

However from the histograms I did not find any reasoning behind the weird shaped and correlated stratified scatter graph. Further investigation into this could prove interesting.

Overall Conclusion

From all the work carried out above you can clearly see that many different things affect used car prices and some more than others. You could say that the different categories are limiting factors and a culmination of these results in the depreciation of a cars price.

As a further investigation I would look into the strange scatter graph produced by my stratified mileage sample. Perhaps using standard deviation or other data representation methods I could find out why it is so peculiar. I could also look at how one category affects another such as engine size and mileage or engine size and MPG and find a relationship between those. There are many more aspects that I could of considered but however from the work I’ve done there are things that are certainly clear.

