- Level: AS and A Level
- Subject: Maths
- Word count: 3318
Statistics. The purpose of this coursework is to investigate the comparative relationships between the depreciation of a cars price, in relation to the factors that affect it.
Extracts from this document...
Introduction
Statistics: Analysis of used cars database
Introduction:
The purpose of this coursework is to investigate the comparative relationships between the depreciation of a car’s price, in relation to the factors that affect it. The factors that I wish to investigate are the age/mileage of a car, being the easiest to compare to depreciation. To do this, I shall use random sampling. I shall give a number of hypotheses, claiming whether each influential factor has an adequate effect on depreciation. I shall attempt to validate this using data given to me on Excel. I have done this in terms of percentage depreciation to make sure that I have relevant data to compare depreciation over each car in my sample. Here are the hypotheses and questions:
<< Hypothesis 1 >>
The older the car, the greater the percentage depreciation of the price – I believe this because as a car travels further, essential parts may perhaps wear down, and stop the car from working to its optimum standard. After a certain level of mileage, the car’s fuel costs may begin to increase, as its decreased efficiency uses up more fuel per mile.
These following data values are necessary to calculate the depreciation of a value of a car (as a rule), when there is more or less mileage:
- Sale price (no miles attached)
- Mileage
Mileage will affect the percentage depreciation of the original car’s price, so there should be no other variables included in the data needed to prove, or refute this hypothesis.
Middle
Beetle
14950
13500
1
6500
1
75
Rover
623 GSi
24086
2975
6
96000
2
76
Suzuki
Vitara
10800
2995
8
50000
2
77
Mercedes
AvantGarde
17915
11750
2
17000
1
78
Audi
80
17683
3995
7
103000
2
79
Volkswagen
Polo
9960
7550
1
5000
1
80
Ford
Escort
13183
3495
7
43000
2
81
Ford
Mondeo
17780
7995
4
30000
1
82
Mazda
Pegasus
10420
2495
7
50000
3
83
Rover
416i
14486
3685
6
64000
1
84
Vauxhall
Corsa
7840
4976
4
21000
2
85
Vauxhall
Corsa
7440
3495
6
55000
2
86
Ford
Fiesta
6590
1664
10
37000
3
87
Nissan
Primera
2574
9
49000
2
88
Citroen
Xantia
14065
8
49000
1
89
Peugot
Graduate
7600
2497
8
71000
2
90
Peugot
306
12350
3995
6
71000
2
91
Fiat
Punto
7518
3769
4
38000
2
92
Volkswagen
Polo
8710
4693
5
50000
2
93
Vauxhall
Calibra
18675
6995
6
63000
2
94
Rover
Metro
5495
1995
7
52000
2
95
Rolls Royce
Silver Spirit
94651
14735
9
70000
2
96
Ford
Escort
15405
3995
5
57000
2
97
Vauxhall
Astra
9795
3191
6
43000
2
98
Renault
19
11695
2748
6
52000
2
99
Ford
Escort
9995
2995
6
64000
2
100
Vauxhall
Vectra
13435
5
52000
2
This is randomly ordered, to get a general trend in data, so my results will not be biased. However, I have 4 pieces of missing data: I will need to fix this using Standard Deviation. To solve this problem I will remove this missing data. To find out if there are any outliers I should find out the standard deviation to find the upper and lower bounds. The upper quartile is 75% of the maximum value, and lower quartiles are 25% of it. The formulae to work the missing values out in terms of standard deviation are as follows:
Upper Bound = Mean + 2x Standard Deviation
Lower Bound = Mean – 2x Standard Deviation
There are data outside the upper bound in the column concerning the Porsche It is approximately £6,000 higher than the upper bound; it is an outlier. However, the effect is not drastic and will not obscure my results to an inaccurate curve. When I identify any huge outliers, I will remove these, though this will not have much of an impact. Because of the 4 missing data, I will need to delete the rows for this make, as a lack of one value will obscure an average. One example is the Lexus: with no mileage, it is impractical for me to include it in my investigation, because it cannot work for my 3rd hypothesis. If I remove all the other cars which lack data, I have a remaining sample of 47.
I have constructed a table to show the range of data and to see how the data correlates I comparison to each other.
Data | Highest Value | Lowest Value | Range |
Price When New | £170,841 | £5,495 | £165346 |
Price Second Hand | £37995 | £1995 | £36000 |
Age | 10Years | 1 Year | 9 Years |
Mileage | 103,000m | 2000 | 101,000 |
Number of Owners | 3 | 1 | 2 |
Conclusion
Some new questions would probably be investigated if I had the chance to do it again. For example, the third scatter graph of the no. Previous Owners vs. Mileage did not give me strong results: it had a weak correlation and a bad trend. In place of this, I would test a new theory of ‘the older a car, the greater the mileage it will have gained.’ This would be an improvement on it: instead of comparing the number of owners to the mileage the age would allow me to see how the mileage built up, in relation to its age. Some flaws in the original graph have been spotted: for example, some people may have owned a car in a very short timeframe, and sold the car briefly after buying it. With age included, I can see how much the car traveled in relation to time, rather than the number of people who drove it. I believe this hypothesis would give me a strong correlation: it would provide me with more reliable results.
Perhaps if I had more time, I would test this multiple regression to see how different influential factors affect each other, rather than depreciation.
This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.
Found what you're looking for?
- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month