I have now got my sample of cars that I need for my investigation. I will now move on to compare the factors that affect the price of a second hand car the most. I think that comparing the second hand prices of the cars is not the best way to do so. For example, if we compare the second hand price of a Fiat Bravo, which decreased by £5815, and a Fiat Uno, which decreased by £5369, we can see from face value that the Fiat Bravo decreased most in price when compared to the Fiat Uno. However, if we calculate the percentage depreciation, the Fiat Bravo has depreciated by 53.8% whereas the Fiat Uno has depreciated by 78.2%. So in actual fact the Fiat Uno decreased most in price when compared with the Fiat Bravo. Therefore, I will use percentage depreciation when comparing the new and second hand prices of cars.
The method for using percentage depreciation is as follows:
Formula: Depreciation rate % = [(New Price – Second Hand Price) x 100%] /Age
New Price
I have found the percentage depreciation for each car. I will now draw scatter graphs for age, engine size and mileage plotted against the percentage depreciation, for each car.
Comparing the graphs for age
I have drawn the scatter diagrams and they each have a trend line passing through them, which is called the linear trend line. Its function is to estimate the percentage depreciation rate for the second hand price car, taking into account all the input values, and then work out a gradient, which tells us a lot about the relationship between the two variables the graph is based upon.
All the graphs have positive correlation, some strong (Fiat and Rover), and some weak (Vauxhall and Ford). All the graphs show some sort of positive correlation, which means that as the age increases, the percentage depreciation increases and thus the second hand price decreases. Rover has the steepest gradient, which suggests that age has a higher value for Rover than any other car make. Ford has the gentlest gradient, which suggests that the age has a lower value on Ford than any other car make.
The second trend line I used is the curvy one, which is called the polynomial trend line. It is used to show how fast the second hand price for each car decreases. If you look carefully at the trend line, you will notice that when the car is young i.e. has been used for 1 or two years, its price declines drastically but as it gets older, its second hand price reduces steadily.
The second hand price of a car will keep on decreasing as it gets older but the price will never get into the regions of negative values because this will mean that money will have to be put in to sell the car, and I know that this is hypothetically incorrect.
Comparing the graphs for engine size
From the graphs I have drawn for each car make against engine size, I can draw some conclusions. Firstly, from face value I can see that all the graphs have a weak correlation, except for the Fiat graph. They all show that the bigger the engine size, the higher the percentage depreciation, apart from the Ford graph. This then contradicts the other graphs. The Fiat, Rover and Vauxhall graphs have positive correlation which suggests that as the engine size increases so does the percentage depreciation, however I would expect the correlation to be negative, which would show that as the engine size increases the percentage depreciation should decrease. The positive correlation might have occurred because of the lack of car data for the car makes, and with more data, I’m sure the correlation would become negative.
I came across three outliers for engine size. The first one occurred for a Ford Puma. It has an engine size of 1.4 and the percentage depreciation is by 37.64% (2 d.p), which is quite low. When compared to the Ford Escort, which also has an engine size of 1.4, the percentage depreciation for the Ford Escort is higher 64.58% (2dp) and also follows the linear line pattern whereas the Ford Puma is secluded and doesn’t follow the pattern. This could be because other factors may influence the price for the Ford Puma. One of the reasons that the price may be higher than it should be for the Ford Puma is that it has a low mileage of 34000 when compared to the Ford Escort Duet, which has a mileage of 64000. When compared to the other Ford cars, the Ford Puma has a relatively low age and one previous owner; this could be the reason why it is low in price. Also the Ford Puma has had a history of service, so this means that it has been through a thorough check to prevent breakdowns from occurring, this could then increase the price of the Ford Puma.
The second outlier occurred for a Rover Club. It has an engine size of 1.8 and the percentage depreciation for it is 23.20 (2dp). It this very low when compared to the rest of the cars for Rover. Other factors may also imply for the Rover Club, such as mileage. The Rover Club has a mileage of 2000 and when compared to the rest of the cars, it is considered very low. It is also one year old which is the lowest out of all the Rover cars, and this could be the reason for the second hand price being high. Also it has airbags and air conditioning, which increase the value of the car even more. All these factors have led to the Rover Club having a higher price.
The third outlier occurred for a Vauxhall Nova. It has an engine size of 1.4 and the percentage depreciation is 82.14 (2dp), which is very high when compared to the other Vauxhalls, which have an engine size of 1.4. The Vauxhall Nova has a high mileage of 75000, which may decrease the value of the Vauxhall Nova. It is also 10 years old and had has 4 other previous owners, which means that it has been used more, and this could possibly be the reason for the Vauxhall Nova to have a low price.
Even when a car has a large engine, it could be old or even has high mileage; this could serious affect its second hand price when on sale. The same thing is applicable for cars with small engine size, they might have higher prices then those of larger engine size and the reason being that it’s still in good condition; still very new and has less mileage. Therefore I realised that engine size alone cannot be used to determine price of a used car as factors such as mileage and age have an impact on the second hand price.
Comparing the graphs for mileage
All the graphs plotted with mileage against percentage depreciation show a very weak positive correlation. This tells us that as the mileage of car increases, so does the percentage depreciation, which means that the higher the mileage the cheaper the second hand price for a car. All the graphs have close to identical linear lines in the scatter diagrams. The car make with the steepest gradient is the Fiat, which means that the mileage has a greater value for Fiat cars. The car make with the gentlest gradient is the Rover, which tells us that it is least influenced by mileage. At present there are no outliers for mileage against the four car makes.
Mean, Median, Upper quartile, Lower quartile and Inter quartile ranges:
Mean will tell me the average second hand price of cars and median will imply the middle value. Median is vital since in some cases it is the best average to use as it gives the middle value meaning it is not affected by extremely large values in a data where the rest of values are small. I have decided not to calculate the range because it can be misleading as it is affected by extreme values since it is calculated by highest value minus lowest value. Therefore I have decided to find the range of the middle 50% of the data which means it is a better way to measure the spread.
Fiat:
Mean: £3606
Median: £3882
Lower Quartile: £1871
Upper Quartile: £4624
Interquartile Range: £2753
Ford:
Mean: £3902
Median: £3200
Lower Quartile: £1664
Upper Quartile: £4700
Interquartile Range: £3036
Rover:
Mean: £4251
Median: £2975
Lower Quartile: £2120
Upper Quartile: £3768
Interquartile Range: £1648
Vauxhall:
Mean: £4961
Median: £4995
Lower Quartile: £3495
Upper Quartile: £6499
Interquartile Range: £3004
I will now move on to construct box plots. These are a useful way of representing the range, the median and the quartiles of a set of data. The data can be produced of cumulative frequency graphs or can be raw data. The usefulness of box plots is that they can be easily interpretative and are a constructive way to compare two or more distributions.
Comparing the box plots:
All the box plots have been drawn on the same scale so that I can compare them. Rover has the largest spread of data telling us that it has cars with a variety of prices. Fiat has the smallest spread of data, which tells us that Fiat cars don’t have a large bandwidth of prices for its cars. Vauxhall has the largest median whereas Rover has the smallest median. The car make with the largest Interquartile range is the Ford. This tells us that 50% of Ford cars are in the £1664 to the £4700 region as the Interquartile range calculates the region in which the 50% of the cars are. The car make with the smallest Interquartile range is Rover. This tells us that 50% of Rover cars are between the £2120 to £3768 regions. Even though Rover has the largest spread, it also has the smallest Interquartile range. This means that the cars in the other 50% are outside the Interquartile region and other factors may apply to some of the cars, which makes them more expensive and cheaper than other cars. Vauxhall has the highest mean of £4961, which tells us that as an average, it has the highest second hand priced cars. Fiat has the lowest mean of £3606, which tells us that as an average, Fiat has the lowest second hand priced cars.
Histograms
Histograms are useful in order to represent data in grouped frequency distributions. For this coursework, histograms will show the frequency density of the cars for each car make.
Analysis of the histograms
The Ford histogram shows that the 0<P≤2000 interval has the highest frequency of 6. This tells us that there are a higher amount of Ford cars in these intervals than any other intervals for the Ford cars. The 5000<p≤9000 interval has the lowest frequency of 3. This tells us that the minority of Ford are priced between the £5000 to £9000 intervals. Overall this tells us that there are more cars priced between the £0 to £2000 intervals, which tells us the Ford cars are cheap.
The Fiat histogram shows us that the 0<P≤3000 has the highest frequency of 5. This tells us more Fiat cars are priced between the £0 and £3000 intervals. The 6000<p≤7000 intervals have the lowest frequency of 0, which tells us that no cars are priced in the highest intervals.
The Rover histogram shows us that the 0<P≤2000 intervals have the highest frequency of 6. This means that the majority of Rover cars are priced between those intervals. The 4000<p≤8000 and the 8000<p≤15000 intervals have the lowest frequency of 1, which tells us that the minority of Rover cars are in these intervals. In addition Rover has the most expensive car priced at £14,999.
The Vauxhall histogram shows us that the 3000<p≤6000 intervals have the highest frequency of 4. This means that the majority of cars are priced between these intervals. The 6000<p≤8000 intervals have the lowest frequency of 2. This tells us that the minority of Vauxhall cars are priced between these intervals. As a whole, Vauxhall cars are priced higher together as a group.
Conclusion:
In my investigation I have found that the age affects the second hand price the most since the correlation was strong and there were fewer outliers in the scatter graphs. This is followed by mileage as identical correlations were found and fewer outliers. The least most influential factor to the second hand price of the cars will be the engine size, as it had the most outliers and the graph correlations were not the same for each car make. This means that my hypothesis was incorrect because the mileage follows the age.
If I had additional time I would have used standard deviation which measure the spread and results are more reliable. I also would have proved using coefficient of covariance that age is the factor that affects the second hand price the most. Coefficient of covariance or r gives a value which can terminate which factor is most important. If r = +1, this indicates the correlation is positive. If the value of r is close to +1 then that means the correlation is stronger that the value that is away from +1. The same thing is with -1 which indicates that the correlation is negative. So the closer the value of r is to -1, the stronger the negative correlation. I would’ve used this method to find the values of r for mileage, age and engine size for all four makes. This would have proved which factor influences the second hand price of a car the most.
If I had used more data for this investigation, then the results would have been more concise and reliable. Also, the gradients of the scatter graphs would have been clearer and so would the box plots.