Therefore the proportions for each car make will be: No. of Cars for each Car Make X 40
Total No. Of Cars for all Car Make
. Ford: 16/51 X 40 = 13
. . Vauxhall: 13/51 X 40 = 10
Rover: 12/51 X 40 = 9
Fiat: 10/51 X 40 = 8
Now that I know how many cars of each car make I need; I will make a random selection. The reason for doing this is to avoid any biasness.
I will number each model for all the 4 car makes i.e. I will number 1 to 16 for ford then 1-13 for Vauxhall and so on. I will then use my calculator to generate random number so I will pick 13 random cars for ford and 10 random cars for Vauxhall and so on.
The table below shows the results I got
Now that I randomly picked the cars needed to carry out my investigation, I will analyse each car make thoroughly since the data is relatively small and convenient to work on.
I will calculate the averages of each make i.e. the mean regardless of age, the range, and the median and finally the mode if any.
The table below shows the Averages of the second hand price of Ford, Vauxhall, Rover and Fiat.
As you can see from the table above, the mean second hand price of Ford, Vauxhall and Rover seem to be in a close range i.e. about 4,300 to 4800, while Fiat has a very low mean. This generally means that Fiat is much cheaper then the Ford, Vauxhall and Rover.
If you look at the range of all the four makes, you will notice that Rover has the largest figure with 14104. However if you look at its median, which is 2975, we can note that if its middle price is that then the cheapest car has a much lower value in price than other car makes and also its expensive car has much higher value in price then other car makes even though we can not really prove that now and also we don’t know the major factor that’s contributing to this results.
I will now analyse each factor that I choose i.e. age, engine size and mileage by comparing it the second hand price. I will draw a graph and make a comparison between the four car makes.
Through out the investigation, I will be using scatter diagrams this is because they show a visual representation of deviation from the trend, and they are easy to comprehend. They are also simple and effective. I thought that they would be the best way to show this amount of results, and be easily shown and understood.
The first factor I will look at is the engine size.
The graphs on the next page show scatter diagrams of Engine size against Second Hand Price for all the four car makes.
This graph shows a positive correlation. It has a gradient of 2954.2. It seems as though its engine has a low value compared to Ford and Vauxhall.
This graph also shows a positive correlation but this one looks a bit steeper. This means that its gradient is high; 3306.3. This indicates that the engine size has a higher value compared to Vauxhall.
This graph has a positive correlation. It has the highest gradient, which means it’s the steepest. This also means its engine size has the highest value amongst the four.
This graph shows a very weak negative correlation. This is indicated by the gradient being -29.74. It has the lowest gradient and also there seems to be a contradiction between the others and this one. This negative gradient tells us that as the engine size increases, the price decreases which is incorrect. Bu I will have to prove this in a later stage.
From the previous page, the entire four car makes seem to have very weak correlation. There are several reasons for this trend. The trend shows us that the bigger the engine size, the higher the price except for one which is Fiat. Even when a car has a large engine, it could be old or even has high mileage; this could serious affect its second hand price when on sale. The same thing is applicable for cars with small engine size, they might have higher prices then those of larger engine size and the reason could be that it’s still in good condition; still very new and has less mileage. Therefore I realised that engine size alone cannot be used to determine price of a used car.
I can conclude that the Engine Sizes has no links between the second hand prices
I will now look whether there is any correlation between the Engine size and price when new so that there are no other factors like age or mileage affecting it. My prediction are that since there aren’t any other variable affecting the engine size apart from the model type, there should be a good positive relation between engine size and price when new.
The graphs below show Engine size against price when new for all the four car makes.
The equation on the line indicates This lines equation indicates that gradient
that it’s a positive correlation. The gradient m is m is 5036.7; this tells us that its engine
80015; this indicates that the engine size has a high has the lowest price compared to the rest.
value.
Unlike the above graph of Ford, this graph is steeper. In comparison to the rest of the cars, this
This tells us that the gradient is higher, 6359.4. This graph’s equation line tells us that it’s
Also tells us that its engine size is worth much more the steepest of all with a gradient of
the one for ford. 15232. Indicates that its engine size is worth a lot than the rest.
From the scatter diagrams on the previous page, you can clearly see that I was right. There is quite a good correlation. I can not say they all have a strong positive of correlation because for example cars from Ford vary with model name e.g. Fiesta with an engine size of 1.8 has a price of £ 8680 however Orion with the same engine size has a price marked £ 16000. The same case is happening in Vauxhall, Nova with engine size 1.4 is worth £ 5599, whilst Tigra that is from the same car make with an engine size 1.4 is worth £ 13510. So you can see why the correlation between engine size and Price when new is not very strong. Therefore can conclude that engine size isn’t a great factor that influences the second hand price but it has a slight effect on it.
I will now move on to see what effect mileage has on a used car.
The scatter diagrams below show relationship between mileage and second hand price for the entire four car makes.
The equations of these lines can tell me steep the Lines are. These are the gradients. I can then use them, though not very reliable, to estimate the price of a given used car whose mileage is known.
The reason I am saying its not very reliable is that there are some exceptional. These are indicated in blue circles. They are known as anomalies. They don’t follow the same trend as the other ones hence making them odd ones.
From the scatter diagrams on the previous page I can make several conclusions regarding the relation between Mileage and Second Hand Price, the first one being that there is a negative correlation. This is because as the price drops, the mileage increases. And also showing a negative correlation as the price drops faster when the car is younger than the older and the price will never reach a negative price. It might gradually become cheaper and cheaper but it will not reach a negative value.
However, there are three anomalies in blue on the graph where it is far out of place than the others. The first one is for Vauxhall that has a mileage of 63000 and its price is 6995, which is quite still high. If you look at its age, six years old, but it still cost more then others, which is very odd. However after looking at its engine size of 2.0, I think that’s the reason why it costs much more than other cars.
If we look at the second blue point, that’s for Fiat, you will notice that the car has a high mileage and yet its second hand price is still high. But looking just below it, another car that has the same mileage of 51000 is much cheaper than it. In fact the price difference is £ 2000. However I analysed it further and I noticed that the one with the blue circle on has an engine size of 1.4 and has only been used for five years while the other one below it has an engine size of 1.0 and has been used for 8 years. Now I can see where the difference comes in.
The third one is from the Rover. It has a second hand price of 14999; it’s been used for a period of one year and has an engine size of 1.8. I think the reason why this has a high value is because its original price is 19530, this indicates that it was an expensive car and so it will influence its second hand price. The second reason could perhaps be the fact that it’s been used for just one year. I can’t comment on the engine size because with a size like 1.8, this car should be worth less in comparison to 623 GSi that has an engine size of 2.3 yet its price is just a fifth of this one.
I can conclude that as the mileage increases, its second hand price decreases.
I also realised that when the car is young, i.e. one or two years old, its price drops faster than when it’s old.
Another thing I noted is that the price will keep on dropping but it will never reach a negative value. If it does then it means the car seller has to add money to the person he is selling the car to and that I don’t suppose happens always.
As the mileage of a car increases, it becomes more and more dominated variable for estimating its price. What I mean by this is that, one can predict and estimate the price of a car by looking at its mileage and can also now how old it might be.
For cars with small mileages, their prices vary very much by other variables, such as the brand or engine size, where as for a car with high mileage, its price highly depends on its mileage.
I will now move on to see what the relationship between age and second hand price is for all the four car makes. My predictions are the same as the one I made previously with mileage and second hand price, I believe that the older a car gets, the less value it has and hence low Second Hand Price.
Equations For the curve:
Ford: y = 82.8x2 - 1703.5x + 10127 Fiat: y = 68.0x2 - 1379.8x + 7930
Rover: y = 253.5x2 - 4240.9x + 19088 Vauxhall: y = -65.1x2 + 34.3x + 7002.1
I made four scatter diagrams for to see the relation between age and second hand price. As you will notice, I made two trend lines one is curvy while the other one is straight. I used to curvy trend, which is also known as polynomial type to show how the second hand price reduces as age increases, which agrees with my prediction. However if you look carefully at the trend, you might notice that when the car is young i.e. has been used for 1 year or two, its price declines drastically but as it gets older, its second hand price reduces steadily. The second hand price keeps on declining as the car gets older but the second hand price will never reach a negative value. It might go to a price next to nothing but it will not go negative. This is quite obvious because if the second hand price goes negative then it means that the seller will have to add some money on top of the car he is selling, which hypothetically does not happen.
However there is one anomaly in Vauxhall circled blue. This car seems to be old but yet its second hand price is still high. There I reckon there’s a factor, which is causing this result. This car is exactly the same one as the one on the previous graphs showing the relation between mileage and second hand price. This car is an exceptional because not only is it old but also it has a high mileage. The thing that’s still making it maintain it high value is the engine size of 2.0.
The second trend I used is called linear. Its function to estimate about how much the second hand price declines per year by looking at its gradient. They all have different depreciation rates and this will takes me to the next stage where I look at the depreciation rate of each car make. But before I look at each car makes depreciation rate, I will see what the relation between mileage and age is because I believe whichever way the age affects a car, so does the mileage. They seem to have a strong common relation. My predictions are that they will have a good positive correlation.
Uses of my results
The equation of the line of best fit can help me predict second hand price if age/mileage is known. Any of the results can be used by car dealers. If someone for example wishes to compare second hand price of 5 year old car, I can simply show the cost by using x = 5 years in my equation.
E.g Ford = -728.87*5 + 7853.4
Fiat = -778.68*5 + 6957.5
Therefore, the price for a 5-year-old Fiat would be £4209.05 and the price for 5-year-old Ford would be £3064.1
However, I can use the equation of the curved line to get results that are more reliable. Looking at the second hand price using the equation of curved line, Ford is worth £3679.5 while a Fiat would be worth £2731
This graph shows me that the age and the mileage are very closely linked. I was right; as this graph complies with my prediction. The gradient of the best-fit line is close to 7334, which means it shows that the link is very close to 7300 miles per year. The y-intercept should show 0 in theory because the car was not driven when it was new, but in this graph, it shows that the y-intercept is 5758.1. This tells us that there's an error on this graph. However there are a few points on the graph circled in blue which are anomalies. I am not surprised to see them because of the few exceptional cars that have either large engine size or they have a popular model type which makes it expensive in the market.
Weaknesses
I have noticed that the depreciation varies according to age e.g. Ford Orion lost 50% value after first year.
Therefore, I have now decided to look at percentage depreciation per year. There is an advantage of using percentage depreciation, which is; it gives a better idea as it compares the old price with a makes New Price instead of absolute price.
E.g. Mercedes Benz £14425 show a loss of £3426, which is 23.8 % loss
Nissan Micra £ 7995 show a loss of £ 3996, which is 50%
These shows that even when absolute price differences are in a close range, their percentage loss show much difference hence making it more reliable using the percentage depreciation.
I am now going to proceed to the stage where I will check each car make’s depreciation rate so as to see which car depreciates more. The formula I am going to use to find this value is as follows:
Formula: Depreciation rate % = [(New Price – Second Hand Price) x 100%] /Age
New Price
The results for each car make are on the next page.
Vauxhall
Rover
Fiat
Ford
Just by observing the results on the table, I can conclude that, the older the car, the less depreciation rate it has. While the younger the car e.g. one or two years old, the higher the depreciation rate. This is true in some way, because I earlier found out the younger the car, the faster it looses its value. However the first statement I made is not really correct because even when the car is very old, its depreciation rate seem to decline but what we must not forget is that the same case will happen to these car i.e. in its early few years, it will have a high depreciation rate but as it becomes older, its depreciation rate will reduce. A good example is the Vauxhall Nova, it is 10 years old and its depreciation rate is 8 % in comparison with the Fiat Bravo, it is 1 years old and its depreciation is 37%.
However we have to bear in mind that is the average of all the depreciation for 10 years. It could mean that the first year it depreciated 30 % then second year 25 % then third year 17% and so on. However there are some with exceptionally high depreciation rate e.g. the Ford Orion has been used one year and it depreciated by 50 %. The reason for is would perhaps be because its not popular in the market as it used to be. The reason I said this is because Ford Orion has quite a large engine size of 1.8 and also its has a low mileage of 7000 which I believe is contrary to its second hand price.
If I had time, I would look at cars in age cluster of 1 year old, 2 year old. 3 year old, because age seems to be an indicator of price of price.
Otherwise I have once again proven the fact that cars, which are young, seem to decline in value very fast. Therefore age affects price.
Cumulative Frequency
I am now going to find the cumulative frequency for the entire car makes. I made a cumulative frequency table below for all the hundred cars. This is going to help me find the mean, median, upper, lower and inter-quartile ranges. To get these values I will need to draw cumulative graph.
The next shows a cumulative graph of all the cars.
From the graph, I can calculate the median, upper quartile, lower quartile and inter-quartile range.
Median = ½ x 100 = 50th value
Upper Quartile = ¾ x 100 = 75th value =
Lower Quartile = ¼ x 100 = 25th value =
Inter Quartile = U quartile – L quartile =
I am now going to draw Box and whisker diagrams as well as cumulative frequency to compare these four cars makes and I will make comments about it. Cumulative frequencies and Box Plots show the spread of distributions. The smaller the spread the more consistent the predictions and so add reliability to the results.
The median (mid-point) divides the data in half and is a better indicator of second hand price if there are extreme values. These extreme values affect the mean (average) and distort it. The interquartile range tells us the range of middle 50% of cars
Standard Deviation
I am now going to find the standard deviation because it measures spread of the data about the mean value. The reason for doing this is because it will give me a more detailed picture of the way in which the data dispersed about the mean as the centre of distribution. Its main use is to compare two sets of data. If a set has low standard deviation, the values are not spread out too much. In a normal distribution, about 66% of the scores are within one standard deviation of the mean and about 98.3% of the scores are within two standards deviations of the mean. The assumption I am going to make in this case is the curve I get has a normal distribution
I am going to find out the standard deviation of all the hundred cars and will analyse it and hopefully come to a conclusion.
The formula I am going to use is as follow:
I will skip the calculations and just go straight to the answer that is:
Standard Deviation = 4939.6 (1 DP)
Now I am going to use the standard deviation and the mean of all the 100 cars to know what value falls under 68% of all the cars and 95% of all the cars.
The mean of Second Hand Price for all the 100 = 5241.6 (1 DP)
Now by subtracting and adding the standard deviation from the mean, I am going to know what the value of second hand price 66% of the cars would lie and also 98.3% of the cars.
Within one standard deviation, 66% of the cars lie between:
5241.6 + 4939.6 = 10181.2
5241.6 - 4939.6 = 302
This means that 302 < Second hand price > 10181.2
Within two standard deviation, 98.3% of the cars lie between;
5241.6 + 9879.2 = 15120.8
5241.6 - 9879.2 = -4637.6
This means that -4637.6 < Second hand price > 15120.8
I can prove this by calculating the percentage of cars, which lie below prices; 10181.2 & 15120.8
I am now going to look at the relationship between second hand price and each factor (age, mileage, engine size) for the entire four car makes in a more rigorous approach using values. This formula is called coefficient of covariance = r. This will conclude my hypothesis by pointing out which factor has a stronger correlation by looking at its value.
Example:
If r = + 1, then all the points lie exactly on a straight line with positive gradient; this is perfect positive
correlation.
If r = -1, then all the points lie exactly on a straight line with negative gradient; this is perfect negative
correlation.
The closer r is to + 1, the closer the points are to a straight line.
The formula for the coefficient of correlation also known as product moment, r is defined as:
r = cov (x, y)
SxSy
Cov X, Y =
Sx =
Sy =
Where Sx and Sy are the standard deviation of x and y.
The units of cov (x, y) are (units of x) X (units of y) and the units of Sx and Sy are also (units of x) X (units of y), so the value of r does not depend on the units of which x and y are measured.
I shall later show that
-1 ≤ r ≥1
I am now going to find the coefficient correlation of second hand price of Ford against Age, Mileage and engine size.
Ford
For each of the table on the previous page, the values for r are as follows:
Coefficient of correlation between second hand price and Age is: r = -0.844
Coefficient of correlation between second hand price and Engine size is: r = 0.3667
Coefficient of correlation between second hand price and Mileage is: r = -0.895
From this I can see that Mileage is closer to -1, which means it has a stronger correlation, and the second one is the Age. I am going to see if this is the same case in Vauxhall, Rover and Fiat.
Vauxhall
Coefficient of correlation between second hand price and Age is: r = -0.9016
Coefficient of correlation between second hand price and Engine size is: r = 0.564604
Coefficient of correlation between second hand price and Mileage is: r = -0.77514
In this case, age seems to be more dominant which r = .0902 this shows that it has a stronger correlation since its closer to -1. It’s followed by mileage.
Rover
Coefficient of correlation between second hand price and Age is: r = - 0.948
Coefficient of correlation between second hand price and Engine size is: r = 0.338
Coefficient of correlation between second hand price and Mileage is: r = -0.620
The same thing is happening again, correlation between age and second hand price seems to be stronger than the other two. I can predict that it would be the same with fiat. I am going to carry out the calculations to prove this.
Fiat
Coefficient of correlation between second hand price and Age is: r = -0.957
Coefficient of correlation between second hand price and Engine size is: r = -0.003
Coefficient of correlation between second hand price and Mileage is: r = -0.712
I was right, age has the most effect on the second hand price and it’s followed by mileage.
Note that r engine size is negative, this implies that as the engine size increases, the price decreases which is not quite true. I came across the same thing previously and I explained that it was an exceptional.
I experienced a bit of difficulties doing this coursework, this is mainly because of the following:
- The data provided was secondary, this means that I did not personally collect the data and therefore errors might have been encountered whilst it was collected hence affecting my results.
- The data was insufficient in that there were only 100 cars provided. This restricts me from doing further analysis.
- Even in the same make, cars have different specs and therefore I could not be able to compare like with like.
This brings me to my conclusion, as I mentioned in my hypothesis that age would be the dominant factor that affects the second hand price the most followed by the mileage. I used all my mathematical and statistical knowledge to come to this conclusion.