I used three scatter graphs, because they show a visual representation of deviation from the trend, and they are easy to comprehend. They are also simple and effective. I thought that they would be the best way to show this amount of results, and be easily shown and understood.
I used a pie chart to show the relative proportions in my sample represented by each car manufacturer. This was so I can easily display the make of cars, and the visual representation is easy to comprehend. I also did not use ‘Bar Charts’ or ‘Frequency Polygons’, because the data is continuous and not discrete, so I used scatter graphs.
Conclusion 1:
I can conclude that all three of my graphs agree with my earlier hypothesis. This proves that in general, the older the car is, the more miles it has been driven and therefore costs less. This also shows that it is impossible to compare a 1994 Bentley TurboR with a 1990 Volkswagen Passat Catalyst and a 2002 Fiat Punto! I surmise that the Bentley is a definite anomaly, because it is a very expensive specialist vehicle, whereas all the rest of the cars, are mass production volume cars, and not as expensive when first registered.
To obtain better results I will have to change my original hypothesis and narrow the number of makes of vehicles down. Because Fords were the most popular cars in my sample, I will use that, instead of something less popular like Vauxhall. This will hopefully create better and stronger correlating graphs. I can then compare my used car prices more easily and satisfy my hypothesis more clearly.
So I propose to abandon my original 55 random cars, and so I then am going to investigate a new sample with about half the original amount, but more narrowly specified; I will be looking at only Fords. By looking at the same make of car, I predict that I will get a stronger correlation and a more reliable set of results.
Section 2 - Fords
Hypothesis 2:
I make the same predictions as before, but with just Fords:
- The older the Ford is, the more miles it would have been driven.
- The more miles it has been driven, the less it costs.
- As the Ford ages, it will be worth less.
Data Collection 2:
I chose to use the exact same elements as before, but with just Fords; I am again using the model, year, mileage and price. This for a second time is so I can specify it properly and have a decent reliable set of results so I can compare them fairly and as equally as I can. I am hoping, that with 30 Fords, I have enough results to be able to compare the results and to find a trend, and to make my sample representative.
Method of Data Collection 2:
I used a similar method as before to collect the raw data, but this time, I only used the ‘Auto Trader’ magazine. I used another stratified random sample, by picking out all the pages that contain Fords, and then I used the random number generator again. If the number was too high (e.g. if it said 112 and there were only 32 Ford cars on one page) I would take off the first digit, and then use that car. If the last two digits were too high still, I would simply press the ‘=’ button again, and use the next number.
Justification of Method of Data Collection 2:
I used the same stratified random sample method, because I wanted to make sure that I still got data that would satisfy a fair test. I chose to investigate 30 cars because with this amount of data, just over half of my first amount, if I find a trend I will be able to say that the results are reliable and not anomalous. I used my calculator again, to produce random numbers because this is a way that I can be sure that I will get random numbers.
Data Representation 2:
Scatter Graph 4 - Price against Mileage
If you look at the graph, you can see that there is a medium strong negative correlation. We can see that as the mileage decreases, the prices declines with it. It has one anomaly, which is the ‘Mondeo Ghia’. It still agrees with my hypothesis that the older a car is, the more miles it has been driven. These results are quite scattered and they are quite poor results.
Scatter Graph 5 – Price against Year
If you look at the graph, you can see that there is a medium strong positive correlation. We can see that as the year decreases, the price declines with it i.e. the older the car, the cheaper it gets. It is still a little anomalous, with the largest as the ‘Mondeo Ghia’ and another anomaly being the ‘Sierra Sapphire’. This graph also agrees with my earlier prediction that as the car ages the less money it costs.
Scatter Graph 6 – Mileage against Year
If you look at the graph, you can see that there is a very strong negative correlation. We can see that as the year decreases, the mileage increases with it i.e. the older the car, the more miles it has been driven. There are no anomalies, but the overall results are not very tight they are very spread. This graph also agrees with my hypothesis, that as the as the car ages, the more miles it has been driven.
Pie Chart 2 – Ford Models
If you look at the pie chart, it shows the model of Fords in proportion of the 30 Fords in my sample. In order to get the best accuracy from the sample, I will choose the model of Fords with the largest representation. We can see that Ford Fiestas are the most popular with 40%.
Justification of Data Representation 2:
I used three scatter graphs, because they show a visual representation of deviation from the trend, and they are easy to comprehend. They are also simple and effective.
I used another pie chart to show the relative proportions in my sample represented by each Ford model. This was so I can easily display the model of Fords, and the visual representation is easy to comprehend.
Conclusion 2:
I can conclude from the second section, that all three of my graphs agree with my earlier (2nd) hypothesis. This proves that in general, the older the car is, the more miles it has been driven and therefore costs less. This also shows that it is almost impossible to compare a ‘1990 Sierra Sapphire’ with a ‘2001 Mondeo Ghia’ costing almost £13,000 and a ‘2002 Ka that’s only done 500 miles’ even though they are all Fords. I surmise that the reason for the Mondeo being an anomaly is because it’s not a very expensive specialist vehicle, like the Bentley, but the Mondeo just simply was a very new car with very little amount of miles that has been driven. Ford Fiestas were the most popular models in my sample. Because it is the most popular I will use that to achieve the greatest accuracy, instead of something less popular like the Focus.
To obtain better correlations I will have to not only narrow the number of makes of vehicles down, but also narrow down the Fords to only Ford Fiestas. This will hopefully create better and stronger correlating graphs. I can then compare my used car prices more realistically and satisfy my hypothesis more clearly.
So I then propose to abandon my 30 Fords, and then investigate a new sample with the same amount of 30, but this time only Ford Fiestas. It will be more narrowed down this time. I chose the most popular amount of Fords from my last ‘Ford Table’, which were the ‘Ford Fiestas.’ I am specifying them more, by choosing them from the same model as well as make. By looking at these, I will expect to get an even stronger correlation and a more reliable set of results to properly investigate than just Fords in general.
Section 3 – Ford Fiestas
Hypothesis 3:
I make the same predictions as hypotheses 1 and 2:
- The older the Ford Fiesta, the more miles it would have been driven.
- The more miles the Ford Fiesta has been driven, the less it will cost.
- As the Ford Fiesta gets older, it will be worth less.
Data Collection 3:
I again decided to choose the exact same statistics as the last two times, but of course with just Ford Fiestas. I am using again, the model, year, mileage and price. This is so once more, I can specify it properly and have a decent reliable set of results so I can compare the results as fairly and as equally as I can. With a sufficient sample amount of 30, I can compare these results easily enough so that I can identify a trend, and so am hoping the sample is representative of Ford Fiestas
Method of Data Collection 3:
I used almost the exact same method as before to collect the raw data, because it was successful using only the ‘Auto Trader’ magazine. I used another stratified random sample, by picking out all the pages that contain Ford Fiestas, and then I used the random number generator again. If the number were too high (i.e. if it said 112 and there were only 9 Ford Fiestas on one page), I would take off the first digit, and then use that car. If the last two digits were still too high, I would simply press the ‘=’ button again, and do as before, using the next number.
Justification of Method of Data Collection 3:
I used another stratified random sample method to make sure that I still got data that would satisfy a fair test. I chose to investigate 30 cars, because with this amount of data, as I have said before, if I find a trend I will be able to say that it is reliable. I used my calculator again to produce random numbers because this is a way that I can be sure once more, that I will get random numbers.
Data Representation 3:
Scatter Graph 7 - Price against Mileage
If you look at the graph, you can see that there is quite a strong negative correlation. We can see that as the mileage decreases, the prices decline with it. It has no anomalies. It does agree with my hypothesis, that the older a car is, the miles it has done. The results are quite spread either side of the best-fit line.
Scatter Graph 8 – Price against Year
If you look at the graph, you can see that there is quite a strong positive correlation. We can see that as the mileage decreases, the prices decline with it. No anomalies gain, but we can see that the results are quite close together, except the lowest and highest points. It does agree with my hypothesis that the older a car is, the miles it has been driven.
Scatter Graph 8a – Price against Year
If you look at the graph, you can see that there are the exact same results as graph 8, but this has a best-fit curve instead of a straight line. This is because it is evident from the results, that there is a curve rather than a line compared to other scatter graphs. It shows that the price falls quickly compared to when it is older. The best, best-fit line is the curve. This is because it follows the results better than a line
Scatter Graph 9 – Mileage against Year
If you look at the graph, you can see that there is a strong negative correlation. We can see that as the year decreases, the mileage increases with it i.e. the older the car, the more miles it has counted up. There are no anomalies except the Ford Fiesta from 1989 costing £495 (number 13), but the overall results are not very closely positioned. The results are spread out either side of the line they are not very close together. This graph also agrees with my hypothesis that as the car ages, the more miles it has been driven.
Pie Chart 3 – Year in Percentage
If you look at the pie chart, it shows the Year in proportion of the 30 Ford Fiestas in my sample. In order to get the best accuracy from the sample, I will choose the year with the largest representation of Ford Fiestas. We can see that 1998 is the most popular year with 20%.
Justification of Data Representation 3:
I used another three scatter graphs, because as I have said twice before, that they are easy to comprehend, and to identify a trend. They are also simple and effective.
I used another pie chart the same as sections 1 and 2, to show the relative proportions in my sample represented by each Ford model. This was again, so I can easily display the year of Ford Fiestas, and the visual representation is easy to comprehend.
Conclusion 3:
I can conclude that all three of my scatter graphs agree with my earlier (3rd) hypothesis. This proves that once more, in general the older the car is, the more miles it has been driven and therefore costs less. This also shows that it is harder to compare a ‘1989 Fiesta costing £495’ with a ‘2002 Fiesta costing almost £9,000’ even though they are all Ford Fiestas. Ford Fiestas from 1998 were the most popular in my sample. Because it is the most popular, I will use this information to the greatest accuracy, instead of the something less popular like the year 1989.
To obtain better results I will have to change my hypothesis and narrow down the number of Ford Fiestas down even more, to Ford Fiestas from the same year. This will hopefully create better and stronger correlating graphs. I can then compare my used car prices more realistically and satisfy my prediction more clearly.
So I then propose to abandon my 30 Ford Fiestas, and so I then am going to investigate 20 different Ford Fiestas. This time, it will be more specified; I will be looking at the most popular year of Ford Fiestas, which was 1998 – see pie chart 3. I am specifying them more, by choosing them from the same year, as well as model and make. By looking at these, I expect to get an even stronger correlation and a more reliable set of results to properly investigate than Ford Fiestas in general.
Section 4 – 1998 Ford Fiestas
Hypothesis 4:
I am making the same predictions as before; but as I have chosen only Ford Fiestas from 1998, I can only make one main prediction:
-
The more miles it would have been driven, the less it costs.
- I also predict, that now it is so specific, I am confident that I will not have any anomalies, or at least not as many, when it was more vague.
Data Collection 4:
I chose the exact same characteristics as before to collect the raw data, but of course with just Ford Fiestas, as well as only from one year (1998). I am again using the mileage and price in addition to the model and year. This is so I can again, specify it properly and have a decent reliable set of results so I can compare them as fairly and as equally as I can.
Method of Data Collection 4:
I used almost the exact same method as before, because it was successful using only the ‘Auto Trader’ magazine, and because I wouldn’t be able to obtain enough Ford Fiestas from 1998 from the only 100 given sample. I used another stratified random sample, by picking out all the pages that contain Ford Fiestas from 1998, and then I used the random number generator again. If the number were too high (i.e. if it said 112 and there was only 9 Ford Fiestas on one page), I would take off the first digit, and then use that car. If the last two digits were still too high, I would simply press the ‘=’ button again, and do as before, using the next number.
Justification of Method of Data Collection 4:
I finally used another stratified random sample method to make sure that I once more still got data that would satisfy a fair test. I chose to investigate 20 cars because with this amount of data, just over half of my last amount, if I find a trend I will be able to say that it is reliable. I used my calculator again, to produce random numbers.
Data Representation 4:
Scatter Graph 10 - Price against Mileage
If you look at the scatter graph, you can see that there is quite a strong negative correlation. We can see that as the mileage decreases, the prices decline with it. It is not anomalous. We can clearly see that these results fit the best-fit line well, and they are tighter when they are more specific. It agrees with my hypothesis that the older a car is, the more miles it has been driven.
Spearman’s Rank Theory:
If you look at Spearman’s theory, you can see that the final result ends up with –0.81. All results are out of 1.0, and so I can safely say that it is quite a strong negative correlation. It complies with my hypothesis that the more miles a car has been driven, the less it costs.
Cumulative Frequency & Box Plot Graphs 1 - Mileage
If you look at the cumulative frequency graph 1, you can see that there is not much of a ‘S-shaped’ curve. This means that there is a considerable variation in the mileages that the cars have been driven. The graph shows that the deviation from the median is very large. There is no large increase in amount of cars that range between the highest and lowest point, and the interquartile range, is nowhere near the centre of the line.
If you buy a 1998 Ford Fiesta, the wide range of mileages in the sample shows that on average the car has been driven 30,450 miles. There are large deviations in the sample, but there is a very wide symmetrical distribution on either side of the median i.e. the interquartile range is 20,250 miles. The difference from the median to the lower quartile is 10,200, and the difference from the median to the upper quartile is 10,050. This means that the differences are almost identical; so the distribution is proportioned. Therefore if a car is chosen randomly from the sample, it’s just as likely to be above or below the median.
Cumulative Frequency & Box Plot Graphs 2 - Price
If you look at the cumulative frequency graph 2, you can see that there is a definite stronger ‘S-shaped’ curve than the cumulative frequency graph 1. This means that there is less variation in the prices. The graph shows that the deviation from the median is very large, just as large as before. There are large deviations in the sample, but there is a not a symmetrical distribution on either side of the median as with the mileage distribution. There is a large increase in amount of cars that range between £3,000 and £3,500. The interquartile range is a lot nearer the centre of the line, so more 1998 Ford Fiestas have a low price.
Justification of Data Representation 4:
I used a scatter graph, because, they are easy to comprehend, and to identify a trend. The scatter graph is also simple and effective. I used a cumulative frequency graph and box plots, to show the average mileages and prices, with giving the ranges and the most likely random result, to show how neatly my hypothesis fits with the results, and whether it’s correct. I used ‘Spearman’s Rank Correlation Coefficient Test Of Price against Mileage’, to show whether the scatter graph was actually was a definite negative correlation, and how well it fit the results as well as my hypothesis. It was the best way to confirm the accuracy of the sample results.
Conclusion 4:
I can conclude that my scatter graph agrees with my earlier (4th) hypothesis. This proves that the more miles the car has been driven has lowered the price. It has no anomalies, and so this shows that it is more reliable and consistent. A confirmed negative correlation from Spearman’s Theory shows us that it is more an accurate correlation, and that we can have confidence in the results.
I can say that, using the data from both of my cumulative frequency graphs and box plots, even though there is a large range of mileages in 1998 Ford Fiestas, the prices don’t differ very much. The ‘S-shape’ curve on the cumulative frequency graph does not have the same shape from the beginning to the end. It is not constant. The distribution of prices around the median is not symmetrical, therefore showing that the greatest variation is above the median rather than below. The range of prices for cars of above the average price is much less than the range of prices for cars of below average price. I believe that this is because, cars that have been driven more miles than average do not have a big decrease in price because of their mileage, but cars that have been driven less miles than average may also be in better condition than average, and so all their owners think that they are worth more by different amounts. Therefore, there is a larger range of above average prices, than below average prices.
We can see this effect, on graph 2 in section 1. Where the deviation above the line of best fit is greater than the deviation below the line.
Overall Conclusion and Summary:
- From all of my data, I can conclude that cars taken at random cannot be statistically compared, because the variations caused by the make, model, mileage, and date of first registration are generally too great.
- As a vehicle gets older, the price decreases with inverse proportion. The biggest decrease in age is when it’s nearly new, but the rate of price decrease gets less as the car ages. (See graph 8a)
- For cars of the same year, the more miles a car has been driven, the lower its price becomes (See graph 10)
- As the sample of cars becomes more selective, we can see that all the results then become more accurate and tighter to the best-fit line.
- Using mainly the cumulative frequency graphs (and graph 2), I can conclude that from the latter results, that the cars above average price have a wider range of prices than cars of below the average price.
Limitations:
- The main reason I only used 20 Ford Fiestas from 1998 was because there was only 21 Ford Fiestas in the ‘Auto Trader’ magazine.
- My sample size was quite small, because
- I didn’t look at the condition of the car (e.g. if a builder used it to carry bricks, or if it was for weddings, the condition would be quite different!)
- I didn’t take into account the extra’s that could affect the final price of the car, like air conditioning, colour, central locking…etc
- Some scatter graphs have best fit lines as curves, but I only showed this on one (8a)
- The sample size becomes smaller and therefore less reliable, as the types of cars gets narrowed down