# Used Cards - find which factors will influence the price of a second hand car and in what way

Extracts from this document...

Introduction

_ _ Maths Statistics – Used Cars

Pilot Study

Aim

The aim of the coursework is to find which factors will influence the price of a second hand car and in what way. I believe that the price second hand cars are sold at is dependant upon several factors; certain factors will have a much larger effect on the price than others. In my investigation I am going to chose the two most popular (this being the most amount of a specific make provided by Edexcel of two specific makes) manufacturers of cars from a tally chart of all of the cars because the thing that people look at first when they are buying a car is the make and I believe this is what will affect the second hand price also. Different makes have different prices and depreciate every year at different rate resulting in sum cars holding their value due to the make. In addition certain cars have a very good reputation of being reliable while others are not. Also some cars have a higher social status than others for example people would prefer a Mercedes over a Ford.

Before I commenced with developing a hypothesis, I firstly explored and tried to discover any data, information or details needed for my statistic coursework. Previous to this, I was given a candidate sheet that provided me with factors which I may have liked to consider. The factors are:

- Price
- Age
- Mileage
- Cost when new
- Engine size
- Colour
- Make
- Fuel type
- Estate/saloon/hatchback

I then began investigating, whether second hand prices change due to different factors of a car. Furthermore, considering whether these factors will affect the depreciation of a car from the new price to the used price would be another aspect to investigate. I will have to reflect on the following questions and the main one being the first:

- What affects the price of a Used Car?
- What is my population?
- What data do I need to collect?
- What sampling method am I going to use?
- How large a sample do I need?
- What method of data collection am I going to use?
- How am I going to record the data I collect?

As secondary data was provided from the Edexcel website, I used that data to my advantage and then began with my simple hypothesis.

Factors Elaborated / Example Hypotheses

Age: The older the car, the cheaper it is, however this may not apply effectively when comparing prestigious cars to standard cars.

Mileage: The lower the mileage, the higher the price of the car.

Cost when new: The cost of a standard car when new may depreciate vastly when it is used, but with premium cars it differs.

Engine Size: As engines differ in size or capacity, I consider that the smaller the size of the engine, the less expensive the car will be as less petrol would be needed.

Colour: Colours differ in Middle Eastern countries. If the colour of a car is ‘white’ it is known to be quite costly. However in European countries white is known to be rather cheap. To sum up I believe rich colour coated cars are more expensive.

Make: The make of a car is one of the most important factors when considering the depreciation a used car. As I have previously stated, standard cars are to a great extent affected by other factors, though premium or prestigious cars are practically not affected by some factors. So therefore the higher rank a used car is the higher the price.

All the variables above are ratio variables (Numbers are used) except the make of the car and the colour, because they are called nominal variables.

Hypothesis

Primarily, I will first try to find any relationships between the current price of the car and the other variables such as age, mileage, car engine size, and the make. After finding the relationships, I will then attempt to progress the data in order, allowing me to eliminate weak relationships and to find stronger relationships between specific variables and eventually I will try to find a general relationship between them. From this, I will draw the conclusions from the examination about how the price decreases and what affects this.

As secondary data was easy to collect and access, I then began with a simple hypothesis for my pilot study which was ‘the higher the mileage, the lower the price of the used car’. The negative thing about secondary data is that it may not be exactly correct; data may also be missing or out of date. I decided not to use primary data because it can be very time consuming, however the positive aspect of it is; data is accurate most of the times. Sometimes, the entire population may be sufficiently small, and I could have included the entire population in the study. However since the data provided was too much to utilize, I will have to apply a sampling to my data. I will carefully choose the sample which can be used to represent the population. It needs to be large enough to represent the population, but small enough to be manageable. The sample reflects the characteristics of the population from which it is drawn. I took a sample of approximately 15% of my data, which is; 30 cars. Samples are used representatively. They represent the whole database. We use samples because it would take too long to investigate every piece of data on the database, so we only investigate a census (portion) of the population. It is important to clearly define the target population.

Sampling Methods

Sampling methods are classified as either probability or non-probability. In probability samples, each member of the population has a known non-zero probability of being selected. Probability methods include random sampling, systematic sampling, and stratified sampling. In non-probability sampling, members are selected from the population in some non-random manner. These include convenience sampling and quota sampling. The advantage of probability sampling is that sampling error can be calculated. Sampling error is the degree to which a sample might differ from the population.

Random sampling is the purest form of probability sampling. Each member of the population has an equal and known chance of being selected. When there are very large populations, it is often difficult or impossible to identify every member of the population, so data may become biased.

Systematic sampling is often used instead of random sampling. It is also called an ‘Nth’ name selection technique. After the required sample size has been calculated, every ‘Nth’ record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file. (E.g. Edexcel Database)

Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error. The researcher first identifies the relevant stratums and their actual representation in the population. Random sampling is then used to select a sufficient number of subjects from each stratum. Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums.

Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation of the truth. As the name implies, the sample is selected because they are convenient. This non-probability method is often used during preliminary research efforts to get a gross estimate of the results, without incurring the cost or time required to select a random sample.

Quota sampling is the non-probability equivalent of stratified sampling. Like stratified sampling, the researcher first identifies the stratums and their proportions as they are represented in the population. Then convenience or judgment sampling is used to select the required number of subjects from each stratum. This differs from stratified sampling, where the stratums are filled by random sampling.

As you can see above I have analysed each sampling method to its extreme, and have noticed that since random sampling may give an equal chance of a car being selected, I decided to choose it. Furthermore as this is my pilot study I will only be working with a small population of my data, so it will be easy to avoid any incorrect conclusions. As for my main study I will not use this sampling method since I will be working with a large population of data. Moreover some of the sampling methods (Quota and Convenience) include aspects of random sampling. So they may also distort my final conclusions and invalidate my hypothesis. So now I have a choice of systematic and stratified sampling. As I will be focusing on a more complex hypothesis for my main study, I decided to use stratified sampling because of it reduces the sampling error.

Source:http://en.wikipedia.org/wiki/Sampling_(statistics)

Method

The data I had at first was only a sample of the used cars population in the country, taken from recent adverts and reputable guides to the motor trade. For the pilot study, I deleted the unnecessary factors, and left the one’s which I was to work upon. For my sample I used a =INT(RAND()*204)+2 formula.

I clicked on the cell again and moved the cursor to the bottom right of the cell until it changed to a black cross. I dragged down until I reached the bottom of the data.

Here is the data with the random sample numbers on the side.

Selected data:

As you can see I have selected a random sample from my database, I chose the first 30 numbers that my random sample formula provided me and notified and replaced and errors which it had chosen.

Here is the original random sample of numbers and their data that I had (In ascending order):

Random Numbers

79 |

92 |

100 |

107 |

114 |

120 |

120 |

140 |

146 |

148 |

156 |

166 |

169 |

177 |

190 |

2 |

2 |

4 |

11 |

12 |

15 |

16 |

18 |

23 |

24 |

34 |

43 |

70 |

71 |

75 |

Random Data

Car | Make | Model | Price | Mileage |

no. | Used | |||

2 | Mercedes | E-Class 2000 | 11395 | 12000 |

2 | Mercedes | E-Class 2000 | 11395 | 12000 |

4 | Rover | 25 | 2970 | 50000 |

11 | Nissan | Micra | 860 | 28000 |

12 | Fiat | Bravo | 1885 | 51000 |

15 | Mercedes | C-Class 93-01 | 90000 | |

16 | Ford | Ka | 2090 | 10000 |

18 | Rover | Mini | 1190 | 12000 |

23 | Honda | Prelude | 1810 | 6000 |

24 | BMW | 3-Series 91-99 | 12825 | 68000 |

34 | Mazda | 121 | 1620 | 55000 |

43 | Mazda | Demio | 1920 | 71000 |

70 | Daihatsu | Sirion | 4915 | 17500 |

71 | Mercedes | Cab E-Class | 10920 | 9500 |

75 | Ford | Mondeo 96-00 | 3335 | 22000 |

79 | Subaru | Forester | 4550 | 50000 |

92 | Mitsubishi | Carisma | 1385 | 71000 |

100 | Nissan | 100 NX | 1005 | 43000 |

107 | Mercedes | SL-Class 89-02 | 19260 | 12000 |

114 | Fiat | Bravo | 1125 | 90000 |

120 | Nissan | Almera | 9075 | 90000 |

120 | Nissan | Almera | 9075 | 90000 |

140 | Fiat | Stilo | 4900 | 60000 |

146 | Toyota | Previa | 10700 | 12000 |

148 | Chrysler | GrandVoyager | 6690 | 15000 |

156 | Land Rover | Range Rover | 7735 | 12000 |

166 | Mercedes | M-Class | 25810 | 19000 |

169 | Ford | Explorer | 4715 | 10000 |

177 | Mercedes | A-Class | 12320 | 80000 |

190 | Ford | Escort | 1225 | 10000 |

As you can see I have notified data which the random sample had repeated. Moreover I have also highlighted a very significant error which may affect my results. This field had an empty cell, with no details of the price used. This is the problem with secondary data, some mistakes may occur with the data and they need to be ignored. Here are the details of car No. 15.

15 | Mercedes | C-Class 93-01 | Missing | 90000 |

As mistakes can not be included in the sample, because it will cause an error and an anomaly in my graph, I decided to get rid of the car numbers that had errors. I decided to replace them through choosing random numbers myself, by using the calculator.

The scientific calculator has a random number generation capability, Ran# which can be used to generate the random numbers. The command generates a random number larger than zero and less than one. Eventually, the random numbers produced are spread evenly over the whole interval from zero to one.

The following calculator command will be used to generate the random numbers I need for my sample:

- Enter 204Ran# to generate a random number between 0 and 204

Middle

s = 1022834768 ÷ 30

s = 5839.048918

s = 5839.05 (2.d.p)

I also worked out the standard deviation for the mileage; here is what I did to get the standard deviation;

s = Σ(X-x) 2

n

s = 21813883000 ÷ 30

s = 26965.33763

s = 26965.34 (2.d.p)

Since my data seemed to be of a weak correlation, due to the conclusion given by the standard deviation, I have decided to expand my variable from mileage and price used, to age and price used. I assume that the higher the age of the car, the lower the price.

I have adjusted my data and have added the ages of all the chosen cars. Here is the data that I am going to use:

Car | Make | Model | Price Used | Mileage | Age |

no. | |||||

2 | Mercedes | E-Class 2000 | 11395 | 12000 | 7 |

4 | Rover | 25 | 2970 | 50000 | 8 |

11 | Nissan | Micra | 860 | 28000 | 12 |

12 | Fiat | Bravo | 1885 | 51000 | 9 |

16 | Ford | Ka | 2090 | 10000 | 10 |

18 | Rover | Mini | 1190 | 12000 | 12 |

19 | Volvo | 440 | 1155 | 10000 | 12 |

23 | Honda | Prelude | 1810 | 6000 | 12 |

24 | BMW | 3-Series 91-99 | 12825 | 68000 | 6 |

31 | Skoda | Fabia | 3585 | 20000 | 7 |

34 | Mazda | 121 | 1620 | 55000 | 10 |

43 | Mazda | Demio | 1920 | 71000 | 9 |

70 | Daihatsu | Sirion | 4915 | 17500 | 5 |

71 | Mercedes | Cab E-Class | 10920 | 9500 | 10 |

75 | Ford | Mondeo 96-00 | 3335 | 22000 | 8 |

79 | Subaru | Forester | 4550 | 50000 | 10 |

92 | Mitsubishi | Carisma | 1385 | 71000 | 12 |

100 | Nissan | 100 NX | 1005 | 43000 | 12 |

107 | Mercedes | SL-Class 89-02 | 19260 | 12000 | 9 |

114 | Fiat | Bravo | 1125 | 90000 | 12 |

116 | BMW | 5-Series 1996 | 6145 | 19900 | 10 |

120 | Nissan | Almera | 9075 | 90000 | 3 |

140 | Fiat | Stilo | 4900 | 60000 | 5 |

146 | Toyota | Previa | 10700 | 12000 | 7 |

148 | Chrysler | Grand Voyager | 6690 | 15000 | 10 |

156 | Land Rover | Range Rover | 7735 | 12000 | 12 |

166 | Mercedes | M-Class | 25810 | 19000 | 5 |

169 | Ford | Explorer | 4715 | 10000 | 10 |

177 | Mercedes | A-Class | 12320 | 80000 | 3 |

190 | Ford | Escort | 1225 | 10000 | 11 |

Below there is a scatter graph showing the results that I got with the data I have shown above.

The line of best fit shows a medium negative correlation between the data which shows that as the age of a car increases the value of the car decreases. This is shown in this section of the graph.

This proves that my theory/assumption was correct; moreover the anomalies may have affected the results in my graph.

I have worked out the correlation coefficient;

r = -0.555694

r2 = -0.3087958216

= -30.87958216%

The correlation coefficient (product moment correlation) of the data for age and price used is -0.555. So according to the scale, this indicates my correlation as being a medium and weak correlation.

I have expanded the working out of the correlation by using Spearman’s rank.

Conclusion

Evaluation

To evaluate, I believe that the data I used was only a portion of the population that exists. Moreover it was also secondary data, which had negative and positive aspects about it. I think that if I had more time I would have researched for primary data to supplement my secondary data. This will make my results even more reliable then they are. Also increasing the amount of cars for my main study to about 100-150 cars to make my results more accurate. Also including other makes of cars, such as Renault, Jaguar, etc. I would have to make sure that I had equal numbers of each make for my investigation so that it does not become biased.

The thing I found easy was collecting the data, since it was already provided from Edexcel. I found it difficult to attempt in collecting primary data, since it is very time consuming. Another difficulty aspect of the study was sorting out the data, and choosing a sample, without having biased results, this lead on to analysing the results, which was quite hard, since the formulas/methods was complicated to use for some data.

Next time, I believe that I should consider the factor of ‘colour’ because:

Colour: Colours differ in Middle Eastern countries. If the colour of a car is ‘white’ it is known to be quite costly. However in European countries white is known to be rather cheap. To sum up I believe rich colour coated cars are more expensive.

This would be a great thing to investigate, since it is interesting and I would like to see the results that I obtain. Overall I feel that I have proven my hypotheses to be correct, and I am not surprised about with result I obtained.

This student written piece of work is one of many that can be found in our GCSE Gary's (and other) Car Sales section.

## Found what you're looking for?

- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month