• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Statistics. The purpose of this coursework is to investigate the comparative relationships between the depreciation of a cars price, in relation to the factors that affect it.

Extracts from this document...

Introduction

Statistics: Analysis of used cars database

Introduction:

The purpose of this coursework is to investigate the comparative relationships between the depreciation of a car’s price, in relation to the factors that affect it. The factors that I wish to investigate are the age/mileage of a car, being the easiest to compare to depreciation. To do this, I shall use random sampling. I shall give a number of hypotheses, claiming whether each influential factor has an adequate effect on depreciation. I shall attempt to validate this using data given to me on Excel. I have done this in terms of percentage depreciation to make sure that I have relevant data to compare depreciation over each car in my sample. Here are the hypotheses and questions:

<< Hypothesis 1 >>

The older the car, the greater the percentage depreciation of the price – I believe this because as a car travels further, essential parts may perhaps wear down, and stop the car from working to its optimum standard. After a certain level of mileage, the car’s fuel costs may begin to increase, as its decreased efficiency uses up more fuel per mile.

These following data values are necessary to calculate the depreciation of a value of a car (as a rule), when there is more or less mileage:

  • Sale price (no miles attached)
  • Mileage

Mileage will affect the percentage depreciation of the original car’s price, so there should be no other variables included in the data needed to prove, or refute this hypothesis.

...read more.

Middle

Beetle

14950

13500

1

6500

1

75

Rover

623 GSi

24086

2975

6

96000

2

76

Suzuki

Vitara

10800

2995

8

50000

2

77

Mercedes

AvantGarde

17915

11750

2

17000

1

78

Audi

80

17683

3995

7

103000

2

79

Volkswagen

Polo

9960

7550

1

5000

1

80

Ford

Escort

13183

3495

7

43000

2

81

Ford

Mondeo

17780

7995

4

30000

1

82

Mazda

Pegasus

10420

2495

7

50000

3

83

Rover

416i

14486

3685

6

64000

1

84

Vauxhall

Corsa

7840

4976

4

21000

2

85

Vauxhall

Corsa

7440

3495

6

55000

2

86

Ford

Fiesta

6590

1664

10

37000

3

87

Nissan

Primera

2574

9

49000

2

88

Citroen

Xantia

14065

8

49000

1

89

Peugot

Graduate

7600

2497

8

71000

2

90

Peugot

306

12350

3995

6

71000

2

91

Fiat

Punto

7518

3769

4

38000

2

92

Volkswagen

Polo

8710

4693

5

50000

2

93

Vauxhall

Calibra

18675

6995

6

63000

2

94

Rover

Metro

5495

1995

7

52000

2

95

Rolls Royce

Silver Spirit

94651

14735

9

70000

2

96

Ford

Escort

15405

3995

5

57000

2

97

Vauxhall

Astra

9795

3191

6

43000

2

98

Renault

19

11695

2748

6

52000

2

99

Ford

Escort

9995

2995

6

64000

2

100

Vauxhall

Vectra

13435

5

52000

2

This is randomly ordered, to get a general trend in data, so my results will not be biased. However, I have 4 pieces of missing data: I will need to fix this using Standard Deviation.  To solve this problem I will remove this missing data.  To find out if there are any outliers I should find out the standard deviation to find the upper and lower bounds. The upper quartile is 75% of the maximum value, and lower quartiles are 25% of it. The formulae to work the missing values out in terms of standard deviation are as follows:

Upper Bound = Mean + 2x Standard Deviation

Lower Bound = Mean – 2x Standard Deviation

There are data outside the upper bound in the column concerning the Porsche It is approximately £6,000 higher than the upper bound; it is an outlier. However, the effect is not drastic and will not obscure my results to an inaccurate curve. When I identify any huge outliers, I will remove these, though this will not have much of an impact. Because of the 4 missing data, I will need to delete the rows for this make, as a lack of one value will obscure an average. One example is the Lexus: with no mileage, it is impractical for me to include it in my investigation, because it cannot work for my 3rd hypothesis. If I remove all the other cars which lack data, I have a remaining sample of 47.

I have constructed a table to show the range of data and to see how the data correlates I comparison to each other.

Data

Highest Value

Lowest Value

Range

Price When New

£170,841

£5,495

£165346

Price Second Hand

£37995

£1995

£36000

Age

10Years

1 Year

9 Years

Mileage

103,000m

2000

101,000

Number of Owners

3

1

2

...read more.

Conclusion

 Some new questions would probably be investigated if I had the chance to do it again. For example, the third scatter graph of the no. Previous Owners vs. Mileage did not give me strong results: it had a weak correlation and a bad trend. In place of this, I would test a new theory of ‘the older a car, the greater the mileage it will have gained.’ This would be an improvement on it: instead of comparing the number of owners to the mileage the age would allow me to see how the mileage built up, in relation to its age.  Some flaws in the original graph have been spotted: for example, some people may have owned a car in a very short timeframe, and sold the car briefly after buying it.  With age included, I can see how much the car traveled in relation to time, rather than the number of people who drove it. I believe this hypothesis would give me a strong correlation: it would provide me with more reliable results.

Perhaps if I had more time, I would test this multiple regression to see how different influential factors affect each other, rather than depreciation.      

...read more.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. Statistics coursework

    All these diagrams will either prove or disprove the first part of my hypothesis - that girls have a higher IQ than boys. So the next stage will be to compare the IQs of boys and girls to their total KS2 results.

  2. Statistics. I have been asked to construct an assignment regarding statistics. The statistics ...

    So 1/2 � 19 = 9.5th Value. The Median is 26,100. The lower quartile would be 1/4 � 19 = 4.75th value. Lower quartile is 23,900. The upper quartile would be 3/4 � 19 = 14.25th value. Upper quartile is 27,500.

  1. Guestimate - investigate how well people estimate the length of lines and the size ...

    The histogram proves this because year 10 has a smaller range and although their modal group was the same, the frequency density was higher. However, to prove this hypothesis further I could have drawn histograms for angle 1 as well.

  2. I have been given the task of finding what affects the price of a ...

    There are two main methods random or stratified, eventually I want to try both but for now I will use a random sample. To do this I will use the random number function on my calculator. I press the random number button and a 3 decimal place number is displayed,

  1. Investigate if there is any correlation between the GDP per capita ($) of a ...

    1.707655324 Pakistan 2000 1.793790385 Papua New Guinea 2100 1.807467376 Philippines 4600 1.840670561 Reunion 5600 1.865873528 Saint Helena 2500 1.888628725 Saint Pierre and Miquelon 11000 1.892706638 San Marino 34600 1.910784435 Saudi Arabia 11400 1.837146344 Country Log of GDP - per capita, Purchasing Power Parity Log ($)

  2. Gcse Statistics: Coursework

    The estimating variable, which I have chosen to use for my investigation is length. I have chosen this because I feel it will be the least complicated variable to investigate, therefore it won't cause awkward problems. A good percentage of youngsters don't have a good perception of weight or mass

  1. Statistics Coursework

    the age of the students and their attendance figures at school or there is no relationship at all. However, the students' appreciation of the importance of their attendance figures does and this is why (in my opinion) the attendance figures vary between students.

  2. Anthropometric Data

    This is saying a weak positive correlation in this case having a characteristic of length (mm) increasing as another; the points are sloping upwards from the bottom left to the top right. In another words the points are arranged in a group structure.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work