• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Statistics. The purpose of this coursework is to investigate the comparative relationships between the depreciation of a cars price, in relation to the factors that affect it.

Extracts from this document...

Introduction

Statistics: Analysis of used cars database

Introduction:

The purpose of this coursework is to investigate the comparative relationships between the depreciation of a car’s price, in relation to the factors that affect it. The factors that I wish to investigate are the age/mileage of a car, being the easiest to compare to depreciation. To do this, I shall use random sampling. I shall give a number of hypotheses, claiming whether each influential factor has an adequate effect on depreciation. I shall attempt to validate this using data given to me on Excel. I have done this in terms of percentage depreciation to make sure that I have relevant data to compare depreciation over each car in my sample. Here are the hypotheses and questions:

<< Hypothesis 1 >>

The older the car, the greater the percentage depreciation of the price – I believe this because as a car travels further, essential parts may perhaps wear down, and stop the car from working to its optimum standard. After a certain level of mileage, the car’s fuel costs may begin to increase, as its decreased efficiency uses up more fuel per mile.

These following data values are necessary to calculate the depreciation of a value of a car (as a rule), when there is more or less mileage:

  • Sale price (no miles attached)
  • Mileage

Mileage will affect the percentage depreciation of the original car’s price, so there should be no other variables included in the data needed to prove, or refute this hypothesis.

...read more.

Middle

Beetle

14950

13500

1

6500

1

75

Rover

623 GSi

24086

2975

6

96000

2

76

Suzuki

Vitara

10800

2995

8

50000

2

77

Mercedes

AvantGarde

17915

11750

2

17000

1

78

Audi

80

17683

3995

7

103000

2

79

Volkswagen

Polo

9960

7550

1

5000

1

80

Ford

Escort

13183

3495

7

43000

2

81

Ford

Mondeo

17780

7995

4

30000

1

82

Mazda

Pegasus

10420

2495

7

50000

3

83

Rover

416i

14486

3685

6

64000

1

84

Vauxhall

Corsa

7840

4976

4

21000

2

85

Vauxhall

Corsa

7440

3495

6

55000

2

86

Ford

Fiesta

6590

1664

10

37000

3

87

Nissan

Primera

2574

9

49000

2

88

Citroen

Xantia

14065

8

49000

1

89

Peugot

Graduate

7600

2497

8

71000

2

90

Peugot

306

12350

3995

6

71000

2

91

Fiat

Punto

7518

3769

4

38000

2

92

Volkswagen

Polo

8710

4693

5

50000

2

93

Vauxhall

Calibra

18675

6995

6

63000

2

94

Rover

Metro

5495

1995

7

52000

2

95

Rolls Royce

Silver Spirit

94651

14735

9

70000

2

96

Ford

Escort

15405

3995

5

57000

2

97

Vauxhall

Astra

9795

3191

6

43000

2

98

Renault

19

11695

2748

6

52000

2

99

Ford

Escort

9995

2995

6

64000

2

100

Vauxhall

Vectra

13435

5

52000

2

This is randomly ordered, to get a general trend in data, so my results will not be biased. However, I have 4 pieces of missing data: I will need to fix this using Standard Deviation.  To solve this problem I will remove this missing data.  To find out if there are any outliers I should find out the standard deviation to find the upper and lower bounds. The upper quartile is 75% of the maximum value, and lower quartiles are 25% of it. The formulae to work the missing values out in terms of standard deviation are as follows:

Upper Bound = Mean + 2x Standard Deviation

Lower Bound = Mean – 2x Standard Deviation

There are data outside the upper bound in the column concerning the Porsche It is approximately £6,000 higher than the upper bound; it is an outlier. However, the effect is not drastic and will not obscure my results to an inaccurate curve. When I identify any huge outliers, I will remove these, though this will not have much of an impact. Because of the 4 missing data, I will need to delete the rows for this make, as a lack of one value will obscure an average. One example is the Lexus: with no mileage, it is impractical for me to include it in my investigation, because it cannot work for my 3rd hypothesis. If I remove all the other cars which lack data, I have a remaining sample of 47.

I have constructed a table to show the range of data and to see how the data correlates I comparison to each other.

Data

Highest Value

Lowest Value

Range

Price When New

£170,841

£5,495

£165346

Price Second Hand

£37995

£1995

£36000

Age

10Years

1 Year

9 Years

Mileage

103,000m

2000

101,000

Number of Owners

3

1

2

...read more.

Conclusion

 Some new questions would probably be investigated if I had the chance to do it again. For example, the third scatter graph of the no. Previous Owners vs. Mileage did not give me strong results: it had a weak correlation and a bad trend. In place of this, I would test a new theory of ‘the older a car, the greater the mileage it will have gained.’ This would be an improvement on it: instead of comparing the number of owners to the mileage the age would allow me to see how the mileage built up, in relation to its age.  Some flaws in the original graph have been spotted: for example, some people may have owned a car in a very short timeframe, and sold the car briefly after buying it.  With age included, I can see how much the car traveled in relation to time, rather than the number of people who drove it. I believe this hypothesis would give me a strong correlation: it would provide me with more reliable results.

Perhaps if I had more time, I would test this multiple regression to see how different influential factors affect each other, rather than depreciation.      

...read more.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. GCSE Mathematics Coursework: Statistics Project

    Scatter graphs are effective in discovering whether there is a correlation between two sets of data, as one set of data is plotted on the x-axis and the other on the y-axis. A line of best fit can also be drawn and the r-value can be found using Excel to describe how strong the correlation is.

  2. Guestimate - investigate how well people estimate the length of lines and the size ...

    The median was also closer to the actual answer for year 10 than it was for year 7, which shows that they are more accurate. The IQR was bigger for year 7 than it was for year 10 which again shows that year 10 are better at estimating than year 7 because their results were more consistent which is better.

  1. Investigate if there is any correlation between the GDP per capita ($) of a ...

    1.707655324 Pakistan 2000 1.793790385 Papua New Guinea 2100 1.807467376 Philippines 4600 1.840670561 Reunion 5600 1.865873528 Saint Helena 2500 1.888628725 Saint Pierre and Miquelon 11000 1.892706638 San Marino 34600 1.910784435 Saudi Arabia 11400 1.837146344 Country Log of GDP - per capita, Purchasing Power Parity Log ($)

  2. Statistics. I have been asked to construct an assignment regarding statistics. The statistics ...

    the highest to lowest attendance compared to Birmingham's attendance being within 5,939. Cumulative Frequency Birmingham City; Attendance Frequency Cumulative Frequency <20,000 0 0 <22,000 1 1 <24,000 4 5 <26,000 4 9 <28,000 7 16 <30,000 3 19 The cumulative frequency diagram shows the Median to be 1/2 of the cumulative frequency maximum, which is 19.

  1. &amp;quot;The lengths of lines are easier to guess than angles. Also, that year 11's ...

    Next, I drew some cumulative frequency tables and curves. From the year 9 cumulative frequency curve I was able to find the median, which was 5.2cm, this is quite far away from the actual length of the line of 4.6cm.

  2. Statistics coursework

    using clear bars to show the distribution of data, whilst also retaining the original data so mode, median and mean can be calculated from it. This is needed as the mode shows the most common results, the median shows the middle value when the data is in order and the

  1. Anthropometric Data

    Here the checking of the total number of sample should on the calculator is correct showing that it is 30. = 139.2 = 57.7666666 This will placed in nearest whole number = 140 = 58 From this I'm able to draw a line best fit that passes through the mean point and all points are balanced on both sides.

  2. Teenagers and Computers Data And Statistics Project

    Introduction to Part 2 The Big Painted Red Cube I was then given a worksheet to complete, which took things 1 step further to evaluate the differences for a general cuboid. I will try and show you how I did this in the next part.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work