Sampling Method
Now I’ve decided how big I need my sample, I know have to decide how I will sample. There are two main methods random or stratified, eventually I want to try both but for now I will use a random sample. To do this I will use the random number function on my calculator.
I press the random number button and a 3 decimal place number is displayed, I then picked the first 2 numbers and used this as my sampling method. If a number was repeated I ignored it and chose again.
EG.
Random produced number 0.311 so I chose car number 31
Random produced number 0.981 so I chose car number 91
Using this sampling method I chose my first group of cars. They ended up being numbers.
1 2 4 5 7 8 15 16 17 18 21 22 24 26 27 31 32 35 37 38 44 51 53 63 65 67 68 70 71 73 76 77 83 86 91 95 96 97 98 98
From these car numbers I made a table with all the data on the cars above that’s I needed such as used price, MPG and mileage. (See Spreadsheet 2)
From this data I complied for scatter graphs on:
- Age against used price
- MPG against used price
- Mileage against used price
- Insurance group against used price
I used scatter graphs as they will display relationships between the data, which is why used price is in everyone. A scatter graph will also give me the ability to put a line of best fit in giving me the ability to predict future data.
Predictions
- For age I believe there will be a very strong negative correlation as the older the car gets the lower the price.
- For MPG I believe there will be a weak positive correlation as the higher the MPG the higher the price but I believe it doesn’t affect it that much.
- For mileage I believe there will be a very strong negative correlation as the mileage increases the price will decrease.
- For insurance group I believe there will be a weak negative correlation as the higher the insurance group the price will decrease but not by much.
As you can see from my predictions I believe that mileage will affect used price the most while insurance group will affect it the least from the ones I chose.
See scatter graphs 1, 2, 3 and 4.
Conclusions of Random Sampling.
As you can see some of my predictions were right while others weren’t.
- Age was a big effecter of price and had quite a strong negative correlation as I predicted.
- MPG again had a very strong negative correlation showing it did affect price a lot, which I predicted wrongly.
- Mileage had quite a strong negative correlation but not very strong as I said. It shows mileage affects price but only to a degree by the shape of the graph it appears a curved line of best fit would suite it better but I shall leave that to that.
- Insurance group did have a positive correlation and quite a strong one at that, showing as the insurance group went up so did used price.
Observations
As you can see on all of the graphs there are pieces of data that are way of the lines of best fit and away from the rest of the data. I purposely kept this data in as it gives me a valid reason to do another sampling method. This data can be called anomalies as they differ from the rest of the data. I could cut this data out to make the sample fairer but then it wouldn’t be a true random sample.
With these observations made I can say a few things of what affects used car prices but now I shall move on and use a stratified sample and see if the data is more reliable.
Stratified
A stratified sample is one where all the data has been put into an order and then a then picked out. For my stratified sample I have ordered them by mileage and then grouped the mileage and picked 40% from each group. This ensures I get 40 cars again so I can evenly compare the random and stratified samples.
The mileage groups were. 0-5000
5000-10,000
10,000-20,000
20,000-40,000
40,000-70,000
70,000-110,000
With these sorted I took 40% at random from each group and ended up with this. I ensured it was random by drawing numbers out of a hat respective to the numbers of the car, I then noted that number and placed in back in so each time the chance of drawing a single card was equal and didn’t change. If I drew the same one twice I simply ignored that, placed it back in and redrew. (See Spreadsheet 3)
If actually counted there are 41 cars. As 40 and 41 are very close, rather than tamper with any results which could make them biased I simply left them.
From this data I then compiled scatter graphs on them just as before.
Predictions
- Age, I believe that there will be a strong negative correlation as there was before but as this is supposedly a more reliable sample it should be more evident.
- MPG, I believe there will be a strong negative correlation as there was before but should be more evident due to sample being more reliable.
- Mileage should have a strong negative correlation due to reasons above.
- Insurance group should have a strong positive correlation due to reasons mentioned above.
See graphs 5,6,7 and 8.
Conclusions on Stratified Sampling.
As you can see some very strange results came up.
- Age showed the very strong negative correlation as I said there would be.
- MPG showed a strong negative correlation as well as I said.
- Mileage proved very weird. The data was in two groups basically one showing high mileage and low price while the other low mileage and low price. From this I can deduce that the mileage is a limiting factor of used price.
- Insurance group showed no correlation with data all over the place, show perhaps my random sample was a mishap and in fact insurance has no relationship or very little with used price.
Observations
Correlations were generally a lot tighter showing that stratified sampling alleviates anomalous data but can provide strange results, such as mileage for example. This result however may not be wrong but in fact right and the random results were wrong. To find out this I shall become more specific and look at another way of representing data.
Histograms
After some thought a great way of comparing two sets of data and in a visual manner would be a histogram.
To make a histogram I would have to group the mileages this however was easy as I shall take the groups I did for my stratifying of the data.
The mileage groups were. 0-5000
5000-10,000
10,000-20,000
20,000-40,000
40,000-70,000
70,000-110,000
I then made a tally chart with the groups and both random and stratified data.
Random
Stratified
Then to construct a histogram I would have to work out the frequency density to go on the horizontal axis, this is worked out by.
Frequency Density = Frequency
Group Width
So I ended up with this.
Random
Stratified
Predictions
- I predict that the random histogram will have a much more erratic distribution of car mileage while the stratified distribution will be more of bell shape displaying the majority in the mid range with low or no extreme values displayed.
I then proceeded to draw the graphs.
See Graphs 9, 10 and 11
Results
- As seen on the two histograms there are some slight differences. The spread of the random sample is a little more erratic and uneven than that of the more bell shaped graph the stratified data shows. From this you could deduce that the stratified sample is a more reliable source of data than a random sample.
- From individual graphs you can see that the majority of the cars are around the 20,000 to 60,000 miles range in both the random and stratified samples. Standard deviation could perhaps tell me which sample is more accurate so that could be an extension to the work done.
- I mentioned a bell shape graph before. By this I mean one, which slowly goes up to a peak then reduces down, with the majority of the data displayed in the middle and only some or no data displayed in the highest and lowest areas.
However from the histograms I did not find any reasoning behind the weird shaped and correlated stratified scatter graph. Further investigation into this could prove interesting.
Overall Conclusion
From all the work carried out above you can clearly see that many different things affect used car prices and some more than others. You could say that the different categories are limiting factors and a culmination of these results in the depreciation of a cars price.
As a further investigation I would look into the strange scatter graph produced by my stratified mileage sample. Perhaps using standard deviation or other data representation methods I could find out why it is so peculiar. I could also look at how one category affects another such as engine size and mileage or engine size and MPG and find a relationship between those. There are many more aspects that I could of considered but however from the work I’ve done there are things that are certainly clear.