I could use a systematic sampling method. This would mean I would have to systematically choose cars from the given list of used car data. For instance, every other one or every fourth one. However, as the list of data is in not ordered in any sort of strata at all, this method would also statistically not give me good enough results. For example, because the data is listed randomly, and I used every fourth one, I may miss out many cars and by chance could end up will all the Fords and Nissans again, or with all the cars chosen under the price of £5000. This again would not solve my problem for the reasons given before.
A stratified sampling method could also be used. This is a method where I would have to order the given data in strata first, and then mathematically choose a certain amount from each stratum. This would mean if I ordered the data by the make of the car, then I would take a certain number of cars from that group and then choose a certain amount from another group, which would be a different make of cars. This would mean I would get some cars of almost every make. This is guaranteed and is statistically a much better method. I could also put the data in strata of price. For example, I could group all the cars under £1000 and all the cars under £2000, under £4000 etc. This would be a good idea, as it definitely would make sure I do not just get the similar data. E.g. All the Fords and Nissans. To know how many cars to choose from each stratum a mathematical formula is needed:
The total sample needed
Multiply by the number of items in the
The total data the sample Strata.
Is going to be take from
For my investigation, because I have decided to take sample of 50 out of a total of 100, I have worked out that I would need to take half of the data from each stratum.
Overall I believe I will need to make use of all of the above methods. Firstly, I will order the given data in strata. I have decided that for my specific problem I will order the strata in the prices. I have done this because this way, I know I will get a very good variety of prices, which will tell me if there is a certain correlation with mileage. If I were to order them in any other strata, then it would limit the chances of getting a good variety of prices. If for instance, I ordered them in the form of the make of car, then there could be a very slight chance that all the cars chosen could be in a very tight range of price.
Once the data is ordered by price, I will need to take half from each strata to give me my total or as close to my total sample of 50. I will then have a choice to make as to which car(s) I will need to add to my sample if the total falls less than 50 or which car(s) I will have to reject if the total goes over my sample of 50.
The next big question for me will be: Which cars of each stratum will be picked? I know half of them will need to be picked, but which ones. This is where the other two methods come in. I will need to use a systematic technique to obtain the half from each stratum. Therefore I have decided to choose every other one, because taking every other one will give me one more or one less than half of the strata, depending on which car in the strata I begin with. This is where the random sampling comes in place. Which car do I start on? The first, or the second? I have decided to use a dice to decide. A fair dice (1-6) will be thrown. An even number will mean I will have to start systematically from the FIRST car in the strata. The dice will be THROWN FOR EACH STRATA. I have done this to give as much chance as possible to each car to be chosen.
I intend to do exactly the same for the 2nd piece of data, taking exactly half of the data there as well. I then intend to draw relevant diagrams such as stem and leaf diagrams and scatter diagrams of each so that the sets of data can be compared in as many relevant ways as possible.
From this plan, I see there will be a limitation. When I take half of the cars from each stratum, if there are an odd number of cars, how many do I take? For example, if there are 7 cars in a group I need to try and take half of 7 cars. I cannot take 3.5 cars.
I will act on this limitation in a certain and as fair a way as possible. I have worked out that I will have exactly 4 groups of cars, which have odd numbers of cars within them. I will act upon this in a systematic way. In every other group, I will round the half of the odd number of cars up. For example, if there are an odd number of cars in a group I will have to take half. Taking the example from above with 7 cars. If I half 7 I get 3.5. I will round it up to 4 and take 4 cars. Then the next time I come across a group with odd numbers, I will round it down. Hence every other group. However, I can foresee a problem within the rectification of the limitation:
How will I know which group to round up on? I have decided to use a fair coin. The coin will be tossed. If it is heads I will start on the 1st odd numbered cars group, and then every other one from then on end. If it is tails, then I will start rounding up on the 2nd group with an odd number of cars in it.
From this plan, I cannot foresee any more problems, however, I may encounter some while actually carrying out the sampling.