I am going to be using secondary data from the World Roller Coasters Database, which consists of published statistics that I have not produced myself. However, when I design my questionnaire, I will be collecting primary data as I will be using the information myself. A questionnaire is a set of questions on given topic, which in my case is whether fastest rides are the most exciting. I will carry out a pilot survey on a small group in order to see if my questionnaire works.
I will not have a qualitative data because the only data that will be relevant to this investigation is numerical. I will be collecting discrete data, as it can only take certain values.
I will record the collected data in a table in Microsoft Excel.
I do not expect to have any problems when collecting my data, as I am going to carry out a simple procedure.
The data has already been grouped for the database, so it will not be necessary to do it myself. When all the data has been collected, I intend to place the rollercoaster’s in order of max speed, so that I can see if there really is a relationship between the speed of a rollercoaster and its thrill factor. I will take into account of any incomplete data, although I will not let one incorrect result affect my decision.
I am planning to create a data table that shows a rollercoaster’s max speed against its thrill factor. I hope that it will show the relationship between these two aspects of the rollercoaster.
I do not intend to do any of the statistical calculations myself, as there is a risk that I may make some simple errors. Instead, I will be using the various mathematical functions on Microsoft Excel, a computer program that will allow me to store data in a spreadsheet, where I will be able to carry out these calculations. I hope that the results of the calculations will firstly prove my hypothesis and secondly allow me to make statements about the data.
I am planning to draw a scatter graph with a line of best fit and a vertical line graph. They should both display the data in graph form, so that I can easily read off the results to draw conclusions.
Below is a plan for the design of my questionnaire, which I will give to 20 people:
This questionnaire is not leading or biased, it has short simple categories, and you only need a tick for an answer. Therefore, I think it is good enough to use in my investigation. The results will give me an indication of what the data may show.
I obtained the following results from the questionnaire:
The results of the questionnaire indicate that 75% of people believe that the fastest rides are the most exciting. Only 5% thought that there was no relationship and 20% of people were undecided. I would therefore expect the fastest rides to be the most exciting.
On the following page you can see the copy of the World Roller Coasters Database.
I have collected all the data that I planned to collect. The only problem I encountered was the fact that the Scientific Calculator selected random numbers to 2d.p. This meant that I had to round to whole numbers, which sometimes resulted in having two numbers the same. When this happened, I would replace one of them with another randomly selected number.
The summary table opposite, shows the rollercoasters that were randomly selected, and the max speeds in descending order. It also shows the thrill factor out of 10 for each ride. It is clear that the higher the max speed, the higher the thrill factor, which shows that the faster rides are more exciting.
In the case of Black Hole and The Great Escape, the thrill factor was higher than that of faster rides, such as Mamba and Cool Runner. This may have been because they had extra features, like simulating carriage movements, or dark tunnels, which would have made them more exciting. Although these particular results do not coincide with my original hypothesis, the general trend of the results is that the faster rides are the most exciting, and therefore provides sufficient evidence that my prediction is correct.
The mean shows that the average max speed of a rollercoaster was around 112 km/h and the average thrill factor was around 8. The mean was calculated by finding the sum of one column and dividing it by the number of values.
The results of the questionnaire were reasonable in the context of the investigation, as they supported the hypothesis, as do the results of the random sampling.
The scatter graph shows the relationship between the max. speeds of rollercoasters and their thrill factors. The line of best fit shows positive correlation – the thrill factors increase with the max. speed. I have devised a line of best fit, which shows the approximate averages for thrill factors in accordance with max speed. The equation of the line of best fit is: S = 26T + 48
(S = max speed, T = thrill factor) There is some variation within the results, as you can see from the results above and below the line of best fit. This was probably due to the features of each individual rides, such as the height or the added attractions, which made them more or less exciting than other rollercoasters at the same speed. One measure of spread that shows the variation within the results is standard deviation. It is calculated by using the standard deviation formula and effectively shows the difference between the mean and a particular result. However, I think that my results are accurate enough to provide sufficient evidence for the hypotheses.
The bar graph, which is in the form of a vertical line graph, also shows the distribution of results. In addition to the scatter graph, you can see that the faster the max speed of a rollercoaster, the higher the thrill factor. The modal group for the thrill factors is 7, because the graph shows that this category contained the highest number of results. From both of the graphs, it is clear that my predicted hypothesis is correct, as the data supports it.
I feel that the sample I used was large enough to represent the population because 60% of the data was used. This provided sufficient results, as well as being practical and manageable. The data I collected was a very accurate representative of the population of the data table, as the random sample ensures that each member (piece of data) in the population (data table) has an equal chance of being selected.
The results of my findings were exactly as I had predicted, so I did not need to change or add to my hypothesis in any way. If the investigation was repeated y somebody else, I would expect the results to be the same, as the people in the questionnaire generally agreed with my prediction. If I were to carry out the investigation again, I think I would try and incorporate more questions into the hypotheses, such as “are the oldest rides the slowest?” or “are the most recent rides the most exciting?”
If somebody else was to read my report, I think they would find it relatively easy to understand my hypotheses, experiments, data collections, calculations and results. I feel that the mean calculation was irrelevant to the investigation as it did not effect my conclusions in any way.