Introduction

The hypothesis I am testing is:

- “The number of mistakes a candidate makes during their driving test is affected by the number of one hour lessons that they have had.”

In this report I should find out what affects the number of mistakes a candidate makes during his/her driving test.

Expectations

- I expect that the number of mistakes made will be affected by the number of lessons taken. I think this as the more lessons you have make the driver learn more efficiently. Also this could give the driver more confidence in driving.

Exceptions

There are some factors which could affect this hypothesis.

- The driving instructor- this could affect the number of mistakes as the instructor may not be very good. Also some instructors have better teaching methods than others and this may make a candidate learn quicker.
- Gender of the candidate- it could depend if the candidate is male or female. . It has been stated that boys and girls perform differently, and that boys often do better in practical tests that girls do. This may mean that males won’t need as many one hour lessons that girls need.
- Any extra practise form siblings or parents- some candidates may have used their extra time to take part in any extra driving practise. The data provided doesn’t state whether or not anyone has had any extra practise and this could affect the number of mistakes made. This also means that the candidate will have more driving practise than the rest who haven’t had any extra practise.
- Weather conditions- if a candidate takes their test in sunny weather, this could affect the mistakes made as sometime the sun may prevent the candidate form seeing properly. Also if it snows or the roads are icy, this may result in having an accident
- Natural Learners-some people are just more natural than others and pick up driving quite quickly, this may result in fewer lessons taken. But on the other hand some candidates may be a lot slower and not interpret information that they have learnt in their driving lessons as quicker. The natural learners will do a lot better then as they have understood and remembered what they have been taught.
- The day and time a candidate takes their test- this may affect the number of mistakes mad as the candidate could take their test during rush hour or when there is hardly any traffic. If the test is taken during rush hour then this may make the candidate more nervous as there may be a lot of traffic, they could also be afraid of occurring an accident and therefore perform more mistakes. However, if the candidate takes there test during a time when there is not much traffic, they will not stress and make fewer mistakes.
- Area that the test is taken in- if the candidate takes their test in a place that they are quite familiar with, they may make less mistakes as they know there way around. However if the candidate takes their test in a place that they are not familiar with then they may make more mistakes.

DATA-The data I am using to test my hypothesis with is secondary data. The data consists of the gender of the candidate, the number of one hour lessons that the candidate has had the number of minor mistakes the candidate has made in their driving test, the instructor they have had before they have taken the test and the time and date of their driving test. Most of the data is qualitative (e.g. the time of day, number of one hour lessons and number of mistakes.) In the qualitative data there is discrete and continuous data. The time of data is continuous as the candidates can be tested at any time of day, and the number of mistakes and lessons is discrete as it takes on whole values and not any decimals. This is secondary data as it has been obtained from another source, e.g. internet. As it is secondary data it may contain any anomalies, outliers and missing data which may have an impact on the outcome of the results. As some parts of the data may be incorrect I will need to clean it up and delete any of the previous data which is no good for me, before making ay assumptions about the hypothesis.

Missing data- several rows have been deleted as all the information isn’t present in the spreadsheet. I deleted these rows as it isn’t reliable for my project. In this example below, the number of minor mistakes isn’t present. This wouldn’t have been useful as it wouldn’t be able to prove my hypothesis to be true or false. The hypothesis relies on the data “number of minor mistakes made” to be able to do a full investigation. The entire row will have to be deleted in order to make further investigations accurate.

- I also found an outlier as shown below. The outlier said that the candidate had only taken 10 lessons and makes 1 mistake. However I have decided not to delete this from the data as it shows examples of some ‘extremes’ and also it is plausible so it will tell me something important about candidates and driving tests; if some candidates are faster at learning that others.

- I didn’t find any examples of any anomalies.

I did a test sample in order to make sure that the hypothesis I am proving has an answer. Also this will make sure that the rest of the work that I will do, will be worthwhile and tells me if there is any relationship between the number of mistakes and number of lessons taken. I have taken the data of the first ten candidates from each instructor and put this into the graph, so there is a sample of 40 candidates in total. I chose 40 as it is a suitable amount of sampling.

- The graph shows that there is a negative correlation with a coefficient of -0.5905. The relationship between the number of one hour lessons and the amount of mistakes made tells us that there is a fairly strong correlation. Even though it isn’t really strong, it shows some kind of correlation. This means that my hypothesis is worth testing for and says that there are other factors which can have an affect on the number of mistakes made during a driving test. These other factors are the exceptions.
- Therefore, the hypothesis will be likely to be reliable and tells us that the more one hour lessons you have the fewer mistakes you will make. In order to find a definite correlation I will need to study the data and hypothesis further.
- I put in a line of best fit as there seems to be some kind of correlation between the points. The equation of the line of best fit is “y=-0.6124x+29.58”
- The gradient is -0.6124 and tells you every time I have one lesson, my number of mistakes will go down by 0.6124.
- The y-intercept is 29.58, which isn’t valid as you wouldn’t take your driving test if you didn’t have any lessons. This tells you that if you have no lessons then you will make 29.58 mistakes and this cannot happen as you wouldn’t consider taking your driving test without any lessons taken beforehand
- The expectation that I wrote down became true because the more 1 hour lessons you have the fewer mistakes you perform. To get a more accurate correlation, I need to analyse the data further and take other factors into account later.