Maths driving test

GCSE Maths

Introduction

The hypothesis I am testing is:

“The number of mistakes a candidate makes during their driving test is affected by the number of one hour lessons that they have had.”

In this report I should find out what affects the number of mistakes a candidate makes during his/her driving test.

Expectations

I expect that the number of mistakes made will be affected by the number of lessons taken. I think this as the more lessons you have make the driver learn more efficiently. Also this could give the driver more confidence in driving.

Exceptions

There are some factors which could affect this hypothesis.

The driving instructor- this could affect the number of mistakes as the instructor may not be very good. Also some instructors have better teaching methods than others and this may make a candidate learn quicker.
Gender of the candidate- it could depend if the candidate is male or female. . It has been stated that boys and girls perform differently, and that boys often do better in practical tests that girls do. This may mean that males won’t need as many one hour lessons that girls need.
Any extra practise form siblings or parents- some candidates may have used their extra time to take part in any extra driving practise. The data provided doesn’t state whether or not anyone has had any extra practise and this could affect the number of mistakes made. This also means that the candidate will have more driving practise than the rest who haven’t had any extra practise.
Weather conditions- if a candidate takes their test in sunny weather, this could affect the mistakes made as sometime the sun may prevent the candidate form seeing properly. Also if it snows or the roads are icy, this may result in having an accident
Natural Learners-some people are just more natural than others and pick up driving quite quickly, this may result in fewer lessons taken. But on the other hand some candidates may be a lot slower and not interpret information that they have learnt in their driving lessons as quicker. The natural learners will do a lot better then as they have understood and remembered what they have been taught.
The day and time a candidate takes their test- this may affect the number of mistakes mad as the candidate could take their test during rush hour or when there is hardly any traffic. If the test is taken during rush hour then this may make the candidate more nervous as there may be a lot of traffic, they could also be afraid of occurring an accident and therefore perform more mistakes. However, if the candidate takes there test during a time when there is not much traffic, they will not stress and make fewer mistakes.
Area that the test is taken in- if the candidate takes their test in a place that they are quite familiar with, they may make less mistakes as they know there way around. However if the candidate takes their test in a place that they are not familiar with then they may make more mistakes.

DATA-The data I am using to test my hypothesis with is secondary data. The data consists of the gender of the candidate, the number of one hour lessons that the candidate has had the number of minor mistakes the candidate has made in their driving test, the instructor they have had before they have taken the test and the time and date of their driving test. Most of the data is qualitative (e.g. the time of day, number of one hour lessons and number of mistakes.) In the qualitative data there is discrete and continuous data. The time of data is continuous as the candidates can be tested at any time of day, and the number of mistakes and lessons is discrete as it takes on whole values and not any decimals. This is secondary data as it has been obtained from another source, e.g. internet. As it is secondary data it may contain any anomalies, outliers and missing data which may have an impact on the outcome of the results. As some parts of the data may be incorrect I will need to clean it up and delete any of the previous data which is no good for me, before making ay assumptions about the hypothesis.

Missing data- several rows have been deleted as all the information isn’t present in the spreadsheet. I deleted these rows as it isn’t reliable for my project. In this example below, the number of minor mistakes isn’t present. This wouldn’t have been useful as it wouldn’t be able to prove my hypothesis to be true or false. The hypothesis relies on the data “number of minor mistakes made” to be able to do a full investigation. The entire row will have to be deleted in order to make further investigations accurate.

I also found an outlier as shown below. The outlier said that the candidate had only taken 10 lessons and makes 1 mistake. However I have decided not to delete this from the data as it shows examples of some ‘extremes’ and also it is plausible so it will tell me something important about candidates and driving tests; if some candidates are faster at learning that others.

I didn’t find any examples of any anomalies.

I did a test sample in order to make sure that the hypothesis I am proving has an answer. Also this will make sure that the rest of the work that I will do, will be worthwhile and tells me if there is any relationship between the number of mistakes and number of lessons taken. I have taken the data of the first ten candidates from each instructor and put this into the graph, so there is a sample of 40 candidates in total. I chose 40 as it is a suitable amount of sampling.

The graph shows that there is a negative correlation with a coefficient of -0.5905. The relationship between the number of one hour lessons and the amount of mistakes made tells us that there is a fairly strong correlation. Even though it isn’t really strong, it shows some kind of correlation. This means that my hypothesis is worth testing for and says that there are other factors which can have an affect on the number of mistakes made during a driving test. These other factors are the exceptions.
Therefore, the hypothesis will be likely to be reliable and tells us that the more one hour lessons you have the fewer mistakes you will make. In order to find a definite correlation I will need to study the data and hypothesis further.
I put in a line of best fit as there seems to be some kind of correlation between the points. The equation of the line of best fit is “y=-0.6124x+29.58”
The gradient is -0.6124 and tells you every time I have one lesson, my number of mistakes will go down by 0.6124.
The y-intercept is 29.58, which isn’t valid as you wouldn’t take your driving test if you didn’t have any lessons. This tells you that if you have no lessons then you will make 29.58 mistakes and this cannot happen as you wouldn’t consider taking your driving test without any lessons taken beforehand
The expectation that I wrote down became true because the more 1 hour lessons you have the fewer mistakes you perform. To get a more accurate correlation, I need to analyse the data further and take other factors into account later.

Hypothesis: The number of mistakes a candidate makes during their driving test is affected by the number of one hour lessons that they have had.”

This is a preview of the whole essay

Hypothesis: The number of mistakes a candidate makes during their driving test is affected by the number of one hour lessons that they have had.”

I will need to do a stratified sample to see whether or not the number of one hour lessons affects the number of mistakes. I am only doing a sample of 100 as it will be too time consuming to do all the population. This is because I need to take a fairly large portion of the population in order to get an accurate representation, in case I coincidentally choose a section of the data that has a particularly strong or weak relationship. Also by using a stratified sample it will ensure that the data will be proportional. This will mean that none of the instructors will be either underestimated or overestimated and this will make it fairer. The sample number that I have chosen to do is 100, because it is a suitable amount of data to test. In order to get the correct samples from each instructor I will need to do the calculation:

The number of candidates for each instructor ÷ the total number of candidates × 100

Total number of candidates for all the instructors was 227.
Total number of candidates for A was 60
Total number of candidates for B was 93
Total number of candidates for C was 24
Total number of candidates for D was 50

Instructor A) 60/ 227 * 100 = 26.431718

There will be 26 samples from A

Instructor B) 93/227 * 100 = 40.969163

There will be 41 samples from B

Instructor C) 24/227 * 100 = 10.572687

There will be 11 samples from C

Instructor D) 50/227 * 100 = 22.026432

There will be 22 samples from D

After making the stratified sample I used the random number key generator on the calculator to choose the sample data from the data on the spreadsheet. This method makes the number of people sampled unbiased and fair. I will key in the random number generator and multiply this number by 226 as there are 227 people in the database. They data must be numbered from 0 to 226 so that when the numbers are generated from the calculator, the highest and lowest values have the same chance and be fair. I also made two separate stem and leaf diagrams for the number of lessons taken from the stratified sample and the number of minor mistakes made from the stratified sample.

Number of lessons

KEY

The range of the number of minor mistakes was 35, and this shows that the results are varied and would’ve depended on more than one factor.
The average was 25.29, which shows that the average amount of mistakes made was 25 and therefore could’ve been achieved easily.

I did these stem and leaf diagrams as it was the easiest way to order the data and look at it separately, before I made any conclusions about the number of lessons and number of mistakes relationship.

Below is the graph of the results found from the sample: y=-0.3468x+22.97

I put a line of best fit in as some of the results looked like they followed a trend and also I wanted to check if the sample followed the hypothesis.
The gradient of the line of best fit was -0.3468 which shows that as you have one lesson your number of mistakes decreases by 0.3468. And since the relationship of the line isn’t perfect this means that there are people who do not follow the trend.
The y-intercept is 22.97 which tell us that when you have no lessons you will make 22.97 mistakes. This is incorrect as you wouldn’t take your driving test if you didn’t take any lessons.
The correlation co-efficient is -0.3538 which is negative and moderately weak correlation. This weak correlation doesn’t give me confidence in the hypothesis, but this could be affected by the exceptions as written previously. The exceptions could have an impact on how many mistakes you make.

New hypothesis: The number of mistakes that a candidate makes during their driving test is affected by the number of one hour lessons that they have had and the instructor they have been taught by during the lessons.

This hypothesis will be focused on separating the data by the instructors whilst comparing the number of lessons and mistakes. This will allow me to see if the relationship is more reliable and if the number of lessons and mistakes relationship is affected by instructor.

Expectations: I think that the number of mistakes made will depend on both the instructor and the number of one hour lessons taken. I think this because some instructors may have better teaching methods than other. This will mean that certain candidates will learn more quickly/slower than others.

Exceptions: after stating my expectations in the introduction, there are other exceptions which may affect the number of mistakes made during a driving test.

Nervousness: Some candidates may be more confident than others and therefore be less likely to make mistakes. Some candidates may be more nervous and make more mistakes
Gender of instructor: The gender of the instructor and also the gender of the candidate may have an impact on how much the candidate learns in their lessons which will affects how well the candidate does in their driving test. For example, if a female candidate has his driving lessons with a male instructor, she may not learn as much as she would with a female instructor. This is because she may be more self conscious and focused on impressing the instructor rather than focusing her attention learning about driving.
Gender of the Examiner: The candidate may be attracted to the examiner and therefore may commit more mistakes as he/ she may be more self conscious.

DATA

The data I have chosen will have a total of 24 people from each instructor. This is because the instructor C has only 24 candidates and this gives me a limitation in the size of data I will use. I will have to take the same amount of data from each candidate as by choosing the same amount of data it will be fair comparing the data from each candidate. Also the candidate will less data will be underrepresented and the candidate with more data will be overrepresented. Even though I will like to choose more than 24 candidates, I will have to stick with 24 from each instructor. Therefore, I will use all the data from instructor C and use 24 from each A, B and D. I will choose the data from A, B, and D by using the random number generator on the calculator. This will make it fair and show that my answer is unbiased. I will compare each of the instructors using both scatter graphs and box plots. This will show me if there is any relationships between the instructors.

These are the box plots for each instructor.
The box plot for A shows that the median is 11. The lower quartile is 8.25 and the upper quartile is 14. The inter quartile range is 5.75. This tells us that the average amount of mistakes made is 11, the lowest mistakes were 8.25, which rounds down to 8, and the highest mistakes made were 14. Instructor A has the lowest median which shows that students require fewer lessons to be at a standard where they can take the test as they have made fewer mistakes.
The box plot for B shows that the median is 18. The lower quartile is 11.25 and the upper quartile is 23.75. The inter quartile range is 12.5. This tells us that the average amount of mistakes made is 18, the lowest amount of mistakes made was 11.25 which rounds too 11, and the highest amount of mistakes made was 18.
The box plot for C shows that the median is 22. The lower quartile is 10.75 and the upper quartile is 29.25. The inter quartile range is 18.5. This tells us that the average amount of mistakes made is 22. Instructor C had the highest median which shows that his/her students require more lessons as they have made more mistakes.
The box plot for D shows that the median is 16.5. The lower quartile is 5.25 and the upper quartile is 26.5. The inter quartile range is 21.25. This tells us that the average amount of mistakes made is 16.5 which rounds up to 17, the lowest amount of mistakes made was 5.25 which rounds down to 5, and the highest amount of mistakes made was 26.5 which round up to 27.
The results show that on average the candidates which have instructor A do better in their driving test than the other instructors who make a higher average of mistakes. This tells us that instructor A is a better instructor as his candidates have made less mistakes. Also the inter quartile range is lower for instructor A which also tells us that he is a better instructor as there is less variation. Instructor A has shown more consistency in its results, whereas the rest of the instructors show more variation. This shows that candidates who have instructor A are quite consistently taking fewer lessons and it is not just a few good students
The lowest mistakes on average were made by the candidates who had instructor A and the highest mistakes made were by the candidates who had instructor C. This tells us that the candidates who have instructor C on average made the most number of mistakes and therefore was the worst. This may be because of the exceptions stated previously. For example the candidates who had instructor A may have had previous experiences and the candidates who had instructor C may not be as good as learning as quickly as the others.

INSTRUCTOR A

The equation of the line of best fit is y=0.07936x+9.622.
The correlation coefficient is 0.08658. This shows positive, very weak correlation, this tells me that the hypothesis is not as trustworthy. I will need to study this further to see if there are other factors affecting the lesson/ mistake relationship.
The gradient of the line is 0.07936 which means that every time a candidate has a lesson with instructor A they make 0.07936 more mistakes. This doesn’t fit my expectation as I thought that the number of mistakes would decrease as you had more lessons. However, this statement could be affected by other things like being nervous or the gender of the instructor.
The y- intercept is 9.622 which tell me that when a candidate has no lessons, they will make 9.622 mistakes in their driving test. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.
Perhaps gender could be an issue, as the correlation coefficient is quite low, therefore this could be groups of males and females.

INSTRUCTOR B

The equation of the line of best fit is y=-0.3741x+25.26. There is a line of best fit which tells us that the number of mistakes decreases when you have more lessons. The odd points on this graph are represented by any exceptions as listed previously.
Correlation coefficient is -0.3913 which shows negative, moderately weak correlation, this tells us that the hypothesis is not as reliable.
The gradient of the line of best fit is -0.3741 which tells us that when a candidate has a lesson the number of mistakes will decrease by -0.3741.
The y-intercept is 25.26 which isn’t valid as this tells us when a candidate has no lessons they will make 25.26 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.

INSTRUCTOR C

The equation of the line of best fit is y=-1.033x+43.35. There is a line of best fit which decreases when you have more lessons.
The correlation coefficient is -0.9539 which shows negative strong correlation. This tells us that the hypothesis is quite reliable. It has a near perfect correlation as the candidates for Instructor C generally behave in a similar way- for every lesson they have they will make one less mistake, (as -0.9539 is so close to -1)
The gradient of the line of best fit is -1.033 which shows that every time a candidate has a lesson the number of mistakes decreases by 1.033.
The y-intercept is 43.35 which isn’t valid as this tells us that when a candidate has no lessons they will make 43.35 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.

INSTRUCTOR D

The equation of the line of best fit is y=-1.047x+46.99. There is a line of best fit which decreases when you have more lessons.
The correlation coefficient is -0.8253 which shows negative strong correlation and tells us that the hypothesis is quite reliable.
The gradient of the line of best fit is -1.047 which tells us that every time a candidate has a lesson the number of mistakes decreases by 1.047.
The y-intercept is 46.99 which isn’t valid as this tells us that when a candidate has no lessons they will make 46.99 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.
Two students have taken twenty nine lessons but one student made 25 mistakes in their exam and the other only made 13 mistakes. Perhaps the student with fewer mistakes takes extra lessons off a family member. There are no extreme outliers and this has made the correlation coefficient very high. The relationship seems to get stronger as students take more lessons.

Conclusion: Overall the 4 scatter graphs and the box plots are all different and this tells me that the instructor definitely has an impact on how well the candidate does in their driving test. The box plots have shown that instructor A shows less variation and more consistency and therefore most candidates have been between a certain amounts of mistakes. It also shows that instructors C and D have more variation and this could be affected by exceptions like gender. From the scatter graphs I have found that instructor A and B have weak correlation and instructors C and D show strong correlation. Also instructor A has positive correlation which tells me that it doesn’t fit within the hypothesis, this could be because of any exceptions. Perhaps gender was affecting how the students learnt to drive. Some males believe they are better drivers than women, and for this reason required they need fewer lessons although they make the same number of mistakes as women.

New hypothesis: The number of mistakes a candidate makes during their driving test is affected by the number of one hour lessons that they have had, by the instructor the driver had and also the gender of the driver.

For this hypothesis, I will need to include the gender of the candidates. Therefore I will need to do the scatter graphs for both the instructor with females and the instructor with males. For instructor A and B the correlation coefficient is quite weak and therefore I will need to discard two of the instructors as the instructors for A and B doesn’t need to be tested further. I now need to test into C and D using both the female and male data. Here is the data presented as scatter graphs.

FEMALE CANDIDATES WHO HAD INSTRUCTOR A

The line of best fit was y=-0.2175x+15.44. There is a line of best fit which decreases when you have more lessons.

The correlation coefficient was -0.2661, which shows negative weak correlation.
The gradient was -0.2175 which tells us that every time a candidate has one lesson their number of mistakes decreases by 0.2175.
The y-intercept is 15.44 which isn’t valid as this tells us that when a candidate has no lessons they will make 15.44 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.

MALE CANDIDATES WHO HAD INSTRUCTOR A

The line of best fit was y=-0.7462x+24.83. There is a line of best fit which decreases when you have more lessons.
The correlation coefficient was -0.6295, which shows negative, moderate correlation
The gradient was -0.7462, which shows that every time a candidate has one lesson, the number of mistakes decreases by 0.7462.
The y-intercept is 24.83 which isn’t valid as this tells us that when a candidate has no lessons they will make 24.83 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.

FEMALE CANDIDATES WHO HAD INSTRUCTOR B

The line of best fit was y=-0.06399x+18.41. There is a line of best fit which decreases when you have more lessons.

The correlation coefficient is -0.06176, which shows negative very weak correlation.
The gradient is -0.06399, which shows that every time a candidate has one lesson, the number of mistakes decreases by 0.06176.
The y-intercept is 18.41 which isn’t valid as this tells us that when a candidate has no lessons they will make 18.84 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.

MALE CANDIDATES WHO HAD INSTRUCTOR B

The line of best fit is y=-0.7075x+31.13. There is a line of best fit which decreases when you have more lessons.
The correlation coefficient is -0.7871. This shows negative fairly strong correlation.
The gradient is -0.7075, which tells you that every time a candidate has a lesson their number of mistakes decreases by 0.7075.
The y-intercept is 31.13 which isn’t valid as this tells us that when a candidate has no lessons they will make 31.13 mistakes. This isn’t reliable as a candidate wouldn’t take a test if they hadn’t taken any lessons.

Conclusion for cycle 3

I found that the males in both instructors had a better correlation coefficient which shows that they have more consistent results. The female’s results in both instructors have more variability. Therefore this shows that males are better at learning with the instructors A and B than females learning with instructors A and B

At the start of the project the data was very scattered and the correlation coefficient was quite weak. Even though the points were scattered, there was a slight trend showing that as your number of lessons increased the number of mistakes decreased. Therefore I decided that I should also take other factors into account and see if they affect the relationship between the number of lessons and number of mistakes.

In cycle 2 I was looking at the relationships between the number of lessons and number of mistakes and also taking in account of the instructors. This gave me an idea of how many lessons people took and how many mistakes people made. Overall, I concluded that Instructor A seemed to be the best instructor as his/ her students were mostly making very few mistakes although they took the least amount of lessons. However, when the two were compared, it showed that the relationship between lessons and mistakes was very weak for Instructors A and B. Therefore, this was needed to be studied further. On the other hand, instructors C and D showed a very good lesson mistake relationship, which suggests that males and females learnt to drive as equally as well as each other. Therefore I didn’t test the data for C and D further.

Whereas in cycle 3 I found that the gender of the driver also affects the lesson and mistake relationship. This shows that whoever male drivers are with, they perform the same. However, the female drivers show difference in performance when with different instructors. With the limitation of not having enough people with instructor C I could not see if gender also affected the lesson and mistake relationship so due to the relationship being very strong in cycle 2 I did not include the data in cycle 3. I also didn’t include instructor D as the correlation coefficient was very high in cycle 2 and therefore didn’t need to be tested into further. The best drivers seem to be the males of Instructor B, and the male instructors of instructor A and they seem to get the best value for money as for every two lessons taken, almost 3 less mistakes are made (approximately). Lessons are expensive, so these students have saved the most money, or made the most use of the money they spent.

One of the limitations of the project was that there were not enough candidates of Instructor C, so this limited the data. The results I had may not have been reliable but the data seems relatively reliable and not too biased.

All the way through males seem to be the better drivers. Referring back to the original hypothesis, the number of lessons does affect the number of mistakes made, but this relationship is also affected by gender and instructor. Furthermore, it can be seen that males seem to be better drivers although they are unaffected by instructor. Women are not as good drivers so gender affects performance, but females also show that instructor has an affect on how well a person performs in their driving test. Overall, if I was one of the candidates I would choose instructor C mainly because gender wouldn’t affect the number of mistakes and also because the number of lessons against mistakes are quite low.

Maths driving test

This is a preview of the whole essay

Document Details

Related Essays

Microsoft Excel Driving Tests Coursework

Maths Statistics Coursework

Maths Data handling Corsework

Maths Coursework