Male 29 49 18 20 116
Female 31 51 22 20 124
Total 60 100 40 40 240
A population of 240 students is too many to work with, so in order to investigate my hypothesis I will take a sample of sixty students. The reason that I chose sixty students is because it is a quarter of the population, however my sample group will be randomly selected. In order to draw upon accuracy I will use a stratified sample. To calculate the stratified sample I will divide the previous total population by four, which equals 60, this is now one quarter of the total population. I will now will proceed to divide each instructors male and female populations by the initial 240, multiplying by 60. See example below:-
29 x 60 = instructor A’s male population
240 previous population
A B C D Total
Male 7 12 5 5 29
Female 8 13 5 5 31
Total 15 25 10 10 60
This is the data that I will use in the scatter graphs. I will plot a scatter graph of all the data that I obtain and then plot a separate scatter graph based on my stratified sample of students. If my hypothesis is correct I expect to see negative correlation in all the scatter graphs.
Now that I have calculated my stratified sample I now know how many students I need from each instructor. To select these students I will use systematic sampling, until I collect the correct population number.
Graph 3 - everyone
Graph 4
Conclusion for graph 3
Graph 3 shows a scatter graph of lessons against mistakes for the whole population. If taking more lessons means that you make less mistakes this should be shown on the graph as negative correlation. On my graph there is no correlation, providing me with no information.
Conclusion for graph 4
Graph 4 is the scatter graph for the stratified sample group of students and
once again there was no form of correlation, suggesting that my hypothesis may be incorrect because the sample that I picked may not
have been representative of the population, or that they hypothesis itself needs to be refined.
Hypothesis two
The more lessons a student takes the less mistakes they will make depending on their instructor. This means that some instructors are better than others.
Techniques to be used
To investigate this hypothesis I will be using two techniques:-
- Stratified sampling
- Scatter graphs and correlation
- Comparison of each individual instructors whole population and stratified sample population.
Expected outcomes
Hypothesis two is a refinement of hypothesis one; in this next stage of the investigation I have suggested that some instructors are better than others. To investigate this hypothesis I decided to use scatter graphs. As a follow on from hypothesis one I have distinguished that it is important that I look at each instructor individually as one or more of the instructors results might be distorting the graph due to there being no correlation between the number of lesions and mistakes a person occurs within my previous hypothesis.
I would like to investigate to identify if the instructors have an affect on the students ability to make less mistakes. Within my findings I expect to see the best instructor with the best line of negative correlation, helping me to distinguish the best instructor.
Graph 5 & 6
Graph 5 is a scatter graph of instructor A’s stratified sample population. It shows positive correlation, suggesting that instructor may not be a good instructor. To investigate further I felt it was necessary to look at instructor A’s original student population, using the same technique of presentation. From analysing graph 6 I have distinguished that the stratified population used within graph 5 is not a good representative sample as while graph 5 shows positive correlation graph 6 shows loose negative correlation, overall suggesting that instructor A is a better instructor than what I originally thought, but the findings suggest that instructor is an instructor which teaches both students that make mistakes and others that don’t. From this investigation I feel that I have not got enough statistical evidence to draw upon instructor A’s performance levels with accuracy.
Graph 7 & 8
Graph 7 shows instructor B’s stratified sample population presented within a scatter graph. In examining the graph I can see that there is a very lose negative correlation. Before drawing upon any conclusions for this graph I have decided to ensure that the chosen sample is a good representative, to do this I have placed instructor B’s original student population into a graph that shows the actual populations correlation between driving lessons and mistakes. From examining this graph I have distinguished that although instructor B seems to be a poor instructor he does get results for some of his students and not for others, I will investigate further (see graph 13 & 14)
Graph 9 & 10
In examining both the stratified sample population and the original student population of instructor C’s performance levels in the link between the number of driving lessons and the number of mistakes I can see that there is a tight negative correlation, particularly within the original population. I can conclude so far that instructor C is the best instructor.
Graph 11 & 12
Graph 11 and 12 display instructor D’s stratifies sample population and also the original population of students results. After extensively exploring the plots on the graphs I can conclude that there is evident negative correlation, of which I would describe as average, suggesting that instructor D obtains results with a lot of his students.
Conclusion
From investigating hypothesis two, I feel that my scatter graphs do display evidence that suggests that some instructors are better than others. The best instructor seems to be instructor C followed by D then B then A. I have acknowledged that while my hypothesis two only investigates the number of lessons, mistakes and the instructor I feel it is necessary to recognise that there may be a lack in validity as there are variables that I have overlooked which I feel will have had an effect on the students ability to drive well. I have identified variables such as the weather, the day of the lesson, the time of the lesson and the amount of traffic that was present. These identified factors which are uncontrollable will have an effect on the students performance, in return having a knock on effect on the instructors achievement of teaching students that make less mistakes.
Further Investigation
While investigating my second hypothesis I came to the conclusion that instructor C was the best instructor because the scatter graph for him showed the best negative correlation. The worst instructor was A closely followed by B, however I noticed within instructor B’s scatter graph there seemed to be a slight suggestion of loosed negative correlation. I am now going to investigate further by dividing instructor B’s students into two separate scatter graphs, one to explore males and the other females performance levels, again looking at the number of lessons and mistakes. As before I will be looking for negative correlation.
Graph 13 & 14 Conclusion
After separating instructor B’s male and the female students results into four scatter graphs I can now see a very clear result which has greatly influenced instructor B’s results. In comparing the four scatter graphs I have distinguished that instructor B’s male students display a very good negative correlation, whereas his female students show no correlation, therefore having a knock on effect on his overall performance levels. From this investigation I have discovered that instructor B is particularly effective in teaching male students, but not so good when instructing female students.
Hypothesis 3
Hypothesis three states that on average males make less mistakes than female. This hypothesis will be much more difficult to investigate in comprising to the last two hypothesis as the technique of using a scatter graph will simply not be enough to find evidence to prove or disprove this hypothesis.
Techniques to be used
- Estimations of mean from grouped data.
- Frequency polygons.
- Cumulative frequency graphs.
- Box plots
I need to use these four techniques to investigate hypothesis three in order to find as accurate an answer as possible. To start the investigation I need to draw up a tally table for the males and one for the females in order to the data into separate groups, from this I will find it easer to work with the existing data that I previously stratified sampled within hypothesis one.
Male – Tally Table
Female – Tally Table
Now that I have grouped my data I will use it to plot frequency polygons of male and female students mistakes.
Graph 15, 16 & 17 conclusions
Graph 15 shows the frequency polygon for the male students mistakes. Graph 16 shows the frequency polygon for the female students mistakes. In order to compare these two graphs I have plotted them on the same axes which is graph 17. graph 17 shows the male graph slightly shifted to the left of the female graph, this suggests that males seem to make less mistakes, which supports my hypothesis. In using this method of presenting and comparing data I have acknowledged that frequency polygons give a quick picture of the data, for a more accurate result I will use the stratified sampled group data to calculate estimates of the mean for male and female students mistakes.
Estimates of the Mean
Male Mean = 432 = 14.90
29
Estimates of the Mean
Female Mean = 548 = 17.68
31
From the estimates of the mean I can now see that in average males make 14.90 mistakes, and on the other hand females make 17.68 mistakes. This suggests that females make more mistakes than males and this again supports my hypothesis. I will now draw cumulative frequency graphs and box plots to ensure that I fully explore this hypothesis, drawing upon different techniques to present and certify my findings to date.
Graph 18 & 19 Conclusion
In graph 18 I have plotted the cumulative frequency for both the male and female students. From my graph I can see that more males make between 5 and 23 mistakes, but more females make between 23 and 40 mistakes. This again supports my hypothesis that females make more mistakes than males, proving my hypothesis to be correct. I used my cumulative frequency graph to produce box plots which are displayed on graph 19. After examining graph 19 I have distinguished that the female students have a bigger range than the males, however the male’s boxplots shows that the males have a bigger inter-quartile range than the females. Lastly it can be clearly seen on the box plots that the females have a higher median than the males. These findings once again suggest that the females make more mistakes in comparison to the male students, supporting my hypothesis, indicating that it was correct.
After distinguishing that my hypothesis was correct I feel that a further investigation could explore the possibility of variables that would have affected the actual validity of my findings. As my investigation only considered a comparison of mistakes I feel that if I was given the opportunity to repeat this piece of coursework I would take into account the factors which may have contributed towards the students mistakes, these would include the weather, the time of lesson, the day of the lesson, the type of manoeuvre, and many other factors which may have affected the accuracy of my overall outcome.