The numbers from 1 to 76 are 12 or 13 random numbers assigned to them so it is relatively fair. However 77 has only 5 numbers assigned to it as 0 has 6. We do not need 0 as one of our results so if the number rounds to 0 we can just say it is 77 so 77 has 11 random numbers assigned to it and makes it fairer.
Top set sample
These are my chosen sample of 17 people from the top set.
Middle Sets Sample
1000/70=14.29 which is not a whole number so it is not 100% fair. These are my chosen sample of 15 people for the middle set.
Bottom sets Sample
1000/38=26.32 which is not a whole number so it is not 100% fair. These are my chosen 8 people for the bottom set.
Now I have chosen my sample data, I need to start comparing them. I have put the information on stem and leaf diagrams so it will be easier to read and work out the averages.
Stem and Leaf for angles
From the stem and leaf diagrams I can calculate the mean, mode and range. The mean is the arithmetic average; the sum of the data divided by the sample size. One problem with using the mean is that it does not often show the typical outcome. If there is one outcome that is very far from the rest of the data, then the mean will be affected by this outcome.
The median is a measure of the central tendency of a data set. It is the middle value in a data set, when the values are ranked from lowest to highest. The median is better for describing the typical value.
The mode is the single class in a statistical distribution having the greatest frequency. The mode shows what most people guessed.
The range shows the difference between the minimum value and the maximum value in a set of data. The range helps identify best and worst case and process variability.
Top set
Mean = sum of all numbers = 64
17
Mode = 60
Median = 60
Range = 87-50 = 37
Middle set
Mean = sum of all numbers = 72
15
Mode = 70
Median = 70
Range = 100-55 = 45
Bottom set
Mean = sum of all numbers = 57
8
Mode = 60
Median = 60
Range = 75-30 = 45
To make my results clearer to read I have condensed them into the table below:
This table can help us to compare the data. The top set mean is less than the overall mean and is the actual value. It is closer to the actual value than the other means so I am on the right track with my hypothesis. The middle set mean too large and is 8 away from the actual value. The bottom set is too small and is 7 away from the actual value so the bottom set had a better mean than the top set.
Top and bottom set have a mode of 60 which is 4 away from the actual value and better than the middle set mode of 70 which is 6 away from the actual value. This means that more people thought the size of the angle was 60 and 70.
As I said before, the median shows a typical value. Surprisingly the top and bottom set also have the same median on 60 again and the middle set is further off with a median of 70. So far it seems as though the bottom set predicted angles better than the middle set.
The range shows how spread out the data is. The top set has the smallest range of 37 than the middle and bottom sets which have a range of 45. This means that the top sets results had less spread than the other two sets.
The top set had the best mean and range. It was joint with the bottom group with the best mode and median so it is fair to say that top set were better at estimating angles than the bottom and middle sets so far.
The surprising this that it seems as thought the bottom set are better than the middle set by the means I have been using to compare so far. Now I will look at the lengths
Stem and leaf for lengths
I will do stem and leaf for the lengths now. From the stem and leaf diagrams I can calculate the mean, mode and range.
Top set
Mean = sum of all numbers = 64
17
Mode = 65
Median = 65
Range = 84-50 = 34
Middle set
Mean = sum of all numbers = 58
15
Mode = 61, 70
Median = 61
Range = 75-30 = 45
Bottom set
Mean = sum of all numbers = 63
8
Mode = 60
Median = 60
Range = 90-40 = 50
To make my results easier to read I have condensed them into the table below:
For the lengths top set results were not as good as they were for the angles. The mean for top set is the most furthest away from the actual value and the middle set is the closest. Even the bottom set got a closer mean than top set.
All the modes were larger than the actual value but bottom set was only 1 away. The middle set had two modes so we cannot really get any information from that as the two modes are far apart. Top sets mode was off the actual value by 6.
The medians were also all larger than the actual value and again bottom set has the closest median to the actual value and was only off by 1. Top sets median was the most far- off of the three sets as middle set was closer than top. On the other hand, top set got the smallest range so the data was less spread apart.
Box Plots
To do a box plat I need to use some of the information I got from before like the median but I also need to work out the lower and upper quartiles. The formula for them is written below:
The information above is enough for me to draw a box plot for the angles.
The bottom set has the smallest inter quartile rage and top set has the largest. However, the actual value is not in bottom set or middle set inter quartile range but it is in top sets. So although top sets inter quartile range is larger, it is more accurate because it is around the actual value. The lower quartile of the middle set is larger than the actual value and the upper quartile of the bottom set is smaller than the actual value. In conclusion, from the box plots, top set estimated the angles better than middle and bottom set. Now I need to do box plots for the lengths.
The information above is what I need to draw my box plots.
The box plots seem better in the lengths than they did in the angles as the actual value is in all three inter quartile ranges. Top set has the smallest range and most of the predictions made were larger than the actual value. The middle set has the largest range so its data is more spread out. Top sets median is the largest and most further away from the actual value. The actual value is closer to the lower quartile of top set and bottom set but is more or less in the middle of middle sets. From these box plots it is difficult to say which set done best.
The mean, mode and median do a nice job in telling where the average of the data is, but often we are interested in more. We need a measure of how far the data is spread apart. This is what standard deviation does.
Standard deviation for angles
Standard deviation is a statistical measure of spread or variability. It is a statistic that measures the dispersion of a sample. This is the formula
σ =
∑ - sigma
- mean
Top set Middle set Bottom set
The mean of x showed some interesting results. First of all, top set got a mean of -0.47 which is very small and means that most of the far off negative ones balanced out the far off positive ones very well. It had the mean which was closest to zero so it was the best one. Middle set mean was not too bad but it was further away from zero than top set. Bottom set, on the other hand, got a mean of -7.13 which is not very good. All the sets got a negative mean which shows they guessed less than the actual value was but bottom set guessed way too low.
The standard deviation of top set is less than the other sets which shows that it had less spread from the actual values than middle set and bottom set. The is not a lot of difference between middle and bottom sets standard deviation although bottom sets is slightly smaller. For angles in terms of standard deviation, top set estimated the best.
Standard deviation for lengths
I wanted the mean to be close to zero and the middle set got the best mean and it was negative. That means overall they guessed the length too small. Top set and bottom set got a positive mean so they guessed the lengths too big. However the top set got the mean most far-off as even the bottom groups mean was smaller than it.
In the standard deviation however, the top set got the smallest so their data was less spread out. The bottom set has the highest standard deviation so its data was the most spread out and few people got the accurate length.. As the standard deviation is more accurate than the mean, top set still got good estimates.
Scatter diagram
I wanted to see if there was any correlation between peoples estimates on the angles and the lengths. To make this fair I took the x value of what I done in the standard deviation which showed either “Angle – Actual value (59)” or “Length – Actual Value (64)”. I plotted the angle on the x axis and the length on the y axis.
For the top set, there seems to be a very weak positive correlation between the lengths and angles. I have drawn a line of best fit but 7 out of the 17 pieces of data are very far away from the line of best fit.
I drew another scatter diagram for the middle sets and bottom sets on the following page.
There seems to be no correlation for these scatter diagrams as well. This shows that they estimates the pupils made about the length of the line and the size of the angle were not related.
Conclusion
For angles, my hypothesis was correct as top set had better mean, median, mode and range. It also had a better box plot and standard deviation.
For the lengths of the line however, it was not as simple to see which set predicted the lengths better. Top set had the worst mean but the best standard deviation. Top set didn’t get a good mode or median but it had the smallest range. From the box plots, it was impossible to see which set done better.
So overall my hypothesis was right for the size of the angle but not for the length of the line.