Maths Statistics Coursework

Guesstimate Joel Morrison

Guesstimate

Introduction

The title of my investigation is ‘Guesstimate’, as I will be looking at how accurate different people are at estimating. The aim of the investigation is to deduce estimating skills of pupils of different ages, abilities and genders. To do this I have created the following hypotheses:

1) The older you are, the better you are at estimating.

The higher the band you are in, the better you are at estimating.
Boys are better than estimating than girls.

In order to do this I will need to collect information for Key Stage 3, Key Stage 4 and Key Stage 5, and within the stages ability (band) and gender. I will collect this information from a database, which gives us: Key Stage, maths ability (higher, middle or lower band), gender and their estimates of an acute (17°), obtuse (147°) and reflex (302°) angle.

However, I will only be using the information for the obtuse angle, because the acute would be extremely small so people may guess zero, which would affect our results. Also, reflex angles could be mistaken for acute angles and vice versa, so people may not be giving an accurate estimate.

I will assume that the data is reliable, as I will eliminate bias from my sample by looking at the errors in guessing. To calculate percentage error I will use:

This will make it easier to see how far out the pupils were from guessing the correct angle, making it much easier to compare results. This means that 0% error will be a perfect estimate, so the higher the percentage error the worse the estimate.

I will use a stratified sample to reduce bias, as this will also take into account the ratio of boys to girls and the different abilities, so that each group is represented in proportion to its size. I will use a sample size of 100 for each Key Stage, which makes each of the strata a sensible size to analyse.

I will also be using a random sample to select the pupils in each stratum, because a random sample means that each member of the population has the same chance of being chosen. I will do this y using a calculator to generate random numbers, which I will then select in the original data.

However, there may be outliers or anomalies. I will treat any estimates that are above 50% error as anomalies. I will discard any anomalies when analysing the data and creating box plots.

Investigating Hypothesis 1

I will now investigate my first hypothesis – the older you are, the better you are at estimating. To do this I will take my stratified sample by selecting and taking out only the pupils that I selected randomly. I will then discard any anomalies (results above 50% error) and create box plots for Key Stages 3, 4 and 5 separately. By creating them on the same sheet, it will be easy to compare the box plots using the average (mean), spread (using the I.Q.R or standard deviation), range, maximum/minimum values and the skew. I predict that as age (or Key Stage) increases, spread (range and standard deviation) decreases and the average becomes closer to zero. The data statistics for this has also been created in Autograph and inserted.

Results

This is how the results look for each Key Stage, with all of the key statistics shown in a results box to the right:

Analysing the Data

The first thing that is noticeable just by looking at the box plots is that semi inter-quartile ranges are the same for every box plots. This is unusual, but what it does tell us is that, ignoring the highest and lowest values, age does not have a great effect on how consistent the pupils are at estimating. This is because Key Stages 3, 4 and 5 all had semi inter-quartile ranges of 5.1 and ranges therefore of 10.2.

However, in order to come to a better conclusion, we must also look at the rest of the box plots and statistics. We can measure the spread of data by looking at the range; this is merely the distance between the lowest value and the highest value. As I expected the highest range of 51.02 belongs to Key Stage 3, with Key Stage 4 having a slightly lower range of 46.26. The lowest range was Key Stage 5 with 41.5, as expected as these are the oldest students and are also all the highest ability. However, the differences between the ranges are actually quite small (only around 5% difference in error between each Key Stage), although this could be due just to one person guessing poorly. Using range is not a very accurate way of measuring spread as it doesn’t take into account the fact one or two people may have estimated poorly whilst the others very accurately. Despite this, we can still say that for each Key Stage the range is especially high, and therefore age does not, to a huge extent, affect the ability of all of the pupils in that Key Stage to estimate accurately.

A more commonly used and accurate method of measuring spread is standard deviation. The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data, although it is effected by very large or small values. When the examples are tightly bunched together the standard deviation is a low figure. It is important we get an accurate representation of what the spread is like as spread measures how closely the data is clustered. As expected, the largest is Key Stage 3 with 9.56396, then 4 with 9.2567, and ...