Analyse a set of results and investigate the provided hypothesise.

Introduction

My name is Khalil Sayed-Hossen, I’m a year10 student and am carrying out the “Guesstimate” coursework task. For this coursework I am going to analyse a set of results and investigate the provided hypothesise.

Plan

Within the duration of producing this (Guestimate) coursework, I will first investigate the hypothesis given, that people estimate the length of lines better than the size of angles. Once I have done this I will begin to investigate hypothesise of my own. I will need to find away of proving and disproving these hypothesise through analysing relevant data.

The data I will be using is from a pooled set of results that members of my class have collected and combined together to form a broad, clearer set of results. To be able to compare a set of results there must be a clear comparison. Since the results of the length of the line were given in the mm and the size of the angle in ° (degrees) there is no clear comparison. To be able to compare these two different types of data I will need to calculate the percentage error for each result. This is done by first calculating the differences between the actual size of the angle and the length of the line, i.e. errors, and then by using the formula: -

Error ÷ Correct × 100 = percentage error

Ways in which I can compare this data include, looking at the mean of the results, standard deviation and through producing scatter graphs. Scatter graphs are useful as, once the line of best fit has been drawn we can then analyse the inter-quartile range. I will also use any other methods that become apparent during the duration of this coursework and apply them when investigating my other hypothesis as well.

During the course of my investigation I will try and eliminate any bias that might occur. This is most likely to happen when I select a range of data from the pool of results, when selecting specific data I will try and sample as many random data as I can and make sure that it hasn’t all come from one person.

Collection of data

As part of this coursework, a given task was to collect data from random people by asking them to estimate the length of a line in (mm) and the size of an angle in (°) degrees. Once these results were taken they were then entered onto an X-cell spreadsheet as raw data. This was carried out by each member of the class, and once each of us had completed this task we pooled our results to give a broad, clearer set of data, which could be used to investigate any hypothesise.

Data analysis

Once all the data has been collected I will begin to make an analysis and apply it to the given hypothesise in the coursework, and also my own hypothesise. Before I can do this I need to change the data from being just raw data, to data I can compare. As said earlier, this can only be achieved by working out the percentage error for each data point for both line guesses and angle guesses. I will now work out the percentage errors. I will start by splitting investigation into different parts, depending on what methods I’m using to prove or disprove the hypothesis of line.

I will first select the data from the pool that I will use to analyse. This is not as simple as it sounds though. When selecting data from the pooled set of results we need to take into consideration how many males were asked and how many females were asked, this is called stratified random sampling. We do this to prevent any bias. For example, if our pooled set of results contained 40 males and 90 females and we then selected 20 males and 20 females’ results to analyse, our data would be bias, as the ratio of women to men or men to women would not be the same as the original set of results, and would have changed significant.

Stratified random sampling prevents this, and is achieved in this case by taking the number of males and dividing that by the total number of people, and multiplying this figure by however many samples are needed, this will then give the correct ratio of women to men if the process is then repeated for the amount of women. The formula looks like this-

Group (male or female) ÷ total × preferred number of data points

I will now use this method to select a set of data points from the pooled set of results.

In total there are 167 males and females who estimated the line and the angle, of these, 85 were males and 82 were females. So through knowing this information we can now calculate how many results of men and women are needed in my sample of however many data points by using stratified random sampling.

Stratified Random Sampling

I want to sample forty angle data points from the total of 167. I will now attempt to do this using the stratified random sampling method and formula.

Group (male or female) ÷ total × preferred number of data points

Males

÷ 167 × 40 = 20.35 *(say 20)

Females

÷ 167 × 40 = 19.64 *(say 20)

*Rounded to the nearest whole number to give exact amount needed.

So from these results I can see that the ratio of males against females is equal when rounded to the nearest whole number. From gaining this information I can now accurately begin to specifically sample 40 random data points from the pooled set of results.

My Sample data

When selecting the data not only did I have to take into account the ratio of males to females but I also have to consider the fact that each person’s results may not be reliable, so to prevent this, my data selection was spread throughout the pool and not all from one section, this was another way of preventing bias and unreliable data.

Once I had finished selecting my sample data. I noticed that within my set of selected data there was an outlier or anomaly, this I have highlighted in green. This anomaly must be removed and replaced as it is not a fair representation of the average guess of the length of the line, and when calculating the mean of line guesses, the anomaly would have a large weighted effect and would make the mean of the results insignificant and unreliable.

Revised set of sample data

This is set of sample data is going to be used through out my investigation of the length of the line.

I will now begin my investigation.

Firstly, I will begin by converting all the line and angle data points into their percentage errors. As said in my plan, this is done to implement a clear comparison.

I will first need to work out all the errors of the data points. We do this by subtracting the just the original guesses from the correct length of the ...