# Analyse a set of results and investigate the provided hypothesise.

Extracts from this document...

Introduction

Introduction

My name is Khalil Sayed-Hossen, I’m a year10 student and am carrying out the “Guesstimate” coursework task. For this coursework I am going to analyse a set of results and investigate the provided hypothesise.

Plan

Within the duration of producing this (Guestimate) coursework, I will first investigate the hypothesis given, that people estimate the length of lines better than the size of angles. Once I have done this I will begin to investigate hypothesise of my own. I will need to find away of proving and disproving these hypothesise through analysing relevant data.

The data I will be using is from a pooled set of results that members of my class have collected and combined together to form a broad, clearer set of results. To be able to compare a set of results there must be a clear comparison. Since the results of the length of the line were given in the mm and the size of the angle in °(degrees) there is no clear comparison. To be able to compare these two different types of data I will need to calculate the percentage error for each result. This is done by first calculating the differences between the actual size of the angle and the length of the line, i.e. errors, and then by using the formula: -

## Error ÷ Correct × 100 = percentage error

Ways in which I can compare this data include, looking at the mean of the results, standard deviation and through producing scatter graphs. Scatter graphs are useful as, once the line of best fit has been drawn we can then analyse the inter-quartile range. I will also use any other methods that become apparent during the duration of this coursework and apply them when investigating my other hypothesis as well.

Middle

40

41

F

-5

-11.11111111

27

50

14

M

5

11.11111111

28

55

50

M

10

22.22222222

29

40

71

F

-5

-11.11111111

30

20

16

F

-25

-55.55555556

31

50

14

M

5

11.11111111

32

40

14

M

-5

-11.11111111

33

40

41

F

-5

-11.11111111

34

60

15

M

15

33.33333333

35

70

14

M

25

55.55555556

36

53.2

28

M

8.2

18.22222222

37

40

34

F

-5

-11.11111111

38

45

45

F

0

0

39

37

79

F

-8

-17.77777778

40

10

12

F

-35

-77.77777778

I will start by investigating the line.

I first calculated the errors, by subtracting the correct length of the line away from the guesses. Once I had calculated the errors I was then able to use the percentage error formula:

Error ÷ Correct × 100

= percentage error

In excel we do this in the percentage error column by dividing the first data point in the line error column by 45, then by multiplying this by 100 to find the percentage.

This found the percentage error for the first data point, to find the percentage error for all the other data points, because the formula is the same for each of the other data points in this column we simply highlight the first data point using the right click of the mouse, drag down and the formula works out the percentage error in each cell.

Calculating the percentage error for angle guesstimates

angle | age | gender | Angle error | Angle percentage errors (%) | |

1 | 30 | 78 | M | -6 | -16.66666667 |

2 | 52 | 12 | F | 16 | 44.44444444 |

3 | 43 | 45 | F | 7 | 19.44444444 |

4 | 45 | 14 | M | 9 | 25 |

5 | 40 | 46 | M | 4 | 11.11111111 |

6 | 50 | 14 | M | 14 | 38.88888889 |

7 | 45 | 17 | F | 9 | 25 |

8 | 40 | 45 | F | 4 | 11.11111111 |

9 | 32 | 44 | M | -4 | -11.11111111 |

10 | 30 | 14 | M | -6 | -16.66666667 |

11 | 70 | 47 | F | 34 | 94.44444444 |

12 | 40 | 15 | M | 4 | 11.11111111 |

13 | 36 | 14 | F | 0 | 0 |

14 | 35 | 61 | M | -1 | -2.777777778 |

15 | 40 | 45 | F | 4 | 11.11111111 |

16 | 30 | 41 | M | -6 | -16.66666667 |

17 | 40 | 46 | F | 4 | 11.11111111 |

18 | 40 | 16 | F | 4 | 11.11111111 |

19 | 38 | 36 | M | 2 | 5.555555556 |

20 | 45 | 32 | F | 9 | 25 |

21 | 40 | 66 | M | 4 | 11.11111111 |

22 | 35 | 34 | M | -1 | -2.777777778 |

23 | 35 | 34 | F | -1 | -2.777777778 |

24 | 40 | 62 | M | 4 | 11.11111111 |

25 | 35 | 46 | F | -1 | -2.777777778 |

26 | 40 | 41 | F | 4 | 11.11111111 |

27 | 45 | 14 | M | 9 | 25 |

28 | 45 | 50 | M | 9 | 25 |

29 | 9 | 71 | F | -27 | -75 |

30 | 45 | 16 | F | 9 | 25 |

31 | 45 | 14 | M | 9 | 25 |

32 | 50 | 14 | M | 14 | 38.88888889 |

33 | 45 | 41 | F | 9 | 25 |

34 | 50 | 15 | M | 14 | 38.88888889 |

35 | 75 | 14 | M | 39 | 108.3333333 |

36 | 47.2 | 28 | M | 11.2 | 31.11111111 |

37 | 35 | 34 | F | -1 | -2.777777778 |

38 | 45 | 45 | F | 9 | 25 |

39 | 45 | 79 | F | 9 | 25 |

40 | 45 | 12 | F | 9 | 25 |

When calculating the percentage error for the angle guesstimates, we repeat the same process needed to work out the percentage errors for the line guesstimates. Except in this case we divided the errors by 36, as this was the correct size of the angle.

Now that I have calculated the percentage errors for all data points of line and angles within my sample data, I will be able to proceed with my fist method of proving or disproving the hypothesis, this will be by calculating the mean of line percentage errors and angle percentage errors. I will then compare both means.

Calculating the mean of the line percentage errors

Line percentage errors (%) |

11.11111111 |

22.22222222 |

11.11111111 |

11.11111111 |

6.666666667 |

22.22222222 |

44.44444444 |

33.33333333 |

16.66666667 |

33.33333333 |

122.2222222 |

33.33333333 |

33.33333333 |

11.11111111 |

11.11111111 |

33.33333333 |

33.33333333 |

11.11111111 |

0 |

33.33333333 |

0 |

44.44444444 |

22.22222222 |

11.11111111 |

11.11111111 |

11.11111111 |

11.11111111 |

22.22222222 |

11.11111111 |

55.55555556 |

11.11111111 |

11.11111111 |

11.11111111 |

33.33333333 |

55.55555556 |

18.22222222 |

11.11111111 |

0 |

17.77777778 |

77.77777778 |

Line percentage errors (%) |

-11.11111111 |

-22.22222222 |

11.11111111 |

11.11111111 |

6.666666667 |

22.22222222 |

-44.44444444 |

-33.33333333 |

-16.66666667 |

33.33333333 |

122.2222222 |

33.33333333 |

-33.33333333 |

11.11111111 |

11.11111111 |

33.33333333 |

-33.33333333 |

-11.11111111 |

0 |

-33.33333333 |

0 |

44.44444444 |

22.22222222 |

11.11111111 |

-11.11111111 |

-11.11111111 |

11.11111111 |

22.22222222 |

-11.11111111 |

-55.55555556 |

11.11111111 |

-11.11111111 |

-11.11111111 |

33.33333333 |

55.55555556 |

18.22222222 |

-11.11111111 |

0 |

-17.77777778 |

-77.77777778 |

To calculate the mean percentage error, we need to use the usual method of calculating any mean result. We need to add up all the percentage error data points and divide by how many data points there are. But before we can do this we need to make any negative percentage error data points positive. If this is not done, when we add up all the data, the negative data will subtract itself from any positive data, and this we do not want, as we are only looking at the percentage of which they were away from the correct, weather or not the guess was too high or too low, is insignificant.

Adding all percentage errors

To add the percentage errors we need to convert the negatives into positives, as said earlier. I did this in excel by squaring each negative percentage, by using the formula ^2, and then square rooting each percentage. Once I had done this I was able to add up all the percentage errors by first highlighting all the data points in the percentage error column and then by using the formula ∑ in excel, which means the sum of.This gave me the sum of all the percentage errors for the line, and the angle. The sum of the percentage errors for the line was 981.5555556% and for the angles 795%.

Line percentage errors (%) | Angle percentage errors (%) |

11.11111111 | 16.66666667 |

22.22222222 | 44.44444444 |

11.11111111 | 19.44444444 |

11.11111111 | 25 |

6.666666667 | 11.11111111 |

22.22222222 | 38.88888889 |

44.44444444 | 25 |

33.33333333 | 11.11111111 |

16.66666667 | 11.11111111 |

33.33333333 | 16.66666667 |

122.2222222 | 94.44444444 |

33.33333333 | 11.11111111 |

33.33333333 | 0 |

11.11111111 | 2.777777778 |

11.11111111 | 11.11111111 |

33.33333333 | 16.66666667 |

33.33333333 | 11.11111111 |

11.11111111 | 11.11111111 |

0 | 5.555555556 |

33.33333333 | 25 |

0 | 11.11111111 |

44.44444444 | 2.777777778 |

22.22222222 | 2.777777778 |

11.11111111 | 11.11111111 |

11.11111111 | 2.777777778 |

11.11111111 | 11.11111111 |

11.11111111 | 25 |

22.22222222 | 25 |

11.11111111 | 75 |

55.55555556 | 25 |

11.11111111 | 25 |

11.11111111 | 38.88888889 |

11.11111111 | 25 |

33.33333333 | 38.88888889 |

55.55555556 | 108.3333333 |

18.22222222 | 31.11111111 |

11.11111111 | 2.777777778 |

0 | 25 |

17.77777778 | 25 |

77.77777778 | 25 |

24.53888889 | 23.625 |

#### Finding the mean percentage error

What I did next was divide both numbers by 40, as this was the amount of data points. I was left with the products,24.53888889% for the line, and 23.625% for the angles, which were the mean percentage errors. These are highlighted in yellow.

The hypothesis states that people estimate lines better than angles. From information I have gathered through calculating the mean result of the percentage errors I have found that my findings contradict the hypothesis, and that people tend to estimate the size of angles better than the length of lines. My assumption that people will estimate the size of the angle better than the length of the line, for reasons mentioned earlier, was found to be true through this investigation.

If I were able to make these findings more reliable I would have sampled a larger amount of data from a more extensive pool of data, as this would have decreased the effect that unreliable, bias data had on the mean.

I will now investigate through other methods of proving and disproving the hypothesis.

Cumulative frequency

I could have at this point produced a frequency graph, but due to limitation in time I have decided to produce a cumulative frequency graph as this is a clearer, indicative representation of data, and I will be able to deduce more information from it.

If we represent the percentage errors of both line and angle percentage errors individually in frequency tables, we can calculate cumulative frequencies. Once we have done this we can use these new values, when plotted and on a graph, to form a cumulative frequency curve. This is useful as we will be able to find the median from the halfway point, and we will be able to locate the upper and lower quartiles.

The upper quartile is 75% and the lower quartile is 25 %. From knowing the upper and lower quartile, we can calculate the inter-quartile range. This is found by subtracting the lower quartile from the upper quartile. The inter quartile range is half of the data distribution and shows how widely spread the data is, if the inter-quartile range is small, then the distribution is bunched together and shows more consistent results, if the inter-quartile range is large, then the distribution is spread and shows a wider variation in results.

We can compare both the line inter-quartile range and the angle inter-quartile range, and whichever is smallest, will be the most accurate, as this would mean a smaller percentage error.

Line percentage errors cumulative frequency table

Line percentage errors (%) | Frequency | cumulative frequency | upper limits |

0.-10 | 4 | 4 | ≤ 10 |

11-.20 | 17 | 21 | ≤ 20 |

21-30 | 5 | 26 | ≤ 30 |

31-40 | 8 | 34 | ≤ 40 |

41-50 | 2 | 36 | ≤ 50 |

51-60 | 2 | 38 | ≤ 60 |

61-70 | 0 | 38 | ≤ 70 |

71-80 | 1 | 39 | ≤ 80 |

81-90 | 0 | 39 | ≤ 90 |

91-100 | 0 | 39 | ≤ 100 |

101-110 | 0 | 39 | ≤ 110 |

111-120 | 0 | 39 | ≤ 120 |

121-130 | 1 | 40 | ≤ 130 |

Conclusion

The standard deviation of the male line and angle estimates is 25.8% to 3.sf.

Comparing data

From investigating my hypothesis, I have found that through investigating the mean of the percentage errors for male and female estimates, males were more accurate. But when I investigated the percentage errors through standard deviation, I found that females were more consistent with estimating and that female estimates were more typical of the mean than male estimates. But this is irrelevant as the data still shows that males were more accurate as the standard deviation of the male estimates was 18.1% and the standard deviation of female estimates was 25.8%, which is a difference of 7.7%. My findings contradict my hypothesis and males were more accurate at estimating lengths of lines and size of angles.

Evaluation

I believe that I have investigated both hypotheses as much as I could have in the time I have been given. The conclusions I have come to through my findings were based upon the data pooled by my class. I believe that some of this data may have been unreliable due to errors etc. I believe that with a more extensive pool of data, my findings would have been more conclusive an indicative a true representation.

I have reached the end of my investigation. If the time allocation was greater, I could have investigated another hypothesis such as “Younger people estimate lines and angles better than older people”.

STATISTICAL COURSEWORK

GUESSTIMATE

COURSEWORK

Khalil Sayed-Hossen 10B

Khalil Sayed – Hossen 10B

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

## Found what you're looking for?

- Start learning 29% faster today
- 150,000+ documents available
- Just £6.99 a month