• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month
Page
  1. 1
    1
  2. 2
    2
  3. 3
    3
  4. 4
    4
  5. 5
    5
  6. 6
    6
  7. 7
    7
  8. 8
    8
  9. 9
    9
  10. 10
    10
  11. 11
    11
  12. 12
    12
  13. 13
    13
  14. 14
    14
  15. 15
    15
  16. 16
    16

Statistics - My aim is to investigate whether it is possible to gain information about authorship of a text by using statistical measures.

Extracts from this document...

Introduction

Kuljit Bahra        AS Maths Coursework        10/03/2003

Statistics Coursework – Authorship

Design

Aim

        My aim is to investigate whether it is possible to gain information about authorship of a text by using statistical measures. I will investigate the authorship of an Adult text and a Child text. I will calculate the mean of the distribution for both populations. From this, I will calculate the standard deviation and variance. I will use the unbiased estimator for both populations. I will calculate the standard error and confidence intervals for both populations. I will represent my data using frequency distribution tables. I will put my results into a frequency distribution graph. For the confidence intervals, I will use normal distribution diagrams.

Hypothesis

        I predict that there will be more letters per word in Great Expectations by Charles Dickens and fewer in Charlie and the Great Glass Elevator by Roald Dahl. Therefore, the mean in Great Expectations will also be larger. I expect Great Expectations to have a larger standard deviation because of the use of a larger vocabulary.

Population

        I will randomly select 50 pages from each book by using the RAND function in Microsoft Excel. Once I have 50 random pages for each book, I will select a random line for each page. I will finally select a random word from each line.

Using the RAND function

...read more.

Middle

3

327

2

12

THIS

4

474

8

6

MY

2

459

33

9

YOU'VE

5

454

23

10

PUT

3

308

25

1

HAD

3

406

29

11

TONE

4

        Raw data for Charlie and the Great Glass Elevator by Roald Dahl.

Page No.

Line No.

Word No.

Word

Letters in word

77

18

7

GRIN

4

150

11

6

MORE

4

131

9

9

TO

2

143

14

1

EXPLOSIONS

10

164

12

1

ISN'T

4

140

31

7

AGAIN

5

92

2

1

RED

3

176

26

2

ALL

3

74

2

6

EYE

3

41

1

8

OFF

3

14

30

7

GREEN

5

120

2

3

A

1

55

25

5

ONE

3

146

16

9

FEEDING

7

93

19

1

CRIPPLED

8

57

8

10

MARS

4

23

8

2

ABOUT

5

119

9

1

LOOK

4

26

29

1

WORTH

5

74

22

5

WONKA

5

24

7

2

YOU

3

111

25

3

YOU

3

138

2

6

I

1

70

23

6

RAN

3

158

27

1

VAPOUR

6

152

28

3

PINE

4

165

18

6

OLD

3

89

5

4

BESIDE

6

111

26

7

TO

2

43

20

6

MANDARIN

8

23

3

1

SERIOUS

7

181

12

3

MOMENT

6

117

18

2

ABOUT

5

38

5

6

SPY

3

170

18

3

SAID

4

181

13

9

OF

2

78

7

6

YOU

3

65

21

9

OR

2

75

28

8

BUMP

4

50

24

1

STRAIGHT

8

14

8

7

OUT

3

98

1

6

ELEVATOR

8

172

10

1

FORTY

5

130

19

3

QUIET

5

104

9

8

WONKA

5

183

2

5

LETTER

6

17

1

7

MR

2

183

27

4

A

1

107

16

8

TO

2

129

9

1

PILL

4

Frequency distribution

        I

...read more.

Conclusion

Communication

Limitations

        One major limitation was the amount of samples that I collected. If I had collected more samples my data would have increased in accuracy. Because of the time allowed to complete the investigation collecting 50 samples from both the books seemed sensible. If I were to repeat the investigation I would increase the number of samples that I collected because this would increase the accuracy of my experiment.

Extensions

        To extend the investigation I could have looked at the number of words per line. I could have looked at the number of words per page. I could have also looked at the number of paragraphs per page.

Improvements

        To improve the investigation, I could have collected more results. This would lead to the sample mean being more similar to the population mean.

        I could have also collected different types of results. I could have looked at the number of words per page.

Conclusion

        In conclusion, my results show that it is possible to gain information about authorship of a text using statistical measures. My results show this because the adult text has a higher average of letters per word and also has more variation of word length. More information could be gained by collecting a larger sample size.

...read more.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. Standard addition was used to accurately quantify for quinine in an unknown urine sample ...

    When standard solutions are prepared a stock solution of known concentration of which dilute aquiots are produced to known volumes to give the concentration range that is required. The stock aliquiots are added to a fixed amount of the sample solution, and then made to volume.

  2. GCSE Mathematics Coursework: Statistics Project

    Also apparent is that there are a few other outliers, which will have to be replaced so that they again do not skew the results and make any conclusions formed inaccurate. These anomalies have an asterisk by their rows on the sheets showing the sample, and are also circled in red in Fig 1.

  1. Statistics coursework

    This is then supported by the mode, median and mean which are all slightly higher for the girls compared to the boys. Year 7's data appeared to prove my hypothesis however year 11's results were more confusing. The cumulative frequency graph shows that the bottom 50% of girls have scored

  2. The normal distribution

    Calculate the average, range, median and mode for the following set of data (a random set of your exam results from the last exam): 66.25, 15, 32.5, 26.25, 48.75, 48.75, 36.25, 35, 68.75, 72.5, 43.75, 40, 20, 48.75, 12.5, 41.25, 53.75, 50, 31.25, 95,

  1. Statistics. I have been asked to construct an assignment regarding statistics. The statistics ...

    So 1/2 � 19 = 9.5th Value. The Median is 26,100. The lower quartile would be 1/4 � 19 = 4.75th value. Lower quartile is 23,900. The upper quartile would be 3/4 � 19 = 14.25th value. Upper quartile is 27,500.

  2. Maths Statistics Investigation

    11 6450 665 89.7 67 Seat Ibiza 2003 7200 3 9030 6315 30.1 68 Ford Mondeo 93-96 34000 11 12255 690 94.4 73 Mercedes Cab 93-97 18500 9 51825 14225 72.2 78 Audi 80 Cabriolet 96000 9 19430 4125 78.8 79 Subaru Forester 50000 11 16945 4550 73.1 82 Fiat

  1. Probability of Poker Hands

    If two hands have the same pair, than the kicker or the highest non-paired cards are compared in order to determine the winner. It is ranked higher than the no pair hand because the probability if getting one pair hand is lower than the probability of getting no pairs.

  2. Reaction Times

    0/5=0 5 < r < 10 10 10/5=2 10 < r < 20 15 15/10=1.5 20 < r < 25 2 2/5=0.4 25 < r < 30 0 0/5=0 27 Right-handed girls- Reaction Tally Frequency Frequency density 0 < r < 5 0 0/5=0 5 < r < 10 7

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work