• Join over 1.2 million students every month
• Accelerate your learning by 29%
• Unlimited access from just £6.99 per month
Page
1. 1
1
2. 2
2
3. 3
3
4. 4
4
5. 5
5
6. 6
6
7. 7
7
8. 8
8
9. 9
9
10. 10
10
11. 11
11
12. 12
12
13. 13
13
14. 14
14
15. 15
15
16. 16
16

# Statistics - My aim is to investigate whether it is possible to gain information about authorship of a text by using statistical measures.

Extracts from this document...

Introduction

Kuljit Bahra        AS Maths Coursework        10/03/2003

Statistics Coursework – Authorship

Design

## Aim

My aim is to investigate whether it is possible to gain information about authorship of a text by using statistical measures. I will investigate the authorship of an Adult text and a Child text. I will calculate the mean of the distribution for both populations. From this, I will calculate the standard deviation and variance. I will use the unbiased estimator for both populations. I will calculate the standard error and confidence intervals for both populations. I will represent my data using frequency distribution tables. I will put my results into a frequency distribution graph. For the confidence intervals, I will use normal distribution diagrams.

## Hypothesis

I predict that there will be more letters per word in Great Expectations by Charles Dickens and fewer in Charlie and the Great Glass Elevator by Roald Dahl. Therefore, the mean in Great Expectations will also be larger. I expect Great Expectations to have a larger standard deviation because of the use of a larger vocabulary.

## Population

I will randomly select 50 pages from each book by using the RAND function in Microsoft Excel. Once I have 50 random pages for each book, I will select a random line for each page. I will finally select a random word from each line.

## Using the RAND function

Middle

3

327

2

12

THIS

4

474

8

6

MY

2

459

33

9

YOU'VE

5

454

23

10

PUT

3

308

25

1

3

406

29

11

TONE

4

Raw data for Charlie and the Great Glass Elevator by Roald Dahl.

 Page No. Line No. Word No. Word Letters in word 77 18 7 GRIN 4 150 11 6 MORE 4 131 9 9 TO 2 143 14 1 EXPLOSIONS 10 164 12 1 ISN'T 4 140 31 7 AGAIN 5 92 2 1 RED 3 176 26 2 ALL 3 74 2 6 EYE 3 41 1 8 OFF 3 14 30 7 GREEN 5 120 2 3 A 1 55 25 5 ONE 3 146 16 9 FEEDING 7 93 19 1 CRIPPLED 8 57 8 10 MARS 4 23 8 2 ABOUT 5 119 9 1 LOOK 4 26 29 1 WORTH 5 74 22 5 WONKA 5 24 7 2 YOU 3 111 25 3 YOU 3 138 2 6 I 1 70 23 6 RAN 3 158 27 1 VAPOUR 6 152 28 3 PINE 4 165 18 6 OLD 3 89 5 4 BESIDE 6 111 26 7 TO 2 43 20 6 MANDARIN 8 23 3 1 SERIOUS 7 181 12 3 MOMENT 6 117 18 2 ABOUT 5 38 5 6 SPY 3 170 18 3 SAID 4 181 13 9 OF 2 78 7 6 YOU 3 65 21 9 OR 2 75 28 8 BUMP 4 50 24 1 STRAIGHT 8 14 8 7 OUT 3 98 1 6 ELEVATOR 8 172 10 1 FORTY 5 130 19 3 QUIET 5 104 9 8 WONKA 5 183 2 5 LETTER 6 17 1 7 MR 2 183 27 4 A 1 107 16 8 TO 2 129 9 1 PILL 4

Frequency distribution

I

Conclusion

#### Communication

##### Limitations

One major limitation was the amount of samples that I collected. If I had collected more samples my data would have increased in accuracy. Because of the time allowed to complete the investigation collecting 50 samples from both the books seemed sensible. If I were to repeat the investigation I would increase the number of samples that I collected because this would increase the accuracy of my experiment.

##### Extensions

To extend the investigation I could have looked at the number of words per line. I could have looked at the number of words per page. I could have also looked at the number of paragraphs per page.

Improvements

To improve the investigation, I could have collected more results. This would lead to the sample mean being more similar to the population mean.

I could have also collected different types of results. I could have looked at the number of words per page.

Conclusion

In conclusion, my results show that it is possible to gain information about authorship of a text using statistical measures. My results show this because the adult text has a higher average of letters per word and also has more variation of word length. More information could be gained by collecting a larger sample size.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

## Found what you're looking for?

• Start learning 29% faster today
• 150,000+ documents available
• Just £6.99 a month

Not the one? Search for your essay title...
• Join over 1.2 million students every month
• Accelerate your learning by 29%
• Unlimited access from just £6.99 per month

# Related AS and A Level Probability & Statistics essays

1. ## Statistics coursework

- IQ of girls in year 7 (Table 3) IQ Frequency Cumulative Frequency Percentage of total 60<IQ<70 0 0 0 70<IQ<80 0 0 0 80<IQ<90 3 3 5.17 90<IQ<100 10 13 22.41 100<IQ<110 33 46 79.31 110<IQ<120 9 55 94.83 120<IQ<130 2 57 98.28 130<IQ<140 1 58 100 - IQ of boys in year 7 (Table 4)

2. ## The aim of this investigation was to look at the reliability and validity of ...

This is a strong positive correlation. A result table illustrating the outcome of Eysenck's personality inventory, illustrating the 'E' scores for form A and B. Participant N Score Rank Order Form A Rank Order Form B Average 'N' score Difference Difference� Form A Form B 1 7 11 4.5 7 9 -2.5 6.25 2 9 12

1. ## Probability of Poker Hands

For example, we can have a flush with spades, diamonds, clubs and hearts. Number of ways to choose cards without restriction: In order to find the probability, we must use the formula P(A)= N(A)/N(S), where N(A) is the number of outcomes in which even A can occur and N(S)

2. ## Maths Statistics Investigation

61000 6 24590 13455 45.3 45 Volvo 480 60000 11 13660 1320 90.3 47 Subaru Justy 64000 4 8995 3610 59.9 54 Ford Escort 29000 11 13310 900 93.2 57 Honda Jazz 13000 4 11300 8260 26.9 59 Audi Coupe 88-96 27000 9 22295 6695 70 62 Fiat Cinquecento 66000

1. ## Standard addition was used to accurately quantify for quinine in an unknown urine sample ...

The reference beam passes through an attenuator that reduces its power to approximately that of the fluorescence radiation. Signals from the reference are then fed into a difference amplifier whose output is displayed by a meter. Fig.2 Components Of A Fluorescence Spectrophotometer The Standard Addition Technique Standard addition is an alternate calibration technique to external standardisation.

2. ## The normal distribution

Draw a line of best fit through the data points and use it to derive the equation for the line (y = mx + c). b) Rearrange it to solve for x. That is x = ....................... c) Use the equation from part b to predict the weight of a wombat that produced 12.2 mg methane per hour.

1. ## Statistics. I have been asked to construct an assignment regarding statistics. The statistics ...

Birmingham City 28,270 - 21,394 = 6,876 is the Range. Chelsea 41,829 - 40,734 = 1,095 is the Range. The range for Birmingham's attendance is considerably more than that of Chelsea, so you could still say that Chelsea's attendance is more consistent, because the attendance is always within 1,000 of

2. ## Fantasy Football - Maths Coursework - Statistics

less consistent and that there is a wide range of high and low scores. However this means that attackers are unpredictable as to whether they will score very high, or very low. Midfielders: The median is in the middle of the box, which makes this box plot symmetrical, which means that there are as many high points as low points.

• Over 160,000 pieces
of student written work
• Annotated by
experienced teachers
• Ideas and feedback to