• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

The aim of this investigation is to gain statistical information to show authorship of a text.

Extracts from this document...


AS Mathematics: (AQA) Statistics Coursework DESIGN Introduction: The aim of this investigation is to gain statistical information to show authorship of a text. For this investigation, I will use two pieces of text in order to investigate authorship. In order for the investigation to be valid, the two pieces of text I need to use should have a different theme attached to them. By theme, I mean they need to be different in a broad way i.e. different genre, different age readers. I had a number of different texts to compare but I decided to use one adult text and one child text as this will give me a more obvious variation and expectation. For this investigation I will be calculating the mean of the distribution for both populations. I will then be able to calculate the standard deviation and variance, and I will be using the unbiased estimator for both populations. I will calculate the standard error and confidence intervals for both populations. My data will be represented using frequency distribution tables and these can show the trends of a frequency distribution graph. The normal distribution diagrams will also be used for the confidence intervals representation. Population: In a statistical enquiry, you often need information about a particular group. This group is known as the POPULATION and it could be small, large or infinite. The population for my investigation is the all the words of each separate book. Sampling: Sampling is the selection of individual members of a population. The advantage of taking a sample is that it is cheaper, quicker and the results are easier to analyse and the appropriate for this type of investigation. Unfortunately, it does have some disadvantages that are difficult to avoid as the results may include natural variation or bias, and so may not be representative of the whole population and thus the results are meaningless. ...read more.


I I I 7 7 I I I 3 8 I I I I 4 9 I I I I I I I 7 10 I I 2 11 I 1 12 I 1 The distribution is not normal and I will discuss how a certain theorem acknowledges this. Raw Data for Children's Text: Word Page Word Word Length 1 24 mum 3 2 8 to 2 3 7 was 3 4 29 their 5 5 31 trouble 7 6 3 she 3 7 18 was 3 8 45 there 5 9 5 the 3 10 19 very 4 11 37 a 1 12 38 he 2 13 20 it 2 14 45 to 2 15 26 and 3 16 15 eggs 4 17 25 chris 5 18 30 friends 7 19 35 archie 6 20 33 to 2 21 40 yellow 6 22 2 hands 5 23 10 out 3 24 43 house 5 25 6 on 2 26 42 jacket 6 27 14 was 3 28 38 oh 1 29 17 said 4 30 25 there 5 31 35 for 3 32 37 chris 5 33 36 cat 3 34 10 coops 5 45 43 half 4 46 40 of 2 47 16 the 3 48 15 place 5 49 33 bring 5 40 19 six 3 41 45 picture 7 42 46 lots 4 43 27 sing 4 44 41 down 4 45 4 glass 5 46 36 the 3 47 23 too 3 48 37 it 2 49 14 the 3 50 26 want 4 Frequency Distribution table and graph for Children's Text: No. of letters(x) Tally Frequency (f) 1 I I 2 2 I I I I I I I I 8 3 I I I I I I I I I I I I I I I 15 4 I I I I I I I I 8 5 I I I I I I I I I I I 11 6 I I I 3 7 I ...read more.


Another progression could be to sample a number of horror books by the same author to see if they are at all similar in their population parameters. If I was to do this coursework again I could change the populations. I could possibly choose a book which has an adult's version and a child's version. I feel this would be fairer because it is a possibility that I chose an advanced child's book or an easy adult's book. Instead of collecting two sets of data for each population I could have chose two adults and two child's books. I then could have taken 100 words from each book and combined the child's and the adults together. Another possible way would be to choose two fact books. It would be interesting to see if this had any difference on the final results. Improvements To improve the investigation, I could have collected more results. This would lead to the sample mean being more similar to the population mean. I could have also collected different types of results. I could have looked at the number of words per page. Conclusion In conclusion, my results show that it is possible to gain information about authorship of a text using statistical measures. My results show this because the adult text has a higher average of letters per word and also has more variation of word length. However more information to check the reliability of the investigation can be found by extensive use of larger samples. Overall the investigation was a success as my hypothesis were proved right and the set of data seemed to be accurate but the reliability can only be tested on grounds that the real mean of the parent population is known and from this the consistency of this investigation can be ensured. My sampling method and generating of random numbers was also good so I would change the method, instead I would opt use a much larger size sample Khaled Hamid Page ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. Statistics coursework

    100 4 5 4 13 11 Male 86 3 3 3 9 11 Male 101 4 3 3 10 11 Male 111 4 6 6 16 11 Male 99 4 4 5 13 11 Male 107 4 5 5 14 11 Male 88 3 3 3 9 11 Male 96

  2. The aim of this investigation was to look at the reliability and validity of ...

    For a trait theory to be acceptable as a personality theory it must firstly isolate basic traits which describe personality and measure them accurately. This is attempted using a process called Factor Analysis. A pioneer in this field was Raymond Cattell (1965).

  1. Probability of Poker Hands

    matching the pair A pair occurs when a player obtains two cards of the same value from his set of five cards. The other three cards do not match the pair and do not have a pair among themselves. The pair with the higher value defeats the pair with the lower value.

  2. Maths Statistics Investigation

    0.04(5+24) =62.4 =1.16 62.4+1.16=63.56 too small * When X = 0.965 * And Y = 0.035 0.965((9.5x6) +8) + 0.035((0.0005x10000) +24) 0.965(57+8) 0.035(5+24) =62.725 =1.015 62.725+1.015=63.74 63.74~63.7 Therefore I have concluded that X = 0.965 and Y = 0.035 2.

  1. The normal distribution

    The average is 39.93 and the standard deviation is 2.73. Sample scores Score - average (average = 39.93) (Score - average)2 41 1.07 1.14 43 3.07 9.42 37.5 -2.43 5.90 38.5 -1.43 2.04 44 4.07 16.56 38 -1.93 3.72 37.5 -2.43 5.90 Average = 39.93 44.71 Total up all the

  2. Design an investigation to see if there is a significant relationship between the number ...

    I felt that this would be difficult to do, as the shelved structure of the bay would mean a grid would inevitably incorporate the ledges and gullies caused by this shelved structure. However, a line transect would avoid these gullies.

  1. Standard addition was used to accurately quantify for quinine in an unknown urine sample ...

    In fluorescence the process occurs very rapidly. Fluorescence and phosphorescence come under the general heading photoluminescence. In fluorescence the energy transitions do not involve a change in electron spin whereas, phosphorescence does involve a change in electron spin and therefore, occurs more slowly than fluorescence.

  2. Guestimate - investigate how well people estimate the length of lines and the size ...

    I also took away the negative signs from the errors as all I wanted was the difference and it didn't matter whether they were positive or negative. The method for working out this is: "error = actual angle - estimate" And the method for % error is "% error = error x 100" Actual error WHAT DOES THE ERROR SHOW?

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work