• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

The aim of this investigation is to gain statistical information to show authorship of a text.

Extracts from this document...


AS Mathematics: (AQA) Statistics Coursework DESIGN Introduction: The aim of this investigation is to gain statistical information to show authorship of a text. For this investigation, I will use two pieces of text in order to investigate authorship. In order for the investigation to be valid, the two pieces of text I need to use should have a different theme attached to them. By theme, I mean they need to be different in a broad way i.e. different genre, different age readers. I had a number of different texts to compare but I decided to use one adult text and one child text as this will give me a more obvious variation and expectation. For this investigation I will be calculating the mean of the distribution for both populations. I will then be able to calculate the standard deviation and variance, and I will be using the unbiased estimator for both populations. I will calculate the standard error and confidence intervals for both populations. My data will be represented using frequency distribution tables and these can show the trends of a frequency distribution graph. The normal distribution diagrams will also be used for the confidence intervals representation. Population: In a statistical enquiry, you often need information about a particular group. This group is known as the POPULATION and it could be small, large or infinite. The population for my investigation is the all the words of each separate book. Sampling: Sampling is the selection of individual members of a population. The advantage of taking a sample is that it is cheaper, quicker and the results are easier to analyse and the appropriate for this type of investigation. Unfortunately, it does have some disadvantages that are difficult to avoid as the results may include natural variation or bias, and so may not be representative of the whole population and thus the results are meaningless. ...read more.


I I I 7 7 I I I 3 8 I I I I 4 9 I I I I I I I 7 10 I I 2 11 I 1 12 I 1 The distribution is not normal and I will discuss how a certain theorem acknowledges this. Raw Data for Children's Text: Word Page Word Word Length 1 24 mum 3 2 8 to 2 3 7 was 3 4 29 their 5 5 31 trouble 7 6 3 she 3 7 18 was 3 8 45 there 5 9 5 the 3 10 19 very 4 11 37 a 1 12 38 he 2 13 20 it 2 14 45 to 2 15 26 and 3 16 15 eggs 4 17 25 chris 5 18 30 friends 7 19 35 archie 6 20 33 to 2 21 40 yellow 6 22 2 hands 5 23 10 out 3 24 43 house 5 25 6 on 2 26 42 jacket 6 27 14 was 3 28 38 oh 1 29 17 said 4 30 25 there 5 31 35 for 3 32 37 chris 5 33 36 cat 3 34 10 coops 5 45 43 half 4 46 40 of 2 47 16 the 3 48 15 place 5 49 33 bring 5 40 19 six 3 41 45 picture 7 42 46 lots 4 43 27 sing 4 44 41 down 4 45 4 glass 5 46 36 the 3 47 23 too 3 48 37 it 2 49 14 the 3 50 26 want 4 Frequency Distribution table and graph for Children's Text: No. of letters(x) Tally Frequency (f) 1 I I 2 2 I I I I I I I I 8 3 I I I I I I I I I I I I I I I 15 4 I I I I I I I I 8 5 I I I I I I I I I I I 11 6 I I I 3 7 I ...read more.


Another progression could be to sample a number of horror books by the same author to see if they are at all similar in their population parameters. If I was to do this coursework again I could change the populations. I could possibly choose a book which has an adult's version and a child's version. I feel this would be fairer because it is a possibility that I chose an advanced child's book or an easy adult's book. Instead of collecting two sets of data for each population I could have chose two adults and two child's books. I then could have taken 100 words from each book and combined the child's and the adults together. Another possible way would be to choose two fact books. It would be interesting to see if this had any difference on the final results. Improvements To improve the investigation, I could have collected more results. This would lead to the sample mean being more similar to the population mean. I could have also collected different types of results. I could have looked at the number of words per page. Conclusion In conclusion, my results show that it is possible to gain information about authorship of a text using statistical measures. My results show this because the adult text has a higher average of letters per word and also has more variation of word length. However more information to check the reliability of the investigation can be found by extensive use of larger samples. Overall the investigation was a success as my hypothesis were proved right and the set of data seemed to be accurate but the reliability can only be tested on grounds that the real mean of the parent population is known and from this the consistency of this investigation can be ensured. My sampling method and generating of random numbers was also good so I would change the method, instead I would opt use a much larger size sample Khaled Hamid Page ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. The aim of this investigation was to look at the reliability and validity of ...

    A pioneer in this field was Raymond Cattell (1965). Cattell believed that the personality could be divided into two kinds: Surface Traits and Source Traits. Surface Traits were those, which other people could see, the overt personality, but underlying these are Source Traits which create the basis of the personality.

  2. Statistics coursework

    130<IQ<140 1 58 - IQ of girls in year 7 (Table 1) - IQ of boys in year 7 (Table 2) IQ Frequency Cumulative Frequency 60<IQ<70 1 1 70<IQ<80 1 2 80<IQ<90 4 6 90<IQ<100 15 21 100<IQ<110 40 61 110<IQ<120 6 67 120<IQ<130 0 67 130<IQ<140 0 67 Once

  1. Investigating the Relationship Between the Amount of Money a Football Club Receives and its ...

    �231,000 23 3 Brighton & H A 17 46 8 3 12 25 35 8 4 11 24 31 55 10952 �0 -17 3 Cambridge Utd 2 46 13 6 4 41 21 10 6 7 37 27 81 9247 �72,000 30 3 Cardiff City 3 46 13 7 3

  2. I am investigating how well people estimate the length of a line and the ...

    I will display data from hypothesis 1 in a cumulative frequency table, then graph as I will find it easier to compare data both on other cumulative frequency graphs and on box plots, than I would do on perhaps a frequency polygon making it easier for me to come to

  1. Standard addition was used to accurately quantify for quinine in an unknown urine sample ...

    Because the vibrational levels of both ground and excited states are similar, the fluorescence spectrum is often a sort of mirror image of the exciting absorption spectrum. The lifetime of an excited singlet state is usually 10-9-10-6 seconds and fluorescence lifetimes fall in this range.

  2. Guestimate - investigate how well people estimate the length of lines and the size ...

    Frequency Cumulative Frequency Upper Class Boundary 0 < E < 30 1 1 30 30 < E < 60 20 21 60 60 < E < 90 6 27 90 90 < E < 120 1 28 120 120 < E < 150 1 29 150 150 < E <

  1. Probability of Poker Hands

    matching the pair A pair occurs when a player obtains two cards of the same value from his set of five cards. The other three cards do not match the pair and do not have a pair among themselves. The pair with the higher value defeats the pair with the lower value.

  2. Design an investigation to see if there is a significant relationship between the number ...

    maximum error in my measurements to 0.005m, which I feel is an acceptable maximum error. Justification: Method: I have decided to sample the Fucus vesiculosus from the lower and middle shores of Robin Hood's Bay. My reason for choosing to sample from these areas is that I believe there will

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work