• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

The aim of this investigation is to gain statistical information to show authorship of a text.

Extracts from this document...


AS Mathematics: (AQA) Statistics Coursework DESIGN Introduction: The aim of this investigation is to gain statistical information to show authorship of a text. For this investigation, I will use two pieces of text in order to investigate authorship. In order for the investigation to be valid, the two pieces of text I need to use should have a different theme attached to them. By theme, I mean they need to be different in a broad way i.e. different genre, different age readers. I had a number of different texts to compare but I decided to use one adult text and one child text as this will give me a more obvious variation and expectation. For this investigation I will be calculating the mean of the distribution for both populations. I will then be able to calculate the standard deviation and variance, and I will be using the unbiased estimator for both populations. I will calculate the standard error and confidence intervals for both populations. My data will be represented using frequency distribution tables and these can show the trends of a frequency distribution graph. The normal distribution diagrams will also be used for the confidence intervals representation. Population: In a statistical enquiry, you often need information about a particular group. This group is known as the POPULATION and it could be small, large or infinite. The population for my investigation is the all the words of each separate book. Sampling: Sampling is the selection of individual members of a population. The advantage of taking a sample is that it is cheaper, quicker and the results are easier to analyse and the appropriate for this type of investigation. Unfortunately, it does have some disadvantages that are difficult to avoid as the results may include natural variation or bias, and so may not be representative of the whole population and thus the results are meaningless. ...read more.


I I I 7 7 I I I 3 8 I I I I 4 9 I I I I I I I 7 10 I I 2 11 I 1 12 I 1 The distribution is not normal and I will discuss how a certain theorem acknowledges this. Raw Data for Children's Text: Word Page Word Word Length 1 24 mum 3 2 8 to 2 3 7 was 3 4 29 their 5 5 31 trouble 7 6 3 she 3 7 18 was 3 8 45 there 5 9 5 the 3 10 19 very 4 11 37 a 1 12 38 he 2 13 20 it 2 14 45 to 2 15 26 and 3 16 15 eggs 4 17 25 chris 5 18 30 friends 7 19 35 archie 6 20 33 to 2 21 40 yellow 6 22 2 hands 5 23 10 out 3 24 43 house 5 25 6 on 2 26 42 jacket 6 27 14 was 3 28 38 oh 1 29 17 said 4 30 25 there 5 31 35 for 3 32 37 chris 5 33 36 cat 3 34 10 coops 5 45 43 half 4 46 40 of 2 47 16 the 3 48 15 place 5 49 33 bring 5 40 19 six 3 41 45 picture 7 42 46 lots 4 43 27 sing 4 44 41 down 4 45 4 glass 5 46 36 the 3 47 23 too 3 48 37 it 2 49 14 the 3 50 26 want 4 Frequency Distribution table and graph for Children's Text: No. of letters(x) Tally Frequency (f) 1 I I 2 2 I I I I I I I I 8 3 I I I I I I I I I I I I I I I 15 4 I I I I I I I I 8 5 I I I I I I I I I I I 11 6 I I I 3 7 I ...read more.


Another progression could be to sample a number of horror books by the same author to see if they are at all similar in their population parameters. If I was to do this coursework again I could change the populations. I could possibly choose a book which has an adult's version and a child's version. I feel this would be fairer because it is a possibility that I chose an advanced child's book or an easy adult's book. Instead of collecting two sets of data for each population I could have chose two adults and two child's books. I then could have taken 100 words from each book and combined the child's and the adults together. Another possible way would be to choose two fact books. It would be interesting to see if this had any difference on the final results. Improvements To improve the investigation, I could have collected more results. This would lead to the sample mean being more similar to the population mean. I could have also collected different types of results. I could have looked at the number of words per page. Conclusion In conclusion, my results show that it is possible to gain information about authorship of a text using statistical measures. My results show this because the adult text has a higher average of letters per word and also has more variation of word length. However more information to check the reliability of the investigation can be found by extensive use of larger samples. Overall the investigation was a success as my hypothesis were proved right and the set of data seemed to be accurate but the reliability can only be tested on grounds that the real mean of the parent population is known and from this the consistency of this investigation can be ensured. My sampling method and generating of random numbers was also good so I would change the method, instead I would opt use a much larger size sample Khaled Hamid Page ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Probability & Statistics essays

  1. Standard addition was used to accurately quantify for quinine in an unknown urine sample ...

    Both classes of substances have delocalised ?-electrons that can be placed in low-lying excited singlet states. In polycyclic aromatic systems where the number of ?-electrons available is greater than in benzene, these compounds and their derivatives are usually much more fluorescent than benzene and its derivatives.

  2. GCSE Mathematics Coursework: Statistics Project

    I also need to compare the differences in the amount of TV watched, to see whether this factor is influenced by gender. To do this, I will: o Construct box and whisker diagrams to study the weights of the girls in comparison to the weights of the boys.

  1. Investigating the Relationship Between the Amount of Money a Football Club Receives and its ...

    �231,000 23 3 Brighton & H A 17 46 8 3 12 25 35 8 4 11 24 31 55 10952 �0 -17 3 Cambridge Utd 2 46 13 6 4 41 21 10 6 7 37 27 81 9247 �72,000 30 3 Cardiff City 3 46 13 7 3

  2. I am investigating how well people estimate the length of a line and the ...

    I will display data from hypothesis 1 in a cumulative frequency table, then graph as I will find it easier to compare data both on other cumulative frequency graphs and on box plots, than I would do on perhaps a frequency polygon making it easier for me to come to

  1. The aim of this investigation was to look at the reliability and validity of ...

    For a trait theory to be acceptable as a personality theory it must firstly isolate basic traits which describe personality and measure them accurately. This is attempted using a process called Factor Analysis. A pioneer in this field was Raymond Cattell (1965).

  2. Guestimate - investigate how well people estimate the length of lines and the size ...

    180 1 30 180 Frequency Table for Secondary data, line 1, year 7 Length (cm) Frequency Cumulative Frequency Upper Class Boundary 0 < l < 2 0 0 2 2 < l < 4 0 0 4 4 < l < 6 1 1 6 6 < l < 8

  1. &amp;quot;The lengths of lines are easier to guess than angles. Also, that year 11's ...

    of the line was, had the highest frequency density, but was not the most densely populated. The year 11 data shows that not many people guessed in the correct group as it is not very dense. Cumulative frequency tables group the data so you can see how much the data has gone up from group to group.

  2. Statistics coursework

    98 4 4 3 11 KS2 Results Total of KS2 Year Group Gender IQ English Maths Science results 11 Female 98 3 3 3 9 11 Female 108 5 5 5 15 11 Female 104 4 5 4 13 11 Female 90 4 4 4 12 11 Female 102 5

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work