Determine whether it is possible to gain information about authorship of a text using statistical measures.

Authors Avatar

Aim: the aim of this investigation is to determine whether it is possible to gain information about authorship of a text using statistical measures. I will be comparing two books, the first aimed at an adult audience and the second aimed at a child which is written using a lower form of English literacy.

I will be comparing the complexities of the two books using a statistical manner by calculating the average number of words per sentence and the average number of letters per word in each book. Using this information I will then calculate confidence intervals which are what I will aim to compare between the two books.  I have chosen to investigate these two areas as neither are effected by font size or the number of lines on a page/words per line.

The final outcome of this investigation should give the relevant evidence to distinguish between which of the two books is more complex and uses longer words and sentences.

The two books I have used are;

“Diana, Her True Story” – Andrew Morton (174 pages, max 39 lines per page)

“A Series of Unfortunate Events” – Lemony Snicket (190 pages, max 21 lines per page)

Both these are from the same decade and therefore would be a fair comparison as there is no bias between the use of language and structure.  However, had I used one book written Pre 20th century then this book may contain the sort of language spoken/written of this period, possibly using longer or shorter words and sentences.

Join now!

        For each book I have used a sample of 50 for both the number of words per sentence and number of letters per word.  This means I will have 50 pieces of data per investigation for each book (e.g. 50 pieces of data for words per sentence and 50 pieces of data for letters per word for adult book and the same for the children’s book).

        I believe that a sample of 50 is a large enough sample size from the book to represent each book as a whole population (parent population) as 30 is the standard sample size and ...

This is a preview of the whole essay