# The aim of this coursework is to compare the word and sentence length of an adults and a child's book. The results should reflect a higher level of difficulty in the adult's book.

Introduction

## The Normal Distribution

## Design

The strategies that I will be using are simple. I am going to take a sample of word and sentence lengths from both books. I will be taking two sets of both these measures for each book. To make it a fair and reliable the samples will be random.

The main objective of the coursework is to demonstrate the difficulty of an adult’s book compared to a child’s book. If the word and sentence length of the adult’s book is longer by a reasonable amount I will judge that the adults is more difficult.

The population that I will be using is two fiction books chosen from a library. One book was selected from the adult’s section and one from the child’s. The adult’s book is called ‘The Regeneration Trilogy’ by ‘Pat Barker’. The child’s book is called ‘The Borrowers Afloat’ by ‘Mary Norton’.

To obtain our sample we decided to do each measure separately. We firstly took the word length from the child’s book followed by the word length of the adults. First of all we randomly selected a page in the child’s book using the ‘Random Number Generator’ on a calculator. Once we had our page we randomly selected a line on the page using the same method. We now had our starting point. We then counted the number of letters in each word, starting with the first word of our selected line until we had 100 words.

Middle

Frequency

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

The Adults book (Sentence length)

Words Per Sentence | Tally | Frequency |

1 | ||

2 | ||

3 | ||

4 | ||

5 | ||

6 | ||

7 | ||

8 | ||

9 | ||

10 | ||

11 | ||

12 | ||

13 | ||

14 | ||

15 | ||

16 | ||

17 | ||

18 | ||

19 | ||

20 | ||

21 | ||

22 | ||

23 | ||

24 | ||

25 | ||

26 | ||

27 | ||

28 | ||

29 | ||

30 | ||

31 | ||

32 |

There was also one 39, which I will show below, as it would be a waste of space if I went up to 39 in my table:

39 |

I have used an adequate amount of data. For each population I have sampled 200 word lengths and 40 sentence lengths. As I have said earlier a good sample size to use is n ≥ 30. I have exceeded this by far for my word length and I am also fine on my sentence length.

I am now going to start calculating my data. As I briefly stated before I am going to find out the confidence intervals for the two populations. I will firstly try 95% for each. I will then be able to see if I am 95% confident that the population mean of the adults book is larger then the population mean for the child’s book (or vice versa). If the intervals of the two populations overlap I will not be able to say that I am 95% confident the population mean of one is higher than the population mean of the other.

E.g. If I was 95% confident that population mean of the child’s book was in the interval below (left), and 95% confident that the population mean of the adults book was in the interval below (right), I could not be 95% confident that the adults population mean is higher than the child's:

Children’s book (3.4 , 4.5) Adults Book (4.1 , 5.5)

Although looking at this it looks like the population mean for the adults book is higher there is an overlap. The population mean could be anywhere in the interval. This could mean that the population mean for the child’s book is 4.

Conclusion

In my aim I set out to show that the adults book was more difficult than the child’s book. I have not done this because all of my intervals overlap. I could have kept on lowering the confidence interval until there was no overlap but I didn’t see the point. At 95% if the intervals overlapped there would be a clear difference between the two population means. If the intervals are still overlapping at 68% then the results are very similar. I stopped at this point because I believed that I could not prove that the adult’s book was more difficult.

My sampling methods were the possible reason why I did not get the results that I wanted. I firstly could have used more pages to sample. I could have selected a random page, line and eventually a word on that line. I then could have repeated this 100 times for each book. For the sentence length I could have selected a page at random followed by a sentence from that page. This could have been done 40 times for each book. This then would have eliminated the chance of choosing an easy or hard page in the book.

If I was to do this coursework again I could change the populations. I could possibly choose a book which has an adults version and a child’s version. I feel this would be fairer because it is a possibility that I chose an advanced child’s book or an easy adult’s book. Instead of collecting two sets of data for each population I could have chose two adults and two child’s books. I then could have taken 100 words from each book and combined the child’s and the adults together. Another possible way would be to choose two fact books. It would be interesting to see if this had any difference on the final results.

