# Collect data with a view to estimating population parameters using estimation techniques.

Introduction

Statistics Coursework

Task: You are required to collect data with a view to estimating population parameters using estimation techniques. This should involve taking a random sample as well as calculating and comparing confidence intervals.

I have decided to estimate the population parameters for sentence length in 2 different genres of books. I have chosen a horror book and a drama book to see how sentence length varies between them. In theory I would expect the horror book to have much shorter sentences to add suspense whilst I would expect the drama to have longer more descriptive sentences.

Method:

As it would be too time consuming to record the sentence length for the whole population (the whole book). I am going to use sampling. To try and avoid any bias I will use the random number function on a calculator to find a page in the book and then I will record the length of the first full sentence. I will take a 100 samples for each book as this is enough that I will be able to gain accurate estimates for the population parameters but not use too much time. If by chance 2 the random number function produces a number that has already been used I will simply take the length of the second sentence on that page.

Data For Drama Book

 Page Sentence Length 51 19 148 20 234 29 114 18 195 6 313 4 239 19 115 11 10 2 203 9 191 8 118 21 109 10 317 4 217 9 298 9 241 9 10 6 232 10 57 11 114 32 80 11 196 14 49 11 67 9 282 15 280 31 226 18 71 24 315 16 308 5 203 9 226 14 147 38 224 10 236 19 185 18 257 5 317 11 1 29 169 15 66 9 267 17 106 20 232 28 160 37 300 25 322 8 49 21 26 29 276 41 214 15 233 7 131 9 76 8 71 8 317 9 177 5 155 13 266 6 95 5 308 3 93 6 55 8 96 4 311 6 65 9 128 21 288 18 203 4 210 19 166 20 175 14 280 13 249 8 245 19 182 4 312 19 52 23 73 13 221 6 204 12 73 13 189 9 129 25 50 25 230 6 273 22 218 12 31 39 149 28 96 7 48 14 80 18 13 11 167 4 34 23 43 10 94 7 49 16

The first thing for me to do is to find the Mean, Standard Deviation and Variance of the sample I have taken. As it would be extremely time consuming trying to find the exact mean and variance for 100 results I have set up frequency tables which will allow me

Obviously this is highly impractical but it shows how inaccurate my estimate is due to the fact that I took so few samples. Also I only sampled 1 book from each genre so it is difficult for me to accurately say that all books from these genres will be the same. It is possible that different authors with different writing styles will produce different sentence lengths. For example another horror writer may use longer sentences whilst another drama writer might use shorter sentences.

So if I was to extend this investigation I would firstly take more samples to ensure greater accuracy which would therefore allow greater certainty in any conclusions drawn. Secondly I would compare a number of different horror books against each other to see if their population parameters were similar or if they varied. Another progression could be to sample a number of horror books by the same author to see if they are at all similar in their population parameters.

