Collect data with a view to estimating population parameters using estimation techniques.

Statistics Coursework

Task: You are required to collect data with a view to estimating population parameters using estimation techniques. This should involve taking a random sample as well as calculating and comparing confidence intervals.

I have decided to estimate the population parameters for sentence length in 2 different genres of books. I have chosen a horror book and a drama book to see how sentence length varies between them. In theory I would expect the horror book to have much shorter sentences to add suspense whilst I would expect the drama to have longer more descriptive sentences.

Method:

As it would be too time consuming to record the sentence length for the whole population (the whole book). I am going to use sampling. To try and avoid any bias I will use the random number function on a calculator to find a page in the book and then I will record the length of the first full sentence. I will take a 100 samples for each book as this is enough that I will be able to gain accurate estimates for the population parameters but not use too much time. If by chance 2 the random number function produces a number that has already been used I will simply take the length of the second sentence on that page.

The Central Limit Theorem

Because I don’t know anything about how the population is distributed I have to use the Central Limit Theorem. Even if you don’t know how the parent population is distributed the central limit theorem allows you to make predictions as to the distribution of the sample means. Also with a large enough sample the sample mean will be close to the population mean. The central limit theorem says that:

If you take enough samples then the means will be normally distributed.
The mean of the sample means is approximately equal to the population mean.
The variance of the sample mean is roughly the same as the population variance divided by the sample size
The large the sample size the closer the sample mean and variation are to the population mean and variation.

X ~ (unknown) (μ, σ²) then X ~ N(μ,σ²/n)

Once I have collected the data I will calculate the mean, standard deviation and variance of the sample. When I have figures for these I can estimate the variance and standard deviation of the population. Next I will calculate the standard error which will allow me to calculate confidence intervals for the population. When calculating confidence intervals I will use the tables for the normal function.