Collect data from a population with a view to estimating population parameters.

Authors Avatar

S1 Task D: Authorship

You are required to collect data from a population with a view to estimating population parameters (e.g. µ and  2) by using the estimation techniques in this module. This should involve taking a random sample as well as calculating and comparing confidence intervals.

Investigate whether it is possible to gain information about authorship of a text using statistical measures:

                e.g        

  • Modern text v Historical text
  • Books by the same authors
  • Adult text v Child’s text, etc.

AIM

My aim is to investigate how sizes of sentences differ in comparing an Adult text with a Child’s text. The adult text I will be using is called ‘Jurassic Park’ and the Child’s text I will be using is called ‘A treasury of stories from Hans Christian Anderson.’ It would usually be assumed that a child’s text would have smaller sentences and smaller words – I will be investigating whether this is true or not by taking random samples from two texts and analysing my data.

The two populations I will be using are books with the names mentioned above. Both of these books were randomly chosen from the bookshelf in my study room.

HYPOTHESIS

        I hypothesise that the sentence sizes will be larger in the child’s text compared to the adult text because of the fact that there seems to be more dialogue in the adult text, indicating shorter sentences.

THEORIES USED

        The most important theory and the one from which predictions about the distribution of the sample mean can be made is the Central Limit Theorem. The central limit theorem states that if the sample size is large enough then the distribution of the sample means is approximately normal, irrespective of the distribution of the parent population. These approximations get closer as the sample size gets bigger. I can use the central limit theorem as n=50, which is a sufficiently large value for n. It incorporates theories such as ‘Unbiased Estimator’ and ‘Confidence Intervals.’ An unbiased estimator is one for which the mean of its distribution (i.e. the mean of all possible values of the estimator) is equal to the population value it is estimating.

Join now!

The sample mean is an unbiased estimator of the population mean. Confidence Intervals relate to an expression of the degree of confidence in your estimate in a more precise way than simple stating the standard error of the mean and size of the population. This is done by using an interval estimate.

I will be calculating confidence intervals of 95% and 90% for my data.

With a sample, the formula for the standard deviation is the same as that for the parent population except for the fact that  is replaced with S and µ is replaced with  :

...

This is a preview of the whole essay