Statistics Coursework - Comparing Newspapers

The Plan

Introduction

I have been told to produce a statistical investigation on the subject of newspapers. The investigation needs to draw a comparison in some way between at least two newspapers. I will have to form my own hypotheses and then collect primary data.

Original Hypothesis

I am going to investigate and evaluate the following hypothesis:

"Based on the sports sections of three different types of newspaper: The Sun (tabloid), The Daily Mirror, and The Times (broadsheet), The Sun will be the easiest newspaper to read or the 'most readable' on average. "

Data Collection

As I have mentioned, the data used for this coursework has to be primary data collected by myself. To test my hypothesis, I am going to, for each newspaper, collect data based on the readability of sentences of all the articles of four different sports which are: Football, Cricket, Rugby and Horse Racing. I have chosen these four sports because, having had a look through each paper, These are the four sports that have articles in all of the papers I am testing.

I will produce a table of my results (using a sampled population of 100, if possible, because this is a large enough sample to represent the data but small enough to be manageable). I will then develop the investigation further from there.

For my method of testing readability, I am going to use the "Readability Statistics" function on Microsoft. In "Spelling and Grammar" on the "Tools" menu, there is an option which will show you the readability statistics of a document. Enabling this option shows you different statistics including: the number of words, characters, paragraphs and sentences, the average number of words per sentence, sentences per paragraph and characters per word. I am going to use the "Flesch-Kincaid Reading Ease" as the statistic for testing the readability of the sentences.

Here is a little information about how the Flesch-Kincaid Reading Ease statistic.

This computes readability based on the average number of syllables per word and the average number of words per sentence. Scores range from 0 (zero) to 100. Standard writing averages approximately 60 to 70. The higher the score, the greater the number of people who can readily understand the document (i.e. the more 'readable' the text is).

Short, choppy, text with little variation in length will score as "easy to read" with this measure, but it is not a good style. Check the average sentences per paragraph and average words per sentence to detect this.

Info from http://www.writepage.com/writing/gramchek.htm

I have chosen this method of testing readability because I thought it would be quicker if I used Microsoft Word to do it than if I physically counted up words per sentence and syllables per word etc. and also, because I think that The Flesch Reading Ease is a more standard way of testing readability and is a more commonly used method. However, I realize that there are other methods of testing readability, and that the Flesch Reading Ease method can be a bit general in some cases because of individual reading levels and standards.

Pre-test and Practical Problems

I am going to do a pre-test because this will enable me to know what type of sampling to use, how much data I need to use and whether my hypothesis is worth pursuing. If my hypothesis will not enable me to produce a detailed enough investigation e.g. because there is not enough range in the data or the comparison is so simple that it doesn't need an investigation, then I will go back, amend it and repeat the pre-test process. Otherwise, I will then be able to investigate and draw conclusions about my hypotheses. I will be able to see whether the data is sufficient or there is enough range in the sampled data set when I draw histograms of my data sets. If the data is good enough to continue the investigation and draw comparisons then I will do so. I will record any practical problems I come across as I go along, and discuss them at the end.

Data Analysis

For each of the three newspapers, I am going to use stratified sampling to collect data from my sports articles because I want each of the four sports I am focusing on to be fairly represented in the data, and stratified sampling will make sure this happens. I will measure the total area of all of the football articles, cricket articles, rugby articles and horse-racing articles, add the four totals together and then work out the proportion of the area each sport covers to the total area of all four sports as a percentage. The percentages I obtain for each sport will tell me the number of sentences I need to take from that sport to contribute to the final population. I may need to round the percentages up or down accordingly. However, the total sentences needed from each type of sport should add up to 100, even though the numbers for each group will differ for each newspaper.

When I know the number of sentences required from each 'group' for all three types of newspaper, I am then going to use systematic sampling to select the individual sentences because it eliminates bias as every sentence in each group an equal chance of being selected. I will use the random number on my scientific calculator to generate a single random number between 1 and 10. The corresponding sentence in each group will then be the first sentence in the sample. I will then generate another random number between 1 and 10 with my calculator which will be ...