N.B. I have included the articles that I am comparing. The sentences that I have sampled, I have highlighted.
---------------------------------------------------------------------------------------------------------------------------------
The Independent
News
"Doubts were raised yesterday about the authenticity of shocking photographs purporting to show British troops apparently mistreating prisoners in Iraq."
This sample has: 18 words (excluding British and Iraq)
120 characters
Approx. 6.67 characters per word
"Questions were also raised over the sharp image quality and the absence of sweat, dirt and injuries on a captive, supposedly arrested for suspected theft and subjected to an eight-hour beating."
This sample has: 32 words
159 characters
Approx. 4.97 characters per word
"British military police are also investigating another eight alleged cases of torture and abuse."
This sample has: 13 words
75 characters
Approx. 5.77 characters per word
"In the US, Janis Karpinski, the reserve brigadier-general relieved of her command at the Abu Ghraib prison, said Army intelligence, and the CIA, were directly responsible for interrogations in the cell-block where the abuses were committed."
This sample has: 31 words
165 characters
Approx. 5.32 characters per word
"He wrote in a letter to his family in January that military intelligence had "encouraged us and told us, 'Great job'."
This sample has: 20 words
85 characters
Approx. 4.25 characters per word
Sports
"He had the luck to see Alan Smith and Mark Viduka squander free headers, but he saved bravely from Smith and exceptionally from Miller after half an hour."
This sample has: 22 words
95 characters
Approx. 4.32 characters per word
"Had Michael Duberry, whose reactions and positional sense were impeccable throughout an intense evening, not blocked what appeared to be a routine tap-in from the rebound, Wayne Rooney would have had his second goal of the match."
This sample has: 34 words
163 characters
Approx. 4.79 characters per word
"Their task was to hold the centre of the pitch while Jermaine Pennant and Milner drove forward down either flank."
This sample has: 17 words
60 characters
Approx. 3.53 characters per word
"Martyn still lives in Yorkshire and his return to Elland Road, where he had spent his last season occupying the bench, was predictably inspired from the moment he appeared to be a standing ovation."
This sample has: 30 words
135 characters
Approx. 4.50 characters per word
Adverts
"It was never goodbye, just see you later."
This extract has: 8 words
32 characters
Approx. 4.00 characters per word
"Too tempting."
This extract has: 2 words
11 characters
Approx. 5.50 characters per word
"Use your head."
This extract has: 3 words
11 characters
Approx. 3.67 characters per word
The Times
News
"Senior investigators from the Royal Military Police were flying to Cyprus yesterday to interview soldiers from the Queen's Lancashire Regiment, some of whose troops are said to have been involved in the alleged incident at Basra."
This sample has: 28 words
137 characters
Approx. 4.89 characters per word
"The photographs have come at a time when the MoD is drawing up several options for sending several thousand more troops to Iraq to take charge of a new area of the country south of Baghdad, including the volatile city of Najaf where thousands of militia loyal to an extremist Shia Muslim cleric are based."
This sample has: 49 words
220 characters
Approx. 4.49 characters per word
"Defence sources said one of the options was to send the headquarters element of 3 Commando Brigade Royal Marines to command the reinforcements."
This sample has: 18 words
92 characters
Approx. 5.11 characters per word
"She dismissed any suggestions that there may have been revenge attacks on Iraqis for the death of her husband."
This sample has: 18 words
85 characters
4.72 characters per word
"Five of the cases had been completed, and five were still under investigation."
This sample has: 13 words
64 characters
Approx. 4.92 characters per word
Sports
"However, he could not prevent the James Milner from leveling five minutes into the second half with his third Premiership goal of the season."
This sample has: 18 words
80 characters
Approx. 4.44 characters per word
"United were below par and below strength, but Leicester were unable to take a point."
This sample has: 13 words
53 characters
Approx. 4.08 characters per word
"Leeds were going for a third successive victory which would have taken them level on points with the three clubs above them, but although they had almost all the play they could only beat Martyn once."
This sample has: 32 words
139 characters
Approx. 4.34 characters per word
"Leeds United and Leicester City failed to improve their chances of staying in the Premiership last night, neither of them managing to secure the three points that would have boosted their chances, not to mention their morale."
This sample has: 31 words
146 characters
Approx. 4.71 characters per word
Adverts
"Picking up email from your phone."
This sample has: 6 words
25 characters
Approx. 4.17 characters per word
"Machines that think for you."
This sample has: 5 words
24 characters
Approx. 4.80 characters per word
"What's the matter, Lagerboy, afraid you might taste something?
This sample has: 8 words
42 characters
Approx. 5.25 characters per word
"Is this how your stairs feel to you?"
This sample has: 8 words
28 characters
Approx. 3.50 characters per word
The Daily Mirror
News
"Lt. Col. Mendonca has vowed no mercy will be shown to the culprits."
This sample has: 10 words
39 characters
Approx. 3.90 characters per word
"Lt. Col. Mendonca's tough message came a day after the Mirror published shock pictures of his soldiers humiliating an Iraqi."
This sample has: 15 words
76 characters
Approx. 5.07 characters per word
"Lt. Col. Mendonca has been asked to stay silent until the probe is over."
This sample has: 11 words
43 characters
Approx. 3.91 characters per word
"They insisted pictures they gave the Mirror showing a hooded prisoner being urinated on and battered with rifle butts were real."
This sample has: 20 words
101 characters
Approx. 5.05 characters per word
"And Sky TV defence analyst Francis Tusa insisted there was no evidence to suggest the pictures were fake."
This sample has: 13 words
55 characters
Approx. 4.23 characters per word
"The two squaddies admit they cannot answer questions regarding minor details in the photos, which were taken months ago."
This sample has: 19 words
100 characters
Approx. 5.26 characters per word
Sports
"Leeds must feel like they are running the wrong way down an escalator and despite their Herculean efforts, they are still two points adrift of safety."
This sample has: 24 words
109 characters
Approx. 4.54 characters per word
"The keeper lost the ball in the corner of his own area, but he somehow managed to scurry back to brilliantly touch away Milner's goal-bound curling shot."
This sample has: 27 words
116 characters
Approx. 4.30 characters per word
"However, Leeds could not find the winner they so badly needed and next up are Arsenal at Highbury on Friday."
This sample has: 16 words
61 characters
Approx. 3.81 characters per word
Adverts
"Extra, extra."
This extract has: 2 words
10 characters
Approx. 5.00 characters per word
"Join a meeting today!"
This extract has: 4 words
17 characters
Approx. 4.25 characters per word
"Don't be left with Peanuts!"
This sample has: 5 words
21 characters
Approx. 4.20 characters per word
-------------------------------------------------------------------------------------------------------------------------------
I am now going to construct some cumulative frequency tables for each newspaper (both word and sentence length). I am going to make a cumulative frequency graph manually, and a Box-and-Whisker diagram on the computer to find the interquartile ranges and the medians. I have sampled different amounts of sentences and words from each newspaper so instead of doing cumulative frequency graphs as such, I will use the percentages of the cumulative frequencies.
I am going to use the mean and the median as my measures of central tendency as this will probably give the best view of the average sentence and word length. I am also going to use standard deviation as this is reasonably the most useful measurement of dispersion.
-------------------------------------------------------------------------------------------------------------------------------
The Independent
Sentence length
Word length
Median sentence length = 20 words per sentence
IQR = 32 - 10 = 12
Median word length = 4.65 characters per word
IQR = 5.50 - 4.15 = 1.35
Mean sentence length = (Σ sentence lengths) / total sentences
= 230/ 12
= 19.17 words per sentence
Mean word length = Σ characters means/ total sentences
= 57.29/ 12
= 4.77 characters per word
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
sentence length = √ (1375.6668/ 12)
= 10.71
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
word length = √ (9.3721/ 12)
= 0.88
The Times
Sentence length
Word length
Median sentence length = 14 words per sentence
IQR = 27 - 8 = 19
Median word length = 4.55 characters per word
IQR = 4.90 - 4.20 = 0.70
Mean sentence length = Σ sentence lengths/ total sentences
= 247/ 13
= 19 words per sentence
Mean word length = Σ character means / total sentences
= 59.42/ 13
= 4.57 characters per word
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
sentence length = √ (1976/ 13)
= 12.33
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
word length = √ (2.7076/ 13)
= 0.46
The Daily Mirror
Sentence length
Word length
Median sentence length = 15 words per sentence
IQR = 20 - 10 = 10
Median word length = 4.40 characters per word
IQR = 5.15 - 4.00 = 1.15
Mean sentence length = Σ sentence lengths/ total sentences
= 166/ 12
= 13.83 words per sentence
Mean word length = Σ character means/ total sentences
= 53.52/ 12
= 4.46 characters per word
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
sentence length = √ (685.6668/ 12)
= 7.56
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
word length = √ (2.887/ 12)
= 0.49
---------------------------------------------------------------------------------------------------------------------------------
Box plots
Sentence lengths
These results show that: * The Independent has the greatest variation
* The Daily Mirror has the least variation
* The Independent has the highest measure of central tendency
* The Times has the lowest measure of central tendency
* The Times has the greatest range
* The Daily Mirror has the smallest range
Word lengths
These results show that: * The Independent has the largest range and the greatest variation
* The Times has the smallest variation
* The Daily Mirror has the lowest measure of central tendency
* The Independent has the highest measure of central tendency
============================================================================
As well as comparing readability across different newspaper, I think it would be quite interesting to compare readability across different categories. I hypothesize that readability varies across different categories, and my prediction is that it will be easiest in the tabloid newspaper and hardest in the quality newspaper. I have data from just three categories - News, Sport and Adverts, because these are the only common categories of the newspapers, so I will compare these categories alone.
---------------------------------------------------------------------------------------------------------------------------------
News
Sentence length
Word length
Median sentence length = 18 words per sentence
IQR = 26 - 14 = 12
Median word length = 4.85 characters per word
IQR = 5.30 - 4.30 = 1.00
Mean sentence length = Σ sentence lengths/ total sentences
= 328/ 16
= 20.50 words per sentence
Mean word length = Σ character means/ total sentences
= 78.53/ 16
= 4.91 characters per word
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
sentence length = √ (1529.75/ 16)
= 9.78
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
word length = √ (7.5781/ 16)
= 0.69
Sports
Sentence length
Word length
Median sentence length = 22 words per sentence
IQR = 31 - 15 = 16
Median word length = 4.35 characters per word
IQR = 4.65 - 4.10 = 0.55
Mean sentence length = Σ sentence lengths/ total sentences
= (22 + 34 + 17 + 30 + 18 + 13 + 32 + 31 + 24 + 27 + 16)/ 11
= 24 words per sentence
Mean word length = Σ character means/ total sentences
= (4.32 + 4.79 + 3.53 + 4.5 + 4.44 + 4.08 + 4.34 + 4.71 + 4.54 + 4.30 + 3.81)/ 11
= 4.31 characters per word
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
sentence length = √ (532/ 11)
= 6.95
Standard deviation of = √ (Σ (variable-mean value)²/number of variables)
word length = √ (1.4087/11)
= 0.36
Adverts
Sentence length
Word length
Median sentence length = 5 words per sentence
IQR = 7 - 4 = 3
Median word length = 4.25 characters per word
IQR = 4.70 - 3.90 = 0.80
Mean sentence length = Σ sentence lengths/ total sentences
= (6 + 5 + 8 + 8 + 8 + 2 + 3 + 2 + 4 + 5)/ 10
= 5.1 words per sentence
Mean word length = Σ character means/ total sentences
= (4.00 + 5.50 + 3.67 + 4.17 + 4.80 + 5.25 + 3.50 + 5.00 +4.25 + 4.20)/ 10
= 4.43 characters per word
Standard deviation of = √ [(Σ (variable-mean value)²)/number of variables]
sentence length = √ 50.9/ 10
= 2.26
Standard deviation of = √ (Σ (variables-mean values)²/number of variables)
word length = √ (4.0261/10)
= 0.63
---------------------------------------------------------------------------------------------------------------------------------
Box plots
Sentence lengths
These results show that: ∙ The adverts have a lower range, a lower median and a lower inter-quartile range, so all the advert sentences must be as short or shorter than sentences from other categories
- News has the largest range of sentence lengths
- Sport has a higher median value, so its sentences are generally the longest
Word lengths
These results show that: ∙ The News has the largest range
∙ The Sport has the least range
∙ The News has the highest median
∙ The Adverts have the lowest median
∙ The Sport has the least interquartile range
∙ The News has generally got the longest word lengths, although they are the most varied
∙ The Sports has generally got the shortest word lengths, and the least variation, although some of its sentences are longer than the some of the sentences in the other categories
3) To summarize, interpret, discuss and compare results
To help me to compare the results that I got, and to write a report based on these findings, I have made some tables showing all the data that I collected.
I have found in my research that the readability levels of the newspapers and the categories (or even the proportion of space allocated to each one) vary quite a lot. I think that the most plausible explanation for this is that the groups of people who read the newspapers also vary quite a lot in terms of education, interests, cultures, etc.
The quality, the Times, had the largest proportion of space devoted to adverts (32%), probably because it is considered by advertisement companies to have a larger number of readers, so more likely to have an effect in an advertising campaign. The Daily Mirror had a slightly larger proportion of space for news (48%), than the Independent or the Daily Mirror.
In the newspapers, the mean values of sentence length are:
19.17 19 13.83 words per sentence
(Broadsheet) (Quality) (Tabloid) newspapers
The values of variation (standard deviation) are:
10.71 12.33 7.56
(Broadsheet) (Quality) (Tabloid) newspapers
This shows that the Independent (broadsheet) has the highest mean sentence length (19.17 words per sentence), that would imply harder levels of readability. The Daily Mirror (tabloid) has the lowest values on the other hand with a sentence length mean of 13.83 words per sentence, which would imply easier levels of readability.
The Daily Mirror has the lowest variation of sentence lengths: its standard deviation is 7.56, which means that a lot of its sentences were of a close value to the mean. Since it has a low mean, this is significant because it shows that a lot of its sentences have a similar value to the mean, i.e., have easy readability. The Independent had the highest standard deviation of word length (0.88), and the highest IQR (1.35). Obviously these two measures would be in similar proportions, because they are both measures of variation.
The newspapers have mean values of word length of:
4.77 4.57 4.46 characters per word
(Broadsheet) (Quality) (Tabloid) newspapers
and the variation (standard deviation) is:
0.88 0.46 0.49
(Broadsheet) (Quality) (Tabloid) newspapers
This shows that the Independent (broadsheet) has the highest mean word length, which would imply the hardest level of readability. It also has the most deviation (0.88). That means that it has a lot of words that have a much longer or much shorter length than average. The Daily Mirror (tabloid) has the lowest mean word length (4.46), which would hint at an easier level of readability.
The Times has the lowest deviation (0.46) which shows that a lot of its words have mean lengths that are of a similar value to the total mean length of words (4.57 characters per word).
I concluded that the broadsheet, the Independent had the hardest readability, with the highest mean and median values of both sentence length (mean = 19.17, median = 20), and word length (mean = 4.77, median = 4.65). That is probably because it is read more by the people from a more educated background who like a wide variety of words in the articles because it makes the article more interesting. They might have chosen to read this newspaper because it has features such as reviews and Business news (if you remember the Daily Mirror had neither reviews nor Business news).
I found that the Daily Mirror (the tabloid) generally had the easiest readability, since it had the lowest mean and median values of sentence lengths, and the lowest mean length of word lengths. There are two possible explanations for this: 1) that it contains a lot of "show-biz", which is usually read by less educated people, so the language would be less sophisticated, 2) that it contains a lot of shorter, slang words that are considered to be more modern and "cooler".
The Times had the greatest sentence length variation (standard deviation = 12.33), and the Independent had the greatest word length variation (standard deviation = 0.88).
These findings prove that my hypothesis stating that readability levels change in different types of newspapers is correct.
---------------------------------------------------------------------------------------------------------------------------------
I am also going to compare the categories.
News had the greatest proportion of space in the newspapers (average = approx. 44.3% of the newspaper), compared to Sport (approx. 28%) and Adverts (approx. 26.3%).
The mean sentence length of the categories was:
20.5 24.0 5.1 words per sentence
(News) (Sport) (Adverts)
The variation (standard deviation) of these means was:
9.78 6.95 2.26
(News) (Sport) (Adverts)
This shows that the Sport has the highest mean value (24.0) of sentence length, which suggests a harder level of readability. The Adverts have the lowest mean value (5.1) of sentence length, which means an easier level of readability.
The News has the greatest deviation (9.78), which shows that it could have unusually long or short sentences. The Adverts have the least deviation, which shows that all its sentences are of a close value to the mean (5.1). Since it has a low mean, the majority of its sentences must be quite short.
The mean word length of the categories was:
4.91 4.31 4.43 words per sentence
(News) (Sports) (Adverts)
The variation (standard deviation) of the word lengths was:
0.69 0.36 0.63
(News) (Sports) (Adverts)
From this, I can see that the News has the highest mean value of word length (4.91), giving a harder level of readability. However, the News also has the greatest deviation (0.69), implying that not all its words are as long as this, but there are some longer words that level it out. The Sport, surprisingly enough, has a shorter mean word length than even the Adverts. As well as that, it has the least deviation, so a lot of its words will be that long (or short!).
So, I conclude that the News has the hardest level of readability, even though the Sport had a greater mean value of sentence length, because the Sport also had a relatively low mean value of word length. I think that the Adverts have the easiest level of readability because they had the lowest mean value of sentence length and a fairly low mean value of word length. The Sport is neither easiest nor hardest, because while it had the hardest level of readability in its sentence length, it also had the easiest level of readability in its word length.
This also confirms that my hypothesis was correct: readability does change over different categories.
---------------------------------------------------------------------------------------------------------------------------------
Some of my results were quite surprising, such as the variation of the sentence lengths and word lengths from the Times. It had the highest standard deviation of sentence length (12.33) and the lowest standard deviation of word length (0.46). It seemed quite strange to me because I had thought that the variations would be in similar proportions, being from the same newspaper. By comparing these statistics with others, I realised that it was because the Times had such a large proportion of adverts (32% compared to the Independent, 23%. and the Daily Mirror, 24%). Because the advert sentence lengths are so short and the other sentences from the Times are relatively long, it gave the Times a highly varied range of sentence lengths. The variation of word length was not affected by the adverts because the adverts have "normal" words so the mean word length of the Adverts was not so different to the News or Sports words lengths.
In the same way, the Sports mean lengths also surprised me. It had the lowest mean word lengths and the highest mean sentence lengths. I had expected that a high mean value of sentence lengths would almost guarantee a high mean value of word lengths. I can't think of a likely explanation for this.
I also noticed that the Daily Mirror had a much lower standard deviation of sentence length (7.56) than the Independent and the Times. Rather than it just being because the Daily Mirror does not have many adverts (since it has quite a few), I suggest that it is because the Daily Mirror's other sentences are not that much longer than the adverts, so the Daily Mirror has a small range of sentence lengths, all of a low mean length.
On the other hand, most of my results were expected, such as the adverts sentence lengths (mean = 5.1). I had expected this because adverts have the same function as headings- to attract attention to themselves. So they must have short, bold sentences to draw the reader's attention to them.
---------------------------------------------------------------------------------------------------------------------------------
From my cumulative frequency tables, I could work out the probability that a word or sentence of a certain length would appear in a particular newspaper or category.
I can see that 100% of the words in the Times samples that I took are under 5.5 characters long. Therefore I can predict that it is unlikely for a sentence with a mean of more than 5.5 characters per words to appear in the Independent, and that it is extremely unlikely, almost impossible for a sentence with a mean of more than 6.5 characters per word to appear.
0% of the sentences in either News or Sport were less than 10 words each in length. Therefore, it is unlikely for a sentence shorter than 10 words long to appear in the News or Sport sections of a newspaper. However, 100% of the Advert sentences had a length of less than 10 words, so it is highly probable that all the Adverts in these three newspapers will have a sentence length of less than 10 words long.
50% of the Independent sentences were under/ over 19 words long so it is equally likely for a sentence of either length to appear in the Independent.
---------------------------------------------------------------------------------------------------------------------------------
I think that my method was quite accurate, since I used stratified sampling and I tried to avoid bias, for example, I chose samples from different categories, since readability is likely to differ in different categories. All of my data is primary (I collected it myself), so I can be sure that it is as accurate as possible.
To improve the method that I used and to make it more accurate, I could sample further data by using more than one newspaper for every type of newspaper, i.e., two or more tabloids instead of one. If I used bigger samples, such as comparing more than one article from each category, this could change the results because it lessens the limits of the investigation and improves its accuracy.
I could have: ∙ done a survey questionnaire about the type of newspaper people read
∙ written to the newspaper boards, asking for information about where their newspaper sells most or least widely
∙ looked at a lot of different newspapers, investigating the relationship between cost and readability.
All these would have helped me in my research because they would give me more insight about the circulation, the types of people who read the newspapers and even their attitude towards the different newspapers.