An investigation into the difference in readability between a tabloid and a broadsheet newspaper.
Katherine Allen
An investigation into the difference in readability between a tabloid and a broadsheet newspaper
Introduction and Aims
Readability can be defined as how easy or how difficult something is to read. If it were hard to read, one would say it has a high readability. I think it is possible to determine readability by the number of letters in a word or perhaps by the number of words in a sentence. Shorter words would be easier to read as they are more commonly used whereas as longer words may be more difficult to recognise. Longer sentences are likely to go into more depth so would be more complicated to read.
There are different types of newspaper, aimed at different groups of readers. There are tabloid, quality and broadsheet newspapers. Some papers are published daily, while some come out weekly. There are also special issues that are in shops on Saturdays or Sundays.
I have chosen to investigate The Independent, a daily broadsheet, and The Mirror, a daily tabloid, because I feel there will be a greater difference in readability between these than if I were to investigate the comparison between a tabloid and a quality newspaper, or a broadsheet and a quality newspaper.
Broadsheets are aimed at the more intellectual reader - people who would buy newspapers for detailed, in-depth news stories. Tabloids are aimed at people who prefer only a brief overview of the main news stories, and more 'gossipy' articles. Therefore I would expect The Independent to have a higher readability than The Mirror. It is the aim of this investigation to confirm this theory, or otherwise discover reasons for its refutation.
Hypothesis
* The readability of a tabloid (The Mirror) will be lower than that of a broadsheet (The Independent).
Katherine Allen
Pre-test
Before I begin my investigation I will carry out a short pre-test in order to ensure that my proposed method will be successful, and to highlight any problems that would be likely to arise.
I will take a small sample of 50 words from each newspaper. The newspapers that I will be testing are both from the same day and are both national, so there will be similar articles in each. There words will be taken from articles about the same topic. I will take the first 50 words of the article as I feel this will give a fair, random sample.
I expect to encounter some problems whilst counting the words, such as words that are not everyday language. This could include people's names, or other words, which are specific to the article. These will be left out as I count.
Table of Results
The Independent
The Mirror
Letters per word
Tally
Frequency
Tally
Frequency
4
IIII IIII IIII
5
IIII IIII
9
5
IIII IIII
9
IIII IIII
0
6
III
3
IIII III
8
7
IIII
5
IIII IIII
9
8
IIII
5
IIII I
6
9
II
2
IIII
4
0
III
3
II
2
1
IIII
5
0
2
II
2
II
2
3
I
0
I can see that it is hard to spot any patterns or major differences between the articles in the two newspapers. When I conduct the investigation I will take a bigger sample. The pre-test drew attention to other types of words or letters that I will need consider when counting words. These are: -
* Headlines - headlines often do not form sentences, or they play on words. They will be left out when counting.
* Hyphenated words - no hyphenated words will be counted.
* Numbers - numbers will not be included, unless written as words.
* Words that use apostrophes - words such as 'weren't' will be left out.
Katherine Allen
* Dialogue - speech and words such as 'he said', which accompany dialogue, will not be counted because the words that somebody says are not going to depend upon the type of newspaper.
* Abbreviations - abbreviations, such as WMD, will not be counted as words.
* People's names and titles (Mr, Mrs, ...
This is a preview of the whole essay
* Hyphenated words - no hyphenated words will be counted.
* Numbers - numbers will not be included, unless written as words.
* Words that use apostrophes - words such as 'weren't' will be left out.
Katherine Allen
* Dialogue - speech and words such as 'he said', which accompany dialogue, will not be counted because the words that somebody says are not going to depend upon the type of newspaper.
* Abbreviations - abbreviations, such as WMD, will not be counted as words.
* People's names and titles (Mr, Mrs, etc), the names of countries and any other nouns that begin with capital letters for any other reason than them being the beginning of a sentence. None of the above will be counted in the sample.
Planning
The newspapers that have been chosen for this investigation are The Independent and The Mirror. They are both daily, national papers. The Independent was chosen because it is a broadsheet newspaper, known to be read by people who wish to find detailed, serious stories. The Mirror is a tabloid newspaper. This was chosen as opposed to other tabloids because it is known to have more news stories than other tabloids, so it is more likely that it will contain an article on the same topic as one in The Independent. It is important that articles about the same story are selected to be investigated, as they will use the same type of language. It will, therefore, only be the length of the words that varies, and not the types of words that are used. The fact that the newspapers are both national, daily papers and that the topic of the articles is the same means that there should not be any bias in the investigation.
From carrying out my pre-test I discovered that I would need to count more than 50 words in each sample. Therefore, 100 words from each newspaper will be counted. To make it a fair sample, I will leave out all the problem-words that were highlighted during the pre-test. They will not be counted from either newspaper. I have also decided that I will not count any words that contain just 3 letters or less. This is because these small words are likely to be used very frequently, and they are often not the type of words that can be replaced with alternatives. To avoid bias, the sample will random. I will take the first 100 usable words from the article as I feel this is enough to represent the whole article, and it will mean that the sample is random and fair.
As I count the words I will tally them in a table. After working out the total number of words containing each amount of letters, I will display the data in a Box and Whisker Plot and, because the Box and Whisker Plots do not show the specific lengths of the words, I will also draw a Back-to-Back Bar Chart. I feel that this will be a clear way to represent the data, and that, by looking at these graphs, I will be able to spot differences and make comparisons between the two newspapers.
Katherine Allen
Table of Results
The Independent
The Mirror
Letters per word
Tally
Frequency
Tally
Frequency
4
IIII IIII IIII IIII IIII I
26
IIII IIII IIII IIII IIII
24
5
IIII IIII IIII II
7
IIII IIII IIII III
8
6
IIII IIII II
2
IIII IIII IIII
5
7
IIII IIII I
1
IIII IIII IIII
4
8
IIII IIII I
1
IIII IIII IIII
4
9
IIII
5
IIII III
8
0
IIII
5
IIII
4
1
IIII II
7
0
2
IIII
5
III
3
3
I
0
From the table alone it seems that, although there are longer words in The Independent, there are just as many, if not more, short words. There seems to be more mid-length words in The Mirror.
I will draw my proposed graphs to get a clearer picture of the spread of results. I will need to know the median and the upper and lower quartiles of the results for each newspaper in order to draw the Box and Whisker Plot.
The Independent
Median - 6
Lower Quartile - 4
Upper Quartile - 8
The Mirror
Median - 6
Lower Quartile - 5
Upper Quartile - 8
Katherine Allen
Stratified Sample
The previous part of this investigation only took into account one section of the newspaper. This may not have given the whole picture of the difference in word length, as only certain sections were investigated. Carrying out a stratified sample would mean there is a representation of the length of words throughout the whole paper.
To take a stratified sample I will first need to ascertain the different sections in each paper, for example, news, business, sport etc. I will then need to count how many pages there are of each section. There will not be the same sections in each paper but this will not be a problem, as it is a comparison between the overall newspapers, not the sections themselves. When counting the pages I will leave out those that are adverts or taken up almost completely with adverts. I will also exclude television guides and pages covered with tables showing sports results or similar. I don't think that the wording on those types of pages will be different in length, and often they are not formed onto sentences so the level of readability would be very hard to determine.
I will take the number of pages in the section and divide it by the number of pages altogether. I will then multiply this number by 100, which is the number of words that I shall count from each newspaper. I am using this number of words in my samples because it seemed to give a wide spread of results during the last part of the investigation and I therefore consider it to be a reasonable number in order to gain a fair sample. From my calculations I will know how many words need to be counted from each section.
When counting the words I will, again, count from the beginning of the first article in the section. I feel that counting from the beginning will give a random sample, as I will not be choosing the words for any specified reason. This will avoid bias. Words that were highlighted as problems after the pre-test will not be included when counting. Also, I will, again, exclude words with three letters or less.
I will record the number of letters in each word of each section in a table. I will then work out a total by adding up the number of words with a certain amount of letters from each section. I will display the results in Box and Whisker Plots and Back-to-Back bar charts so that clear comparisons between the The Independent and The Mirror can be made.
Katherine Allen
Data Interpretation
The Box and Whisker Plots show that the data from both The Independent and The Mirror have the same medium and the same upper quartile, which does nothing in the way of confirming the hypothesis. It is the lower 50% and the upper 25%, which vary. The results from The Independent have a symmetrical inter-quartile range, whereas the results for the Mirror have a positively skewed inter-quartile range, in fact, whole of the graph for The Mirror has a positive skew. This means that there are more words squashed into the lower half. The top 50%, representing the words with the most letters, is more spread out. This suggests that there are a lot more short words than long words.
For The Independent, the lower quartile ranges from 4-6 letters, which means there is no lower whisker - the lower 25% has the same length of words as the lower 50%. In The Mirror the lower quartile ranges from 5-6. This goes against my hypothesis, because it means that there are more words with the least letters in the broadsheet newspaper, than in the tabloid newspaper. My hypothesis stated that the broadsheet would have a higher readability, with readability, meaning that the tabloid would have shorter words and the broadsheet would have more long words. However, the upper 25% of data supports the hypothesis, because The Independent has words with lengths ranging from 8-13 letters and The Mirror has words with lengths ranging from 8-12 letters. There are words with longer letters in the broadsheet newspaper.
The Box and Whisker Plots did not show the specific lengths of the words, so it was necessary to draw another graph in order to do this. The Back-to-Back Bar Chart shows clearly that, although there are words with longer letters in The Independent, there are also more words with the least number of letters. In The Mirror there are more words with a medium length. This does not confirm my hypothesis.
I need to carry out further investigations on the two newspapers, in order to discover whether my hypothesis really was wrong, or whether it was my method that was at fault.
Katherine Allen
Stratified Data Interpretation
Again, the Box and Whisker Plots show that the data from both newspapers have the same medium. Also, both have lower quartiles ranging from 4-5 letters per words. Neither has a lower whisker - to the range of data in the lower 25% is the same as that in the lower 50%. The inter-quartile range for The Independent goes from 4-7 and has a positive skew, whereas the inter-quartile range for The Mirror is symmetrical. This supports my hypothesis, because it shows that there are more words of greater length in the inter-quartile range of the broadsheet newspaper than there are in the tabloid newspaper. In The Mirror, the majority of the data is crammed between the three lowest possible lengths of word, whereas in The Independent the majority of data is spread further into the larger word lengths. The upper 25% of the data ranges from 7-11 for The Independent, and only 7-10 for The Mirror. Although this is not a dramatic difference, it does support my hypothesis.
The Back-to-Back Bar Chart, showing specifically the number of letters in each word, also goes some way to confirming my hypothesis. Again, there is not a spectacular difference, but I can see that there are slightly more short words in the tabloid newspaper and slightly more long words in the broadsheet.
This result confirms that, with my definition of the word, the broadsheet does have a higher readability than he tabloid newspaper. It does not, however, prove the theory. Although my stratified method of data collection appeared to work better than the original method, it still is not enough proof that The Independent has a higher readability than The Mirror, or that any broadsheet newspaper would have a higher readability than a tabloid newspaper.
Katherine Allen
Evaluation
Overall, I do not think this investigation is very accurate or very practical.
When considering the amount of words in a whole newspaper, taking just 100 of them to be counted in a sample is not really enough to provide an accurate idea of what the lengths of the words throughout the paper are really like. A lot more words would be needed to do this and counting out the letters in as many words as would be required is not practical. Even if this could be done, it would not be fair to say that a hypothesis that might have been proved through an investigation of just two newspapers would be applicable to all broadsheets and all tabloids. So, not only would a greater number of words need to be sampled, they would also need to be sampled from a greater number of newspapers.
I feel that the way I took my samples was random and unbiased, although a problem I encountered whilst taking my stratified sample was to do with counting the number of pages in each section. Many pages had large pictures or advertisements taking up part of the space. I had to judge how much of the page should be counted, with no real way of measuring. Perhaps, in further investigations I would need to measure the images and work out exactly how much of the page was part of the particular section, in order to get a more accurate result.
I do not think that readability can necessarily be determined by the length of words. This assumption that I made at the beginning of the investigation may well be false and could, therefore, invalidate the whole investigation. It would be true to say that, often, longer words are harder to read than shorter ones, but this is not always the case. For example, words such as 'augur', which are not commonly known or used, would, for most people, be harder to read than regularly used, longer words such as 'determinedly'. I think the readability also depends upon the order of letters in words, and the sounds they make.
Another way of investigating the readability of newspapers might be to actually ask somebody to read the paper and either time how long it takes them, or count how many times they stumble over words, or make mistakes. It would be best if the person reading the newspaper were a child, as they are more likely to have difficulties in reading. I would have to get quite a few children to read, as it would not be fair to make conclusions after testing the theory on just one person. However, it would be very difficult to find a large group of people who are of the same reading level and, if they were not all of the same ability, this could invalidate the investigation.