Before I collect the data from the newspapers, I must work out how many words I should collect from each section. From my planning I know that I need to collect data from separate sections. So I will have to work out how many words I should collect from each section. I firstly counted how many pages there are in the newspaper altogether, so for the Herald Tribune there was 18.
I then take away the number of Other (Advertisements) pages there are. Which in this newspaper is none, so nothing needs to be done.
Then to work out how many words should be collected in the News section, you simply, do the formula:
No. Pages in Section
= No. Words to be sampled
Total pages in newspaper minus Adverts
Which for this example would be:
7
= 38.8 (39)
18
So then I have worked out that I need to collect 39 words from the news section of the Herald Tribune. However I must collect the data randomly and to do this I must follow these steps:
There are many problems which need to be discussed before sampling data, so that the data is fair and unbiased.
That explains the problems I may face while collecting data, and how to deal with them. I can now collect my data.
RESULTS:
Firstly I collected data from the Herald Tribune: News Section
From these results we are able to draw a Box & Whisker diagram:
The Box & Whisker diagram allows us to see the data broken down and presented in a simple way. As you can see from this diagram, the maximum word length in the news section of the Herald Tribune is 13, while the minimum word length is 2. The median is 6 letters per word. It is very simple and easy to read.
However one of the problems of Box & Whisker diagrams is that they do not show quantities of the data, basically it does not show the frequency of the word length. This is a draw back as it is can never show the Mode of the word lengths of the newspapers.
The stem & leaf diagram is a diagram which shows the Mode of the word lengths, it is very basic and simple, however presents the data well and shows the frequencies simply.
Stem and Leaf Diagram for Herald Tribune- News:
1:
2: 0 0
3: 0 0
4: 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0
6: 0 0 0 0
7: 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0
9:
10:
11:
12: 0
13: 0
The stem & leaf diagram works by the numbers on the left, it then shows the frequency of those word lengths by the 0 after it. So from this diagram showing the Mode of the Herald Tribune News section, we can see that 4 is the mode, as there were nine four letter words. After that 7 was the mode as there were eight seven letter words. The stem & leaf diagram is very effective in presenting Modal values, however it is limited to that.
Another diagram we can use to present the data is a Histogram, it works the same as a Stem & leaf diagram, however is more attractive and similar to a bar chart.
The Histogram shows the word length along the x axis, and the frequencies along the y axis. It is very simple and shows that the mode is 4, as it occurs nine times. It tells us the same as the stem & leaf diagram, however presents it in a different way.
I will now present the entire Herald Tribune newspapers results:
From these results we are able to present a Box & Whisker diagram:
The Box & Whisker diagram shows the median, which for the entire of the Herald Tribune is 5, which means that the median length of a word in the Herald Tribune is 5 letters. The maximum length of a word in the Herald Tribune was 13, while the minimum was 2.
I will now show the stem & leaf diagram which I used before:
Stem and Leaf Diagram for Herald Tribune:
1:
2: 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0
7: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0 0
9: 0 0
10: 0 0 0 0
11:
12: 0
13: 0
The stem & leaf diagram shows the mode to be 4, as it occurs 20 times, and then 7 as it occurs 19 times. The stem & leaf diagram holds more data in this one, as it covers the entire newspaper of the Herald Tribune.
The Histogram presents the information in a similar way, so I will again show you the results as the Histogram presents them:
The histogram again shows the modal values. The x axis shows word length, and the y axis shows the frequency.
All Results:
Herald Tribune:
NEWS:
Herald Tribune:
ENTERTAINMENT:
Herald Tribune:
BUSINESS:
Herald Tribune:
SPORT:
The Daily Mail:
NEWS:
The Daily Mail:
ENTERTAINMENT:
The Daily Mail:
BUSINESS:
The Daily Mail:
SPORT:
The Sun:
NEWS:
The Sun:
ENTERTAINMENT:
The Sun:
BUSINESS:
The Sun:
SPORT:
I know want to go about in proving my hypotheses. To do this I need to review them:
-
“Broadsheet newspapers have longer words than Tabloid newspapers.”
- Words in the News section of all newspapers, will have longer words than the other sections of that newspaper.”
- “Broadsheets give a higher proportion of the newspaper to news articles, than tabloid newspapers.”
To prove the first hypothesis, I need to present the word lengths of all the newspapers, so I will use the Box & Whisker diagrams as they show the medians and upper and lower quartiles.
The diagram will show spread of data in all three newspapers, to determine whether a newspaper has longer words than another, I will sort them by these criteria:
- The largest median value.
- The consistency of the data (ie. That it does not spread over a long range of word length.)
- The largest upper & lower quartile ranges.
- The longest word.
- The highest lowest word.
The diagram shows that The Sun has the lowest median value, so therefore we can say that it is very likely that The Sun has shorter words than the other two papers. Therefore we can already part prove that the first hypothesis is correct, as the Herald Tribune has a larger median than the tabloid.
The Herald Tribune and the Daily Mail have the same median value, of 5 letters per word. However if you refer back to the guidelines I made about what to judge a newspaper’s word length on, you will notice that the second criteria is about the consistency of data, in the Herald Tribune the data is more constant than the Daily Mail, you con see this as the distance between the lower and upper quartiles on the Herald Tribune are smaller than the distance between them on the Daily Mail.
Another piece of criteria was the quartiles; both papers have the same upper quartile, of seven. However the Herald Tribune has a lower quartile of four, while the Daily Mail is shorter, only having a lower quartile of three.
The last criterion is about the minimum and maximum values. The Herald Tribune has a higher minimum, at two words, while the Daily Mail has got a minimum of one word. The Daily Mails longest word was eleven letters long; while the Herald Tribune’s longest word was thirteen letters long.
I can then say that the Herald Tribune (broadsheet) has the longest words, narrowly ahead of The Daily Mail (quality tabloid) but both clear of The Sun (tabloid), whose average word length is only four letters per word.
I am now going to present the histograms of all newspapers so you can compare there modal values:
We have now proved that the first hypothesis is correct,
“Broadsheet newspapers have longer words than Tabloid newspapers.”
Now we have to prove the other two hypothesis to be either correct or incorrect.
“Words in the News section of all newspapers will have longer words than the other sections of that newspaper.”
This hypothesis will require me to present box & Whisker diagrams for all the sections of each newspaper.
I will firstly present the Herald Tribune:
From the Herald Tribune’s box & Whisker diagram you can see that the sections, “News”, “Entertainment” and “Business” are all very similar, with a median of 6. The upper quartile is 7 is all three, and they have very similar lower quartiles. The Sport section is very different; it has a much lower median, 3. and the longest word was only 5 letters long.
However in relation to the hypothesis, that the News section has the longest words, it is difficult to say, as three sections have the same median and upper quartile, however we can say that the News and Entertainment section has longer words than the Business section as they have a longer lower quartile.
We can say that the News section has narrowly got longer words, as the longest word in the News section was 3 letters longer than the Entertainment section.
We now must look at the other newspapers:
The Daily Mail:
Very similar to the Herald Tribune, as the News, Entertainment and Business sections all have longer words than the Sport section. However the News section has the lowest upper quartile of the three sections on a median of 5. Therefore you would have to say that the Entertainment section of the Daily Mail has longer words than the Business section, and both sections have longer words than the news section.
This disproves our hypothesis.
However you could argue that the News section was very consistent, the distance between lower and upper quartiles was small, and therefore it was a consistent result, while the Entertainment section was very varied and inconsistent.
Finally I must present The Sun newspaper, although the hypothesis has been proved on the broadsheet and disproved on the quality tabloid, it would be worth finding out what the tabloids results were. Also we have noticed how the Sport sections have done over the last two newspapers, having very low medians, will this be the same in the tabloid.
The Sun’s results are totally different to the other newspapers, the Sport section has much longer words, while the News section has the shortest words.
This totally disproves the hypothesis.
It is worth noting that due to only one Business page in the Sun, we only needed to collect 2 word lengths, so although it seems to have had the longest words, and very consistent, I would ignore it as it is not at all accurate.
The News section can however say that it is the most consistent section in all three newspapers, however it cannot say that it has the longest words.
The final hypothesis must be proved or disproved.
“Broadsheets give a higher proportion of the newspaper to news articles, than tabloid newspapers.”
This hypothesis would require data on the percentage of news articles in all three newspapers.
I have worked out the percentage of news articles in all three newspapers:
The Broadsheet newspaper (Herald Tribune) has the highest percentage of the paper for news, compared with the other two papers. It had no page of Adverts compared with the Daily Mail, which gives 29% of the paper to adverts, 2% more than it gives to news. The Sun however is the worst, it gives 31% of it’s paper to Adverts, that is 7% more than it gives to news.
I am now showing the proportion of the newspaper and where it goes in each paper:
The Herald Tribune:
The Daily Mail:
The Sun:
These pie charts show the proportion the paper gives to each section. It is also worth noting a few more points:
- The Business section of the Herald Tribune is much larger than in the other two newspapers, it may be because the readers of the Herald Tribune are more interested in Business than the readers of the other newspapers.
- The sport section of the Herald Tribune is smaller than the other newspapers; this may be because the readers of the other newspapers have a greater interest in sport than the readers of the Herald Tribune.
- The name of the newspapers is very different, the full name of the Herald Tribune is “The International Herald Tribune” a total of 29 letters, “The Daily Mail” has 12 letters and “The Sun” has 6 letters. Another sign that broadsheets have longer words.
- The price of the newspapers is also very different, on a weekday, the Herald Tribune cost 120p, a weekday Daily Mail, costs 40p and a weekday Sun, costs 25p. This may be because the Herald Tribune believes that people will pay more for a quality newspaper, it also may be because they believe that wealthier people read the Herald Tribune, so can afford to pay for it. Another fact to consider is that The Sun and the Daily Mail sell more copies of their newspaper than the Herald Tribune in the United Kingdom so can bring the price down. However the Herald Tribune is one of the worlds’s best selling newspaper.
The project could have been improved if there was a larger sample, as then it would have been more accurate and we could make more definitive points. On the other hand it was a successful project as I was able to compare all the sections, and prove or disprove my hypotheses. If I were to do the project again, I would use more newspapers, such as two tabloids, two quality tabloids and two broadsheets. I would then conduct the sampling over several days’ newspapers (making sure the days are the same) so then I would have a very accurate result. I would also look at other hypotheses, such as paragraph length and complexity of words. One thing I found was that if there was a shorter word in the broadsheet it was usually more complex than the tabloid, for instance the word “old-fashioned” was used in the Sun, while the word “antiquated” was used in the Herald Tribune. The word “antiquated” has 10 letters its meaning is the same as “old-fashioned” however old fashioned has 13 letters. Antiquated is more complex however. So it is unfair to say that a more intelligent reader would read a newspaper with longer words.