Tabloid and broadsheet newspaper comparison maths coursework
Maths Coursework Michael Grainger
S1 Task D: Authorship
Introduction:
Tabloid and broadsheet newspapers are both aimed at different audiences. This, therefore, means that they are written differently to correspond with the audience that they are aimed at. Tabloid newspapers supposedly give an easier read than a broadsheet newspaper and this is what this investigation will prove.
Aim:
The aim of this piece of coursework is to gain information about authorship of a text using statistical measures. I will collect data from a population with a view to estimating population parameters (e.g. mean & variance) by using estimation techniques from the previous module.
The task will involve taking a random sample, expressing my results in various forms appropriate to the work also calculating and comparing confidence intervals.
Prediction:
I predict that after counting the letters in sample words from both a tabloid and a broadsheet newspaper, that I will find that the broadsheet newspaper has longer words overall because it is aimed at a more educated audience. This then means that the tabloid will have a shorter mean word length making it an easier read for the audience that it is aimed at.
The population being used is a random sample of word lengths from both a tabloid newspaper and a broadsheet newspaper. When collecting the data, people's names, place names, numbers and hyphenated words will not be included in the results. A sample of 50 words will be taken from the second page of each paper, containing political news, and also from the last and second last pages of each paper, containing sporting news. This will allow me to gain and appropriate and fairer sample and will mean that it is less susceptible to bias.
The two newspaper being used are the "The Sun" and "The Guardian."
My sample is a random sample and has been obtained by using the random number generator on my calculator. This allowed me to choose particular words, at random, and then count the number of letters within each.
Results:
Below is the information that I have collected:
Tabloid Newspaper
Political Pages
Word length
Frequency
Cum. Frequency
Word length
Frequency
Cum. Frequency
4
4
9
3
45
2
5
9
0
3
48
3
7
6
1
0
48
4
5
21
2
2
50
5
4
25
3
0
50
6
7
32
4
0
50
7
5
37
5
0
50
8
5
42
6
0
50
Sports Pages
Word ...
This is a preview of the whole essay
4
4
9
3
45
2
5
9
0
3
48
3
7
6
1
0
48
4
5
21
2
2
50
5
4
25
3
0
50
6
7
32
4
0
50
7
5
37
5
0
50
8
5
42
6
0
50
Sports Pages
Word length
Frequency
Cum. Frequency
Word length
Frequency
Cum. Frequency
6
6
9
3
48
2
0
6
0
2
50
3
4
20
1
0
50
4
6
26
2
0
50
5
6
32
3
0
50
6
6
38
4
0
50
7
4
42
5
0
50
8
3
45
6
0
50
Broadsheet Newspaper
Political Pages
Word length
Frequency
Cum. Frequency
Word length
Frequency
Cum. Frequency
3
3
9
3
37
2
2
5
0
4
41
3
4
9
1
3
44
4
3
2
2
3
47
5
5
7
3
48
6
5
22
4
2
50
7
7
29
5
0
50
8
5
34
6
0
50
Sports Pages
Word length
Frequency
Cum. Frequency
Word length
Frequency
Cum. Frequency
2
2
9
7
43
2
6
8
0
5
48
3
8
6
1
0
48
4
3
9
2
2
50
5
4
23
3
0
50
6
5
28
4
0
50
7
6
34
5
0
50
8
2
36
6
0
50
On separate sheets of graph paper you will find cumulative frequency graphs, box plots and standard bar charts for the data.
Results Collected:
Tabloid Newspaper
Political Pages
n = 50
Mean: 5.460
Variance: 8.488
Standard deviation: 2.913
Standard error: s.d. = 2.913 = 0.412
Vn V50
Median: 5.00
Upper Quartile: 7.00
Lower Quartile: 2.50
Inter-quartile range: 4.50
Sports Pages
n = 50
Mean: 4.540
Variance: 6.891
Standard deviation: 2.625
Standard error: s.d. = 2.625 = 0.371
Vn V50
Median: 4.00
Upper Quartile: 6.00
Lower Quartile: 2.00
Inter-quartile range: 4.00
Broadsheet Newspaper
Political Pages
n = 50
Mean: 7.040
Variance: 11.676
Standard deviation: 3.417
Standard error: s.d. = 3.417 = 0.483
Vn V50
Median: 6.50
Upper Quartile: 9.00
Lower Quartile: 4.00
Inter-quartile range: 5.00
Sports Pages
n = 50
Mean: 5.690
Variance: 7.317
Standard deviation: 2.705
Standard error: s.d. = 2.705 = 0.383
Vn V50
Median: 5.50
Upper Quartile: 8.00
Lower Quartile: 3.00
Inter-quartile range: 5.00
Calculating Confidence Intervals for the results:
Tabloid Newspaper
Political Pages
90% confidence interval:
(5.460 - (1.645 x 0.412)), (5.460 + (1.645 x 0.412))
(5.460 - 0.678), (5.460 + 0.678)
We can be 90% confidence that the mean lies between:
(4.782, 6.138)
95% confidence interval:
(5.460 - (1.96 x 0.412)), (5.460 + (1.96 x 0.412))
(5.460 - 0.808), (5.460 + 0.0.808)
We can be 95% confident that the mean lies between:
(4.652, 6.268)
99% confidence interval:
(5.460 - (2.575 x 0.412)), (5.460 + (2.575 x 0.412))
(5.460 - 1.061), (5.460 + 1.061)
We can be 99% confident that the mean lies between:
(4.399, 6.521)
Sports Pages
90% confidence interval:
(4.540 - (1.645 x 0.371)), (4.540 + (1.645 x 0.371))
(4.540 - 0.610), (4.540 + 0.610)
We can be 90% confident that the mean lies between:
(3.930, 5.150)
95% confidence interval:
(4.540 - (1.96 x 0.371)), (4.540 + (1.96 x 0.371))
(4.540 - 0.727), (4.540 + 0.727)
We can be 95% confident that the mean lies between:
(3.813, 5.267)
99% confidence interval:
(4.540 - (2.575 x 0.371)), (4.540 + (2.575 x 0.371))
(4.540 - 0.955), (4.540 + 0.955)
We can be 99% confident that the mean lies between:
(3.585, 5.495)
Broadsheet newspaper
Political Pages
90% confidence interval:
(7.040 - (1.645 x 0.483)), (7.040 + (1.645 x 0.483))
(7.040 - 0.795), (7.040 + 0.795)
We can be 90% confident that the mean lies between:
(6.245, 7.835)
95% confidence interval:
(7.040 - (1.96 x 0.483)), (7.040 + (1.96 x 0.483))
(7.040 - 0.947), (7.040 + 0.0.947)
We can be 95% confident that the mean lies between:
(6.093, 7.987)
99% confidence interval:
(7.040 - (2.575 x 0.483)), (7.040 + (2.575 x 0.483))
(7.040 - 1.244), (7.040 + 1.244)
We can be 99% confident that the mean lies between:
(5.796, 8.284)
Sports Pages
90% confidence interval:
(5.690 - (1.645 x 0.383)), (5.690 + (1.645 x 0.383))
(5.690 - 0.630), (5.690 + 0.630)
We can be 90% confident that the mean lies between:
(5.060, 6.320)
95% confidence interval:
(5.690 - (1.96 x 0.383)), (5.690 + (1.96 x 0.383))
(5.690 - 0.751), (5.690 + 0.751)
We can be 95% confident that the mean lies between:
(4.939, 6.441)
99% confidence interval:
(5.690 - (2.575 x 0.383)), (5.690 + (2.575 x 0.383))
(5.690 - 0.986), (5.690 + 0.986)
We can be 99% confident that the mean lies between:
(4.704, 6.676)
Distribution of the sample means:
Another calculation that can be done with the data is to find the distribution of the sample means. The population variance is ?² and samples of size "n" have been taken. To find the variance of the distribution of the sample means from these figures, the calculation ?²
n
can be used.
Tabloid Newspaper
Political Pages
?² = 8.488 = 0.170
n 50
Sports Pages
?² = 6.891 = 0.138
n 50
Broadsheet Newspaper
Political Pages
?² = 11.676 = 0.234
n 50
Sports Pages
?² = 7.317 = 0.146
n 50
If there are signs of a normal distribution occurring then the mean and standard deviations should be similar to those of the population. This is more so for larger sample sizes. However, this does not show a normal distribution because the figures are quite a way out from what they should be to show normal distribution.
Interpretation and analysis of results:
From the box plots that I have drawn of the data collected you can see that for the tabloid newspaper political section, the longest words that were found consisted of 12 letters and the least consisted of 1 letter. This means that the range is 11. It can also be seen that the inter-quartile range for this section is 4.50. This is a section that has data with a low inter-quartile range so this shows that most of the data was within a certain set of values with not a very large range.
For the sports section of the tabloid newspaper, the longest words in the article were 10 letters long. This shows that talk about sports doesn't necessarily have to use as long words as political subjects due to the audience that it is aimed at. The range for this set of data is, therefore, 9 words. The inter-quartile range for this section is 4.00. This is actually the lowest inter-quartile range for the whole lot of data sets therefore it is within an even smaller range of values than the tabloid political news.
The political section in the broadsheet newspaper has a mean of approximately 7 words and a median of 6.5. These are the highest averages for all of the data and show that, on average, the political section of the broadsheet newspaper has a greater number of letters per word. However, the inter-quartile range is greater than that of the two sections in the tabloid newspaper showing that there is a greater overall spread of number of letters for this section of the broadsheet.
The sports section of the broadsheet newspaper also has a higher mean number of words than both of the tabloid sections. This again shows that, on average, the sports section of the broadsheet has a greater number of letters per word than the sports section of the tabloid. The inter-quartile range is the same as that of the political section and is greater than both of the tabloid sections.
Overall, the standard deviation for "The Sun" is less than that for "The Guardian." It is only approximately 2.8 compared with the 3.0 of "The Guardian." This shows that "The Sun has more constant data and that "The Guardian" has a higher and lower number of letters per word.
You can see from looking at the confidence intervals, used to estimate the mean with confidence, that some of the intervals overlap. This means that two or more of the mean values may be equal because they can be found within the same set of values. This does not matter much apart from the fact that the data does not show what it was intended to because we can't tell which newspaper section had the longest mean word length.
Conclusion:
In conclusion we can now tell that, because the average values for the broadsheet were higher, that it is a harder read than the tabloid newspaper and that, as the tabloid has less letters per word on average, it is therefore an easier read.
Also, as mentioned before we can tell from the standard deviations that the tabloid has more constant data with values centred a round a certain set. The broadsheet, however, has more high and low letters with a greater range between them; therefore, it is less constant data.
Altogether this means that my prediction is correct but the investigation is not necessarily concluded. Only a relatively small sample was taken and it was from only one of each type of newspaper. To further the results a larger sample could have been used and more newspapers could have been investigated.
Another thing that may be worth mentioning is that word length may not necessarily mean that the newspaper is an easier read. This conception may have been incorrect so this investigation may not truly tell that a tabloid newspaper is easier to read.