Mathematics Coursework:- Read All About It
Introduction
I have chosen the "Read All About It" option for my Maths Coursework. This involves comparing articles from newspapers and comparing them. I have chosen to use two newspapers for my investigation. I have chosen to compare articles from a Tabloid and a Broadsheet newspapers. The papers I have chosen are "The Mirror" and "The Guardian".
I am predicting that the articles in the broadsheet newspapers would be more complex and often longer. I would also have thought that broadsheets have a higher reading age. I view the broadsheets as a newspaper for more intelligent readers and for people looking for in depth reading, whereas I think that tabloid readers will be less advanced readers and be people that want a lighter read. It will be interesting to see how accurate my prediction is.
I will be looking at:
* Average Word Length
* Average Sentence Length
* Reading age
There are many different newspapers; they range from tabloid papers to the broadsheet papers. The tabloids are a lighter read to the more involving descriptive broadsheet papers. Different newspapers are written to suit these preferences.
In the tabloid papers the wording used is less profound and therefore more easily understood. However in broadsheet newspapers the writing is more complicated and difficult to read.
Analysis 1
Investigation into the word lengths of two different samples of writing from two different types of newspaper.
Hypothesis: My hypothesis is that a sample of two hundred words from a broadsheet newspaper will consist of more longer words than a sample of two hundred words from a tabloid newspaper. I also think that the most popular word length will be 4 letters long in both papers.
Method: I will select two pieces of writing consisting of two hundred words each from two different newspapers , of which one will be a tabloid and the other will be a broadsheet. The topic of each of the samples will be the same, e.g. The Olympics. I will do this so that the two pieces of information will be comparable. By selecting two hundred words from each article, I will have a big enough sample to give me enough information to interpret some good results; yet it is small enough to be easy to gather and sort. I shall ignore proper nouns, e.g. names of people and places, and also numbers. I shall leave these out as they are not everyday words and could introduce bias into my investigation and results, e.g. in one article, the name of someone may be 'Tim Smith', and in another article, the name of someone may be 'Alfredo Schevschenko'. As one name is much longer than the other this would be skew the results.
Handling the data: I shall firstly record how many words there are of each word length, I will do this in a tally-table. I will then work out the median, the mode and the range. After this I will interpret my results into frequency polygons so that the results are easy to visualise trends. I will then compare the results of the tabloid investigation with the results of the broadsheet investigation. I will then calculate the mean and then further analyze the data using box plots, standard deviation and variance from the mean.
Results: Word length in a broadsheet newspaper.
Tally-table to record raw data.
Word Length
Tallies
Frequency
IIIII I
6
2
IIIII IIIII IIIII IIIII IIIII IIIII
30
3
IIIII IIIII IIIII IIIII IIIII IIIII IIIII IIII
39
4
IIIII IIIII IIIII IIIII IIIII IIIII IIIII I
36
5
IIIII IIIII IIIII IIIII III
23
6
IIIII IIIII II
2
7
IIIII IIIII IIIII IIIII IIIII II
27
8
IIIII IIIII II
2
9
III
3
0
I
1
IIIII III
8
2
II
2
3
0
4
I
5
0
Totals
200
200
Median - the median word length is four letters long.
Mode - the modal word length is three letters.
Range - the range is thirteen. One to fourteen (1 to 14). 14 - 1= 13
Mean -
Formulae
Total no. of letters = Mean
Total no. of words
962 = 4.81
200
Mean word length in the broadsheet sample was 4.81 letters long.
Quartiles and interquartile range -
Finding the median and upper and lower quartiles.
,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,LQ,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,M,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,UQ,8,8,8,8,8,9,9,9,10,11,11,11,11,11,11,11,11,12,12,14.
LQ - Lower quartile LQ = 3
M - Median M = 4
UQ ...
This is a preview of the whole essay
Mode - the modal word length is three letters.
Range - the range is thirteen. One to fourteen (1 to 14). 14 - 1= 13
Mean -
Formulae
Total no. of letters = Mean
Total no. of words
962 = 4.81
200
Mean word length in the broadsheet sample was 4.81 letters long.
Quartiles and interquartile range -
Finding the median and upper and lower quartiles.
,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,LQ,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,M,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,UQ,8,8,8,8,8,9,9,9,10,11,11,11,11,11,11,11,11,12,12,14.
LQ - Lower quartile LQ = 3
M - Median M = 4
UQ - Upper quartile UQ = 8
Interquartile range -
Upper Quartile - Lower Quartile = Interquartile Range
8 - 3 = 5
Box Plot - Box plot to show how widely values are dispersed from the mean.
Standard deviation -
The standard deviation is a measure of how widely values are dispersed from the average value (the mean).
Letters per word
Freq. In word
6
6
6
2
30
4
60
20
3
39
9
17
351
4
36
6
44
576
5
23
25
15
575
6
2
36
72
432
7
27
49
89
323
8
2
64
96
768
9
3
81
27
243
0
00
0
00
1
8
21
88
968
2
2
44
24
288
3
0
69
0
0
4
96
4
96
5
0
225
0
0
200
962
5946
Average word length = = = 4.81
A = = = 29.73
B = = = 23.1361
= 29.73 - 23.1361 = 6.5939
Standard Deviation = = = 2.56785903
Word length in a tabloid newspaper.
Tally-table to record raw data.
Word Length
Tallies
Frequency
IIIII
5
2
IIIII IIIII IIIII IIIII IIIII IIIII IIII
34
3
IIIII IIIII IIIII IIIII IIIII IIIII IIIII IIIII I
41
4
IIIII IIIII IIIII IIIII IIIII IIIII IIIII IIII
39
5
IIIII IIIII IIIII IIIII I
21
6
IIIII IIIII IIIII IIIII
20
7
IIIII IIIII IIIII IIIII
20
8
IIIII III
8
9
III
3
0
III
3
1
II
2
2
II
2
3
I
4
0
5
I
Totals
200
200
Median - the median word length is four letters long.
Mode - the modal word length is three letters.
Range - the range is fourteen. One to fifteen (1 to 15). 15 - 1= 14
Mean -
Formulae
Total no. of letters = Mean
Total no. of words
912 = 4.56
200
Mean word length in the tabloid sample was 4.56 letters long.
Quartiles and interquartile range -
Finding the median and upper and lower quartiles.
,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,LQ,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,M,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,UQ,7,7,7,7,7,8,8,8,8,8,8,8,8,9,9,9,10,10,10,11,11,12,12,13,15.
LQ - Lower quartile LQ = 3
M - Median M = 4
UQ - Upper quartile UQ = 7
Interquartile range -
Upper Quartile - Lower Quartile = Interquartile Range
7 - 3 = 4
Box Plot - Box plot to show how widely values are dispersed from the mean.
Standard deviation -
The standard deviation is a measure of how widely values are dispersed from the average value (the mean).
Letters per word
Freq. In word
5
5
5
2
34
4
68
36
3
41
9
23
369
4
39
6
56
624
5
21
25
05
525
6
20
36
20
720
7
20
49
40
980
8
8
64
64
512
9
3
81
27
243
0
3
00
30
300
1
2
21
22
242
2
2
44
24
288
3
69
3
69
4
0
96
0
0
5
225
5
225
200
912
5338
Average word length = = = 4.56
A = = = 26.69
B = = = 20.7936
= 26.69 - 20.7936 = 5.8964
Standard Deviation = = = 2.428250399
Results Explanation:
My results show me that there is not a huge difference between word length in a tabloid and a broadsheet newspaper, but the broadsheet's word lengths are more fluctuating as you can see in Fig. 1. There is a pattern in both of the frequency polygons and it is the steep rise of the amount of letters in a word, where both investigations peak at three letter words, and from that point there is a steady decrease in the amount of letters per. word. In both types of newspapers the most commonly used words are three letters long, closely followed by four letter and two letter long words. In the tabloid newspaper the frequency of words declines as the words get longer, whereas in the broadsheet newspaper there are further peaks of frequency in seven and eleven letter length words. The box plots show that the tabloid sample has a smaller distribution of word lengths than the broadsheet sample. The broadsheet has more longer words implying that it is using more complex language. The standard deviation is greater in the broadsheet meaning that there is a greater distribution from the mean (average value).
Conclusion:
My hypothesis was proven correct by the research I undertook, that is, a sample of two hundred words from a broadsheet newspaper will consist of more longer words than a sample of two hundred words from a tabloid newspaper. Although I predicted that the most popular word length would be four letters when it was actually three letters.
I am aware that there are a number of weaknesses around the data collected and analysed; i.) I chose two hundred words, whilst it was enough to work on, I would need a much larger sample to give me confidence about whether my hypothesis was true.
ii.) I have chosen an article by one reporter from each newspaper. It is possible that if I had chosen another article by a different writer I may have had very different results therefore I should sample more articles.
iii.) I have chosen a paper from one day. If I had more time I would have chosen a number of different reporters, different days and a range of issues such as, sports reports, editorials and general news items.
Analysis 2
Investigation into sentence lengths from two different samples from two different types of newspaper.
Hypothesis: I predict that the most popular sentence length will be sentences with sixteen to twenty words. I also think that from the sample of twenty sentences per newspaper, broadsheet will have more longer sentences, and will have the longest sentence.
Method: I will select two pieces of writing consisting of twenty sentences each from two different newspapers , of which one will be a tabloid and the other will be a broadsheet. The topic of each of the samples will be the same, e.g. The Sudan Crisis. I will do this so that the two pieces of information will be comparable. By selecting twenty sentences from each article, I will have a big enough sample to give me enough information to interpret some good results; yet it is small enough to be easy to gather and sort. I will allow names and numbers as similar sorts of information are being used.
Handling the data: I will firstly choose a piece of writing from each paper consisting of twenty sentences. I will then count the amount of words per sentence and put the information into a grouped frequency table as some sentences will be very long and some will be very short. In the frequency table I will calculate the mid-point and the cumulative frequency. I will then draw a cumulative frequency curve and to calculate the interquartile range and the median. I have chosen this because there may be some very long sentences, and if we used the mean it would distract the result.
Results-
Results of investigation into sentence length in a broadsheet newspaper.
Frequency data to record raw data.
No. of words per sentence
Tallies
Frequency
Cumulative Frequency
Mid-point
- 5
0
0
2.5
6 - 10
0
0
7.5
1 - 15
I
2.5
6 - 20
IIIII II
7
8
7.5
21 - 25
III
3
1
22.5
26 - 30
IIII
4
5
27.5
31 - 35
II
2
7
32.5
36 - 40
II
2
9
37.5
41 - 45
I
20
42.5
46 +
0
20
NA
20
20
20
Mean - Group 21 - 25
Upper Quartile - Group 26 - 30
Lower Quartile - Group 16 - 20
Finding the mean -
Sentence Length.
Words per sentence
2
6
7
6
20
9
6
20
21
22
24
30
26
26
28
34
33
42
Mean = =
Results-
Results of investigation into sentence length in a tabloid newspaper.
Frequency table to record raw data.
No. of words per sentence
Tallies
Frequency
Cumulative Frequency
Mid-point
- 5
I
2.5
6 - 10
III
3
4
7.5
1 - 15
I
5
2.5
6 - 20
IIIII
5
0
7.5
21 - 25
IIIII I
6
6
22.5
26 - 30
III
3
9
27.5
31 - 35
I
20
32.5
36 - 40
0
20
37.5
41 - 45
0
20
42.5
46 +
0
20
NA
20
20
20
Median - Group 16 - 20
Upper Quartile - Group 21 - 25
Lower Quartile - Group 11 - 15
Mean - Group 21 - 25
Finding the mean -
Sentence Length.
Words per sentence
2
6
7
6
20
9
6
20
21
22
24
30
26
26
28
34
33
42
Mean = =
Results Explanation:
My results show me that there is a difference between sentence length in a tabloid and a broadsheet newspaper. There is a pattern in both of the cumulative frequency curves and it is the way that the lower quartile, the median and the upper quartile are all in corresponding groups. The tabloid paper had a median of 16 - 20, whereas the broadsheets median was 21 - 25 showing us that the broadsheet consists of more longer sentences. The broadsheet has more longer sentences implying that it is using more complex language. The standard deviation is greater in the tabloid sample meaning that there is a greater range from the mean.
Conclusion:
My hypothesis was proven correct by the research I undertook apart from I predicted that the most popular sentence length would be in the group of 16 - 20 words.
I am aware that there are a number of weaknesses around the data collected and analysed; i.) I chose twenty sentences, whilst it was enough to work on, I would need a much larger sample to give me confidence about whether my hypothesis was true.
ii.) I have chosen an article by one reporter from each newspaper. It is possible that if I had chosen another article by a different writer I may have had very different results therefore I should sample more articles.
iii.) I have chosen a paper from one day. If I had more time I would have chosen a number of different reporters, different days and a range of issues such as, sports reports, editorials and general news items.
Standard deviation -
Words per sentence
Frequency in sentence
2
44
2
44
3
0
69
0
0
4
0
96
0
0
5
0
225
0
0
6
3
256
48
768
7
289
7
289
8
0
324
0
0
9
361
9
361
20
2
400
40
800
21
441
21
441
22
484
22
484
23
0
529
0
0
24
576
24
576
25
0
625
0
0
26
2
676
52
352
27
729
27
729
28
2
784
56
568
29
0
841
0
0
30
900
30
900
31
0
961
0
0
32
0
024
0
0
33
089
33
089
34
156
34
156
35
0
225
0
0
36
0
296
0
0
37
0
369
0
0
38
0
444
0
0
39
0
521
0
0
40
0
600
0
0
41
0
681
0
0
42
764
42
764
20
477
2421
Standard deviation -
Words per sentence
Frequency in sentence
2
44
2
44
3
0
69
0
0
4
0
96
0
0
5
0
225
0
0
6
3
256
48
768
7
289
7
289
8
0
324
0
0
9
2
361
38
722
20
2
400
40
800
21
441
21
441
22
484
22
484
23
0
529
0
0
24
576
24
576
25
0
625
0
0
26
2
676
52
352
27
0
729
0
0
28
784
28
784
29
0
841
0
0
30
900
30
900
31
0
961
0
0
32
024
32
024
33
089
33
089
34
156
34
156
35
0
225
0
0
36
0
296
0
0
37
0
369
0
0
38
0
444
0
0
39
0
521
0
0
40
0
600
0
0
41
0
681
0
0
42
764
42
764
20
473
2293
Table to show summary of broadsheet and tabloid sample analysis.
Analysis
Broadsheet
Tabloid
Median sentence length (no. of words)
Group 21 - 25
Group 16 - 20
Mode sentence length (no. of words)
6
6
Range
42 - 12 = 30
42 - 12 = 30
Mean sentence length (no. of words)
Group 21 - 25 / 23.85
Group 16 - 20 / 23.65
Standard Deviation
7.23 (2dp)
7.44 (2dp)
Analysis 3
Investigation into the reading age of two different types of newspapers using the 'Fry Readability Test'.
Hypothesis: My hypothesis is that the broadsheet newspaper will have an older reading age which I think will be 17, and I think that the tabloids reading age will be 14.
Method: I will firstly select two samples of 100 words each from both the papers, the broadsheet and the tabloid. From each paper I will choose a sporting article of 100 words and a political article of 100 words. I will then find the average number of syllables per 100 word sample. After this I will find out the average amount of sentences for each 100 word sample. Then I will use the Fry graph to determine the reading age.
Handling the data: After I have collected the data I will find out the mean amount of sentences and syllables per 100 words. After this I will plot the points on the graph.
Gathering the data - Broadsheet
00 word samples. 2 samples. Excluding names.
Sample 1 -
5 sentences
144 syllables
Sample 2 -
3.3 sentences
175 syllables
Mean amount of sentences per 100 words = 4.2 sentences
Mean amount of syllables per 100 words = 160 syllables
Gathering the data - Tabloid
00 word samples. 2 samples. Excluding names.
Sample 1 -
5.9sentences
143 syllables
Sample 2 -
4 sentences
153 syllables
Mean amount of sentences per 100 words = 5.0 sentences
Mean amount of syllables per 100 words = 148 syllables
Results Explanation:
My results show me that the broadsheet paper has a estimated reading age of 17 years and the tabloid had an estimated reading age of 14 years old. Both of the papers were above the curve and this represents that they both use a more difficult vocabulary.
Conclusion:
My hypothesis was proven correct by the research I undertook .
I chose the 'Fry Readability Test' because it was a quick and efficient way to find approximate reading ages from different pieces of writing. My hypothesis was proven correct, but once again I recognise that the more samples I took and the more variation of samples e.g. different writers, days, subjects, would give me much more reliable results.
Overall Conclusion:
Overall I have enjoyed my research. I found it very interesting to compare two newspapers of which one was a broadsheet and the other a tabloid, to see if my hypotheses were correct. Most of my hypotheses were correct and I was surprised that my analysis three hypothesis was true, as I have never worked, or ever done any research on reading age before. Actually the results of both the papers in all three analysis were very similar. In any future research I might see how many sentences were in comparative articles to see if different papers restrict information given to the reader. Also in the future I would like to do research on many more tabloid papers and many more broadsheet papers, from many different days and a bout many different subjects.