From this data I can now produce 3 frequency polygon, one for broadsheets, one for tabloids and then one comparing them.
[Graph 1.1]
[Graph 1.2]
[Graph 1.3]
Median
Average median for the different section of the broadsheets and the tabloid newspapers are as follows:-
Tabloids Front Page
111111111111112222222222222222222222222333333333333333333333333333333333333333344444444444444444444(4)44444
The number in brackets is the median for the front page on tabloid newspapers.
200 words were counted in total (100 for The Mirror and 100 for The Daily Mail)
14 one lettered words.
25 two lettered words.
49 three lettered words.
33 four lettered words.
The 100th number is 4.
The median for the front page on tabloids is 4.
Tabloid Sport
The median is 4
Tabloid Financial
The median is 4
Broadsheet Front Page
The median is 4
Broadsheet Sport
The median is 4
Broadsheet Financial
The median is 5
Mode
The modal number is the most common number of letters per word. These are averages of the sections for the two different types of newspapers.
Tabloid Newspapers:-
Front Page
The modal number is 3
Sport
The modal number is 3
Financial
The modal number is 3
Broadsheet Newspapers:-
Front Page
The modal number is 3
Sport
The modal number is 3
Financial
The modal number is 4
Mean
Σfx Using this formula I am able to calculate the mean.
Σf
f is the frequency and x is the number of letters in the word.
Tabloid Front Page
(1x14)+(2x25)+(3x59)+(4x33)+(5x20)+(6x14)+(7x15)+(8x15)+(9x4)+(10x3)+(11x4)+(12x4)
14 + 25 + 59 + 33 + 20 + 14 + 15 + 15 + 4 + 3 + 4 + 4
= 940
200
= 4.7
Tabloid Sports
(1x5)+(2x25)+(3x47)+(4x44)+(5x22)+(6x16)+(7x12)+(8x12)+(9x8)+(10x4)+(11x4)+(13x1)
5 + 25 + 47 + 44 + 22 + 16 + 12 + 12 + 8 + 4 + 4 + 1
927
200
= 4.635
Tabloid Financial
(1x5)+(2x29)+(3x41)+(4x40)+(5x23)+(6x17)+(7x17)+(8x8)+(9x12)+(10x3)+(11x1)+(12x1)+(13x2)+(16x1)+(18x1)
5 + 29 + 41 + 40 + 23 + 17 + 17 + 8 + 12 + 3 + 1 + 1 + 2 + 1 + 1
967
200
4.835
Broadsheet Front Page
(1x13)+(2x34)+(3x35)+(4x28)+(5x16)+(6x18)+(7x14)+(8x13)+(9x10)+(10x4)+(11x8)+(12x2)+(13x1)
13 + 34 + 35 + 28 + 16 + 18 + 14 + 13 + 10 + 4 + 8 + 2 + 1
943
200
4.715
Broadsheet Sport
(1x6)+(2x24)+(3x40)+(4x32)+(5x24)+(6x16)+(7x14)+(8x12)+(9x15)+(10x4)+(11x5)+(12x5)+(13x2)+(14x1)
6 + 24 + 40 + 32 + 24 + 16 + 14 + 12 + 15 + 4 + 5 + 5 + 5 + 2 + 1
1090
200
=5.45
Broadsheet Financial
(1x3)+(2x17)+(3x34)+(4x36)+(5x25)+(6x24)+(7x20)+(8x11)+(9x15)+(10x3)+(11x3)+(12x3)+(13x1)+(14x3)+(15x2)
3 + 17 + 34 + 36 + 25 + 24 + 20 + 11 + 15 + 3 + 3 + 3 + 1 + 3 + 2
1099
200
=5.495
This table represents the mean, average mode and average mean score.
Analysis of Graphs
Form the grouped frequency table I produced two frequency polygons, one for the tabloids and one for the broadsheets data. These are the following observations I made:-
- Both polygons, start with an almost equally steep ascent up until 3. This is the most dramatic difference in both graphs when the word length only increases by 1 letter. Both the tabloid and the broadsheet have a high frequency of 2 letter words due to the modern English language used in the newspapers. The same applies for the couple of one letter words in the English language.
- Both polygons reach their maximum peak when the word length is 3. This shows that the most frequent words in both types of newspapers were 3 letters long.
- After the word length reaches 5, the two graphs follow slightly different paths. The broadsheet polygon rises gradually, takes a less steep descent to where the word length is 8 before rising slightly up until where the word length is 9 and then a steep decent to 10. Again it rises slightly and after one more peak it gradually descends to 0 where it levels off. The tabloid polygon steeply descends down until it reaches words of 10 letters long on the graph. It then gradually decrease to 0 before rising again to 1 at 16 after one more dip back down to 0 it ascends again, finishing at 18 to a frequency of 1.
- The lines first cross on the first descent after the lines have both peaked at 3. This is because the tabloid line steeply drops from 65 to 47 compared to the broadsheet line, which gradually decreases from 65 to 58. At this point the two cross over, indicating that the broadsheet has a higher frequency of 6 lettered word than the tabloids do.
- The overall shapes of the two polygons are basically the same, but the peaks and troughs are more exaggerated on the tabloid than the broadsheet. This suggests that the tabloid has a higher number of shorter words than the broadsheet, and a lower number of longer words.
Conclusion
My hypothesis originally was that broadsheet newspapers have a longer average word length than tabloid newspapers have. From the data I have collected I can say from the results of the graphs and mean mode and median tests that this is true. My hypothesis is correct broadsheet newspapers do have a longer average word length than tabloid newspapers.
The mean word length for the tabloid newspaper was 4.72 and the mean word length for the broadsheet was 5.22. These two figures are still extremely similar (with only 0.5 of a letter difference) which seems surprising, but using my frequency polygons I can explain why. Both newspapers had a similarly high frequency of words with 3 or less letters, and they both had quite similar results for words with 5, 8 and 10 letters. These are the places on the graphs where the lines are closest together. I have also found out that both types of newspaper do contain a high percentage of short words, as I predicted, but that tabloids still have a slightly higher frequency for words under 4 letters long. I was incorrect about the longest word in my investigation coming from a broadsheet newspaper as I recorded an 18 letter word from The Daily Mail, a tabloid newspaper. All in all, I can say that the investigation I performed produced evidence that strongly supports my hypothesis, however it does not prove it.
If I were to repeat this investigation and I had more time I would:-
- I would test a greater number of words, simply because the more unbiased evidence I can obtain the more reliable my results will be. If I had more time I would test as many words as possible from each newspaper, resulting in more evidence to back up my hypothesis.
- I would test a greater number of newspapers. To increase my accuracy in my results I would use a wider selection of newspapers, including different types of broadsheets and tabloids. In my investigation I only used two tabloids and two broadsheets, which might not be representative of their whole category. If I used a variety of newspapers, tabloids and broadsheets ie. The Sun, The Daily Express and The Daily Telegraph, The Financial Times then I would have a more reliable source and my results would be fair and even more unbiased.
To further develop my investigation I could use stratified random sampling. In my investigation I chose to use a systematic random sample because it eliminated more bias than a simple random sample. However, I could use a stratified random sampling to overcome the possible bias of random samples and improve the accuracy of my investigation. In this type of sample method the composition of the newspaper is taken account, for example if 35% of the whole newspaper consisted of news articles, then 35% of the total word sample would be words from the news pages. This method ensures that the sample is representative of the whole newspaper rather than randomly selected samples.
Investigation 2
Hypothesis 2
“Tabloid newspapers have a higher reading age than broadsheet newspapers.”
Broadsheet newspapers have a higher reading age than tabloid newspapers because broadsheets use a more complex style of writing than tabloids. The sentences in a tabloid I believe would be shorter than the sentences in a broadsheet, significantly lowering the reading age. I think that tabloids use shorter sentences and fewer complex words than broadsheets, and will therefore have a lower reading age. There are many different methods for calculating reading age, but most use the number of sentences and syllable counts to determine the difficulty of the language. I believe that the higher frequency of short sentences there are in an article will significantly lower the reading age. This is because it is easier to understand shorter sentences. Broadsheet newspapers use longer, more complex sentences compared to tabloids. This means the reader needs to have a greater experience of using English to comprehend the meaning of the language. In general, I believe that broadsheet newspapers will have a higher reading age than tabloid newspapers.
I am planning on using the FOGG test of readability in this investigation to compare the reading ages of tabloid newspaper and broadsheet newspapers as it is a simple and clear way of finding the reading age of newspapers.
Method
- Using 2 broadsheet newspapers and 2 tabloid newspapers published on the same day.
- Select as many articles that appear in all 4 newspapers (of at least 150 words), ignore advertising, puzzles and horoscopes.
- Randomly select a page from each newspaper using the random function on a scientific calculator.
- Choose the article that seems to be the biggest of the page. Then underline the first hundred words in the article discarding proper nouns and capitalized words.
- Then by using the FOGG test calculate the reading age of the 4 newspapers.
The FOGG Test
There are a number of readability tests I could have used to investigate into the reading ages of newspapers. I considered using the Flesch score, however many of the readability tests were unsuitable for this investigation because they weren’t aimed at adults, which newspapers articles are aimed at. I feel the FOGG test is the most suitable test for my investigation.
- Select a Sample of 100 words Count the number of whole sentences in the sample.
-
In the event that the last sentence stops short of the 100th word then only count the number of whole sentence.
- Ignore proper nouns and capitalized words.
- Only count the number of words with 3 or more syllables. The result is y.
- Find x by dividing 100 by the number of sentences.
- Add the number of words with 3 or more syllables (y) to x.
- Multiply the result of y + x by 0.3
- Add 5 to that to give the equivalent English reading age in years.
Canceling out Bias
To eliminate bias from the investigation I am planning on use newspapers published on the same day, 2 broadsheets (the Independent and The Guardian) and 2 tabloids (The Daily Mail and The Mirror). I will ensure that my investigation is fair and unbiased and that my investigation is represented by more than one type of newspaper.
Firstly, before I pick which pages to sample I have 2 discard pages that do not have a sufficient amount of words in at least one article. The FOGG test only needs a 100 word sample to be able to produce an answer, however, since I am ignoring proper nouns and capitalized words, I will ensure that the articles I am sampling will be at least 150 words long as this may need a the article to be longer. I will be ignoring pages that are purely advertising, as this does not represent the content of the newspaper. Also, I will ignore horoscopes, stocks and shares pages, sports results pages, racing pages, whether pages and puzzle pages because the majority of the content in mostly digit numbers and there is not enough written English.
I have randomly selected an article from each newspaper by using a scientific calculator. To do this I entered the page numbers of the newspapers, of which had a sufficient amount of word in the articles. Then using the random selecting function on a calculator to choose a page number for each newspaper, completely unbiased, quick and easy
Investigation 2, Results
I collected and recorded my data in the table below.
To find out the reading age of each newspaper I have to now make the following calculations for each paper:
The Independent The Guardian
(100/4) + 24 = 49 (100/3) + 22 = 55.3
(49 x 0.3)+ 5 = 19.7 years of age (55.3 x 0.3) + 5 = 21.6 years of age
The Daily Mail The Mirror
(100/4) + 9 = 34 (100/5) + 14 = 34
(34 x 0.3) + 5 = 15.2 years of age (34 x 0.3) + 5 = 15.2 years of age
This table shows the average reading ages of Broadsheet newspapers and of tabloid newspapers.
Conclusion
My results using the FOGG test of readability relates to my hypothesis, showing that the broadsheet newspapers I tested have a significantly higher reading age than the tabloids. My results showed that highest reading age of 21.6 years was from a broadsheet (The Guardian) and the minimum age of 15.2 years was from both tabloids a tabloids (The Mirror and The Daily Mail). I calculated the average reading age for each type of newspaper, as it takes into account all extreme values, I recorded. From these calculations I found that the broadsheets I tested have an average reading age that is just over 5.5 years older than the average tabloid reading age. This evidence supports my hypothesis and indicates it is possibly true, as there is a difference in the reading ages for the broadsheet and tabloid. I think my way in using the FOGG test, provided enough unbiased evidence to make a analysis, and that I performed the investigation thoroughly and with great care.
Although my results support my prediction, they do not prove it. To confirm the hypothesis fully I would need more time and greater resources in order to gain more unbiased evidence. If I were to undertake the experiment again I would improve it by the following:-
- I would use a greater selection of tabloids and newspapers, instead of limiting my investigation to two of each. I would try to test samples from as many different broadsheets and tabloids as possible, to confirm whether my hypothesis is true for all newspapers without exceptions.
- I could investigate the reading ages of broadsheets and tabloids published at different times, instead of newspapers just published on the same day. I could test a selection of broadsheets and tabloids from past years, different months or even different days of the week. This would prove whether my hypothesis was correct regardless of the date of the newspaper even if it was published 2 years ago and whether the reading age remains constant throughout a month. However, without the time nor money I would be unwilling to investigate.
-
I would use larger word samples from each newspaper to increase the accuracy of my results. In my investigation I only used one 100 word sample from each newspaper, so to improve my next investigation I would test at least 6 samples from each newspaper. I could then find an average reading age for different newspapers, and sections in the newspaper ie. sport, finances etc. Then the average age for broadsheets and tabloids.
- I would experiment using a wide range of readability tests on more word samples. In my investigation I only used one test, but if I had more time I would use some of the tests ~ I researched and mentioned in my plan. Then I could say whether the test I used was biased — it could exaggerate the true reading age or may not be a very accurate formula.
In short, I would test a larger number of sample words from each newspaper, test more newspapers, different tabloids and broadsheets and use different readability tests. To further extend my investigation of the reading ages of tabloids and broadsheets, I could test newspapers published at different times or even in different parts of the world, as long as they are written in English.
Investigation 3
“The distribution of word length in tabloids will be of a more positive skew than that of a broadsheets”
I believe that tabloids contain a greater proportion of shorter words than broadsheets in similar samples of text. This will result in data collected, having a more positive skew, as the majority of the words will be short. I think that both the tabloid and the broadsheet data will show a positive skew. This is because word samples form any newspaper will contain more short words, less than 4 letters long. Although both types of newspaper contain a high proportion of short words, tabloids will contain a higher number than broadsheets. Therefore, I predict that the word length frequency data for tabloid newspapers will show a more positive skew than that of broadsheets.
Method
Using the data collected in investigation 1, I will use this to compare the lengths of word in broadsheet newspapers and tabloid newspapers. I will present the data to begin with, in tables showing the cumulative frequency of different word lengths for different newspapers. I am then going to calculate the average cumulative frequency for broadsheet and tabloid newspapers and record my results in a table.
I will then use this data collected to produce 2 cumulative frequency graphs. From the graphs I can work out the median word length the upper and lower quartiles and the interquartile range of the length of words for each type of newspaper. The reason for using the interquartile range rather than just the range is because the interquartile range discards any extreme values that may sway results. To measure the interquartile range subtract the lower quartile from the upper quartile, this measure is the spread of the middle section of the cumulative frequency graph,
After I have collected all my results from the 2 cumulative frequency graphs, I will plot another one comparing both the tabloid and the broadsheet. I will then use the box and whisker diagram to determine how skewed the word lengths are. The box and whisker diagram is a useful way of representing the interquartile range median and quartiles.
I will also calculate the standard deviation of each newspaper to use as a comparison with the interquartile range. Standard deviation is more precise than the box and whisker diagram and takes all the values into account. I will work out an average standard deviation result for broadsheet and tabloid newspapers.
Cancelling out Bias
To cancel out bias I will make sure that my calculations are correct and draw a precise line of best fit on my cumulative frequency line graph. Since the data being used is from investigation 1, I do not feel there is any bias in using it.
Investigation 3, Results
Broadsheets
Tabloids
I can now work out the average cumulative frequency data for the tabloid and the average for the broadsheet. These tables show the data.
Standard Deviation
Standard deviation is a more reliable and precise way of measuring the range, it take all measures into account even the extreme values. I will calculate standard deviation for each 100 word sample. I will then be able to compare my results for broadsheet and tabloid newspapers.
Name of Newspaper - The Independent
Type of Newspaper – Broadsheet
Type of Article – Front Page
Average Word length = Σfx/Σf
= 470/100
= 4.70
Variance = Σfx2 / Σf - (AWL) 2
= 3066/100 – 4.72
= 30.66 – 22.09
= 8.57
Standard Deviation = √Variance
= √8.57
= 2.93
Name of Newspaper - The Independent
Type of Newspaper – Broadsheet
Type of Article – Sport
Average Word length = Σfx/Σf
= 544/100
= 5.44
Variance = Σfx2 / Σf - (AWL) 2
= 3588/100 –5.442
= 35.88 – 29.59
= 6.29
Standard Deviation = √Variance
= √6.29
= 2.51
Name of Newspaper - The Independent Type of Newspaper – Broadsheet
Type of Article – Financial
Average Word length = Σfx/Σf
= 571/100
= 5.71
Variance = Σfx2 / Σf - (AWL) 2
= 4169/100 –5.712
= 41.69 – 32.60
= 9.09
Standard Deviation = √Variance
= √9.09
= 3.01
Name of Newspaper – The Mirror Type of Newspaper – Tabloid
Type of Article – Front Page
Average Word length = Σfx/Σf
= 437/100
= 4.37
Variance = Σfx2 / Σf - (AWL) 2
= 2533/100 – 4.372
= 25.33– 19.10
= 6.23
Standard Deviation = √Variance
= √6.23
= 2.50
Name of Newspaper – The Mirror Type of Newspaper – Tabloid
Type of Article – Sport
Average Word length = Σfx/Σf
= 436/100
= 4.36
Variance = Σfx2 / Σf - (AWL) 2
= 2369/100 – 4.362
= 23.69 – 19.01
= 4.68
Standard Deviation = √Variance
= √4.68
= 2.16
Name of Newspaper – The Mirror Type of Newspaper – Tabloid
Type of Article – Financial
Average Word length = Σfx/Σf
= 459/100
= 4.59
Variance = Σfx2 / Σf - (AWL) 2
= 2613/100 – 4.592
= 26.13 – 21.07
= 5.06
Standard Deviation = √Variance
= √5.06
= 2.25
Analysis of Graphs
- Graphs [3. 1 and 3.2] show the average cumulative frequency for the tabloid and the broadsheet. I then combined both onto [Graph 3.3] to compare the median and quartiles of the data I drew box and whisker diagram underneath the x-axis on each graph.
-
Graph 3.1 (the average cumulative frequency for the broadsheet data) shows an equally steep initial ascent that increases steadily before leveling off around 10 letters. This indicates that there is a high proportion of short words in the data. The virtually straight line between 4 and 7 letters shows a consistent frequency of words had 4, 5, 6 and 7 letters.
-
The lower quartile, median and upper quartile values are all below 6 letters, showing that the most of the data is clustered between 1 and 6 letters. This is shown more clearly on the box and whisker diagram
- Graph 3.2 (the average cumulative frequency for the tabloid newspaper) also shows a steep initial curve, steeper than [Graph 3.1], by a fraction, which levels off when the word length reaches 7 letters. This indicates that the tabloid contained a high percentage of shorter words and fewer words with 7 or more letters.
- The median, lower quartile and upper quartile were all found to be between 2 and 7, which indicates that most of the data is clustered between 2 and 7 letters.
- The inter quartile range of [Graph 3.1] is 4.1 letters, compared with 2.8 letters on [Graph 3.2]. This shows that the tabloid newspapers have a smaller spread of word lengths than broadsheets do.
- The tabloid and the broadsheet plotted on a graph and the corresponding box and whisker diagrams below. Broadsheet and the tabloid lines separate in [Graph 3.3], the way that the two lines drift apart from each other between 2 and 9 shows that the tabloids contains a higher number of short words and that the broadsheet contains a high number of long words. I can determine whether the box and whisker diagrams are positively skewed, negatively skewed or symmetrical using the following equations.
Positive Skew
Upper Quartile – Middle Quartile < Middle Quartile – Upper Quartile
If the median (middle quartile) is closer to the lower quartile than the upper quartile then it is a positive skew.
Negative Skew
Middle Quartile – Lower Quartile > Upper quartile – Middle Quartile
If the median is closer to the upper quartile than it is to the lower quartile then it has a negative skew.
Symmetry
Lower Quartile – Middle Quartile = Upper Quartile – Middle Quartile
Using these rules I can calculate that the distributions of both the tabloid and the broadsheet have a positive skew. The box and whisker diagrams show that the median is closer to the lower quartile than the upper quartile for both newspapers. However, I am interested in the extent of the skew, so I will make calculations to determine which newspaper has the more positive skew.
Tabloid
Median – Lower = 3.4 – 2.3 = 1.1
Upper – Median = 5.1 – 3.4 = 1.7
The median is closer to the lower quartile by 0.6 letters.
Broadsheet
Median - Lower = 4 - 2.5 = 1.5
Upper – Median = 6.6 – 4 = 2.6
The median is closer to the lower quartile by 1.1 letters.
This shows that the tabloid and the broadsheet do have positive skews, suggesting my hypothesis may be correct.
Standard Deviation
I found the standard deviation of each sample because it is a more sophisticated and precise measure of the spread and because it takes all values, even extremes into account. My results for standard deviation show that the broadsheet had a wider variation in word length. This is because it had roughly the same number of short words as the tabloid but a greater number of longer words, resulting in a higher standard deviation.
This helps when discussing how varied the word lengths are in the tabloid and the broadsheet The inter quartile range is another measure of the spread but it only measures the 50% of the data. I feel this is an unsuitable range for such a small sample of words. This is why I have found the average standard deviation for each type of newspaper, and I found that the broadsheet had a larger spread.
Conclusion
I found that the skew of word length distribution was indeed more positive in tabloids than broadsheets. This is shown by the box and whisker diagrams on Graph 3.3. The tabloid has a more positive skew because it contains more shorter words than the broadsheet. Both distributions were positive because both newspapers contained a high proportion of short words. I think this is because they cannot be avoided in the modern English language, and so even in the most complex passage of writing with many long, complex words the simple 2 and 1 letter words cannot be substituted.
I have also used the graphs and the data sheets to make percentage comments about the distribution of word length. 50% of the words in the tabloid newspaper sample were 3.4 or fewer letters long, compared with 50% in the broadsheet being 4 or less. This confirms what the cumulative frequency graphs show, that the broadsheet has more longer words than the tabloid and thus a less positive skew.
Final Conclusion
My investigation has been successful and I am pleased with the evidence I have obtained. I feel I have thoroughly investigated all my hypotheses and provided a detailed analysis and conclusion on the results of each experiment
I plans on what I was aiming to achieve were clear and precise before I under took every experiment. I made note of the problems I may encounter, and in some cases did, in the early stages of planning, ie, ignoring proper nouns and capitalized words when recording the words to record word length. the rules concerning proper nouns and foreign names word length. The amount of detail contained in my plan helped enormously when I collected the data, which could become quite complicated at times. I used a variety of sampling methods to obtain the evidence for the various investigations, including a systematic random sample to select the word samples for Investigation 1.
I tried to present and process the data in the most appropriate form — this included frequency polygons, cumulative frequency graphs, box and whisker diagrams and calculations. I put a lot of effort into ensuring all my calculations were correct. As it is a long report I decided a contents page was necessary to give my report a structure and also make it easy to under stand.
I analysed my findings thoroughly and wrote evaluations on how to improve my work if I were to repeat it. Overall, I am pleased with these investigations and feel I have completed the investigations the very best I could. I believe I have written a detailed, report, investigating 3 complex hypotheses to the best I could.