The section with the smallest per-cent I would give a sample of 50 words.
This table shows that my hypothesis was wrong to the point that I thought The Sun would have had a bigger Sports section, by the Telegraph proved to have the biggest.
3. Max Lacome- Shaw
As well as this because of the Daily Mails small per-cant of Sports News, just 2%, if gave the Telegraph a 200 word sample.
I selected a stratified sample as follows:
I chose the newspaper with the lowest percentage value for Home News, which was the Daily Mail (21%) and selected a fifty-word sample from this. Then I calculated the other papers as follows:
The Daily Telegraph = 27% = 27/21 x 50= 64 words
The Sun = 22% =22/21 x 50 = 52 words
I repeated this process for Foreign News and Sports News results above. Using Random Number Button to Find a Random Sample, this is how I did it using my calculator:
Shift, Ran# = (Gives Random Number) x (Number of Words in Article)
Whatever number I obtained from this I rounded to the nearest integer. Example:
798 words in article
Random number = 0.375 x 798 = 299.25
= 299 (rounded)
In this case I chose the 299th word in the article and wrote it down with the number that the random number method chose for me. If by chance my process selected the same number more than once I ignored it and carried on. I repeated the process until I had chosen the required sample size. I then counted the length of each word and however many letters it had in it from the sample and listed the results in tally charts, on the next pages.
4. Max Lacome- Shaw
Once I had obtained all my tally tables I then converted my results into a cumulative frequency column, added on to the side of the tally tables. I did this by adding up the total number of letters in each word:
Example Daily Mail- Sports News:
3 + 14 = 17 (this answer would be my cumulative frequency number for three letters in a word) The ‘3’ here represents the total number of words with two letters in them and the other added up numbers of letters in words before it from the newspaper sample of the section, and the ‘14’ is the total number of words with three letters in the word from the sampled section.
34 + 7 = 41 (this answer would be my cumulative frequency number for seven letters in a word) The ‘34’ here represents the total number of words with six letters in them and the other added up numbers of letters in words before it from the newspaper sample of the section, and the ‘7’ is the total number of words with seven letters in the word from the sampled section.
From this I was able to make cumulative frequency graphs of all the three sections from the three newspapers, as show on the next pages.
5. Max Lacome- Shaw
On these cumulative frequency graphs I have plotted the median: This is half of the total number of cumulative results from the tally tables, lower quartile: this is ¼ or a quarter of the total number of cumulative results from the tally tables, and the upper quartile: this is ¾ or three quarters of the total number of cumulative results from the tally tables. From these upper quartiles, lower quartile and medians I was able to plot a pictorial way of looking at my results this is called box and whisker diagrams or box plots, as show on the next pages.
6. Max Lacome- Shaw
These box and whisker diagrams show that:
In the ‘Home News’: The Sun and the Daily Mail had an almost identical difference between upper and lower quartiles although the Daily Mail did have a higher difference between them, this means that on average from the sample the Daily Mail had longer words. Unexpectedly the Telegraph seemed to have a lower median of letters in a word than either of the Daily Mail or The Sun, this means that The Sun and the Daily Mail had longer words on average than the Telegraph, which was unexpected because due to my hypothesis the Telegraph should have had longer word length than either of the other two due to its broadsheet expected audience. Although The Sun did have a higher median of words than the Telegraph the Telegraph did have however a higher upper quartile than The Sun, this means that the Telegraph did have longer words just not enough of them. The Daily Mail out of the three had the highest median and the highest upper quartile, this means that it had higher number of letters in a word and it also had the most number of higher letters in a word. All three newspapers also had the same difference of numbers of letters in words, this means that they all started off at one number of letters in a word and ended at ten numbers of letters in a word.
In the ‘World News’: These box plots have proven to make my hypothesis a more accurate one with The Sun having an appalling range between lower and upper quartiles and a low median to add. Coincidently the Daily Mail and the Telegraph had practically the same range between upper and lower quartiles and also around the same word lengths, although the Telegraph has a just higher average of range between the lower and upper quartiles than the Daily Mail, the median of the Telegraph is lower than the Daily Mail, which means that the Telegraph had some higher lettered words than the Daily Mail and or less lowers lettered words but did not enough higher lettered words or less lower lettered words to get a higher median. The Sun’s median was the lowest median out of the three newspapers and was at about 3.3 average of words, it also had a very small range between lower and upper quartiles nearly just half of the Daily Mail and the Telegraph. All three newspapers again had the same difference of numbers of letters in words, this means that they all started off at one number of letters in a word and ended at ten numbers of letters in a word.
In the ‘Sports News’: The Telegraph had a very large range between upper and lower quartiles ranging from 2 letters in a word to 6.5 letters in a word, this means that it had many words with few letters in the words and also had several words with many letters in the words. In this section the Daily Mail had the highest median again which means that it again had more words with more letters in them or less words with less letters in them, giving it a higher median. As well as this the Daily Mail and The Sun had again very similar range between upper and lower quartiles but again the Daily Mails were at a higher word length. The Telegraph also had very varied differences in the number of letters in the word ranging from one to thirteen, which was more than even the Daily Mail ranging between one and ten and The Sun ranging between one and twelve letters in a word. I also expected there to be on average shorter word lengths in the Sports articles than in the other sections, this hypothesis turned out to be wrong as the box plots show.
7. Max Lacome- Shaw
I will now plot cumulative frequency tables combining the cumulative frequency tables for each of the three newspapers as whole newspapers. I will do this by adding up the cumulative frequency tables that I have set out for each different section and combined them into one table by adding up the total number of words from each of the different section simultaneously. Here is an example of what I did:
In the Sports News from the Telegraph three words had one letter in them, I then added this to the number of words with one letter in them from the World News which was fourteen, and then I added both of these to the number of words with one letter in them from the Home News which was only two so:
3 + 14 + 2 = 19 (this answer I would then add to another table)
I then did this same method for all the lengths of words, in the three different sections. Onto these tables I then added cumulative frequency columns, I made cumulative frequency columns by added up the frequencies and from these columns I could make cumulative frequency graphs, as show on the next pages.
8. Max Lacome- Shaw
The cumulative frequency graphs show:
For the Telegraph that: there is a steady rise in cumulative frequency with one or two large jumps between two letters in a word and three letters in a word. Towards the tip of the curve the spaces between the numbers of the letters in words gets smaller, this means that there are less and less numbers of letters in the higher letters words.
For the Daily Mail that: there is a much larger cumulative frequency of numbers of letters in words than compared with the Telegraph which means that there was more words with higher numbers of letters in them, making the cumulative frequency more.
For The Sun that: this like the Telegraph, towards the tip of the curve had less of a difference between word length. Also like the Telegraph and unlike the Daily Mail it had a higher starting word length, this means that the number of words with only one letter in it, for example: ‘a’ or ‘I’, was more than in the Daily Mail.
After I did this I calculated estimates for the median, the lower and upper quartiles from the cumulative frequency graphs, and use their results to make box plots examining all three sections of the newspapers combined, as show on the next pages.
9. Max Lacome- Shaw
Comparing all three sections of the all the three newspapers in box plots shows that:
The Sun and the Telegraph have very similar lower quartiles both at an estimate of around 2.8 this means that they both have words that are of smaller sizes of around 2 or 3 letters in a word. The newspaper with the highest median of letters in a word was The Daily Mail this also had the highest upper quartile, which was around 6.2 letters in a word. The Telegraph had the second most amount of median, and expectedly The Sun had the least. Although the Daily Mail had the highest median it did not have the highest spread of results, the Telegraph had the most spread results ranging from 1 letter in a words to 14 letters in words where as the Daily Mail only had a spread of between 1 and 10 letters in a word. The Telegraph also had the biggest inter-quartile range this means that it had a range of different numbers of letters in a word but not enough of them to make a difference to a median average. Even though The Sun had the lowest median, it did however have a spread bigger than the Daily Mail, The Sun managed to get words that were at least 12 letters long but had it had more it would have got a higher quartile. Had I used the mean for this I would have defiantly got different results, My hypothesis for the difference would have been that the Telegraph would have got a higher median and probably the highest median, and The Sun’s median would have been lower.
I have now decided to calculate the mean value for each paper taken as a whole. The mean is a different type of average. I will see if it gives a different result from the medians. I expect the Telegraph to have highest mean followed by The Daily Mail and then by The Sun. The mean is affected by extreme values such as long words and the Telegraph data is the only data containing words above 12 letters, so this is why my hypothesis is as it is.
Standard Deviation
Standard Deviation is a measure of dispersion or spread and will give me information about the spread of data for each paper. I will compare these results, and make a table. I am going to use standard deviation because it would give me another average which will make my results more accurate, and will give me the chance to see what happens when you use a different form of average.
Firstly to complete Standard Deviation I needed to have to find the mean of all my newspapers as a whole so I will get three means, I can then use this information to start standard deviation. I do this by:
Firstly going back to my ‘Combine Tally Tables’ page, which I have written out by hand, and gathering information. The information that I need is:
The Letters in Word x The Frequency = (A section of the mean)
Example: 10 x 1 = 10
27 x 2 = 54
39 x 3 = 117
10. Max Lacome- Shaw
(I will make a full example of my results from one paper including the set up of standard deviation later on in the project)
I then did this method of obtaining information all the way down the table and once I had done that I then needed to:
Divide the added up numbers of all the calculations as shown on the previous page by the total number of cumulative frequency.
Result of all the calculations, divided by, the total number of cumulative frequency
This then gave me the mean; I then started standard deviation. Firstly I will subtract each bit of data and square it.
Example: (1 – 5.76) 2 = 22.6576 x (frequency of one lettered words) 10 = 226.57
(I will be basing my results on 2 d.p.)
I then did this for all of the calculations I previously maid.
Example of the full process for The Sun, I then did this for all three newspapers.
11.
I have made a results table of all my standard deviation and mean results.
This table shows that the Daily Mail has the greatest mean average; this means that on average it had longer words than the other newspapers. This was quite surprising seeing as it had the smallest number of letters in words as well as not having the greatest cumulative frequency number. The Telegraph had the second biggest mean; this was probably down to the fact that it had the highest number of letters in word and therefore had the largest cumulative frequency number of 329. The Sun expectedly came lowest, because of its smaller words on average.
The standard deviation result are interesting because of the similarities between the Daily Mail results and The Sun results, there only is .03 differences between the two. This shows that they both have about the same spread of word length. The Telegraph in standard deviation has the highest mark, this is what I did expect due to the fact that mean is affected by freak results and the Telegraph did have a high word length of 14. The higher the standard deviation, the higher the spread of word length. This shows a higher spread of word lengths, probably because of the longer words in the sample.
About 2/3 of data lies within the standard deviation, above the mean or below the mean.
Example: 4.46 + 2.35 = 6.81
4.46 + 2.35 = 2.11
Summary:
My conclusion is that overall my results do slightly support my hypothesis at the beginning of the project; although in parts it is false – for example in the box plots of ‘Comparing Newspapers – Home News’ I thought that that Telegraphs median would have almost certainly been higher than The Sun’s but I was proved to have been wrong. I could have improved my this project in many areas such as, more data sampling and or more articles to sample from, as well as various versions of the same newspapers from different days. I could have also tried other articles that could have backed up my research slightly more and given me more accurate results. All through the project I could have also made my result more accurate by using a greater number of words behind the decimal point. In my results I didn’t expect the Daily Mail to do
12.
better than the Telegraph because of its audience but throughout the project it seemed to.