Statistical Investigation

Statistical Investigation: Newspaper Comparison

For my statistical investigation coursework project, I have chosen to compare “The Sun” (tabloid) and “Financial Times” (broadsheet).

I will analyse both of those newspapers for context and style and make comparisons between them. I will consider amounts of space devoted to different items such as, images, articles and headlines. Most importantly I will be analysing the readability in terms of word lengths.

Finally I will conclude my investigation by commenting on the data collected and how my results relate to my initial hypothesis.

Hypothesises

The pages of the tabloid will be covered with a larger area of images than the broadsheet.

I will prove this by measuring the total area of the images on each front page of each newspaper and work that out as a percentage of the total area of each page, then compare.

The headlines of the tabloid are larger than the headlines of the broadsheet.

I will test this by measuring and comparing the sizes of the front page headlines of each paper.

There is more space dedicated to the actual text in the front page article of the broadsheet than of the tabloid.

I will confirm this by measuring and comparing the area of the section devoted to the text in each front page article.

There is more sport in the tabloid than in the broadsheet.

I will demonstrate this by calculating the total percentage of sports pages in each newspaper.

There are more international news articles in the broadsheet than in the tabloid.

I will verify this by calculating the percentage of international news pages in each newspaper.

Solutions

Tabloid: number of images on front page: 3

Image 1: 33.5 x 10.5 = 351.5

Image 2: 4.5 x 6.7 = 30.15

Image 3: 4.1 x 4 = 16.4

Total area: 351.75

30.15

+16.40

398.3cm²

Broadsheet: number of images on front page: 3

Image 1: 19 x 8.5 = 161.5

Image ...

This is a preview of the whole essay

Solutions

Tabloid: number of images on front page: 3

Image 1: 33.5 x 10.5 = 351.5

Image 2: 4.5 x 6.7 = 30.15

Image 3: 4.1 x 4 = 16.4

Total area: 351.75

30.15

+16.40

398.3cm²

Broadsheet: number of images on front page: 3

Image 1: 19 x 8.5 = 161.5

Image 2: 3.5 x 3.6 = 12.6

Image 3: 12 x 5 = 60

Total area: 161.5

12.6

+ 60.0

234.1cm²

Broadsheet: 12.5 x 9.5 = 118.75cm²

Tabloid: 14.5 x 6.5 = 94.25 13 x 9.5 = 123.5

94.25 + 123.5 = 217.75cm²

Broadsheet: 9.3 x 13.2 = 122.7cm²

Tabloid: 4.3 x 4.8 = 20.64cm²

Broadsheet: Total pages: 18

Sports pages: 0.33333333333333333333333333333333

0.33333333333333333333333333333333

Tabloid: Total pages: 60

Sports pages: 39

Tabloid: 1 page of international news 1

Broadsheet: 5 pages of international news 5

Readability

Hypothesis

I believe that the broadsheet newspaper will have a higher number of longer words than the tabloid and will therefore be harder to read.

To prove this I will create two frequency tables of the number of letters in a word for each newspaper. The data will not be grouped. From these cumulative frequency tables I will construct two cumulative frequency graphs. I will also create two grouped frequency tables in order to also construct two histograms. I will analyse both the histogram and the cumulative frequency graph in order to find out whether my hypothesis is correct.

To avoid bias I will collect a random sample of words. In order to do this I will use the RAN# button on my calculator. I will number each page in the tabloid from 1 to 60 and each page from the broadsheet 1 to 18. I will also number the first 200 words on each page 1 to 200.

I will press SHIFT RAN# to give me a random number between 0 and 1 I will then multiply this number by 60 or 18 (depending on which newspaper it is that I am collecting data from) and then round it to an integer to identify the page of the newspaper. I will again press SHIFT RAN# to give a 2nd number between 0 and 1 and multiply that by 200 and round it to an integer to identify the word on that page.

If this process gives the same word on that page then I will ignore that result and repeat my process until a different word is selected, however if the same word appears on a different page or in a different position on the same page I will not ignore it.

Number of letters in 100 words extracted from a broadsheet newspaper

Number of letters in 100 words extracted from a Tabloid newspaper

Analysis and Evaluation

In the first section of this project I came up with a few simple hypotheses and used simple calculations like finding area and working out percentages to prove them.

However for my readability hypothesis I came up with a more complex problem and therefore needed to perform more complex statistical calculations and procedures in order to verify my final hypothesis.

My readability hypothesis was that “the broadsheet newspaper will have a higher number of longer words than the tabloid newspaper and will therefore be harder to read.” To prove that this was true I needed to gather a sufficient amount of random words from each newspaper and compare the lengths of the words by counting the letters.

I collected 100 words from each newspaper, count how many letters there are in each and put the data in two separate frequency tables (one for broadsheet and one for tabloid) from which I constructed two cumulative frequency tables so that I could draw two cumulative frequency graphs and box plot diagrams.

I also created two grouped frequency tables for each newspaper so that I could construct a Histogram for each set of data. I did all of this so that I could make observations that will prove that my hypothesis is true and also see if I can make any improvements which would make my results more accurate.

To avoid bias I used the RAN# button on my calculator to collect a completely random sample of words.

The cumulative frequency table for the words in the tabloid has a Median of 3.2 this suggests a low representation of the number of letters per word, this can also be said about the Mode, as it is very similar to the Median. The Mode is 3 and this shows that most of the words taken from my random sample of words in a tabloid have 3 letters.

The Median of the broadsheet is 5.7, which shows a larger estimate of the number of letters per word.

Apart from this obvious conclusion I have drawn on a number of other variables to act as a contingent to this.

For example the Histogram can show us the general spread (whether it is positively of negatively skewed) of the data. The tabloid justifies my hypothesis very well as it is negatively skewed which means the majority of the data is concentrated towards the lower quartile than the higher.

However the broadsheet does not display such a theoretical fit. I had expected a more so negatively skewed graph; of course I did get a comparatively more negatively skewed graph.

The cumulative frequency graph also depicted a lot of information. The gradient of the tabloid is a lot steeper at the beginning of the graph, showing that most of the data is within the first quartile. As we go further down in the ‘x axis’ the line gets flatter. The lower quartile is 1.8 and the upper quartile is 5.6, giving an inter quartile range of 3.7 the inter quartile range for the tabloid is much less than that of the broadsheet showing a more consistent unbiased distribution centred around the Median.

The box plot diagram tells gives us another perspective of our data.

For example the Median is closer to the lower quartile in the tabloid than in the broadsheet. This was hypothesised, but a larger discrepancy was anticipated.

The Median of the broadsheet although further away from the lower quartile than the tabloid is still less than we expected. This is to do with bias, because there are large amounts of 3-4 letter words in all newspapers regardless of whether it is broadsheet or tabloid, there will never be a completely positively skewed graph of letters in words for any newspaper.

I have concluded that my tabloid data has fitted my hypothesis very smoothly, which suggests that tabloids have a larger amount of words with few letters.

My broadsheet did not fit my hypothesis ideally but never the less did support the main idea. I believe this to be because there are a large amount of 3 letter words in all newspapers

Statistical Investigation

This is a preview of the whole essay

Document Details

Related Essays

Maths Coursework - Statistical Investigation

Investigation into some of the statistical differences between The Sun, The...

Maths Statistical Coursework

Investigation into some of the statistical differences between The Times an...