Investigation into some of the statistical differences between The Times and The Telegraph on a specific day

Authors Avatar
Investigation into some of the statistical differences between The Times and The Telegraph on a specific day

Design and Planning

The aim of this project is to compare two daily published broadsheets. The two papers that will be used are THE TIMES and THE TELEGRAPH, both purchased on the same day. A lot of data can be easily collected from a newspaper, ranging from average word length to area devoted to adverts per page.

The project will attempt to reach conclusions regarding three specific questions. In answering these questions a range of sampling methods, presentation of data, and statistical calculations will be used in order to interpret and evaluate the data and come to a valid conclusion, drawing together all the data.

Each question will be presented and it will be explained what statistical methods will be involved in drawing conclusions for these questions.

Question 1:

· How does the font size of the headline text affect the length of the article?

This involves comparing two sets of data:

· Font Size of Headline text: A sheet was printed from Microsoft Word that had various font sizes in the Times New Roman font, the standard font for the two papers, printed on it. This was used as a guideline when compiling all the data.

· Length of column of each article : In The Times and The Telegraph there is a standard column width and simply measuring the vertical length of all the columns in the article gives a suitably accurate indication of the length of the article

To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper will need to be collected. As each page has approximately three articles on it and both newspapers have roughly thirty pages as systematic sample of every 4 pages will provide enough data to support any conclusion.

The best ways to find out if the size of the headline text affects the length of the article is to draw a scatter diagram and find the line of best fit and to use Spearman's rank correlation coefficient.

Question 2:

· What is the most common type of advertisement and how much space is given to each?

This involves collecting two sets of data:

· Number of times a pre-defined type of advert occurs : This will be done simply by looking through the paper and making a tally chart.

· Area devoted to each pre-defined advert type : Whilst making the tally chart the area of each advert will also be recorded in centimeters squared. All these results can then be added up to give the total area devoted to each advert type.

To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper will need to be collected. The only fair way to do this is to collect data from the whole of both papers, as this gives a much better picture of how much advert space there is and will provide at least twenty sets of data from each paper.

The best way to compare the data collected is to draw two sets of comparative pie charts. One set comparing the type of advert and the other comparing the area devoted to each type.

Question 3:

· What is the dispersion and averages of the number of words in each article and how do they differ between the two newspapers?

This involves collecting one set of data:

· The number of words : This will be done by counting the number of words in the first sentence as this usually gives a good indication of the depth of the article. The data will be collected in a grouped frequency table.

To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper will need to be collected. Therefore to collect the right amount of data fifty samples in total over the two papers should be taken in the style of a stratified random sample, distributing the amount of samples proportionally between the two papers. A page number should then be randomly generated and the first article from that page sampled.

The best way to compare these two sets of data will be to use standard deviation, mean deviation, the quartile ranges, the averages (mean, median, mode), and histograms with box and whisker diagrams.

Collection, selection, presentation, analysis and interpretation and evaluation of data

Question 1:

To make the calculations accurate enough to draw a valid conclusion twenty sets of data from each paper was collected. As each page has approximately three articles on it and both newspapers have roughly thirty pages as systematic sample of every 4 pages was used to provide enough data to support any conclusion. As with all continuous data the column length will have a maximum and minimum error which will mean that errors in the data are possible, however these errors will not noticeably affect any of the statistical calculations.

THE TIMES THE TELEGRAPH

Font Size Column Length (cm) Font Size Column Length (cm)

72 57 90 45

36 11 48 20

24 16 36 17

48 59 48 20

72 68 72 46

36 20 20 5

28 20 36 30

72 34 72 75

36 14 30 24

80 36 28 16

72 49 48 42

36 21 24 6

28 20 36 22

48 35 72 38

36 18 28 14

90 83 72 34

24 8 90 80

60 46 28 19

36 18 90 104

72 34 72 67

It was found that not every page had 3 articles on it so not as many samples were collected as was hoped, but luckily twenty samples were still collected anyway.

Scatter diagrams

Firstly a scatter diagram was drawn for each of the two sets of data. This consists of laying out each of the measures along one of the axes of the grid, then considering each item in turn. The two measures for that item act exactly like an ordered pair and thus like coordinates of a point on the grid. Each item considered is thereby linked to one point on the grid and that point can be plotted in the normal way. From the scatter of points that is built up a pattern can be identified, a line of best fit simplifies this trend. To plot this line a special point was plotted, (Average of Font Size, Average of Column Length). These were then compared.

Spearmans Rank Correlation Coefficient

Another method of finding the relationship between two sets of data is to use Spearmans Rank Correlation Coefficient. Each distribution must first be put into an order of merit. Each item being considered has two ranks allocated to it and the difference between these two ranks can be found. If the symbol d is used to represent this difference then the coefficient of rank correlation can be written as:
Join now!


where n is the number of items in the distribution.

If two or more measures in one distribution are equal it is convenient, though not mathematically justifiable, to allocate them a rank which is the average of the ranks which they would have occupied if they had been different. For example, if the third and fourth measures in a distribution are equal they would both be allocated the rank 3.5 or if the fifth, sixth, and seventh are equal they would be allocated the rank 6.

The easiest way to represent this data and to calculate ...

This is a preview of the whole essay