Which paper is easier to read, the tabloid, or the broadsheet?
Data Handling Coursework.
In this investigation, I shall initially create a hypothesis. My hypothesis will be relevant to the content of a tabloid newspaper in comparison to a broadsheet. I shall firstly decide what type of data to collect, which will depend on which data is relevant to my hypothesis. I will then decide how much data to collect to obtain an accurate outlook and answer to the hypothesis. I will need to decide how I shall choose my sample to make a fair comparison of the two newspapers. The data will then be recorded in a table, and then in a graph that is relevant to the average and range required to answer the hypothesis.
Hypothesis:
Which paper is easier to read, the tabloid, or the broadsheet?
My hypothesis is - I think that the tabloid is easier to read than the broadsheet. By investigating the three main factors affecting how easy or difficult the paper is to read, the average and spread of the data may help identify how easy or hard it is to read. When investigating the number of letters in the words of the articles, I think that the tabloid will have the lower average and range, as less word length variation and shorter words would make it easier to read. When measuring how many words are in each sentence I think that the tabloid would have a lower average and range, as this would also make the tabloid easier to read. When timing how long it takes to read the articles, I think the tabloid will have the lower average, as the paper that is easier to read may take less time to read. Also, the tabloid would have the lower range, as the samples are all from different people. If the range was lower, then the consistency of the times would be greater indicating that the paper would be easier to read.
Plan / Methodology :
Firstly, I will need to collect the relevant data. I will count the letters of 50 words at the start of an article in each paper that covers the same story. I will choose my sample by choosing the first articles in each that match up with the same story and of at least 50 words each. Choosing an article with the same story from each paper is a fair comparison as the type of story can effect how easy it is to read e.g. sports compared to politics - politics is usually harder to read as it often has longer, more complicated words, longer sentences etc.
Then I shall count the number of words in the first 30 sentences, moving to the adjacent article if the initial article has less than 30 sentences. The two articles cover the same stories to make it a fair investigation (as explained previously).
Also, I shall take a story from each paper and present the stories to 30 year 11 students. They will read the first 400 words of each story at the same time and measure how long it takes to read that number of words.
I shall initially record the results of the ...
This is a preview of the whole essay
Then I shall count the number of words in the first 30 sentences, moving to the adjacent article if the initial article has less than 30 sentences. The two articles cover the same stories to make it a fair investigation (as explained previously).
Also, I shall take a story from each paper and present the stories to 30 year 11 students. They will read the first 400 words of each story at the same time and measure how long it takes to read that number of words.
I shall initially record the results of the letters per word on a frequency table. Once I have done this I shall work out the mean and standard deviation of the data. The mean will show me the average amount of words in each sentence, and may agree or disagree with my hypothesis. The standard deviation will show me an accurate spread of the number of letters per word, as it uses all of the data. From this I can state which paper has a more consistent number of letters per word; the larger standard deviation, the less consistent the number of letters per word.
As there is a variety of numbers of words in sentences, I shall group my data. To represent the data, as it is grouped, and I predict that there will be an uneven trend e.g. there are probably more 10-word sentences than 40-word sentences, I shall make some groups larger than others to save having too many groups. I shall then work out the frequency density to obtain fairer readings. I will then place my results onto a histogram, which I think is most appropriate to display frequency density data. From this, I shall work out the mean average and standard deviation. The mean will show me the average number of words per sentence of each newspaper, which will help me to compare the two papers and hopefully the mean will back my hypothesis, and the tabloid should have the lower mean. The standard deviation will display a spread of the whole range of all 30 sentences in each paper. Hopefully the results will agree with my hypothesis and the tabloid will have a smaller standard deviation as this range indicates the diversity of length of sentence, which could add to the difficulty of reading.
For the time it takes for to read 400 words, I shall record my results initially on a tally/frequency table. I shall then group my data and present it on a grouped data cumulative frequency table. I will then put my data onto a cumulative frequency polygon where I can acquire the median average and inter quartile range. The median is the best average for this data as it ignores any anomalous results and indicates a fair average of the samples to compare the two papers. The inter quartile range is appropriate for this data as it gives a fair spread for the sample using 50% of the data. This should show me how consistent it is for 30 pupils to read 400 words of the two papers and may indicate that the larger the range, the less consistent amount of time it takes to read the 400 words, which may make it harder to read. All mean averages should be rounded up to the nearest two decimal places.
Analysing the Data:
The data on letters per word has been recorded on two tables. The raw data was firstly collected onto the tally/frequency table. I then grouped the data and placed it onto a frequency density table. I obtained the mean average and standard deviation of the two papers from the raw data on the first table (graph 1). The broadsheet had a higher mean average of 4.82 compared to 4.24. This supports my hypothesis; the paper that has the higher mean average has therefore, on average longer words and thus, the information suggests that that paper is harder to read. The broadsheet also had the higher standard deviation, which takes into account all of the data when calculating the spread. This suggests that the broadsheet has words of less consistent lengths which supports my hypothesis that the less consistent the lengths of word, the harder it may be to read.
The data on words per sentence has been recorded on two tables and two histograms. Firstly, the raw data has been recorded on a tally/frequency table. It was then split into five uneven groups and placed onto a frequency density table to work out the frequency density to achieve a fair and accurate trend between the groups. To show the difference in trends between the two papers up clearly, that data was then plotted onto a histogram for each paper. The mean average was estimated using the grouped data on the frequency density table (graph 4) and was then worked out accurately using the raw data in the first table (graph 3). The difference between the estimated mean was greater than that of the actual mean. The broadsheet had the higher mean at 28.84 compared to the tabloid - 21.61. This supports my hypothesis because the higher mean points out that; on average that paper has larger amounts of words per sentence, which may make the paper harder to read on average. The standard deviation was calculated using the raw data on graph 3. The broadsheet newspaper has the larger standard deviation - 13.57 compared to the tabloid, which had 7.09. This suggests that the tabloid, with the lower standard deviation, has the more consistent number of words per sentence, potentially making it easier to read than the broadsheet, which supports my hypothesis.
The samples on the time it takes to read the two similar articles was placed firstly in its raw form on a tally/frequency table, then grouped into even groups and put onto a cumulative frequency table in preparation for the data to be transferred onto a cumulative frequency polygon. The median averages and inter-quartile ranges were all worked out using the cumulative frequency polygon. As predicted by my hypothesis, the broadsheet had the higher of the two medians at 2.02 compared to the tabloids 1.53. The higher median says that on average, it takes someone longer to read that article, and therefore that article is harder to read on average, which agrees with my hypothesis. The broadsheet has the higher inter-quartile range, with 67 seconds compared to the tabloid's 30seconds. This suggests that it takes a more varied amount of time to read the broadsheet, which agrees with my hypothesis. This factor shows that the broadsheet newspaper is harder to read.
Conclusion:
From this investigation I have analysed three factors that are relevant to my hypothesis. Firstly in the letters per word data, the averages and ranges of the two papers support my hypothesis that the broadsheet is harder to read than the tabloid, as do the two other factor's averages and ranges. These averages and ranges were calculated using the graphs and tables and calculations necessary as mentioned in the previous section.
The aim of this investigation was to compare the two separate types of newspaper and answer the study question with a hypothesis, and to then gather the data required to answer the question, and then to display and calculate the relevant information to try and answer the question and find out if the hypothesis made by myself is accurate. I conclude that the aim of the investigation has been fulfilled and my hypothesis proved right as the information in the previous section will validate, that the answer to the question is that the broadsheet newspaper is the harder of the two to read as proved by the outcomes of the three main factors in this investigation.
Evaluation:
On the whole this project has been successful, yet a vast amount could have been done to improve it. Firstly the main success was that the study question was answered and my hypothesis proved plausible. There were no real problems with handling of gathering the data, but the gathering of the data and transfer of the data could have been improved. This is because I only used and calculated one average and one range per factor, which could have been expanded, e.g. words per sentence - I could have used the mode and median as well as the mean to analyse the data. Also I could have displayed this data in not just a histogram, but also possibly in a radar diagram or pie chart to display different trends to analyse.
When gathering my data for how long it takes to read 400 words in the two separate articles, instead of just using my maths class, who may have a higher or lower literacy than other sets I could have chosen 30 random people of different abilities and possibly different ages to make the data less biased and a better representation of the population that reads the two newspapers. Also I could have transferred the data onto a histogram as well as the cumulative frequency polygon to obtain the mean and standard deviation as well as the already acquired median as the mean uses all the data as does the standard deviation compared to the I.Q.R. which only calculates the range using half of the data.
When gathering the data, if I had measured a larger sample size - 100 words instead of 50, then the statistics would be more valid and possibly more accurate compared to the lesser sample size.