4. Thirty people will be selected at random, regardless of sex or age and asked to read each article aloud for thirty seconds. After completing this task, the number of words they had successfully managed to read will be counted and recorded. The article which has had the least number of words read will be assumed to be the more difficult article to read, due to different contributing factors.
- No data will be collected for this hypothesis because the data collected in hypothesis 2 will be used instead. After producing a form of data presentation from this information, the range will be calculated.
- All the paragraphs will be included in this sample because there are not a large quantity of paragraphs in an article. The number of sentences per paragraph will be tallied and recorded in a table and then average will be calculated and compared.
Note: Hyphenated words are counted as one word. E.g. fire-fighters is one word.
Numbers are counted as one word and when counting syllables the entire phrase is included. E.g. 2001 is one word and ‘two thousand and one’ is five syllables.
Results
‘ The Times will have more syllables per word than The Independent.’
(Log Book pg. 3 & 4)
The systematic method of data sampling provided me with well over 150 words, but because one newspaper provided 156 words and the other provided 233 words I had to calculate percentages so they could be drawn as a graph and to be compared.
A graph can be drawn of the cumulative frequency distribution; a curve can be obtained which has a characteristic shape. This curve is called a cumulative frequency curve. Both curves have a similar shape; this shape is the shape of values that are in a 'normal distribution'. This means that the data if put on a frequency curve would have symmetrical curve peaking at the middle. The cumulative frequency curve for The Independent has a slightly steeper incline, but the median is at the same position along the x-axis as The Times. This shows that The Independent has a fractionally more of a 'normal distribution' than The Times does and that the percentage values per interval in The Independent are on average higher than The Times. The curve for The Independent is smoother towards the end than The Times. This again shows that the decline from the peak is more regular in The Independent than in The Times.
The inter-quartile range is a range that discards any higher or lower value and concentrates on the middle values; this shows how spread out the main part of the data is. The Times interquartile range is 1.25 and The Independent’s interquartile range is 1.1. The inter-quartile ranges of both papers only differ by 0.15.
Using the results from my table I calculated the following.
The Times.
Mean = 1.61 to 2 d.p.
Median = 1
Mode = 1
The Independent
Mean = 1.51 to 2 d.p.
Median = 1
Mode = 1
These show that although the percentage values in The Independent are on average higher per interval, but overall the total average is higher in The Times by 0.10.
My hypothesis has been proven true and The Times does have more syllable per word than The Independent.
‘The number of letters per word will be greater in The Independent than The Times.’
(Log Book pg. 5 & 6)
Although the RAN function was time consuming it provided random numbers and I did not have to calculate percentages as I had the same quantity from both newspapers. I chose to plot a Bar graph because all the intervals were equal and it allow me to compare the variation of letters per word in each interval with ease.
The red bars indicate The Times and the green indicates The Independent. There are eleven intervals that had data out of thirteen; in these eleven, six have a greater number reached by the independent, whilst only five are from The Times. Straight away it can be calculated that the average will be very close and The Time will have a greater average because its bars achieve the highest frequencies of 12 and 11.
The bar chart reveals that neither newspaper have typical distribution of the values. They do not increase steadily towards the middle of the range, instead they vary from 9 to 3, but they do have lower frequencies at the extremes of the ranges. The Times peaks over the 3-5 group whereas The Independent peaks over a larger 2-6 group. This shows that the Independent has on average fewer letters per word that The Times.
Using the results from my table I calculated the following.
The Times.
Mean = 4.92 to 2 d.p.
Median = 5
Mode = 3
The Independent
Mean = 4.88 to 2 d.p.
Median = 5
Mode = 2
These show that The Times does have a Higher average letters per word than The Independent although it is fractional. The difference is only 0.04. This average is also support by the mode. The most frequently occurring number of letters per word in The Times is 3, whereas in The Independent it is 2.
Therefore my hypothesis is completely incorrect. The Independent had a majority of two letter words and the Times had a majority of three letter words.
‘ There will be more words per sentence in The Times than The Independent.’
(Log Book pg. 7 & 8)
This was an easy method of data collection because it simply involved counting the number of words in every sentence. It provided a wide range of results and I chose comparative pie charts because they allow you to compare not only the percentage components but also the totals of the components, the areas of the pie charts must be proportional to the totals of the components.
It can clearly be seen from the data that The Independent has 4% more sentences with 11 – 20 words than The Times. In both newspapers it can be seen that the interval of 11 – 20 words per sentence is the most frequent and it dominate over half of each pie chart.
What the comparative pie charts allow you to do is to compare the percentage of the total article each interval represents. The charts show that if a certain interval has an equal frequency in The Times and The Independent it has a higher percentage of the total in The Independent than The Times. A frequency of 1 represents 4% in The Independent, but it only represents 3% in The Times.
Using the results from my table I calculated the following.
The Times.
Mean = 22.27 to 2 d.p.
Median interval = 11 - 20
Mode = 11 – 20
The Independent
Mean = 17.35 to 2 d.p.
Median Interval = 11 - 20
Mode = 11 – 20
These results show that The Times has on average 4.92 more words per sentence than The Independent. This may have been influenced by one abnormally high or low value. As they both has the same mode and median interval, the results show be re-recorded in another table with smaller interval which would show a more distinct difference.
‘ The Independent will be more difficult to Read than The Times.’
(Log Book pg. 9 & 10)
This was an easy hypothesis for which to collect data because people showed an abnormal enthusiasm in participation. I chose to use a histogram because the final interval had to be adapted and expanded, so it did not have a range of 10 like all the other intervals, but instead it has a range of 20. This would have created an incorrect bar graph because the area of the bar would not have been proportional to the actual figure. The height required for use in a histogram can be calculated by dividing the frequency of the interval by its range. In this case, the first three intervals were divided by ten and the last interval was by twenty.
The two histograms show that The Times has the most variable results with a range of seventeen and The Independent has a smaller range of seven. They also show that the Independent has more constant results with a smaller frequency in the 109.5 – 129.5 interval than The Times, but it has a higher frequency in the 79.5 – 89.5 and the 89.5 – 99.5 interval showing that The Independent has on average fewer word read in 30 seconds.
Using the results from my table I calculated the following.
The Times.
Mean = 110.98 to 2 d.p.
Median interval = 99.5 – 109.5
Mode = 99.5 – 109.5
The Independent
Mean = 103.50 to 2 d.p.
Median Interval = 99.5 – 109.5
Mode = 99.5 – 109.5
These results show that The Time was read at an average 110.98 words per 30 seconds, whilst The Independent was read at a slower 103.50 words per 30 seconds. The median and mode are the same for both newspapers showing that although slightly different they are still quite similar in the reading difficulty.
The reading difficulty would have been influenced by many different factors, including word length, sentence length, clauses etc.
The results have proven my hypothesis incorrect and The Independent is more difficult to read than The Times, instead it is easier.
‘ The number of letters in a word will be more varied in The Independent than The Times.’
(Log Book pg. 5 & 6)
This did not require data collection because it had already been collected for a previous hypothesis and these results were adequate for usage. There is a wide range of results and I chose a cumulative frequency graph because it shows range and totals clearly and efficiently.
The graph shows that The Independent has the steepest gradient indicating that a larger percentage of the results were words that contained three letters or less. The points cross at 4 and then The Times rises more quickly than The Independent showing than The Times has a greater percentage of word with 4 to 9 letters than The Independent. The points meet again at 9 and remain equal until 11 where they separate briefly and rejoin at 13. Overall The Times and the Independent have very similar gradients and this indicates that the variation in the number of letters from the two newspapers will be very similar.
From the graph I calculated the following:
The Times
Lower quartile: 2
Median: 4.6
Upper Quartile: 6.2
The Independent
Lower quartile: 2.6
Median: 4.2
Upper Quartile: 5.7
From these I calculate, the Interquartile Range:
The Times IQR = 6.2 – 2
= 4.2
The Independent IQR = 5.7 – 2.6
= 3.1
The range of The Times and The Independent is 12, because this figure was the same, I calculated the Interquartile Range which eliminates the extremities and only concentrates of the middle 50% of the cumulative frequency. This showed that The Times has a greater range of 4.2 compared to the Independent, which has a range of 3.1. This figure proves my hypothesis incorrect because The Times has a large Interquartile Range than The Independent.
‘ The Independent will have more sentences per paragraph than The Times.’
(Log Book pg. 11 & 12)
The data was rapidly collected because it was very simple. It provided a relatively few results and it will have to be considered in my results. I chose comparative pie charts because they allow you to compare not only the percentage components but also the totals of the components, the areas of the pie charts must be proportional to the totals of the components. This allows for different quantities of results to be recorded and because there was relatively little for this hypothesis it was ideal.
It can clearly be seen from the data that the largest percentage of the data in The Independent is the interval of 3 sentences per paragraph, but the largest percentage of data in The Times is the interval with 2 sentences per paragraph. This already makes it obvious that The Times has shorter paragraphs on average. The Independent has 11% less paragraphs with 2 sentences and The Times has 14% less paragraphs with three sentences. Confirming this result, it the complete absence of any paragraph with four sentences in The Times.
The pie charts show that The Times yield more constant and regular results due to its shorter range of 3 intervals and 13% . The Independent has a range of 4 intervals and 41% in percentages.
The comparative pie charts allow you to do is to compare the percentage of the total article each interval represents. The charts show that if a certain interval has an equal frequency in The Times and The Independent it has a higher percentage of the total in The Times than The Independent. A frequency of 1 represents 7% in The Times, but it only represents 6% in The Independent.
Using the results from my table I calculated the following.
The Times.
Mean = 2.07 to 2 d.p.
Median = 2
Mode = 2
The Independent
Mean = 2.41 to 2 d.p.
Median = 3
Mode = 3
These results confirm my conclusion as The Independent has a higher mean, median and mode than The Times. If I had more data, it would have reduced the effect of any anomalous results in the calculations.
Evaluation
In measuring anything we are limited in our accuracy by the equipment available and our own human limitations. It is important that we are aware of what error is implied by our measurements and what the maximum possible error is likely to be. Throughout the course all data has been subjected to human error and if there had been more time available I would have repeated my data collection several times to reduce the likelihood of a miscalculation.
The greatest problems encountered during the Coursework, was the lack of time and the lack of data. The lack of data limited two of my hypotheses and the only solution is to use a larger source of data instead of a single article, possible a whole newspaper or simply a longer, more substantial article. In some hypothesis it was easy to collect over 150 samples for use and this eliminated any affect of anomalies in the results.
If more time was available, I would have carried out more thorough and extensive data collection and I would have chosen some more complex hypotheses to create a greater challenge and I believe that they would have been more interesting to research.
Although a more complex hypothesis may expand an investigation, it is also justifiable to say that to improve the results of all the questions a simple method could be employed. The accuracy and reliability of any conclusions would be helped if more data was collected from the papers. This would support more valid conclusions.