By Samuel Bell
Data Handling Coursework
Newspaper Comparisons
Aim: In this coursework I am going to compare the readability of three newspapers, a broadsheet (The Times), an informative tabloid (Daily Mail), and a tabloid (Daily Mirror.)
Hypothesis: 1) In this coursework I predict that broadsheet newspapers use longer words than tabloid newspapers
2) In this coursework I predict that broad sheet newspapers have longer sentences than tabloid newspapers.
Planning: In my coursework I am going to look at three different newspapers. These are different types: Broadsheet and Tabloids. This will give me a reliable source to compare my results with. I purchased these three papers on Thursday, March 18th 2004. I chose Thursday because it will have normal news. Whereas on a Saturday or Monday there will be more sport or entertainment. This may affect the readability as it will be targeted at a particular audience, not just general. I have taken these papers from the same day. This will make the investigation fairer as there will be no extra news (e.g. sport on Monday.) Also the articles will be similar stories so it will be a fair comparison.
I will divide the newspapers into four sections: News, Entertainment, Sport, and Other. This will give me a fair, wide range of results. After I have counted how many pages are in each section, I will use Stratified sampling. Stratified sampling will allow me to determine how many words I will randomly select from each section. I will use simple random sampling. To do this I will use the Ran# button on my calculator to select a page from a section. I will do the same for number of articles on that page, then paragraphs and finally a word. This will be my starting point. From there I will count the amount of words I have worked out using stratified sampling. If I come to the end of the article, I will go back to the top and carry on counting there.
To analyse the results from hypothesis one, I will firstly record the data in a tally chart, forming a frequency table. I will then create an ordered stem and leaf diagram. This is good because it shows the distribution and retains detail of the data. From the stem and leaf diagram I will calculate the median (Q2), the lower quartile (Q1), the upper quartile (Q3), and the inter-quartile range (Q3-Q1). I will do this to eliminate bias which is caused by extreme values. From these values I will make box and whisker diagrams for each paper. From this I will be able to identify outliers, these are extreme values. These are values which lie 1.5 times above the lower and upper quartiles. Any values that lie outside this area are outlier. I will include these use these. Box plots represent the data clearly so comparisons can be made easily.
For the second hypothesis I will record the raw data in a tally chart, which I will transfer into a frequency table, from this I can calculate the mean and standard deviation. Standard deviation allows me to find the spread of the data from the mean. I will then draw a Cumulative frequency graph for easy comparison of results. It also allows me to read off medians and quartiles.
Pre-test - I am going to perform a pre-test. I will take 30 words, randomly from each newspaper; this is to identify any possible problems.
Pre-test for the times:
Number of letters
Tally
Frequency
2
4
3
5
4
2
5
6
6
2
7
3
8
9
3
0
1
2
0
3
0
4
5
0
Total:
30
Number of letters
Tally
Frequency
0
2
7
3
6
4
4
5
3
6
7
5
8
9
0
2
1
0
2
0
3
0
4
0
5
0
Total:
30
Pre-test for the Daily Mail:
Pre-test for the Mirror:
Number of letters
Tally
Frequency
2
2
5
3
7
4
6
5
4
6
3
7
2
8
0
9
0
0
1
0
2
0
3
0
4
0
5
0
Total:
30
Problems I encountered: During my pre-test I encountered a few problems. I am going to draw these into a table an have the solutions next to them.
Problems encountered:
Solutions:
Titles:
I will not include the headlines or titles because often extravagant words.
Names:
I will not include names because often long or short.
Places:
I will not include places because often long or short.
Numbers:
I will not include numbers written as: 10, but written: Ten will be included.
Supplements:
I will not include supplements as they are different subjects
Captions:
I will not include these as they are often summarised or abbreviated
Adverts:
I will not include adverts as these are not articles.
Data Handling Coursework
Newspaper Comparisons
Aim: In this coursework I am going to compare the readability of three newspapers, a broadsheet (The Times), an informative tabloid (Daily Mail), and a tabloid (Daily Mirror.)
Hypothesis: 1) In this coursework I predict that broadsheet newspapers use longer words than tabloid newspapers
2) In this coursework I predict that broad sheet newspapers have longer sentences than tabloid newspapers.
Planning: In my coursework I am going to look at three different newspapers. These are different types: Broadsheet and Tabloids. This will give me a reliable source to compare my results with. I purchased these three papers on Thursday, March 18th 2004. I chose Thursday because it will have normal news. Whereas on a Saturday or Monday there will be more sport or entertainment. This may affect the readability as it will be targeted at a particular audience, not just general. I have taken these papers from the same day. This will make the investigation fairer as there will be no extra news (e.g. sport on Monday.) Also the articles will be similar stories so it will be a fair comparison.
I will divide the newspapers into four sections: News, Entertainment, Sport, and Other. This will give me a fair, wide range of results. After I have counted how many pages are in each section, I will use Stratified sampling. Stratified sampling will allow me to determine how many words I will randomly select from each section. I will use simple random sampling. To do this I will use the Ran# button on my calculator to select a page from a section. I will do the same for number of articles on that page, then paragraphs and finally a word. This will be my starting point. From there I will count the amount of words I have worked out using stratified sampling. If I come to the end of the article, I will go back to the top and carry on counting there.
To analyse the results from hypothesis one, I will firstly record the data in a tally chart, forming a frequency table. I will then create an ordered stem and leaf diagram. This is good because it shows the distribution and retains detail of the data. From the stem and leaf diagram I will calculate the median (Q2), the lower quartile (Q1), the upper quartile (Q3), and the inter-quartile range (Q3-Q1). I will do this to eliminate bias which is caused by extreme values. From these values I will make box and whisker diagrams for each paper. From this I will be able to identify outliers, these are extreme values. These are values which lie 1.5 times above the lower and upper quartiles. Any values that lie outside this area are outlier. I will include these use these. Box plots represent the data clearly so comparisons can be made easily.
For the second hypothesis I will record the raw data in a tally chart, which I will transfer into a frequency table, from this I can calculate the mean and standard deviation. Standard deviation allows me to find the spread of the data from the mean. I will then draw a Cumulative frequency graph for easy comparison of results. It also allows me to read off medians and quartiles.
Pre-test - I am going to perform a pre-test. I will take 30 words, randomly from each newspaper; this is to identify any possible problems.
Pre-test for the times:
Number of letters
Tally
Frequency
2
4
3
5
4
2
5
6
6
2
7
3
8
9
3
0
1
2
0
3
0
4
5
0
Total:
30
Number of letters
Tally
Frequency
0
2
7
3
6
4
4
5
3
6
7
5
8
9
0
2
1
0
2
0
3
0
4
0
5
0
Total:
30
Pre-test for the Daily Mail:
Pre-test for the Mirror:
Number of letters
Tally
Frequency
2
2
5
3
7
4
6
5
4
6
3
7
2
8
0
9
0
0
1
0
2
0
3
0
4
0
5
0
Total:
30
Problems I encountered: During my pre-test I encountered a few problems. I am going to draw these into a table an have the solutions next to them.
Problems encountered:
Solutions:
Titles:
I will not include the headlines or titles because often extravagant words.
Names:
I will not include names because often long or short.
Places:
I will not include places because often long or short.
Numbers:
I will not include numbers written as: 10, but written: Ten will be included.
Supplements:
I will not include supplements as they are different subjects
Captions:
I will not include these as they are often summarised or abbreviated
Adverts:
I will not include adverts as these are not articles.