Read All About- Analysis & Data Collection

Statistics Coursework: Read All About It

I am investigating whether or not sentences in broadsheet newspapers contain more words than sentences in tabloid newspapers.

My main hypothesis is that sentences contain more words in broadsheet newspapers compared to tabloid newspapers.

I believe this because broadsheet newspapers are generally targeted at a more intellectual audience and articles in that type of newspapers paper are more detailed and informative.

My sub hypothesis is that there are more pictures per page in a tabloid newspaper compared to a broadsheet newspaper.

To avoid confusion this will also include adverts. This links into my main hypothesis that sentences contain more words in broadsheets; as in a tabloid newspaper more space is given to pictures so there isn’t as much space to write on.

To collect data I will be using two mainstream broadsheet and tabloid newspapers; The Times and The Daily Mail. Bias may occur if a major story breaks that might contain more information than usual; (journalists may have more to say about a devastating natural disaster than an MP’s latest antics) so to try and minimize this I will use two editions of the two newspapers- bringing the total number of newspapers to four. For my main hypothesis I will use simple random sampling, stratified sampling and systematic sampling to collect data. Firstly I will randomly choose 20 pages in each newspaper. Then, after counting all the sentences in the selected article on the page, I will use stratified sampling to find out how many sentences from that article I need to collect data from. For instance if there are 24 sentences in an article, out of a total of 329, then I will collect sample 7 sentences in this article (24/329 x 100= 7.29 rounded down to 7). My aim is to get 100 pieces of data from each newspaper, bringing the total sample to 400 which I believe is a good size. Once I have done this I will then use systematic sampling to choose the specific sentences; so for instance if an article has 15 sentences and I want to test 5 sentences, I will collect data from every 3rd sentence. For articles I will collect the appropriate number of sentences. This way I get a range of sentences throughout the article and my results will not be biased. Systematic sampling is a simple and quick method to select a random sample and it is unlikely that a pattern will occur in a piece of writing. When I am counting the number of words I may encounter problems such as how to count numbers; to solve this, when counting words numbers will be counted as all one word for example: 7528= Seven thousand five hundred and sixty, would be all one word.

I will use simple random sampling for my sub hypothesis. I will randomly choose 20 pages from each newspaper. Then I just have to count how many pictures are on the sample page. There are a variety of problems that could be encountered; for instance whether or not to include diagrams, logos and adverts in the count. I will not count diagrams and logos but to avoid too much confusion I will include pictures that are a part of adverts.

I will not be using cluster sampling, quota sampling, convenience sampling, opinion polls or questionnaires. I am collecting ...

This is a preview of the whole essay

I will not be using cluster sampling, quota sampling, convenience sampling, opinion polls or questionnaires. I am collecting data from newspapers so asking people with opinion polls, questionnaires and convenience sampling would be pointless. Cluster sampling requires the population to be divided into groups and quota sampling requires the data in the sample to be of a particular type and since my raw data is neither of these it would not work.

For my main hypothesis I will use a variety of diagrams. I plan to use histograms, cumulative frequency diagrams, box plots, population pyramids and comparative pie charts. The reason I will be using all these is I expect to get a wide range of data so it will need to be grouped and these are the best diagrams for grouped data. Because all my data is grouped I will not be using scatter graphs as these require ungrouped data and their purpose is to compare the relationship between two variables, which is not what I am trying to find.

These diagrams will also help me find out my calculations. From cumulative frequency diagrams and box plots I will find out the mean, median, mode and interquartile range of all of the four newspapers tested. I will also do Spearman’s Rank to find a correlation between the two types of newspapers and the correlation between the broadsheet and tabloid papers issued on the same day.

Selection And Collection Of Data

I collected all of my data from four newspapers, two issues of the Daily Mail and two issues of The Times. Two issues of the paper meant that results for my main hypothesis would be more accurate, especially if there had happened to be an unusually low amount of words per sentence on that day in one paper- which may lead me to come to the wrong conclusion. For my main hypothesis I collected 100 pieces of data from each newspaper, 400 samples in all. I believe this was a good sample size as it is large enough so the results are reliable and bias is impossible, and also any larger sample would become too time consuming to obtain. In order to get a broad range of news articles (news, sports, columnists etc.) I used simple random sampling and stratified sampling. I used the simple random sampling to get data from 20 pages in each newspaper and this ensured that every page had an equal chance of being chosen. I then used stratified sampling to get a fair proportion of words from each article, so if the article was short I would only sample a couple of sentences from that article; whilst if the article was longer I would sample more sentences. This is a good technique to not get biased results however to completely eliminate that possibility I then used systematic sampling. This is a simple and quick method for me to chose which sentences from the article I would sample. There is the possibility that the data may be unrepresentative if a pattern exists but I found when collecting my data that this was not the case.

For my main hypothesis I found that using all three of these types of sampling was the best way to get a quick sample and it virtually made it impossible to get biased results. I did not choose to use convenience sampling or opinion polls as these require data to be collected from people and this was not necessary for my hypothesis as I was collecting data from newspapers. Cluster sampling and quota sampling could not be used as there were no obvious groupings of data in the newspaper and no way to get data of a particular type. On some pages I found that there were no articles, the whole page was dedicated to adverts so results from this page was unobtainable. My solution to this was to ignore that page and randomly choose another one.

For my sub hypothesis I collected data from 20 pages in each newspaper. To do this I used simple random sampling to select which page I was going to collect from. This was the quickest and easiest method to select a random sample and it works especially well when only a small sample is required, which is what I needed. Also, as with the main hypothesis, every page had an equal chance of being chosen although there was no guarantee that the sample would be unbiased. I found that all of the ways to sample were unnecessary as I did not need to divide the sample into categories and data didn’t need to be obtained from people. As predicted in my plan I encountered the problem of whether or not to include adverts, diagrams, logos etc. As I stated in my plan I did allow pictures that were included in adverts but everything else didn’t count- as it isn’t really a picture. One unexpected problem that I encountered early on was the huge difference in the number of pictures on a page between the two newspapers of the same time. This actually made my data less accurate and there were quite a few anomalies that would affect the mean later on. I also had timing issues so for the sub hypothesis I only collected data from one broadsheet and one tabloid newspaper- not two like I had done for the main hypothesis.

Read All About- Analysis & Data Collection

This is a preview of the whole essay

Document Details

Related Essays

Handling Data Coursework : Read all about it

Read All About It

"Read All About It".

Read All About It Plan