10 3 4 10 2 1 8 5 2 3
5 5 2 3 3 3 4 1 7 3
7 5 2 9 2 3 3 3 5 3
2 10 7 4 12 4 2 3 3 5
I have highlighted the numbers with different colours for ease of counting. Now I am going to classify data in a tally chart.
Median - Half of 100 is 50, so the median will correspond to 50th value in order. So the median word length is 3 letters long.
Mode – the modal word length is three letters.
Range – the range is eleven. One to twelve (12 – 1 = 11).
Mean –
Formaulae
Mean word length in the magazine sample was 0,78 letters long.
Name of newspaper:
The Baku Post.
Name of article:
Premier league – city maintain unbeaten home run.
This is the data that I have:
7 7 4 4 2 3 3 7 2 4
6 2 7 3 6 6 6 7 7 2
3 5 2 3 6 2 4 1 5 2
3 6 3 5 5 2 8 4 5 3
5 3 7 6 7 4 5 2 7 8
4 3 4 3 2 4 4 5 7 4
4 5 3 4 2 4 3 7 3 6
7 7 10 7 10 3 5 4 3 3
4 3 4 2 6 7 3 5 5 3
2 9 3 3 2 3 2 8 4 4
I have highlighted the numbers for ease of counting. Now I am going to classify data in a tally chart.
Median - Half of 100 is 50, so the median will correspond to 50th value in order. So the median word length is 4 letters long.
Mode – the modal word length is three letters.
Range – the range is eleven. One to twelve (12 – 1 = 11).
Mean –
Formaulae
Mean word length in the newspaper sample was 0,78 letters long.
From these tables I can quite surprisingly see that the magazine and the newspaper have similar amount of word lengths. This is not very easy to see however and because of this I will display this information in a clearer form.
Fig. 1
These charts show me that both articles have similar amount of word
lengths. However I can just about see that the newspaper has longer
amount of word lengths than the magazine. To get the mean,
mode and medium I will create a cumulative frequency table to see
which has the longer word length.
“IN Baku” magazine:
Now I want to estimate the value of median. I will first draw a cumulative frequency graph.
I will now estimate the value of median by drawing a horizontal line from 50 on the cumulative frequency polygon. Then I will drop a vertical line down to the “No of letters per word” axis and we will be able to read off the value of the median.
After that I will draw the lower quartile (Q1) and the upper quartile (Q3) which I am going to estimate now. I need to do it for finding the interquartile range.
The lower quartile is the th value, i.e. Q1 = 25,25th value.
The upper quartile is the th value, i.e. Q3 = 75,75th value.
As I can see from Fig. 3
The median letters
letters
letters
The interquartile range is the difference between the lower and higher quartiles.
In this case, interquartile range is:
letters
“The Baku Post” newspaper:
I will now estimate the value of median by drawing a horizontal line from 50 on the cumulative frequency polygon. Then I will drop a vertical line down to the “No of letters per word” axis and we will be able to read off the value of the median.
After that I will draw the lower quartile (Q1) and the upper quartile (Q3) which I am going to estimate now. I need to do it for finding the interquartile range.
The lower quartile is the th value, i.e. Q1 = 25,25th value.
The upper quartile is the th value, i.e. Q3 = 75,75th value.
As I can see from Fig. 4
The median letters
letters
letters
The interquartile range is the difference between the lower and higher quartiles.
In this case, interquartile range is:
letters
I was going to draw a box plot but how I can see there is no need for that, as the median, lower quarter, higher quarter and the interquartile range are the same both for the magazine and the newspaper. So I can come to the overview of my investigation.
Results Explanation:
My results show me that there is not a huge difference between word length in a magazine and a newspaper, but the newspaper's word lengths are more fluctuating as you can see in Fig. 1. There is a pattern in both of the frequency polygons and there is the steep rise of the amount of letters in a word, where both investigations peak at two letters words and from that point there is a steady decrease in the amount of letters per word. In both types of printed sources the most commonly used words are the words of three letters long, closely followed by four letters and two letters long words. Both in the magazine and in the newspaper the frequency of words declines as the words get longer. There was no need to draw the box plots as I wanted to do in the beginning of my investigation. This happened because all the approximate data by which the box plot should be built was absolutely same. This would not give any results for my investigation.
Conclusion:
My hypothesis was proven correct by the research I undertook, that is, a sample of one hundred words from a broadsheet newspaper will consist of the same average length of words as a sample of one hundred words from a newspaper. Although I predicted that the most popular word length
would be four letters when it was actually three letters.
I am aware that there is a number of weaknesses around the data that I collected and analysed:
- I have chosen one hundred words, whilst it was enough to work on. I would need a much larger sample to give me confidence about whether my hypothesis was true.
- I have chosen one article from each printed source. It is possible that if I had chosen another article I may have had very different results therefore I should sample more articles.
- I have chosen a magazine and a newspaper from one day. If I had more time I would have chosen a number of different reporters, different days and a range of issues such as, reports and general news items.
Investigation 2.
Investigation of sentence lengths from two different samples of two different types of printed sources, i.e. the magazine and a newspaper.
Hypothesis:
I predict that the most popular sentence length will be sentences with ten to fifteen words. I also think that from the sample of twenty sentences per printed source, newspaper will have longer sentences than the magazine.
Plan:
- I will select two pieces of writing consisting of twenty sentences each from a magazine and a newspaper.
- The topic of each of the samples will be the same, e.g. Spice girls. I will do this so that the two pieces of information will be comparable. By selecting twenty sentences from each article, I will have a proper sample to give me enough information to interpret good results; yet it is small enough to be easy to gather and sort. I will allow names and numbers as similar sorts of information are being used.
- I will then count the amount of words per sentence and put the information into a grouped frequency table as some sentences will be very long and some will be very short.
- In the frequency table I will calculate the mid-point and the cumulative frequency.
- I will then draw a cumulative frequency curve and calculate the interquartile range and the median. I have chosen this because there may be very long sentences, and if we used the mean it would distract the result.
Name of magazine:
IN Baku.
Name of article:
Psy – factor
Median – group 11 – 15
Upper Quartile - Group 21 - 25 (midpoint 22,5)
Lower Quartile - Group 6 - 10 (midpoint 7,5)
Interquartile Range: 22,5 – 7,5 = 15
Name of newspaper:
The Baku Post.
Name of article:
Self – image may affect future weight.
Median – group 16 – 20
Upper Quartile - Group 26 – 30 (midpoint 22,5)
Lower Quartile - Group 11 – 15 (midpoint 7,5)
Interquartile Range: 27,5 – 12,5 = 15
Results Explanation:
My results show me that there is a difference between sentence length in a magazine and a newspaper. There is a pattern in both of the cumulative frequency curves and it is the way that the lower quartile, the median and the upper quartile are all in corresponding groups. The magazine had a median of 11 – 15, whereas the newspaper’s median was 16 – 20 showing us that the newspaper consists of longer sentences. The newspaper has longer sentences implying that it is using more complex language.
I am aware that there is a number of weaknesses around the data that I collected and analysed:
- I have chosen twenty sentences, whilst it was enough to work on. I would need a much larger sample to give me confidence about whether my hypothesis was true.
- I have chosen one article from each printed source. It is possible that if I had chosen another article I may have had very different results therefore I should sample more articles.
- I have chosen a magazine and a newspaper from one day. If I had more time I would have chosen a number of different reporters, different days and a range of issues.
Overall Conclusion:
Overall I have enjoyed my research. I found it very interesting to
compare two printed sources of which one was a magazine and the other a
newspaper, to see if my hypotheses were correct. Most of my hypotheses
were correct and I was surprised that my analysis on first and second hypotheses was
true, as I have never worked, or ever done any research on amount of words or letters in two printed sources before. In any future research I might see how many sentences were in comparative articles to see if different papers restrict information given to the reader. Also in the future I would like to do a research on many more magazines and many more newspapers, from different days and about different subjects.