Statistically analyse books or newspapers.
Read All About It
This is my maths statistics coursework, which is called read all about it. The aim of this coursework is for me to statistically analyse books or newspapers.
My Plan
My plan is to compare children's books and adult's books by setting an appropriate hypothesis and then finding relevant data to prove it. I have decided that my hypothesis will be that there are longer words in adult's fiction books than in children's fiction books. I am going to do this by counting the numbers of letters in a word. I am going to use three children's adventure books and three adult romance books, this is to stop bias and make data more reliable by using more than one book. I am going to use books by different authors this is so as to stop being biased. I am going to use the first one hundred words from the middle page from each book; this is also to stop being bias by using random sampling.
With these words, I will make a tally chart to show the number of letters in each word. From the data I am going to find the mean, mode, median and range produce cumulative frequency graphs and box plots for each book and then compare them. I am also going to work out the standard deviation and compare them to show spread.
Results- Lightning Strikes/Adults Book
Number of letters in a word.
Tally
Frequency
Fx=
2 2
2
21 42
3
4 42
4
6 64
5
3 65
6
3 78
7
1 77
8
6 48
9
9
0
10
1
2 22
2
0 0
= 459
Mean= 4.59 Mode=2
Median= 4 Range= 10
Standard Deviation = 15.19
Results- the Obsession/Adults Book
Number of letters in a word
Tally
Frequency
Fx=
1
2
2 24
3
37 111
4
22 88
5
5 25
6
8 42
7
2 14
8
5 40
9
4 36
0
10
1
3 33
=424
Mean=4.24 Mode=3
Median=3 Range= 10
Standard Deviation = 15.54
Results-Leap of Faith/Adults Book
Number of letters in a word
Tally
Frequency
Fx=
5 5
2
0 20
3
36 108
4
8 72
5
5 25
6
9 54
7
9 63
8
3 24
9
2 18
0
2 20
1
0 0
2
12
=421
Mean= 4.21 Mode= 3
Median=3 Range= 11
Standard Deviation = 15.21
Results- Mr Majeika/Children's Book
Number of letters in a word
Tally
Frequency
Fx=
4 4
2
21 42
3
9 57
4
9 76
5
0 50
6
2 72
7
8 56
8
4 32
9
2 18
0
10
=417
Mean= 4.17 Mode=2
Median= 4 Range= 9
Standard Deviation = 14.13
Results- Naughty Amelia Jane/ Children's Books
Number of letters in a word
Tally
Frequency
Fx=
3 3
2
2 24
3
34 102
4
24 96
5
9 45
6
3 78
7
7
8
2 16
9
9
0
10
=390
Mean= 3.9 Mode= 3
Median= 4 Range= 9
Standard Deviation = 16.18
Results- The Hodgeheg/ Children's Book
Number of letters in a word
Tally
Frequency
Fx=
3 3
2
9 38
3
32 96
4
4 56
5
8 90
6
9 54
7
3 21
8
8
9
9
=375
Mean= 3.75 Mode= 3
Median= 3 Range= 8
Standard Deviation = 15.15
Cumulative Frequency Tables
Lightning Strikes
No of letters in a word
Frequency
Cumulative Frequency
2
2
2
21
...
This is a preview of the whole essay
Number of letters in a word
Tally
Frequency
Fx=
3 3
2
9 38
3
32 96
4
4 56
5
8 90
6
9 54
7
3 21
8
8
9
9
=375
Mean= 3.75 Mode= 3
Median= 3 Range= 8
Standard Deviation = 15.15
Cumulative Frequency Tables
Lightning Strikes
No of letters in a word
Frequency
Cumulative Frequency
2
2
2
21
23
3
4
37
4
6
53
5
3
66
6
3
79
7
1
90
8
6
96
9
97
0
98
1
2
00
The Obsesssion
No of letters in a word
Frequency
Cumulative Frequency
2
2
3
3
37
50
4
22
72
5
5
77
6
8
85
7
2
87
8
5
92
9
4
96
0
97
1
3
00
Leap of Faith
No of letters in a word
Frequency
Cumulative Frequency
5
5
2
0
5
3
36
51
4
8
69
5
5
74
6
9
83
7
9
92
8
3
95
9
2
97
0
2
99
1
0
99
2
00
Mr Majeika
No of letters in a word
Frequency
Cumulative Frequency
4
4
2
21
25
3
9
44
4
9
63
5
0
73
6
2
85
7
8
93
8
4
97
9
2
99
0
00
Naughty Ameilia Jane
No of letters in a word
Frequency
Cumulative Frequency
3
3
2
2
5
3
34
49
4
24
73
5
9
82
6
3
95
7
96
8
2
98
9
99
0
00
The Hodgeheg
No of letters in a word
Frequency
Cumulative Frequency
3
3
2
9
22
3
32
54
4
4
68
5
8
86
6
9
95
7
3
98
8
99
9
00
What I have found From The Tally Charts
By calculating the mean from the data in the tables the mean number of words in the adults books are 4.21, 4.24, 4.59, whereas the children's books have lower means, 3.75, 3.9, 4.17. This shows that the adults books do have on average bigger words but not by much.
The ranges of the adults books are 10,11,10, whereas the children's are 8,9,9. This shows that the adult books have a bigger spread of words and the children's books have a closer spread of word length.
The tables show that the adults books still have a lot of smaller words (2,3,4 lettered words) but they have a lot more longer words (8,9,10,11 lettered words) that the children's books do not have.
What I have Found from the Cumulative Frequency Graphs and Box Plots
By looking at the cumulative frequency graphs and the box plots it shows that the adults books have upper quartiles of 4.6, 6, 5.1 whereas the children's books have upper quartiles of 4.4, 4.2, 5.2. Quite clearly overall the adult books have higher upper quartiles.
When looking at the lower quartiles, there doesn't seem to be much difference between the adults and the children's. The adult books lower quartiles are 2, 2.2 and 2.2 and the children's are 2, 1.8 and 2.1. This shows that the adult books lower quartiles are just slightly higher.
The inter quartile ranges of the adults books are 3.1, 3.8 and 2.4. The children's books are 3, 2.8 and 2. This shows that the adult's books have a bigger inter quartile range showing that the data is more widely spread about the median. This proves that the adult's books have more larger words than the children's books.
What I have found from the Standard Deviation
I have used the "divisor n-1" formula since I am working with a sample data set. The standard deviation of the adults books are 15.21, 15.54 and 15.19 and the standard deviation of the children books are 14.13, 16.18 and 15.15. I worked out the mean standard deviation for the adults and the children's to see which is the largest. The adults mean standard deviation is 15.31 and the children books standard deviation is 15.15. This shows that overall the adults books have a higher standard deviation. This means that the adult books have a wider spread, proving that the adult books have slightly larger words.
Conclusion
All this information proves that adult books do have longer words than children books but adult books still have quite a few shorter words in them. Although children books have quite long words (7,8,9 lettered words) the adults longer words are even bigger (9,10,11,12 lettered words). So I have came to the conclusion that my hypothesis is correct.
Developing my hypothesis
I am now beginning to wonder if the word length also changes the size of sentence length. So to see if this correct I am going to change my hypothesis to 'there are sentences in adult fiction books than in children's fiction books.'
I am going to collect my data from the same books as I used last time on the same page this is again to avoid bias. I am going to take the first 15 sentences from the page and record my results in tally charts and then produce frequency polygons to show my results.
Data
Lightning strikes
No of words in a sentence
Mid Point
Tally
Frequency
Fx=
-5
3
4 12
6-10
8
3 24
1-15
3
3 39
6-20
8
2 36
21-25
23
2 46
26-35
30.5
30.5
Total= 187.5
Mean= 12.5 Mode= 1-5
Median= 11-15
The Obsession
No of words in a sentence
Mid Point
Tally
Frequency
Fx=
-5
3
4 12
6-10
8
7 56
1-15
3
13
6-20
8
2 36
21-25
23
23
26-35
30.5
0 0
Total= 140
Mean= 9.33 Mode= 6-10
Median= 6-10
Leap of Faith
No of words in a sentence
Mid Point
Tally
Frequency
Fx=
-5
3
2 6
6-10
8
3 24
1-15
3
3 39
6-20
8
4 72
21-25
23
2 46
26-35
30.5
30.5
Total= 217.5
Mean= 14.5 Mode= 16-20
Median= 11-15
Mr Majeika
No of words in a sentence
Mid Point
Tally
Frequency
Fx=
-5
3
5 15
6-10
8
2 16
1-15
3
2 26
6-20
8
3 54
21-25
23
23
26-35
30.5
2 61
Total=195
Mean= 13 Mode= 1-5
Median= 11-15
Naughty Amelia Jane
No of words in a sentence
Mid Point
Tally
Frequency
Fx=
-5
3
3
6-10
8
6 48
1-15
3
5 65
6-20
8
0 0
21-25
23
23
26-35
30.5
2 61
Total= 200
Mean= 13.33 Mode= 6-10
Median= 11-15
The Hodgeheg
No of words in a sentence
Mid Point
Tally
Frequency
Fx=
-5
3
4 12
6-10
8
4 32
1-15
3
4 52
6-20
8
18
21-25
23
23
26-35
30.5
30.5
Total=167.5
Mean= 11.16 Mode= 1-5, 6-10, 11-15
Median= 6-10
What I have found From the Tally Charts
By calculating the means from the grouped data, the adult book means are 12.5, 9.33, 14.5 and the children book means are 13,13.33 and 11.16. From this, I can see that one of the adult books has the highest mean out of all the books and one has the lowest out of all of them. This shows that it might not always be necessary to have longer sentences for adults and shorter ones for children. All of the children books have quite high means showing that the children books do have quite long sentences.
By looking at the modes from the adults: Lightning Strikes 1-5, The Obsession 6-10, Leap of Faith 16-20 and those of the children books: Mr Majeika 1-5, Naughty Amelia Jane 6-10 and The Hodgeheg 1-5, 6-10, 11-15. It shows that overall the adults books have higher modes. This shows that the adult books have a higher amount of words per sentence.
Developing my Hypothesis
I am now beginning to wonder if the word length changes the length of the sentences. So to see if this is correct I am going to try a new hypothesis, which is 'there are longer sentences in adult books than in children books'.
I am going to collect my data from the same books as before and again from the same pages, which is again to avoid bias. I am going to use the first 30 sentences from that page and record my results in tally charts I will then work out the mean, median and modal class. Then I will produce frequency polygons to show my results. I will also work out the standard deviation to find out the spread of the data.
Mr Majeika- Children books
Number of words in a sentence
Mid Point
Tally
Frequency
Fx
-5
3
7
21
6-10
8
9
72
1-15
3
7
91
6-20
8
4
72
21-25
23
23
26-35
30.5
2
61
Total= 340
Mean = 11.33 Modal class = 6-10
Median = 6-10 Standard Deviation = 23.78
Naughty Amelia Jane- Childrens Books
No of words in a sentence
Mid Point
Tally
Frequency
Fx
-5
3
9
27
6-10
8
7
56
1-15
3
7
91
6-20
8
2
36
21-25
23
23
26-35
30.5
4
22
Total= 355
Mean = 11.83 Modal class= 1-5
Median = 6-10 Standard Deviation = 27.1
The Hodgeheg- Childrens books
No of words in a sentence
Mid Point
Tally
Frequency
fx
-5
3
0
30
6-10
8
8
64
1-15
3
4
52
6-20
8
3
54
21-25
23
3
69
26-35
30.5
2
61
Total= 330
Mean = 11 Modal Class= 1-5
Median= 6-10 Standard Deviation= 20.81
The Obsession- Adult book
No of words in a sentence
Mid Point
Tally
Frequency
Fx
-5
3
5
5
6-10
8
2
96
1-15
3
5
65
6-20
8
5
90
21-25
23
2
46
26-30
30.5
30.5
Total=342.5
Mean= 11.42 Modal Class= 6-10
Median= 6-10 Standard Deviation= 24.76
Leap of Faith- Adult Book
No of words in a sentence
Mid point
Tally
Frequency
Fx
-5
3
3
6-10
8
7
56
1-15
3
5
63
6-20
8
0
80
21-25
23
5
15
26-35
30.5
2
61
Total = 478
Mean= 15.93 Modal Class= 6-10
Median= 16-20 Standard Deviation = 38.4
Lightning Strikes- Adult Book
No of words in a sentence
Mid Point
Tally
Frequency
Fx
-5
3
6
8
6-10
8
5
40
1-15
3
9
17
6-20
8
3
54
21-25
23
4
92
26-35
30.5
3
91.5
Total= 412.5
Mean= 13.75 Modal Class= 11-15
Median= 11-15 Standard Deviation= 29.45
What I found by looking at the mean, mode and medians
The adult book means are 11.42, 13.75 and 15.93 whereas the children book means are 11, 11.83 and 11.33. This clearly shows that the adult books have higher means, which proves that on average the adult books have a higher frequency of words in a sentence.
The adult book modes are 11-15, 6-10 and 6-10 whereas the children book modes are 1-5, 1-5 and 6-10.This shows that overall adult books have higher modes which also proves that adult books have longer sentences.
The medians of the adult books are 11-15, 6-10 and 16-20 whereas for all the children books the median is 6-10. So overall the adult books have higher medians, which proves that adult books have longer sentences than children books do.
What I have found from the frequency polygons
When looking at the frequency polygons I can see that all the children books start at a high frequency and then go to a lower frequency as the number of words in a sentence increases. This shows that the children books have more shorter sentences than they do longer sentences.
The adult books graphs are much more up and down all the way through the graphs. This shows that the adult books have many more longer sentences than the children books have, but they still have a lot of shorter sentences.
What I have found out from the standard deviation
Again I have used the 'divisor n-1 formula' since it is only a sample set of data.
The standard deviation of the children books are 20.81, 27.1 and 23.78 which works out at an average of 23.9. The standard deviation for the adult books are 38.4, 29.45 and 24.76, which works out at an average of 30.87. This shows that the adults books have a higher standard deviation, which proves that the adult books have a wider spread than the children books have. To prove that this is correct I will produce two cumulative frequency graphs; one for the adults and one for the children's.
What I have found from the cumulative frequency graphs
The inter quartile range of the adult books is 11.5 whereas the children books inter quartile range is 10. This clearly shows that the adult books have a higher inter quartile range, which proves that the adult books have a wider spread.
The adult upper quartile is 18.5 whereas the children's upper quartile is 15. This shows that the adult books have got longer words than the children's because its upper quartile is higher
The lower quartile of the adult books is 7 whereas the lower quartile of the children books is 5, so this means that children books have shorter sentences because the lower quartile is lower.
Overall Conclusion
From all this information I have came to the strong conclusion that my hypothesis is correct because all the information has pointed to adult books having longer sentences than children books. I have also came to the conclusion that when books have longer words they also have longer sentences. I have came to this conclusion because the adult books have the longer words and also have the longest sentences. This also means that children's books do have shorter words and shorter sentences, which is what you would expect to normally find to help them to learn to read.
Limitations and Improvements
The limitations of this investigation were that we had a limited amount of time and I only had a limited amount of books available to me. So this means that both hypothesis are not conclusive because this is only a small sample taken. To make this investigation better, I could have looked at a wider range of books by lots more different authors, which would have made bias even more unlikely to happen. I would also use more of the data out of the books from more pages, this would also make bias much more unlikely to happen. I could also investigate page length, paragraph length or the amount of pages in a chapter.