Part two
- Find reading ages of both texts
- Say how this ties in with the results from part one and if hypothesis was correct
Part three
- Find out the cost of both books and compare, refer to the hypothesis
‘I predict that the word length will be longer in Tarzan of the Apes will be longer than in Disney’s Tarzan.’
To get the data I recorded the first one hundred words from both texts.
Disney’s Tarzan
4, 5, 2, 3, 4, 5, 7, 3, 4, 2, 3, 6, 1, 4, 3, 6, 3, 4, 3, 2, 3, 3, 7, 3, 8, 1, 9, 4, 4, 5, 6, 2, 1 , 5, 6, 5, 2, 1, 5, 6, 5, 3, 6, 5, 4, 4, 5, 6, 4, 4, 3, 5, 4, 3, 4, 4, 3, 4, 5, 1, 5, 4, 3, 5, 4, 3, 6, 1, 2, 5, 3, 5, 6, 5, 3, 8, 4, 2, 7, 2, 3, 4, 6, 3, 3, 3, 5, 2, 5, 3, 4, 3, 4, 3, 4, 3, 2, 6, 4, 3,
Tarzan of the Apes
1, 4, 4, 5, 4, 4, 4, 3, 4, 3, 3, 3, 3, 7, 1, 4, 6, 4, 6, 4, 9, 9, 3, 4, 8, 5, 3, 8, 3, 3, 9, 3, 2, 4, 4, 10, 10, 8, 8, 5, 5, 8, 4, 3, 7, 3, 2, 4, 7, 5, 4, 2, 9, 4, 10, 4, 2, 4, 4, 4, 3, 4, 1, 2, 3, 4, 3, 5, 2, 4, 4, 3, 1, 4, 3, 11, 6, 2, 8, 3, 4, 7, 2, 2, 2, 3, 1, 4, 6, 10, 5, 4, 10, 3, 7, 8, 4, 4, 5.
Mean
This is the average, and requires more calculation than the other two measures. You have to add all of the values and the divide this by the number of values you added together.
Disney’s Tarzan
Total: 402
Total number of values: 100
Mean: 4.02
So to one decimal place, the mean word length is 4 letters.
Tarzan of the Apes
Total: 467
Total number of values: 100
Mean: 4.67
So to one decimal place, the mean word length is 5 letters.
This shows that on average, Tarzan of the Apes has greater length words than Disney's Tarzan has.
Mode
This is the most frequently occurring value and the easiest to obtain, it’s simply the value that occurs most in the set of data.
I am going to put the results into a tally chart for each book, making it easy to view which is most common.
Disney's Tarzan
This shows that is the most common word length.
Tarzan of the Apes
This shows that is the most common word length.
This shows that D.T has a lower modal word length than T. of the Apes.
Median
This is the central value in the data when it is arranged in numerical order.
So we need arrange our values into numerical order, and then find the middle value.
Because our word number is 100, an even number, we need to add up the 2 middle values and divide by 2, unless the 2 middle values are the same.
Disney's Tarzan
1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 8 9
The middle values are both 4, the median value is four.
Tarzan of Apes
1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 10 11
The middle values are both 4, the median value is four.
This shows that both of the sets of data have median values of four. I expected that the word length of Tarzan of the Apes would be greater, as its words where longer in general. I could have been proved wrong because although it has longer words, the shorter words are still most common and the middle value, because even though I assumed it would be a more challenging read it still needs basic words, e.g. that.
Range
Range is the difference in between the highest and lowest value of the data. We get an idea of the scale of difference from the most basic words to longer more complicated words.
Disney's Tarzan
Greatest value : 9
Smallest value: 1
Range: 8
Tarzan of the Apes
Greatest value : 11
Smallest value: 1
Range: 10
Tarzan of the Apes has a greater range, both of them start at 1, so it shows that this book has longer words in them.
Cumulative Frequency
This is a running total of all the frequencies.
I have drawn a cumulative frequency diagram and table for both sets of data.
Inter quartile range
This is the difference between the upper and lower quartiles. This tells you how spread out the central half of the data is. Only the middle half of the data is used so the inter quartile range is not effected by extreme values.
I worked this out for both sets of data on my cumulative frequency graphs.
Disney's Tarzan
Deviation of a mean
This is the difference between a particular value and the mean is the deviation from the mean for that value. If you use x for the value and x for the mean, then the deviation from the mean is x- x.
This shows that, Tarzan of the Apes has a higher median word length that Disney’s Tarzan. It shows that Tarzan of the Apes has a higher upper quartile, this suggests longer words. Disney’s Tarzan has a lower, lower quartile, this suggesting that it has shorter words. These results could differ from the original calculations because of not 100% accuracy on the graph or incorrect plotting.
I predict that the reading age of Disney’s Tarzan will be lower than the reading age of Tarzan of the Apes.
This shows that my hypothesis was correct. This shows that Tarzan of the Apes is exactly on the border line for ages seventeen to eighteen, showing us that it is defiantly an adult’s text.
Disney’s Tarzan is in the age eleven section, showing us this is definitely a children book. Tarzan of the apes has a higher reading age then Disney’s Tarzan, as expected.
Box and Whisker Diagrams
A box and whisker diagram (or box plot) is a diagram showing the of a set of data.
For example, if heights of 16 year old pupils are taken, the box and whisker diagram might look like this:
This sort of diagram can also be drawn horizontally.
I have drawn a box and whisker plot to represent length of words for both of my books.Using the data from my cumulative frequency charts.
Disney’s Tarzan
Tarzan of the Apes
This shows us that Tarzan of the Apes has the biggest range of word length, has the greater median, Disney’s Tarzan a lower, lower quartile but that Tarzan of the Apes has the biggest upper quartile.
Conclusion
Hypothesis 1- I predict that the word length will be longer in Tarzan of the Apes will be longer than in Disney’s Tarzan.
When I was finding the averages, i.e. the mean mode and median my results where as I expected.
Disney’s Tarzan had a lower mean word length, it’s had a lower modal value in comparison with Tarzan of the Apes. However, I expected that the median value would be greater on Tarzan of the Apes however, these values where equal.
Disney’s Tarzans longest word was of 9 letters, giving it a range of eight, however Tarzan of the Apes longest word was 11, giving it a range of ten, this shows that the word length was longer.
When I investigated cumulative frequency and box plots I found that Tarzan of the Apes had a higher median and upper quartile bounds than Disney’s Tarzan, and that Disney's Tarzan had a smaller lower quartile bounds, suggesting that word length was longer in Tarzan of the Apes.
When investigating standard deviation…
All of the testing and evidence that I have conducted and collected backs up my prediction that the word length in Tarzan of the Apes would be longer than in Disney's Tarzan.
Hypotheses 2 - I predict that the reading age of Disney’s Tarzan will be lower than the reading age of Tarzan of the Apes.
When I was finding the data in order to plot the reading test, I found that the average number of sentences per 100 words was much lower in Tarzan of the Apes, suggesting that this would be more difficult to read. I also found that the number of syllables in a sentence per one hundred words was also a lot higher, showing that longer words where being used.
When I transferred my data on to the graph I found that Tarzan of the Apes had a much higher reading age than Disney's Tarzan when they where compared with each other. Tarzan of the Apes was for adults and Disney's Tarzan was for children around eleven. ,
This investigation proves that my hypothesis was correct.
Hypothesis 3 - I predict that the longer adult novel will cost more than the short Disney book
I predicted this because I new that the adult book had a lot more pages and words per page than the Disney book, so would cost more to print. So when I investigated the prices of the book and compared them, I found that Tarzan of the Apes cost more at its R.R.P than the Disney book.
I predict that the reading age of Disney’s Tarzan will be lower than the reading age of Tarzan of the Apes.
I searched on the internet to find a way to test a book for its reading age. This is what I found:
Fry Readability Graph
Select samples of 100 words.
(i) Find y, the average number of sentences per 100-word passage (calculating to the nearest tenth).
(ii) Find x, the average number of syllables per 100-word sample.
Then use the Fry graph (below) to determine the reading age, in years.
This test is suitable for all ages, from infant to upper secondary.
The curve represents normal texts. Points below the curve imply longer than average sentence lengths. Points above the curve represent text with a more difficult vocabulary.
I predict that the longer adult novel will cost more than the short Disney book
On the back of Disney's Tarzan it says the R.R.P is £1.99. The novel didn’t have a price on the back, so I went into a local book store and found that the R.R.P was £4.99 for a paper back copy.
The children’s shorter book cost less than the adults much longer book, however the children’s book was much shorter it was in complete colour and had printed illustrations all the way through.
Despite limitations, my hypothesis was correct the adult’s book was more expensive, most likely due to printing costs because of its size in comparison to the small children’s book.