2) I also believe that the same applies for females and travel by train; with an equal sample, there should still be more females than males travelling by this method
3) Finally, Due to the inconclusive amount of evidence relating to the correlation of age and method of travel, an estimate must be made; I believe that “bike”, “other”, and “motor” – using an equal sample of ages, will be generally more populated by younger ages
Testing the Hypotheses
- Because the data was in PivotChart table format, it would be impossible to delete certain records in order to balance male and female data; therefore the only remaining solution would be to use mathematical ratios in order to calculate the answer to this hypothesis.
There is 1.2626990664470071389346512904997 times the number of females than there are males. I must multiply male values by this number in order to have more accurate results.
There are 3268 males that get to school by bus, and 4468 females. Because there are more females than males, 3268 must be multiplied by the above ratio;
3268 * 1.2626990664470071389346512904997 ≈ 4126.5
4468 > 4127
Therefore, bus travel is used more by females than among males (ages 9-15)
This method can be applied to the hypothesis;
There are 3738 males that walk to school, and 4903 females.
3738 * 1.2626990664470071389346512904997 ≈ 4720
4903 > 4720
Walking to school is more often the method of travel used by females than by males
There are 470 males that use a method of transport not listed in the survey and 650 likewise females.
470 * 1.2626990664470071389346512904997 ≈ 593.5
650 > 594
Therefore, Hypothesis 1 is entirely incorrect.
- Exactly the same basic mathematical method can be applied to this hypothesis to determine if travel by train is more frequent among females.
There are 121 males travelling by train, and 217 females.
121 * 1.2626990664470071389346512904997 ≈ 153
217 > 153
Therefore, travel by train is more recurrent among females than among males, so hypothesis 2 is correct.
- This data has more than 2 separate options in both variables; instead of being a matter of “male” and “female”, it is “9”, “10”, “11”, “12”, “13”, “14”, and “15”. This makes this hypothesis much more complex to test fairly, but the same method can be applied;
6744 ÷ 2327 = 2.8348045397225725094577553593947
Therefore, all data relating to 9-year-olds must be multiplied by 2.8348045397225725094577553593947 in order to be equal to data relating to 13-year-olds
6744 ÷ 4525 = 1.4976795580110497237569060773481
Therefore, all data relating to 10-year-olds must be multiplied by 1.4976795580110497237569060773481 in order to be equal to data relating to 13-year-olds
6744 ÷ 4928 = 1.3685064935064935064935064935065
Therefore, all data relating to 11-year-olds must be multiplied by 1.3685064935064935064935064935065 in order to be equal to data relating to 13-year-olds
6744 ÷ 5511 = 1.2237343494828524768644529123571
Therefore, all data relating to 12-year-olds must be multiplied by 1.2237343494828524768644529123571 in order to be equal to data relating to 13-year-olds
6744 ÷ 6675 = 1.0103370786516853932584269662921
Therefore, all data relating to 14-year-olds must be multiplied by 1.0103370786516853932584269662921 in order to be equal to data relating to 13-year-olds
6744 ÷ 2201 = 3.0640617900954111767378464334393
Therefore, all data relating to 15-year-olds must be multiplied by 3.0640617900954111767378464334393 in order to be equal to data relating to 13-year-olds
After these calculations have been applied to the data, it may be left with an unclear trend. In order to be able to make a fair conclusion on the data, trendlines will need to be added. This can be done via Microsoft excel 2007, which has a feature to add several different types of mathematical trendlines. 3 Types of line in particular are particularly effecient at dealing with this type of data;
Exponential trendlines are lines that use the fewest “squares” possible in order to fit through the equation y=cebx, where c and b are constants, and e is the base of the natural logarithm.
Logarithmic trendlines are lines that use the fewest “squares” possible in order to fit through the equation y=cINx+b, where c and b are constants
Power trendlines are lines that use the fewest “squares” possible in order to fit through the equation y=cxb, where c and b are constants. This is by far the simplest trendline.
By also adding the R2 value to these trend lines, it can easily be seen how much the lines slope and therefore how strong the trend is.
The above chart relates to the number of children of each age who travel to school by bike. This data is unedited.
Upon applying the above calculations to even out the data, making the results fair and conclusive;
The average of these values is 382. The average of the first (youngest) 3 values is 383 and the average of the last (oldest) 3 values is 345.333 – these results indicate a slight correlation suggesting greater popularity among younger ages of travelling to school by bike. Despite this, due to the lack of popularity among 9-year-olds, the trend of the graph is (very slightly) generally upwards;
Exponential:
Logarithmic:
Power:
All trendlines, with the exception of averages, show a very slight general incline towards the older ages. Therefore, I can conclude that the general trend of the data shows that older children generally tend to go to school by bike as opposed to younger children, unless children aged 9 are not taken into account.
The above graph contains unedited data from all of the children aged 9-15 who said that they travel to and/or from school using an “other” method (other than the suggested methods of transport from the survey). The data must be changed using the correct age ratios before it can be used fairly
It is clearly visible from the above chart that there is a general downward trend in this chart, proving this part of the hypothesis correct; the data shows a trend in favour of young children using an “other” method to get to school. Adding trend lines and calculating the R squared value can add further depth to this conclusion:
Exponential:
Logarithmic:
Power:
All 3 graphs show a general downward curve.
Like the other 2 unedited graphs, the above graph is disproportional and therefore unreliable unless the values are mutliplied using the correct ratio’s to make the data fair.
The new version of the graph, similarly to the last edited version of a graph, very clearly shows a downward slant, this time being even more conclusive in proving that the hypothesis was indeed correct. Adding some trend lines to this graph – complete with R-squared values - shows even more information;
Exponential:
Logarithmic:
Power:
Overall, the third hypothesis is almost entirely correct, the “travel by bike” graph the exception in some ways.
Final conclusion and Evaluation
During this project, I have tested a set of data from the internet from a New Zealand census survey. By putting this data into many different kinds of graphs and tables using Microsoft Excel, I was quickly able to spot several very clear and simple trends. I later used more specific and complex graphs and added trend lines while also calculating the R-squared values in order to discover harder-to-spot trends and be more accurate about this. I also overcame several great difficulties that would have otherwise severely hampered the project.
I can conclude that, despite my first hypothesis, females aged 9-15 overall are more likely to walk to school, or use an “other” method of transport (other than car, train, bus, etc), than males aged 9-15. Also, I proved my second hypothesis correct, as females aged 9-15 also happened to be more likely to travel by train than males of the same age group. Finally, I discovered that travel by motor and “other” were both generally more commonplace among younger ages (from the 9-15 range) and that travel by bike is most often made use of by children of ages 10-12, followed by ages 13-15, and finally age 9, showing that it is more common among older children overall, although this would be reversed if children aged 9 were not taken into account.
Microsoft Excel proved to be extremely useful to this project, especially with its very easy-to-use graph maker and trend line tool.
I managed to prove or disprove every hypothesis I put forward, resulting in 1 correct hypothesis, 1 incorrect hypothesis, and 1 hypothesis that had a combination of correct and incorrect ideas. Although many parts of the project were very difficult and somewhat confusing, I am moderately pleased with the results. If I were to do the project again, I would have used the newest version of Microsoft Excel for the entire project, as it is far more efficient than MS Excel XP.