The kurtosis index is low (0.14) suggesting a hypo-normal (more flat) distribution relative to a normal distribution. In other words, the data have a flat top near the mean rather than a sharp peak.
- Gender
The data inferred that there were more males than females on TITANIC. More than half of the population of TITANIC was male compared to only about 40% of the passengers who were female.
- Ticket Fare
There were considerable variances in ticket fare on TITANIC.
The minimum ticket price was 0 which means that some passengers did not pay anything for the trip. Average ticket price was about 33 dollars which is quite low considering the most expensive fare was 512 dollars. That is because 75% of the passengers (3rd quartile) bought the ticket below the price of 31.27 dollars, 25% of them (the rest) paid from 32 dollars to 512 dollars. One of the main reasons the mean is not an appropriate measure is its sensitivity to extreme values. This is certainly the case with our data which comprises some extreme prices.
Therefore, we can see the outlier (512) in the box plot which is way higher compared to the mean of the ticket fare. This means that a very select clientele paid extremely much, perhaps being offered the appropriate luxury. Additionally the distribution is asymmetric, rightward (positive) skewed because the mean is greater than the median; general equilibrium is much higher the middle number. Upper limit is quite wide since, 3rd quartile is high, respectively 1st quartile is only 7 dollars.
As we can see in Histogram above, the majority of the passengers bought their ticket under the 50 dollars threshold. Between 32 dollars and 512 dollars ticket, many of them sold by price about 200-250 dollars. In fact only 4 people got the 512 dollar ticket out of 1309 passengers. As a result, most of the passengers on TITANIC had comparatively cheap tickets, even though a select few enjoyed the luxury travel that an expensive ticket bought them. Below, here’s a more representative Histogram of fares, one in which we have purposely omitted these luxury ticket prices, and we have increased the number of classes:
Other Statistics:
The high kurtosis (27.028) tells us that we are dealing with a hyper-normal distribution, which means that the TITANIC data is very peaked relative to a normal distribution. More specifically, the data have a distinct peak near the mean and then decline rapidly, having heavy tails.
- Survival ratio
The most important aspect of our analysis of the TITANIC data concerned the survival ratio and if there was a statistical link between this ratio and other variables discussed above. Analyzing this parameter we treated the non-survival event with the value 0 and the survival event with the value 1; average ratio of the survival was only 0.382. In other words, only 500 people survived out of the total of 1309.
2. Comparison
- Age and Survival
We have compared the age variation of those who survived with the age variation of those who did not to see if and how survival depends on the age of the population of TITANIC. We have found that survival does not depend on the gender of the people as the statistics are extremely similar in both cases. The average age of the survival group was about 29, with a variance of 226.3, while the deceased group had, on average, about 30 years, with a variance of 193.5. Standard deviation was 15 and 13.9 respectively.
- Gender and Survival
We also compared the relative frequency of gender and survival, to see which gender group formed which category. Here is what we found:
Basically, 68% of the survivals were females while 32% were males. On the other hand, 16% of the passengers that did not survive were female, while 84% were males. However, from this statistic we cannot say in particular that survival depended on one gender or another.
- Ticket Price and Survival
Last, we compared ticket price paid and survival, to see if there is any dependency between the two, or more importantly to answer the question: Did the passengers of TITANIC buy a ticket to “survival”?
We found out that there was a difference in the prices of those who survived, namely they paid more for their tickets. However, we cannot infer that survival depended on the ticket price because of the high variability of those who survived. The maximum price paid by a survivor was 512 while the maximum paid by a non-survivor was 263. This doesn’t tell us much except that all 4 people who paid the extreme amount of 512 survived.
On average, the survivors paid more than double for their prices but again, the average was influenced heavily by the outliers.
- Conclusion
By making a simple analysis of the general information of TITANIC, we were able to describe the population of the ship in terms of several variables. To summarize, the population was relatively young, more than half of them were males who did not pay too much for their tickets. This fits with the stereotype of the poor young man, in his 20s, looking for a new life in the land of the dreams. Unfortunately, survival ratio from TITANIC was not very high, only slightly less than 40% of the passengers were able to further pursue their dreams.