How to lie with statistics
How to lie with statistics using statistics in media is a good way to persuade people Nowadays, we are all becoming aware of the fact that using statistics in media is a good way to persuade people. The only problem is that statistics can be used to manipulate data in the way we want. Some of the techniques for data manipulation are going to be uncovered in this article. As a result, people, who believe in statistics from media, are in danger of being manipulated. Mark Twain once said: "there are three types of lies: lies, bare lies and statistics". So the main argument of this essay is: "statistical analysis is a mathematical way of making some inference about the data or summarizing it, hence the data analysed using formal methods is unbiased". In the first passage I would like to discuss the fact that averages hide a lot of information. Afterwards, I would like to present a case of O.J. Simpson and incorrect interpretation of conditional probabilities, which is a good example of unwarranted assumptions or in particular Black-and-White thinking. Final argument is how important is to make correct assumptions also known as false or misleading presuppositions. The first argument is that: “there are several types of averages such as median, mode and mean, so averages hide a lot of information” . Even though in most of the cases arithmetic mean is used as an average, other two averages can be used as well. To prove the argument above, let’s assume we are given a collection of values {1,2,3,4,20,20,20}. Then, arithmetic mean is the sum of all values from the set, divided by the number of values. Mode is the most frequent value in the set. Median is middle value if all the values from the set are sorted. In this particular example, arithmetic mean, mode
and median are 10, 20, and 4 respectively. Therefore, any of these three values can be used as an average. Let's consider Simpson's paradox, where the rate for the aggregate is very different from the rates for the sub-groups, which is another good manifestation that averages hide a lot of information. As a result, we have at least two examples how statistics can mislead people. Second argument is that: “majority of people are not aware how to use formal methods, so common sense is a good substitute for the former”. This argument is unsound, because it is invalid, which is ...
This is a preview of the whole essay
and median are 10, 20, and 4 respectively. Therefore, any of these three values can be used as an average. Let's consider Simpson's paradox, where the rate for the aggregate is very different from the rates for the sub-groups, which is another good manifestation that averages hide a lot of information. As a result, we have at least two examples how statistics can mislead people. Second argument is that: “majority of people are not aware how to use formal methods, so common sense is a good substitute for the former”. This argument is unsound, because it is invalid, which is going to be showed below. A good example for this is the O.J.Simpson murder case: a former American football star and actor, was brought to trial for the 1994 murder of his ex-wife and her friend. All evidences were against him. The main statement, which was used to defend O.J.Simpson, was as follows. By official statistics, from a reliable source, for that period of time husbands, who beat their wives, only 30% of them kill their wives. Therefore, majority of the jury members were under impression that 70% of husbands, who beat their wives, don't kill them. This claim played crucial role in the final decision of the jury and O.J.Simpson was acquitted in 1995. Let’s analyse this case more formally, instead of using common sense. But for that we need to introduce some probability theory concepts and theorems. Conditional probabilities are quite simple concept in Probability theory, even though there are some cases when people misinterpret the probabilities, which lead to invalid conclusions. Conditional probability P(A|B) is a probability of event A occurring given that event B has occurred. Theorem that we are going to use in our analysis is Law of Total Probabilities, which claims that, for events A and B the probability of event A is the probability of event A given event B has occurred plus probability of event A given that event B has not occurred, i.e. P(A) = P(A|B) + P(A|not B) Taking into account Law of Total Probabilities we will get that: P(Simpson killed his wife)= P(Simpson killed his wife | he beat her) + P(Simpson killed his wife | he didn't beat her) P(Simpson didn't kill his wife)= P(Simpson didn't kill his wife | he beat her) + P(Simpson didn't kill his wife | he didn't beat her) By axiom, the sum of probabilities add up to 1. Therefore, 1 = P(Simpson killed his wife) + P(Simpson didn't kill his wife) So now if we want to calculate the probability that Simpson is innocent, we need to substitute the statistics used for defending O.J.Simpson in the trial and rearrange our equations. Therefore, we get 0.7 = P(killed|not beat) + P(not killed|beat) + P(not killed|not beat) Hence, P(not killed|beat) = 0.7 - P(killed|not beat) - P(not killed|not beat) This probability can tend to zero if either of P(killed|not beat) or P(not killed|not beat) will tend to 0.7. In the example above I showed that probability of O.J. Simpson being innocent can tend to zero in case we interpret conditional probabilities in a formal way, instead of using our common sense. So now, we can use should pattern as follows. Using formal methods for interpreting conditional probabilities, instead of common sense, will achieve a justified conclusion. When it comes to interpreting conditional probabilities, formal methods is the best known way to achieve justified conclusion. All things considered, using formal methods for interpreting conditional probabilities (and achieving a justified conclusion) is better than not achieving a justified conclusion. Therefore, we should use formal methods for interpreting conditional probabilities. Noam Chomsky has infamously stated, "There is no such thing as the probability of a sentence." For that, he is roundly mocked by computational linguists, which brings us to the final argument I want to discuss in this essay is as follows: “probability of a sentence is zero, so philosophy is dead!” Let's not take into account the fact that word dead in context of philosophy is quite ambiguous. As the former President of the USA, Bill Clinton (August 17, 1998) said: "That depends on what the meaning of 'is' is". Therefore, in order to disambiguate the argument, I am going to explicitly say what I mean by word dead, is that there does not exist any philosophy at all. Important assumption is that sentences are uniformly distributed. As in the case of a fair coin or a dice, probabilities are 1/2 and 1/6 respectively, where 2 and 6 are the number of possible outcomes. As a result, probability of a sentence is one divided by the number of outcomes (number of possible sentences). We can easily show that number of sentences in any natural language is infinite. Natural numbers are a subset of the set of all possible words, while the later is a subset of the set of all possible sentences. Archimedes in 250 BC proved that there is no upper bound for natural numbers; in other words, number of natural numbers is infinite. Therefore, number of possible sentences is infinite. Then probability of a sentence is mathematical limit of one divided by the number of sentences in a natural language. Since number of sentences in a natural language tends to infinity, hence the limit is zero. So we proved that probability of a sentence is zero. If probability of a sentence is zero then we cannot make any sentence. If we cannot make any sentence then we cannot speak. If we cannot speak then philosophy is dead. By Hypothetical syllogism we claim that: "If probability of a sentence is zero then philosophy is dead". Applying Modus ponens we get that philosophy is dead, since as it is shown above that probability of a sentence is zero. To summarise, we almost formally proved that philosophy is dead! What went wrong in the argument, where we concluded that philosophy is dead? Even though all the steps were rigorously explained and formal methods such as Modus ponens and Hypothetical syllogism were used, the crucial thing is assumption we made! We made a fairly obvious assumption that all the sentences are uniformly distributed; in other words we said that probability of a sentence is one divided by the number of possible sentences. Since number of possible sentences in any natural language is infinite, as proved above, hence we obtained that the probability of a sentence is zero. For instance, let's consider the following anecdote: A man and a blond were asked: "What is the probability that you will see now a real dinosaur walking towards you?”. The man said, that the probability is going to be one in a billion, whereas blond said that it is fifty-to-fifty, either I am going to see it or not. So this example showed that assuming something is quite dangerous thing to do, since we obtained a valid argument, but it is not sound. So this situation is also known as false or misleading presuppositions. In conclusion, I argued both for and against the main argument of this article, mentioned in the introduction passage. Using common sense in reasoning about conditional probabilities is an example of false dilemma, whereas making almost obvious assumptions is an example of unjustified premises. These two arguments were carefully analysed and showed that proper way of using formal methods and rigorous proof will validate some issues, where common sense might mislead. However, the main argument is unsound, because it is not valid, which was shown using the fact that averages hide a lot of information.