Joy: The objective of statisticians is to interpret raw data so that these data will have significance to them. But how exactly do they collect such data? How do they ensure optimum accuracy and precision?
Candice: Statistics Canada provides the following guidelines for producing statistical information, say if I were to investigate the number of hours of sleep a typical IB student at Churchill gets every night, I would first formulate the survey objectives and design a questionnaire.
Joy: In this case, the subjects would be IB Year One and Two students, and they would be asked questions like how many hours of sleep they get before a ToK presentation, a chemistry test and a math test all on the same day?
Candice: I wonder how they do it. Some things to consider when designing a survey would be choosing between a sample or a census.
Joy: Surveying all the IB students at Churchill or a sample of them.
Candice: And any possible sources of survey error.
Joy: Are the students too sleep-deprived to even answer your questionnaire?
Candice: Well there is a variety of collection methods to choose from: personal interviews, telephone interviews, or even computer-assisted data-collecting versus the basic paper-based questionnaire.
Joy: After we've collected the data, is it time to process it? We have all these numbers here, what do we do?
Candice: We compile the data and make charts, graphs, draw max-min lines, error bars and consider uncertainties, but we'll do that later. Uncertainty in statistics is much more interesting.
Joy: I can't agree more. Those uncertainty calculations in Physics labs were simply delightful. In fact, Dennis Lindney, a British statistician, once said : “... it is only the manipulation of uncertainty that interests us. We are not concerned with the matter that is uncertain. Thus we do not study the mechanism of rain; only whether it will rain". To what extent, then, are uncertainties allowed in the study of statistics?
Candice: In statistics there is what we call and we have previously mentioned the confidence interval, a particular kind of of a . Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. Thus, confidence intervals are used to indicate the reliability of an estimate. How likely the interval is to contain the parameter is determined by the confidence level or confidence coefficient.
Joy: Earlier we mentioned that Rosling's studies had a 0.4 confidence interval, or 1.80.4 right answers.
Candice: Not quite on par with the chimps, even with the uncertainty factored in.
Joy: But aren't confidence intervals somewhat assumptive in the nature of the data? Since the calculation of confidence intervals is primarily a method, it may depend on an assumption that the distribution of the population from which the sample came is . Like a bell curve.
Joy: But the data does not always distribute in such a fashion, how then, if not using confidence intervals, do we judge the quality of a statistical study?
Candice: There are no commonly accepted definitions for data quality for official statistics. Statistics Canada has defined data quality in terms of "fitness for use". Six dimensions of quality have been identified within the concept of "fitness for use".
The relevance of statistical information reflects the degree to which it meets the real needs of users. The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure. The timeliness of statistical information refers to the delay between the reference point (or the end of the reference period) to which the information pertains, and the date on which the information becomes available. The accessibility of statistical information refers to the ease with which it can be obtained by users. The interpretability of statistical information reflects the availability of the supplementary information necessary to interpret and use it appropriately. The coherence of statistical information reflects the degree to which it can be successfully brought together with other statistical information within a broad analytic framework and over time.
Joy: Now that we have defined what good statistics are, we would like to address some misconceptions people have about statistics.
Candice: Some people believe that statistics are just averages. I personally do not trust averages. Why? Because averages are like a person having his head in the fridge and his feet in the oven and saying that he feels comfortably warm. On the average.
Joy: Despite the great wisdom of the Math SL textbook, people make decisions according to or based on averages. For those in studies, let us first define an average.
Candice: An average is a measure of the "middle" or "expected" value of the data set. The most common method of finding this is the arithmetic mean, the sum of all of the list divided by the number of items in the list, but can also include methods like the median and mode.
Joy: I'm hoping we don't need a visual to explain this, but here's a simple and concise one that will illustrate an average quite nicely. So... yeah.
Candice: Speaking of misleading averages. A common example of where averages are used to misguide an individual's belief in an issue lies in the subject of Africa. When we first hear the word Africa, many things come to mind.
Joy: Poverty.
Candice: Famine.
Joy: Malaria.
Candice: HIV/AIDS.
Joy: UNICEF.
Candice: Drought.
Joy: Global Families.
Candice: All of these issues are true to a certain extent, as one can see here.
(better graph goes here)
Joy: Each coloured bubble corresponds to a region in the world, and the size of the bubble indicates the population. On the x-axis we put GDP per capita, and on the y-axis is child survival rate. The red bubble at the bottom is Sub-Saharan Africa, which indicates how Africa is doing, on average, compared to the rest of the world.
Candice: That is where averages are flawed. Within Africa, there is a huge variation in how well each individual country is doing. The highest quintile in Africa is comparable to the middle-income countries of the world.
Joy: This means that aid to these countries must be highly contextualized.
Candice: Here in Sierra Leone you need humanitarian aid, here in Uganda, development aid, here time to invest, and here in Mauritius, you can go on holiday.
Joy: Even within the Arab states, you have similar climate, similar culture, similar religion, huge difference. Even amongst neighbours. In the lowest quintile you have a civil war going on in Yemen. Then you have United Arab Emirates way up top, money equally and well used.
Joy: The factors we chose to compare in these graphs, GDP per capita, life expectancy at birth and infant mortality rate, are universal statistics, meaning that they comparable from country to country at given time periods.
Candice: However, some statistics can be considered what we call era or culturally-specific statistics.
Joy: Take for example TV viewership ratings across Japan, Taiwan and the US. Nielson ratings for American Idol show viewership peaking at 10.7 million people watching per 114.9 million households, but regular daytime soap operas are lucky to exceed 4.7% viewership.
Candice: Taiwanese dramas also very rarely achieve viewership of above 10%, the fact that one single episode achieved 13.86% viewership made Taiwanese television history.
Joy: The situation in Japan however, is radically different.
Candice: Japanese TV viewership can easily surpass 20%, and even 30%.
Joy: A survey completed in 2000 by NHK, Japan's public broadcasting network, showed that 95% of Japanese people watch television every day.
Candice: 86% said they consider television an indispensable medium, and 68 percent said the same of newspapers.
Joy: Japanese culture also dictates that evenings should be spent with families, and what else can you do with your family every single evening besides watch television?
Candice: Therefore, even though a country's high GDP/Capita can be interpreted as economic strength, the same cannot be said for TV ratings and the quality of the shows. A higher viewership may be more about the country's culture rather than how popular the show is.
Joy: US TV ratings from the 1950s to 2009 show a consistent drop. This might also indicate that viewers have migrated to other sources such as the internet or iTunes. Nielson TV ratings do not account for neither Internet streamed viewership nor iTunes downloads.
Candice: CD sales also aren't what they used to be, the only album to have every exceeded 100 million in sales is Michael Jackson's Thriller, released in 1987. Other best-selling albums that sold 50 million copies worldwide were Whitney Houston's The Bodyguard and Pink Floyd's The Dark Side of the Moon, all released in the late 70s and to early 90s. In fact, do you know when was the last time an album released after the year 2000 made it on the best-selling list?
Joy: I'll phantom a guess, Britney Spears?
Candice: In 2001 with Oops!... I Did It Again at 22 million copies worldwide, along with the Backstreet Boys in the same year with Black and Blue, 21 million copies worldwide.
Joy: Ah, good memories.
Candice: This just comes to show how much technology has effected these numbers. The music is not necessarily more dissonant than it was 30 years ago, people have just gotten out of the habit of buying CDs. And who would, when you can download individual songs you actually like off iTunes?
Joy: You know,just like how this cultural and era-specificity influences statistics, there is a general perception that statistical knowledge is all-too-frequently intentionally by finding ways to interpret only the data that are favorable to the presenter. A famous saying attributed to is, "."
Candice: Harvard President wrote in 1909 that statistics, "...like veal pies, are good if you know the person that made them, and are sure of the ingredients."
Joy: If various studies appear to contradict one another, then the public may come to distrust such studies. For example, one study may suggest that a given diet or activity raises , while another may suggest that it lowers blood pressure. The discrepancy can arise from subtle variations in experimental design, such as differences in the patient groups or research protocols, which are not easily understood by the non-expert.
Candice: Media reports usually omit this vital contextual information entirely, because of its complexity. By choosing, rejecting, or modifying a certain sample, results can be manipulated, and reality is therefore distorted. Such manipulations need not be malicious or devious; they can arise from unintentional biases of the researcher.
Joy: The graphs used to summarize data can also be misleading.
Candice: Statisticians can be malicious like that.
Joy: But we're not manipulative at all, lets go agonize more IB students by telling them they know less about the world than chimps.
Candice: Agreed.
Works Cited
"Confidence Intervals." Rice University Web Calendar. Web. 15 Oct. 2009. <http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html>.
Debunking Myths About the Third World. TED, 2005. Television.
"Fated to Love You -." Wikipedia, the free encyclopedia. Web. 15 Oct. 2009. <http://en.wikipedia.org/wiki/Fated_to_Love_You>.
Gapminder.org - For a fact based world view. Web. 15 Oct. 2009. <http://www.gapminder.org/>.
"Japanese Drama Season - Summer 2005 -." DramaWiki. Web. 15 Oct. 2009. <http://wiki.d-addicts.com/Japanese_Drama_Season_-_Summer_2005>.
"List of US daytime soap opera ratings." Wikipedia, the free encyclopedia. Web. 15 Oct. 2009. <http://en.wikipedia.org/wiki/List_of_US_daytime_soap_opera_ratings>.
"Policy on Informing Users of Data Quality and Methodology." Statistics Canada: Canada's national statistical agency / Statistique Canada : Organisme statistique national du Canada. Web. 15 Oct. 2009. <http://www.statcan.gc.ca/about-apercu/policy-politique/info_user-usager-eng.htm>.
Vaughan, Liwen. Statistical Methods for the Information Professional A Practical, Painless Approach to Understanding, Using, and Interpreting Statistics (Asist Monograph Series). Wylie: Information Today, 2001. Print.