How Can Samples Describe Populations?
Introduction
A facet of modern society is the vast amount of information and knowledge that is available, communicated and accumulated. Adding to this knowledge base through research necessitates a need to organise, simplify and summarise the information, if it is to be useful.
The scientist's goal in general is to investigate and describe the implications of findings of a given problem or hypothesis. When phenomena are not of natural sciences but of a sociological character, there is debate on what serves as validation of hypothesis. It is therefore imperative for any investigation into social phenomena to consider the research methodologies used to explore subject matters and the ramifications that the subsequent results imply.
Social sciences concentrate on the interaction of people and communities in relation to the infrastructure and environment that affects them. The main information seeking tools that are used in the field of study are surveys/questionnaires and interviews. The broad scope of social sciences means that the investigation could involve a very large, scaling down to a very few, number of subjects. This is obviously dependant upon the particular study. What is common to all cases though is the need for the collected data to be accurate through being representative and reflective of the total population under investigation.
Representative refers to the fact that when investigating social phenomena, the data collected should mirror views of the whole population. Unless the investigation is focused on small populations, it is not usually possible to survey the entire population because there are often too many subjects. This leads the scientist into a dilemma; how is it possible to be completely representative in a survey without inclusion of the whole population? To try and fulfil this rudimentary and salient criterion in investigation, sampling techniques have been developed and employed.
Number of Samples
When using samples and attempting to represent the view of a designated population, it is apparent that the data acquired from one member of the population is very unlikely to lead to any conclusions. The law of averages suggests that the greater the number of samples, the more accuracy the data will have. Therefore, it is better to include as many samples as possible, but how many samples are sufficient to justify findings?
Sampling procedures that are probability based can have parameters that can quantify such measures as precision and confidence intervals that can be useful, in ascertaining the nature and character of the parameter under question in relation to the sample size.[1]
The limits on the number of samples are imposed by availability, operational and practical factors, along with the underlying nature of the study; it is clear that very focused studies such as the behavioural patterns of a select minority need far less samples then a survey monitoring the public's opinion on politics. These aspects collectively determine the number of samples that will be used in an investigation.
Sampling Process
The population used in sampling refers to the number of people that the research will most affect. The results of the investigation will therefore be general to the population under scrutiny; this is sometimes referred to as the theoretical population. For example, in researching the attitudes of smokers to other smokers, the theoretical population would be the number of people who smoke. The results of such a study will hold and be considered general to the smoking population. This particular case is of particular relevance in determining the need for sampling since the sheer number of smokers means the theoretical population is unattainable, unfeasible and impossible to survey. A sample is needed to represent the smoking population.
It is necessary to estimate the number of subjects that can be accessed; these are the subjects that can potentially partake in the study. This is referred to as the study population. A complete list is made of the study population, this is known as the sampling frame. [2]
Finally, the sample is taken from the sampling frame. The selection in the sampling is an important step in preserving the quality, integrity and most importantly, the representation of the sample in relation to the theoretical population. The sample is the group of people who you select to ...
This is a preview of the whole essay
It is necessary to estimate the number of subjects that can be accessed; these are the subjects that can potentially partake in the study. This is referred to as the study population. A complete list is made of the study population, this is known as the sampling frame. [2]
Finally, the sample is taken from the sampling frame. The selection in the sampling is an important step in preserving the quality, integrity and most importantly, the representation of the sample in relation to the theoretical population. The sample is the group of people who you select to be in the study. Figure 1 below outlines these ideas.
Figure 1. Summary of general sampling procedure [2]
At this point, it should be clear that progressing from each step in figure 1 has the potential for introducing systematic error or bias. However, the thrust of this essay will be on the step taken to select the final sample.
Types of Sampling
The aim of the sampling process is to draw a sample that is a true representation of the theoretical population; this brings results that are both relevant and accurate. Samples can be selected in one of two broad categories; probability or non-probability samples.
Probability Sampling
A probability sampling method is a class of sampling that utilises some form of random selection. In order to attain a random selection some mechanism ensures that each subject in the sample frame has an equal probability of being chosen as the rest of the population. Random selection can be as simple as picking a name out of a hat, or choosing the short straw. However, software algorithms are often used for this process. [2]
There are four main probability-sampling techniques:
-Simple Random Sampling: approach used where each case should have an equal chance of being included in the sample. Simple random sampling is not the most statistically efficient method of sampling and it occurs, because of probability permutations, that subgroups are not statistically correspondent and therefore the data is rendered unrepresentative.[2]
-Systematic Random Sampling: this technique involves taking every kth member (e.g. if K=4, take 4th,8th,12th etc) of the sampling frame. Obviously, the number of samples decided upon, along with the sampling frame size, dictates the value of 'k'. A random number between zero and k is selected first; this is the first member of the sample and the starting point, the samples are then taken from the sampling frame by continually adding 'k' to this initial value
It is essential that the sampling frame be randomly ordered, at least with respect to the characteristics that are measured. This method of sampling is employed because it is relatively simple and straightforward. It also only needs the generation of one random number.
-Stratified Random Sampling: involves dividing the sampling frame into homogeneous subgroups (using characteristics that have significance in determining study outcome, such as demographic qualities) and then taking a simple random sample in each subgroup in the proportion to the theoretical population.
There are several advantages stratified sampling offers over simple random sampling. Firstly, it assures that the representation not only of the overall population, but also key subgroups of the population. If a subgroup is extremely small, it can be over sampled to ensure it is sufficiently apparent. In interpretation, if quantitative data is handled, the effects of the over sampling can be weighted down. This is called proportionate quota random sampling. If different weighting is used not in proportion to the population, this is disproportionate quota random sampling. [3]
Also, stratified random sampling has more statistical precision than simple random sampling if the groups are homogeneous. This assumes that the variability within-groups is lower than the variability for the population as a whole. Quota sampling exploits this fact.
-Cluster Random Sampling: in this sampling scheme, the theoretical population is divided into 'clusters' through an arbitrary variable such as postcode or month of birth. The sample is then taken by surveying every member of the population of the randomly chosen clusters. It is therefore necessary to choose the clusters so the number of members in the clusters is low enough to allow every member of the cluster to be included in the study. In contrast to stratified random sampling, the divided clusters should be heterogeneous. They should ideally be a scaled down representation of the theoretical population.
Random cluster sampling is used when there is clear logistical advantage, for example in say conducting data collection in a localised area. However, it does remain the fact that the clusters have to be representative of the theoretical population, if the method is to be effective. [1]
One of the most important features of probability based sampling is that once the sample is chosen, the investigator has no subjective choice in who to include in the study, this has been predefined on a probability based permutation.
Non-Probability Sampling
Non-probability based sampling is a class of sampling that does not use the rigors of statistics and seeks other approaches to select a sample. The difference between non-probability and probability sampling is that former does not involve random selection. [2]
Accidental/Haphazard/Convenience samplings are all the same form of sampling that simply determines the sample by what samples are available. Such methods include asking for volunteers and 'on the street' surveys.[2]
The problem with 'accidental' samples is that there is no way of ensuring that the samples are representative of the populations that will be generalised to in the analysis. Another method of non-probability based sampling is purposive sampling. Sampling here has a very defined objective in the 'type' of sample sought. Subjective judgment can be used to target specific demographics or other social constructs. Furthermore, there are five different categories of this form of sampling:
-Modal Instance Sampling: this method seeks to register the opinions of the modal case of the investigation, the 'typical' case in terms of demographics or other factors. It is observable that modal sampling is likely to be unrepresentative of the theoretical population as a whole; the only time that this method is appropriate is in informal contexts.
-Expert Sampling: the investigation is conducted by assembling experts in the field under question. This method of sampling is clearly an efficient way of eliciting the views and experiences of known and demonstrated knowledgeable people and can serve as a good instrument for validating sample results and conclusions brought by other sampling methods.
-Quota Sampling: this method involves dividing the theoretical population by defining attributes (e.g. age, gender etc) into groups and defining certain quotas that need to be fulfilled within each group whilst conducting the investigation. If the quotas reflect the proportions of the theoretical population in terms of the defining attribute, the sampling is termed proportional quota sampling. Instinctively, when this is not the case, the sampling is classed disproportionate quota sampling. Quota sampling relies on prior knowledge of the theoretical population and is the most representative of the non-probability based methods. This technique of sampling is similar to stratified sampling and shares many of the advantages.
-Heterogeneity Sampling: the purpose of this sampling scheme is to represent all views; to generate diverse opinion. In effect, the objective is to sample ideas, not people. Clearly, in order to get all of the ideas, and especially the more unusual ones, a broad and diverse range of participants needs to be included. Heterogeneity sampling is almost the opposite of modal instance sampling. [2]
-Snowball Sampling: in contrast to quota sampling, this method relies on identifying a suitable participant (the targeted population) in the study and then asking this member to recommend others suitable for the investigation. Whereas this blatantly reduces the diversity of samples, this method is deployed in situations where it may be difficult to locate or communicate with certain social groups. [2]
Probability Versus Non-Probability Based Methods
There are clearly situations where different sampling methods are favored due to practical, resource and objective factors. In a more academic context though, the representativeness of a sample in relation to the theoretical population can be independently examined. From the approaches discussed thus far, it should be evident that quota sampling represents the theoretical population most accurately in the non-probability based methods.
Quota and probability based sampling makes use of a model of the theoretical population. The modal itself needs to be accurate and as such requires integrity in the information used to construct it. Essentially, both methods should represent the theoretical population properly; given that the modals are valid and the numbers of samples are sufficiently high.
The issue of data integrity has been demonstrated to be dependant upon modals and assumptions. The probability based approach gives the researcher opportunity to quantify and place magnitudes on the precision and accuracy of the data, this is advantageous to the investigation in two main ways:
* It is possible to compare independent data sets and scale any relevant quantitative data accordingly, provided both sets of data were sampled using probability based methods. This is invaluable if research is to be replicated or compared.
* There is confidence in extrapolating and inferring quantitative effects upon data that has been collected with known statistical characteristics, such as said precision. This could yield greater insight into the analysis of the data that has been collected.
The randomisation process that probability based sampling uses ensures that bias and subjectivity in selecting the sample is ultimately removed from the investigator. The major drawback of using probability sampling is that it is time consuming in organisation and execution. Consequently, it is more expensive to conduct than probability based methods. There are methods that can simplify the probability based sampling process, like stratified probability sampling, but this method is still likely to be costlier than quota sampling.
The most appealing aspect of non-probability based sampling is the cost savings; it is largely less structured and time consuming than probability sampling. The downside is the bias that can be introduced on a practical level, even when using quota sampling. By definition, the researcher obtains samples that are available; samples that could too easily be 'easy to approach people'. When results are subjective in nature, reflecting mood and opinion, the integrity of the data collected can be called into question. However, there are situations in social research where snowball, expert and heterogeneity sampling is cost efficient, resourceful and wholly appropriate. For example, market research on the consumer habits of a very niche product could involve very targeted research and snowball sampling would be a cheap and effective sampling method. The insights that such a method could generate could be of greater value than using probability-based techniques.
Conclusion
If sampling is involved in research, it is necessary to ensure that the sample used is appropriate. A case in point can be drawn from recent domestic politics. The Tory leadership contest between Iain Duncan Smith and Kenneth Clarke was bitterly fought. It emerged during the contest that an opinion poll produced by Smith's team, showed him ahead and gaining momentum in his campaign. Smith's team had deliberately ascertained opinion from constituencies known to be supporters. This sampling obviously misrepresented the population and the inferred results were designed to influence potential voters. The ensuing media focus was an embarrassment to Smith's cause. Whilst this shows that it is possible to identify such (unethical) practise, what this really illustrates is the importance of the information quality in terms of integrity and the connotations of biased sampling.
The two classes of sampling, probability and non-probability based, have been discussed and it is unambiguous that probability based methods offer advantages in the precision and representativeness that can be achieved. However, the use of quota sampling does not strictly imply that the data collected will be unrepresentative; it is simply more likely to be so.
Therefore, care is needed when using quota sampling. Factors such as location, time (of survey) and modals of assumption need to be thoroughly thought through because such issues possible influence results.
There is also support for multistage sampling methods; where a number of techniques (both probability and non probability based methods) are employed. Such methods are often used when the theoretical population is very large and several stages are used to define the sample. [2]
The number of samples used is also a very important consideration in sampling. Certain sampling techniques will be more suitable than others and it could be expected that methods found to be highly successful in ethnographic studies may not be at all appropriate for operational research. It is the duty of the investigating party to ensure that the numbers of samples are sufficient and the sampling technique is suitable. Most important consideration should be given to the content and objective of the research given the context and scope in deciding such matters.
Research methodology is always a compromise between options, and choices are frequently determined by the availability of resources [3]. Therefore, when selecting a sampling technique it needs to be chosen on context and resource issues that are unique to the study. This will lead to representative data that could be analysed and concluded about with conviction.
References
[1] - Zeller, Richard A. & Carmines, Edward G. (1978),
"Statistical Analysis of Social Data", Rand McNally College Publishing Company
[2]- University of Cornell teaching material (2001)
http://trochim.human.cornell.edu/kb/sampnon.htm
[3] - 'Sampling Theory & Practice' (2001)
www.149.170.199.144/redesigns/sampling.htm
[4] - John Gill and Phil Johnson (1997),
"Research Methods for Managers", 2nd Edition, Paul Chapman Publishing
Bibliography
[1] - Zeller, Richard A. & Carmines, Edward G. (1978),
"Statistical Analysis of Social Data", Rand McNally College Publishing Company
[2]- University of Cornell teaching material (2001)
http://trochim.human.cornell.edu/kb/sampnon.htm
[3] - 'Sampling Theory & Practice' (2001)
www.149.170.199.144/redesigns/sampling.htm
[4] - John Gill and Phil Johnson (1997),
"Research Methods for Managers", 2nd Edition, Paul Chapman Publishing
[5] - 'Sampling Populations' (2001),
http://www.rvc.ac.uk/EpiVetNet/manual/Chapter4/index_copy(1).htm
[6] - 'Types of Sampling' , University of Texas(2001),
http://www.ma.utexas.edu/users/parker/sampling/srs.htm
Generic Research Methods