Data to be used and collected
-
Primary data- data collected by the researchers themselves.
-
Secondary data- data collected by others to be "re-used" by the researcher
The data I will use is secondary data as it can be obtained at a fraction of the cost, time, and inconvenience of primary data collection.
However there are limitations to this as well as practical problems that may be encountered. (Some have already been mentioned above)
- LACK OF AVAILABILITY – since the data has already been collected and given, you cannot be sure whether or not bias has been avoided. Also, since there is no access to the boys’ side you cannot be sure on how the data has been dealt with.
- BIAS- some pupils may have altered their measurements in order to influence the result of the investigation. However, this will come to be detected when plotting the graphs since you will be able to see any anomalies that are far off from the regular pattern of things. These anomalies will, as a result, be discarded/rejected.
- INACCURATE DATA- again some pupils may have altered their measurements and/or young pupils may not know how to measure themselves properly which in effect would lead to inaccurate data. However, to reduce the chances of this I have gone to one of the younger classes and taken their measurements myself.
Sample size
I have divided my sample size into 3 main categories - the highest, middle and lowest classes.
Therefore my sample size consists of: (For each class there’ll be 20- 10 boys, 10 girls, giving a total of 60 results.)
1) year 4 girls/boys
2) year 7 girls/boys
3) year 11 girls/boys
To choose all the classes, boys and girls, would seem far-fetched since it would mean analysing more than 100 different results and doing more than necessary calculations. Therefore, choosing the highest, lowest and in-between seems like a sensible idea.
The measurements taken will be of the thumb, wrist, neck and waist. Three measurements will be taken of the thumb (top, middle, and bottom) as well as the waist. However, I will be taking an average of each.
Unfortunately year 4 girls are a total of 10 girls and so I will not be able to carry out my sampling method on them since my sampling size is originally 10 from each class anyways. The same situation has risen in year 4 and 11 boys.
Sampling method
-
Simple random sampling is when a group of subjects (a ) are chosen from a larger group (a ). Each subject from the population is chosen and entirely by chance, such that each subject has the same of being chosen at any stage during the sampling process.
-
Systematic sampling is the selection of every kth element from a sampling frame, where k, the sampling interval, is calculated as:
k = population size (N) / sample size (n)
Using this procedure each element in the population has a known and
equal probability of selection.
-
Stratified random sampling is when a random sample of specified size is drawn from each stratum of a population.
There may often be factors which divide up the population into sub-populations (groups / strata) and we may expect the measurement of interest to vary among the different sub-populations. This has to be accounted for when we select a sample from the population in order that we obtain a sample that is representative of the population. This is done by stratified sampling.
I do not think stratified is needed since the sample size chosen is not very big that it needs to be divided into sub-groups etc. and also the fact that it is more time-consuming compared to the others.
In systematic, the researcher must ensure that the chosen sampling interval does not hide a pattern as any pattern would threaten randomness. A random starting point must also be selected. If this is not taken into account, bias may be introduced.
- Thus, I have chosen simple random sampling as I find it the most convenient and sufficient for this type of investigation.
I will do this by using the random function on the calculator (Ran#) which will generate random numbers from a given value. Example:
if I have 10 sets of data I will enter the number 10 and generate the random function on the calculator. This will in turn give me any random number from 1-10 from which I will be able to get my first of five values. (Decimal numbers will be rounded off to the nearest whole number).
How this will be done is given in an example below:
-
For year 7 Girls there are 22 sets of data given; I need 10 since that is my required sample size.
1) 22Ran# = 8.12 [8] my first value will be of my 8th data set.
2) 22Ran# = 6.97 [7] my second value will be of my 7th data set…..
Graphs
The graphs I intend to use to show the distribution of the data consist of a variety; varying from histograms to scatter graphs.
Histogram- is a representation of a frequency distribution by means of bars, whose widths represent class intervals and whose areas are proportional to the corresponding frequencies.
Since continuous data is involved it would make sense to use a histogram and also the fact that it would make it easier to compare distributions and calculate the mean.
Cumulative frequency diagram- will help analyse the data by calculating an estimate of the mean. It will also help calculate the median and if needed an inter-quartile range; which shows how consistent the subject being tested is.
Pie Chart- creates a visual representation of data as a proportion of a whole. Each sector is proportional to the quantity it represents. It is very useful for comparisons as a comparison can be made instantly just from looking at the diagram.
Moreover, with the use of this graph, the mode can easily be determined.
Frequency Diagram- used to graphically summarise and display the distribution of a process data set. Useful in comparing distributions and identifying the modal and least common class.
Scatter graphs- I will be using this graph, as it is a good way of illustrating two sets of data and establishing whether or not there is a relationship between them and what type (in this case it will be the thumb/wrist, wrist/neck, etc).
Furthermore, by creating a line of best fit, a gradient can be determined which will, in-effect, tell us how true the theory is.
Calculations
To prove or disprove the theory I will be carrying out numerous tasks and various calculations.
This will involve me analysing and comparing graphs; as well as interpreting them e.g. finding out the gradient, correlation and so on.
And from doing all this I will hopefully be able to determine whether or not there is a relationship between the body parts in accordance to the theory.
The gradient {for the scatter graph(s)} will be calculated as:
Gradient = difference in y = dy
difference in x dx
I will also be working out the modal class (including the least common too), the median value and the mean. The modal class will be determined by the group having the highest frequency along with the least common class being the one with the lowest frequency.
The median {for the cumulative frequency diagram} will be calculated as:
The mean {for the histogram} will be calculated using the following formula:
Mean x = ∑fx
∑f
Another calculation I will be doing is percentage error (including average percentage error) which will help me in deciding how close to reality the theory is; as any error will lead to inaccurate data and conclusions. So by using percentage error I will be able to determine how close to the actual or accepted amount I came.
To work out the percentage error I will use the following formula:
Plan
To see whether or not the theory is true.
- To do this I will investigate using any valid statistical method (explained above) which as said before will involve me interpreting, analysing and comparing graphs not forgetting calculations.
- I have already stated my aim and hypothesis as well as my selections i.e. choosing simple random sampling, specific graphs etc.
- I will now carry out my sampling method in order to get my required data and then plot it on various graphs. Calculations will then be done [e.g. gradient, mean, mode etc.] in order to see how valid the theory is.