For this investigation I have four hypotheses, which are: 1) There is a strong positive correlation between arm-span and height in both males and females. I have based this on the Vitruvian theory. 2) Females are generally shorter than males.
Pupil Data
Introduction
In this Investigation I will investigate the different variables of data of 100 males and 80 females. This data is secondary data as I did not collect it myself, but I will trust it to be accurate because it was given to me by my teacher.
The data I will be working with is numerical data, as it is quantative. This data is not qualitative (non-numerical data). Numerical data is always discrete or continuous. Discrete data is data that cannot be changed. Examples of this in the data I am using are date of birth, left-handed, can roll tongue, etc. Continuous data is dat that changes overtime. Examples of this in the data I am using are shoe size, height, weight, arm-span, etc.
My Hypothesis
For this investigation I have four hypotheses, which are:
) There is a strong positive correlation between arm-span and height in both males and females. I have based this on the Vitruvian theory.
2) Females are generally shorter than males. I have based this hypothesis on my observations.
3) The weight of males is more spread out than females. I have also based this hypothesis on my own observations.
Sampling
To investigate my hypotheses I will need to sample the data of 100 males and 80 females. I am sampling them to ensure my data is unbiased and representative of the whole school.
Stratisfied Sampling
The method that I will use to sample is stratisfied sampling. I have chosen this method as it is most suitable to use, because the male and female populations are not the same size in the data I have been supplied with. This method also helps ensure that there are a fair proportion of samples from each group of population. To do stratisfied sampling you will need to divide the population into categories (strata) i.e. age and gender. My strata is gender. Then a random sample is chosen from each category proportional to the size of the category.
Random Sampling
There are many other types of sampling I could of used but have disadvantages. One of the methods I could have used is random sampling. To do this sampling all the results are numbered. These numbers then have to be picked randomly and the data that is chosen is the data which number has come up. This can be done using a calculator or computer. I could not use this type of sampling because the two populations of data are not the same (100 males and 80 females).
Systematic Sampling
Another type of sampling I could have used was systematic sampling, which is very quick and simple to do. In this method of sampling there is a regular pattern created to choose the sample. All the results have to be listed for this to work. You first have to pick randomly a starting point and then every nth data is selected. The problem with this method of sampling is that it is unrepresentative if there is a pattern in this list. I cannot take this chance as the data I am using is secondary data and may have a pattern that I don't know of in it.
Cluster Sampling
Cluster sampling is also another method I could have used. This type of sampling is used if the population is divided into large groups of clusters. To reduce the chance of this data being unrepresentative, large numbers of small cluster are used. I cannot use this method as the data I am using is not in large clusters.
Quota Sampling
I could have used quota sampling as well. In quota sampling, instructions are given concerning the amount (quota) of section of the population to be sampled. This type of sampling is used in market research and can be bias.
Convenience Sampling
The final type of sampling I could have used was convenience sampling. This type of sampling is very simple as the most convenient sample is chosen. For example a person might choose from a 100 people the first 10 or the last 10. The reason I did not chose this type of sampling was because the sample would be bias and unrepresentative.
Sample Size
I am now going to use the stratisfied sampling now; altogether there are 180 males and females (100 males + 80 females). The stratisfied sample amount of data I ...
This is a preview of the whole essay
Convenience Sampling
The final type of sampling I could have used was convenience sampling. This type of sampling is very simple as the most convenient sample is chosen. For example a person might choose from a 100 people the first 10 or the last 10. The reason I did not chose this type of sampling was because the sample would be bias and unrepresentative.
Sample Size
I am now going to use the stratisfied sampling now; altogether there are 180 males and females (100 males + 80 females). The stratisfied sample amount of data I am going to use is 40. I then need to divide 40 by 180 which equals 0.2* (40/180 = 0.2*). The 0.2* is then converted into a percentage which is 22.2%. I now have to work out what 22.2% of 100 males and 80 females and then round it. 22.2% of 100 males are 22 males and 22.2% of 80 females are 18. These are the amount of data I am going to use from each population and if both are added the amount to 40 which is the amount of sample data I wanted (22 males + 18 females = 40 males and females).
I have found out the amount of data I am going to use from each population, now I need to work out which data I am going to use. To do this I will need to use a calculator to produce random numbers, which I will use to pick out students who have the same reference number. I cannot do this in my head as the human brain cannot pick numbers randomly.
To work out random numbers on a calculator (works on only scientific calculators), you first have to press the number of people there are in each population, which is 100 males and 80 females. After you have done that you have to press shift button and then the decimal point button. It should say the population number and then ran# at the top right hand corner e.g. 100 ran#. After you see this press the equals sign for a random number. Then press it again for another number and keep doing that for however much random numbers you want. Here are things you should know about picking the random numbers:
* Ignore decimal points e.g. 18.9=18 (Do not round up or down)
* Ignore numbers that have been repeated
I will do the random sampling separately for girls and boys. As I need a sample of 22 males, I have to press in 100, and then shift, after that the decimal point and finally the equals sign 22 times to get 22 random numbers. I do the same for females but right in 80 instead of 100 and press the equals 18 times as I want only 18 female samples.
Sampled Data
From the random numbers I have got, I have created this table to show the data of the people who have been sampled. The fields that concern my hypothesis have been highlighted so they are easier to spot.
Males
Random Number
Left Handed
Wears Glasses
Date of Birth
Can Swim
Can roll Tongue
Shoe Size
Height
Waist
Arm span
Hand span
Head Circu-mference
Weight
025
0
7/01/68
8.5
73
077
67
25
55
78
096
0
0
03/05/68
0
7
64
65
58
21
57
57
008
0
0
7/01/68
0
7
70
68
71
20
56
45
029
0
0
27/03/68
8
68
74
65
21
57
52
087
0
2/12/67
0
8
70
93
69
20
57
75
022
0
0
24/04/68
0
9
78
70
79
20
50
53
009
0
0
1/08/68
1
83
86
71
22
58
65
017
0
0
04/01/68
6.5
56
68
57
9
55
47
010
0
08/02/68
0
7
73
72
71
21
56
55
034
0
0
25/10/67
0
6
63
71
67
20
56
54
093
0
30/07/68
6
66
75
60
9
54
40
048
0
0
29/11/67
9
77
97
75
22
57
70
033
0
0
1/03/68
7
68
67
71
20
56
47
013
0
0
09/01/66
8
69
69
71
22
57
60
020
0
0
7/02/68
7
65
75
66
9
57
58
061
0
0
04/03/68
0
8.5
75
66
75
22
50
53
075
0
0
21/05/68
5
53
67
55
8
52
40
039
0
22/03/68
0
8
79
74
72
20
57
62
007
0
0
3/01/68
0
82
78
87
22
56
82
066
0
0
02/11/67
6.5
61
77
63
9
55
45
038
0
05/04/68
8
76
78
73
8
55
62
003
0
25/08/68
0
0
9
64
79
60
21
55
56
Females
Random Number
Left Handed
Wears Glasses
Date of Birth
Can Swim
Can roll Tongue
Shoe Size
Height
Waist
Arm span
Hand span
Head Circu-mference
Weight
019
0
0
2/11/67
3.5
49
67
57
7
53
50
056
0
0
25/09/67
6
67
70
64
21
55
65
018
0
0
20/03/68
0
0
5
68
58
56
5
54
47
022
0
09/02/68
3.5
55
55
52
20
55
45
040
0
23/01/68
4
61
60
54
7
53
47
052
0
0
3/09/67
7
69
60
70
9
56
60
007
0
1/11/67
4
61
64
56
9
56
52
017
0
0
09/06/68
7
64
60
66
7
53
52
078
0
0
23/11/67
7
66
70
69
20
56
59
012
0
0
8/06/68
5
62
67
63
9
56
55
048
0
0
03/11/67
0
4.5
60
60
60
5
57
54
042
0
3/10/67
0
4
62
59
57
7
56
50
013
0
24/04/68
0
5
63
65
60
8
54
42
062
0
0
28/06/68
0
5.5
61
59
63
7
53
49
077
0
0
8/10/67
4
55
57
54
20
55
43
033
0
0
23/12/67
7
70
66
73
21
57
55
059
0
0
0/01/68
5
58
70
52
7
56
56
015
0
0
03/10/67
4.5
59
61
51
5
56
53
To prove my hypothesis 1:
I will draw 2 scatter diagrams with one axes for arm span and the other for height. I am drawing 2 so I can check if my hypothesis is correct in females as well as males. If my hypothesis is correct there should be a very strong positive correlation between height and arm span for males and females. I will also compare their product moment correlation coefficient, r, as this form of measure does not depend on scale of axis on the scatter diagrams or the size of my sample. I will use Spearman's rank correlation to see if there is an agreement between arm span and height in male and females. I chose this method as it is a measure of agreement between two sets of data. If my hypothesis is correct then I expect to see the 'r' value in Spearman's rank to be very close to +1 for males and females. I have chosen to use this as this gives the measure of the strength of agreement between height and arm span.
To prove my hypothesis 2:
I will draw a cumulative frequency diagram and plot a line on it for males and a line for females. I will compare then compare them both. If my hypothesis is correct it should look like this:
To prove my hypothesis 3:
I will arrange the weights of males in to order and the weights of females into order. I will then find out the median, upper-quartile and lower-quartile range for both males and females. Next, I will draw a box plot diagram and compare them. If my hypothesis is correct the male box plot would be bigger than the female box plot. It should look something like this:
I will also use the measure of standard deviation, as this form of measure of spread does not depend on any of the averages used in the box plot. Where as the box plot show the spread of data about the medium, standard deviation takes account of all the data.
Hypothesis 1
To prove my first hypothesis, which is there is a strong positive correlation between arm span and height in both males and females; I will have to draw a scatter diagram for males and one for females, both with a line of best fit through it. The Y-axis is going to be height and X-axis is going to be arm span. Here are the heights and arm spans of the males and females:
Random Number
Height
Arm span
Males
025
73
67
096
64
58
008
70
71
029
68
65
087
70
69
022
78
79
009
83
71
017
56
57
010
73
71
034
63
67
093
66
60
048
77
75
033
68
71
013
69
71
020
65
66
061
75
75
075
53
55
039
79
72
007
82
87
066
61
63
038
76
73
003
64
60
Females
019
49
57
056
67
64
018
68
56
022
55
52
040
61
54
052
69
70
007
61
56
017
64
66
078
66
69
012
62
63
048
60
60
042
62
57
013
63
60
062
61
63
077
55
54
033
70
73
059
58
52
015
59
51
I am now going to plot the scatter diagram for males.
As you can see there is a strong positive correlation between height and arm-span in males. The r value is also close to 1 which also proves that there is a strong positive correlation between height and arm span. I used excel to work out the r value as I did not have the time because I was doing other parts of the coursework.
I am now going to see if it is the same in females.
In females it seems that there is a weak correlation between height and arm-span. The r value is closer to 0 than 1 which also shows that there is weak correlation between arm pan and height. This means that females arm-span do not reflect their height. This could have been because there might have been a mistake in the data as it is secondary data. On the graph I have highlighted 2 points which could have been recorded wrong. These points could have also affected my r value.
Hypothesis 2
I am now going to test my second hypothesis, which is females are generally shorter than males, by drawing a cumulative frequency diagram for males and females. To check if my hypothesis is right I will compare the female's line on the diagram with the male's line on the diagram. If the males are really taller than the females, the female line should be stepper.
Before I can do draw the diagram I have to work out the cumulative frequency for both males and females:
Males
Height
Frequency
Cumulative Frequency
46-150
0
0
51-155
56-160
2
61-165
5
7
66-170
6
3
71-175
3
6
76-180
4
20
81-185
2
22
Females
Height
Frequency
Cumulative Frequency
46-150
51-155
2
3
56-160
3
6
61-165
7
3
66-170
5
8
71-175
0
8
76-180
0
8
81-185
0
8
This graph proves my theory is correct as the female's line is stepper than the male's line.
Hypothesis 3
I am now going to try and prove my final hypothesis; Boy's weight is more spread out than girl's weight. I am going to try and prove this by drawing box plots for the weights of both male and females. To do this I must find the median, lower-quartile and upper-quartile for the male's weight and the female's weight. I can only do this, if I put the weights into numerical order. I am only going to use half of the samples I got which means 11 males and 9 females because it makes it easier for me as there are less numbers to deal with. The 11 males I am going to pick randomly as well as the 8 females. I randomly picked them on a calculator
Here are the weights for:
Males
Reference
Weight
096
57
008
45
029
52
009
65
017
47
010
55
013
60
061
53
039
62
066
45
038
62
Females
Reference
Weight
019
50
018
47
040
47
007
52
017
52
078
59
013
42
062
49
077
43
I now have to put the weights in order so I can work out the median upper-quartile and lower quartile.
Males
45, 45, 47, 52, 53, 55, 57, 60, 62, 62, 65
The median is: 55
The lower-quartile is: 47
The upper-quartile is: 62
Females
42, 43, 47, 47, 49, 50, 52, 52, 59
The median is: 49
The lower-quartile is: (43+47)/2=45
The upper-quartile is: 52
From this information I am now going to draw a box plot diagram
This diagram proves my theory as the male's box plot is bigger than the female's box plot.
I am also going to test this method using standard deviation. I am using standard deviation because it takes account of all the data unlike the box plot, which only takes account of medians and the quartiles.
Standard Deviation for males
To work out standard deviation you need to use this formula:
? means 'standard deviation'
? means 'the sum of'
means 'the mean'
I am now going to show you how to do this formula, as simply as possible
I first have to work out the mean of the numbers 45, 45, 47, 52, 53, 55, 57, 60, 62, 62, 65, which is 54.82. I now have to subtract the mean individually from each of the numbers given and square the result.
x
45
45
47
52
53
55
57
60
62
62
65
(x-)²
96.4324
96.4324
61.1524
7.9524
3.3124
0.0324
4.7524
26.8324
51.5524
51.5524
03.6324
I now have to add up all the results which make 503.6364. I now have to divide this by the number of values which is 11. 503.6364 divided by 11 makes 45.78512727. I then have to square root it which makes 6.766471. This is the standard deviation for males. I have worked it out for males by myself to show you how it is done but for the females I am going to use excel to work out the standard deviation to save time.
Standard Deviation for females
I am now going to work out the standard deviation for females on excel. The standard deviation for females is 4.853407.
The standard deviation also proves my theory because the male's standard deviation is bigger than the female's standard deviation. This means my theory, of males weight is more spread out than females, is true.
Conclusion
In my coursework I have proved all my theories except for the second half of the 1st hypothesis, which was there is a strong positive correlation between arm span and height in females. In males my hypothesis was correct but not in females. This might have been because I collected sample data for some females who had a big difference between in arm span and height.
In the data I was given I had found another error. The data I was given was for a school year, so all the children should have either been born in 1967 or 1968 but one of the people in the data was born in 1938. He's reference number is 051. I did not have him in my sample data. If there was one mistake in this data there could have been more that I don't know about. These might have affected my results. Overall I don't really think my results have been affected and I am pleased with what I have got. Some other things that could have affected my hypothesis are:
*