To work out the mean of my data for both of my populations I had to divide the Σ of X times ƒ by the ƒ, so by inserting SUM=E20/D20 it would calculate the mean.
I have chosen to use this formulae because my data is in the form of a frequency table.
To obtain the variance of my data for both populations I had to divide the Σ of
( x^2F) by the Σ of the ƒ minus the mean squared. To work out the variance of both populations on Excel I inserted SUM=(F20/D20)-C22^2
To work out the standard deviation of both populations I had to square root the variance of both the heights of males and females.
σ = √ 9.0864 = 3.014
σ = √ 9.13 = 3.022
I then had to begin to work out the two levels of confidence intervals for both populations and then compare them. By using the already calculated means, variances and standard deviations, I can then enter them into a new formula for working out the confidence interval. These intervals are an estimate for the perimeters most plausible to contain the mean value to work out the standard error for both populations. First I had to work out the standard error, but to make it a more accurate result at the interval I had to change σ to σ n-1. This allows a sample statistic that is free from bias.
σ² n-1 ( n ) S²
n-1
= 50 x 9.0864 = 9.272
49
σ² n-1 ( n ) S²
n-1
= 50 x 9.13 = 9.316
49
This new data can be entered into the confidence interval formula. Theis equal to the standard error; however it gives a better estimate. There are still pieces of information that are unknown so I must use the normal distribution theory to give the (z) value. This is known as the central limit theorem that demonstrates that in large enough samples, the distribution of a sample mean approximates to a normal curve and because the curve is symmetrical, by using this curve I can calculate the (z) value, and percentage points of the normal distribution table. This helps me to find the (z) value that satisfies this is where Z is the normally distributed random variable with the mean equal to zero and the variance equal to one.
Central Limit Theorem
If the sample size is large enough than the distribution of the sample mean is approximately Normal, irrespective of the distribution of the parent population. The mean of the distribution of the sample mean is approximately equal to the mean of the parent population. The variance of the distribution of the sample mean is approximately the variance of the parent population divided by the sample size
These approximations get closer as the sample size gets bigger. These important results are known as the Central Limit Theorem.
Symbolically if X ~ (unknown) ( μ , σ ²) then n ~ N ( μ , σ ² ) provided that n is
n
sufficiently large. (A good rule of thumb is n ≥ 30)
By using the Central Limit Theorem it enables me to make a prediction about the distribution of the sample mean even if I don’t know the parent population. Providing the sample is large enough I am able to be confident that the mean of the sample is close to the population mean.
A 99% confidence interval for the
height of females aged 16-18
= 64.56
S² = 9.0864
n = 50
Φ (0.995) = 2.5758
σ² n-1 ( n ) S²
n-1
= 50 x 9.0864 = 9.272
49
μ = ± x σ n-1
√ n
μ = 64.56 ± 2.5758 x 9.272²
√ 50
μ = 64.56 ± 2.5758 x √ 9.272
7.071
μ = 64.56 ± 2.5758 x 3.045
7.071
μ = 64.56 ± 2.5758 x 0.431
μ = 64.56 ± 1.110
μ is in the interval [ 63.45 , 65.67 ]
63.45 < x < 65.67
A 90% confidence interval for the height of females aged 16-18
= 64.56
S² = 9.0864
n = 50
Φ (0.95) = 1.6449
σ² n-1 ( n ) S²
n-1
= 50 x 9.0864 = 9.272
49
μ = ± x σ n-1
√ n
μ = 64.56 ± 1.6449 x 9.272²
√ 50
μ = 64.56 ± 1.6449 x √ 9.272
7.071
μ = 64.56 ± 1.6449 x 3.045
7.071
μ = 64.56 ± 1.6449 x 0.431
μ = 64.56 ± 0.709
μ is in the interval [ 63.85 , 65.27 ]
63.85 < x < 65.27
A 99% confidence interval for the
height of males aged 16-18
= 70.1
S² = 9.13
n = 50
Φ (0.995) = 2.5758
σ² n-1 ( n ) S²
n-1
= 50 x 9.13 = 9.316
49
μ = ± x σ n-1
√ n
μ = 70.1 ± 2.5758 x 9.316²
√ 50
μ = 70.1 ± 2.5758 x √ 9.316
7.071
μ = 70.1 ± 2.5758 x 3.052
7.071
μ = 70.1 ± 2.5758 x 0.432
μ = 70.1 ± 1.113
μ is in the interval [ 68.99 , 71.21 ]
68.99 < x < 71.21
A 90% confidence interval for the height of males aged 16-18
= 70.1
S² = 9.13
n = 50
Φ (0.95) = 1.6449
σ² n-1 ( n ) S²
n-1
= 50 x 9.13 = 9.316
49
μ = ± x σ n-1
√ n
μ = 70.1 ± 1.6449 x 9.316²
√ 50
μ = 70.1 ± 1.6449 x √ 9.316
7.071
μ = 70.1 ± 1.6449 x 3.052
7.071
μ = 70.1 ± 1.6449 x 0.432
μ = 70.1 ± 0.711
μ is in the interval [ 69.39 , 70.81 ]
69.39 < x < 70.81
Comparing confidence intervals
Below I have presented my confidence intervals graphically:
This one shows the confidence intervals of 90% and 99% for the heights of females aged 16-18 years old.
This one shows the confidence intervals of 90% and 99% for the heights of males aged 16-18 years old.
As you can see by these to diagrams the bigger the confidence the more confident I am that the population will lie between the two values. However the smaller the confidence the less confident I am that the two values will lie between them two values.
When you compare both the male and female confidences graphically you can see that the female heights are concentrated down the left side of the scale whereas the male heights are situated on the right side of the scale. This represents that males are taller than females.
Conclusion
When I relate my evidence that I have obtained from doing confidence intervals and working out the mean and variance values of both populations to my hypothesis, the distribution of the heights of males is bigger than the heights of females.
I calculated two levels of confidence for both males and females; 90% and 99%.
When I collected my data it wasn’t random, however I did try and make sure that when I collected my data I ensured that I wasn’t biased, I did this by not paying must attention to the persons height and the majority of people I asked were sitting down at a table, also I didn’t collect my data at a modelling studio where the majority of the population would be over 6ft,
If I had time I would have extended my problem by widening the age of males and females as I only managed to obtain the heights if males and females aged 16-18. I could have increased this by including younger and older populations.
I could have concentrated my data collection on children aged 10-15 to compare the difference in height of all 10 year olds up to the age of 15. I could have also compared the difference in the heights of boys and girls as the age increases from age 10-15 to see if girls increase in height more gradually than boys do as they might shoot up at a certain age, or maybe it occurs the other way round.
If I decided to keep to the data I had collected I could have adapted it by seeing if that the taller a person is, the bigger their shoe size would be and the shorter a person is, the smaller their shoe size would be.
Formulae and definitions
Variance
S²
Mean
Standard error s.e. = σ
s.e. √n
Unbiased estimate ( n-1 ) S² = σ n-1
n
Confidence intervals
C.I.
μ = ± x σ n-1
√ n