Data Handling Coursework

Mamoona Mohsin

Data handling coursework

Introduction:

I have been provided with a set of data containing 1183 pupils’ height, weight, gender, year group, eye colour, age, hair colour, left/right handed, favourite music, favourite sport, favourite colour and many more. I am going to consider weight, height, year group and gender. I am going to set up three hypotheses to find different combinations between different factors and I will prove these hypothesis using different statistical techniques.

Sampling:

Stratified sampling:

I will 10% of the sample using random stratified sampling.

Total pupils in school = 1183

10% of 1183 ≈ 118 pupils

Sample size for each year group = (pupils in a year/ total pupils in school) x total pupils

in school

Random and systematic sampling:

I randomly chose 8th pupil and then every 10th pupil using systematic sampling to complete the sample of 118 pupils.

Data:

There is a value of weight missing in the data, because the person who took the sample might not weigh that pupil.

Hypothesis 1:

As pupils’ height increases their weight increases. The graph shows a positive correlation, which means the weight would increase with increasing height.

Now I will find Pearson’s product moment correlation coefficient using Excel to see how strong the correlation is between the height and weight of the pupils. Pearson’s product moment correlation coefficient varies between -1 to +1. If the value is -1 it is a strong negative correlation and all points lie on the straight line with a slope downwards. If the value is +1 then the correlation is a strong positive correlation and all points lie on the straight line with a slope upwards.

For the scatter diagram shown above, Pearson’s correlation value is 0.158817 which is not a strong positive correlation. Therefore I will separately find the correlations between height and weight of different year groups.

The scatter diagram for year 8 pupils’ height and weight shows a less negative correlation which means as height of year 8 pupils increase, their weight increases as well.

The Pearson's product moment correlation coefficient for year 8 pupils is -0.07896 which means that the correlation between height and weight of year 8 pupils is less negative. This negative value shows that the weight of year 8 pupils does not increase significantly as their height increases.

This graph shows linear positive correlation. As height increases in year 11, the weight of the pupils increases. The Pearson’s correlation coefficient for year 11’s pupils is 0.246493 which shows more strong correlation than for year 8 pupils.

Hypothesis 2:

In the school, the boys are heavier than girls.

Female weights (kg):

n = 59

36, 36, 38, 38, 40, 40, 41, 42, 42, 42, 44, 44, 45, 45, 45, 45, 45, 45, 45, 46, 46, 46, 47, 47, 47, 48, 48, 48, 48, 48, 49, 50, 50, 50, 50, 50, 50, 51, 51, 51, 52, 52, 52, 52, 52, 54, 54, 54, 54, 55, 55, 56, 57, 58, 59, 60, 60, 74, 140

Male weights (kg):

n = 59, but there is one weight value missing for year 7 male student, therefore 58 values have been put down

26, 32, 35, 38, 38, 40, 40, 42, 42, 43, 43, 43, 43, 44, 45, 45, 45, 48, 49, 49, 50, 50, 50, 50, 50, 50, 50, 50, 51, 53, 54, 54, 56, 56, 57, 57, 57, 58, 59, 59, 60, 60, 61, 63, 63, 64, 64, 64, 65, 66, 68, 68, 70, 72, 72, 75, 76, 92

I will draw a back to back stem and leaf diagram to prove my 3rd hypothesis.

Modal classes with equal width of 10 give following classes

I can easily see which weights fall into which classes:

Key:

20|6 = 26

Stem leaf

There are different types of averages which I can use to prove my hypothesis that boys weigh more than girls.

Modal class is that in which most of the data lies.

Mode is the most common value in the data.

Mean shares out all the data evenly.

Mean (ā) = total weight (∑fa) / no. of values (n)

Median is the central value of the data. If there are two ‘middle’ numbers the median is the middle of two numbers.

Range = largest value – smallest value

The table above shows that all averages for boys are greater than girls; therefore it proves my hypothesis that boys of the whole school are heavier than girls.

Range for girls is greater than for boys, the reason for it could be an anomalous value that one girl weigh 140 kg.

Now I am going to find variance and standard deviation to see how spread the data is between the weights of boys and girls. Standard deviation (s) is the square root of variance (s2).

68% of the values lie within 1 standard ...