Mayfield High School

Authors Avatar
GCSE Statistics Coursework

Introduction:

Mayfield High School is a fictional secondary school where all the students are surveyed about their body, habits, likes and dislikes. My task will be to test my hypotheses using a variety of statistical techniques and analysing my findings.

The data I have been provided with is secondary data. This is data previously gathered by someone else and has been made and accessible or has been published so that it can be used by someone else. This therefore means, it is not primary data- which is data collected by the researcher (me) specifically for this project.

Hypotheses:

To work out a person's BMI, we take their weight in kilograms and divide it by the square of their height in metres.

I travel 3km to get to school. My height is 1.65m and my weight is 65kg. Therefore, my BMI (Body Mass Index) is 24. My friend, who travels 0.5km to get to school, has a BMI of 28. This has given my hypotheses:

i) Students who have to travel further to get to school will have a higher BMI compared to those who don't have to travel as far

The probability of a longer journey home compared to those who live closer to school is very high. During a bus or car journey, it is likely that the student will eat snacks. When they get home, the chances are that they will watch TV, eat dinner, do homework and play on the computer. It is highly unlikely that they will get round to exercise.

ii) Students who travel further to get to school would be taller than those who live closer would would.

I expect those closer to the school to have a lower BMI. Ultimately, I expect them to be taller or lighter to reduce their BMI.

iii) Students who live closer to the school are lighter than those who live further away are.

Same as previous.

Pre-test:

To ensure that I am working with suitable data, and not a set of random numbers, I will carry out a pre-test. This involves testing an obvious hypothesis. In this case, the hypothesis being tested will be "as height increases, weight increases."

I think that the hypothesis above is true because the vast majority of people's weight correlate with their height. Taller people tend to be heavier and shorter people tend to be lighter.

The sample being used will be a systematic sample because it is a simpler and quicker method to obtain a sample. I want to obtain a sample of size 30 from a population of 1183. I have chosen to use a sample size of 30 because it will have enough data to show any trend in the data but will be small enough to manipulate and analyse.

183 / 30 ˜ 39

Starting at a random point, I will collect every 39th piece of data. I will avoid bias in my sampling by using Microsoft Excel to randomise the order of the data before a sample is taken. To study the strength of the relationship of the data, I will use Spearman's Rank Correlation Coefficient. According to Spearman Rank's Correlation Coefficient, the coefficient will have a value of 1 when there is perfect agreement between the two rankings. The coefficient will have a value of -1 where there is perfect disagreement between the two rankings. When there is no agreement between the ranks, the coefficient will have a value of 0.

Sample: See next page

Scatter Diagram: See page 8

The sum of all of the differences squared is 2176.5. To find out spearman's rank coefficient, we must use the formula below:

Calculations:

6 x 2176.5 = 13059

30 x (302 -1) = 26970

3059 / 26970 = 0.484

- 0.484 = 0.516

Analysis of Pre-test:

The spearman's rank correlation coefficient is 0.516. I can conclude that there is agreement between the ranks. In this case, there are 30 pairs of data . The critical value for rs at 1% significance is 0.4251. The value, 0.516, exceeds 0.4251.

The critical value is the value that must be equal or exceeded for the sample to be accepted as reliable.

Therefore, it can be said that there is statistically positive correlation between the two variables. This supports my view from the scatter diagram that there is a reasonable relationship showing that the weight increases as the height increases.

I now accept that the data I will be working with is reliable enough to allow me to continue with my investigation.

In the sample that I have taken, there are no outliers that require discussion, exclusion or inclusion.

Systematic Sample

Height (m)

Rank (H)

Weight (kg)

Rank (W)

Difference

Difference 2

.94

80

0

0

.74

3

70

2

.55

9.5

67

3

6.5

272.25

.8

2

66

4

2

4

.55

9.5

64

5

4.5

210.25

.68

8.5

59

6.5

2

4

.65

0.5

59

6.5

4

6

.7

6

55

8

2

4

.72

4

54

9.5

5.5

30.25

.52

23

54

9.5

3.5

82.25

.68

8.5

52

1

2.5

6.25

.56

7

50

2.5

4.5

20.25

.41

28

50

2.5

5.5

240.25

.62

3

49

4

.7

6

48

6

0

00

.55

9.5

48

6

3.5

2.25

.54

22

48

6

6

36

.47

25

47

8

7

49

.65

0.5

45

20

9.5

90.25

.61

5

45

20

5

25

.46

26

45

20

6

36

.6

6

43

22

6

36

.7

6

42

23

7

289

.43

27

41

24

3

9

.62

3

40

25.5

2.5

56.25

.4

29

40

25.5

3.5

2.25

.62

3

38

27.5

4.5

210.25

.51

24

38

27.5

3.5

2.25

.32

30

35

29

.55

9.5

32

30

0.5

10.25

On my scatter graph, I have drawn a line of best fit to try and estimate the relationship between people's heights and weights.

To help me identify any positive or negative correlation, I will split the graph into four and see where the points lie.

Most of my data lies in the top right hand quarter of my graph. This suggests that there is moderate positive correlation.

The graph cuts the y-axis at approximately -25, that is the y-intercept.

The mean point is (1.225, 35.5).

M = y - 35.5 / x - 1.225

M = 8 / 0.21 = 38.1

Therefore:

38.1 = y - 35.5 / x - 1.225

Therefore,

38.1x - 46.7 (3SF) = y - 35.5

38.1x = y + 11.2

The equation of my line of best fit is:

38.1x = y + 11.2

Quality of Data

Within my systematic sample, I have used an appropriate method to try to identify outliers. Outliers are observations far away from the rest of the data usually produced by recording or entry errors. The method to identify outliers is described below.
Join now!


- The lower quartile (Q1) is found using the formula: (n + 1) x 0.25

- The upper quartile (Q3) is found using the formula: (n + 1) x 0.75

- The interquartile range is found by subtracting the lower quartile from the upper quartile.

- The value for the interquartile range (IQR) is multiplied by 1.5.

- The value calculated is subtracted from the lower quartile to find values that seem too low in value.

- The value of IQR x 1.5 is added to the upper quartile to find ...

This is a preview of the whole essay