• Join over 1.2 million students every month
• Accelerate your learning by 29%
• Unlimited access from just £6.99 per month
Page
1. 1
1
2. 2
2
3. 3
3
4. 4
4
5. 5
5
6. 6
6
7. 7
7
8. 8
8
9. 9
9
10. 10
10
11. 11
11
12. 12
12
13. 13
13
14. 14
14
15. 15
15
16. 16
16
17. 17
17
18. 18
18
19. 19
19
20. 20
20
21. 21
21
22. 22
22
23. 23
23
24. 24
24
25. 25
25
26. 26
26
27. 27
27
28. 28
28
29. 29
29
30. 30
30
31. 31
31
32. 32
32
33. 33
33
34. 34
34
35. 35
35
36. 36
36
37. 37
37
38. 38
38
39. 39
39
40. 40
40

# Statistics Coursework

Extracts from this document...

Introduction

STATISTICS COURSEWORK

Tesheen Moosa

Statistics Coursework

Introduction

I have been asked to examine the student’s attendance figures from all year groups (7, 8, 9, 10 and 11) at Hamilton Community College. I will be investigating whether the age of the students affects their attendance figures at school and does it affect their learning and exam results as well? To start my research, I was given the attendance figures by the school for all of the year groups for the 2003 – 2004 academic years. I will then start to process data (attendance figures) firstly by reducing the amount of data that I will have to process using the method of stratified sampling. By using stratified sampling I will then only use a fair amount of data according to the percentage that I’m comfortable with. I will only be using 20% of the attendance figures from each year. A scientific calculator is used, to randomly select attendance figures that I am going to use, so that the new set of statistics isn’t bias and isn’t affected by my conscious decision. Using the new set of data, I will collate the data in frequency tables (to display all of the frequency distributions), in order to enable easy interpretation and analysis.

Secondly, after collating the data, I will then display the new set of data in forms of graphs/diagrams and charts so that it will be easier for me to compare and study the figures. From these graphs/diagrams and charts, I will calculate the central tendency for all the year groups (mean, median) and also the dispersion of each year group by calculating the quartiles (upper quartile, lower quartile and interquartile range) which will also ensures that the figures that I am going to process and compare are only the true average (middle 50% of the data).

Middle

79.47

81

87.11

122

92.63

163

95.79

204

98.16

41

79.47

82

87.37

123

92.89

164

95.79

205

98.42

Stratified Sampling

As you can see, the data is too big and will take a huge amount of time to process and to analyse. So, I’ve decided to make the data much smaller. I will do this using the method of stratified sampling. Stratified sampling is used when there is a large amount of data that is needed to be process. The data will first be categorised (in this case the attendance figures all belongs to the correct year), then a random sample is then chosen from each category (I’ve used a scientific calculator). The amount or size of the sample is in proportion to the size of each category within the whole data. The proportion must be the same for all categories (in this case I’ve chosen 20% of the attendance figures from each year), so that the investigation is fair. Below is how I did it:

To make the data smaller and easier to process, I’ve stratified the data by putting it all in order according to each year. Then, I’ve used the RANDOM button on the scientific calculator to get a smaller new set of data from the original data. Below is how I did it:

Firstly, I’ve counted the amount of percentage figures in each year group and enter one of the amounts of percentage figures for a particular year group into the calculator. These numbers are important, as it will tell the calculator how big the range of numbers is, e.g. 0 – 100 or 0 – 1000. So, the calculator will not give you figures bigger or higher than the maximum number entered.

I’ve used the 2nd function button to enter the RANDOM (Ran#) mode. The calculator screen should then show the amount of data for a particular year group and the word ‘Ran#’.

E.g.        there are 246 percentage figures for the Year 9 (which also means that there are 246 students in Year 9).

So, I will enter the numbers 246 into the calculator and use the 2nd function button to use the RANDOM button (Ran#).

Therefore, the calculator screen should show the information below:

246Ran#

After that, all I did was press the equal button (=) continuously, while recording the numbers that comes up on the screen every time.

I will ignore the numbers that has already comes up once and put down the number that comes up next randomly on the calculator.

I will only record the amount of data that I will need. Since, that I will only be using 20% of the amount of the original data, I will take 20% away from the original amount of data.

E.g. in Year 9, there are 246 students (amount of data), I only want 20% of that amount. So, I will take 20% from 246.

20% x 246 = 49.2

Because ‘49.2’ is not a whole number, I will have to round it to get a whole number. I cannot use 49.2 because I am taking 20% out of 246 students (it is a discrete quantitative data) and you can’t have 49.2 students. However, I can’t round it down to 49 because then it will mean that 0.2 of a student is missing, so I will round it up to 50 instead. That way no one is missing.

So, I will record 50 set of different random numbers from the calculator and use the numbered list of attendance percentage figures for the Year 9 to get the new set of data.

Below are the lists of random numbers projected by the calculator for each year.

Year 7 (20% x 179 = 35.8 (36))

 1, 8, 14, 15, 16, 24, 26, 29, 34, 36, 37, 45, 56, 64, 66, 67, 68, 71, 73, 83, 87, 89, 103, 112, 123, 127, 136, 143, 153, 158, 162, 163, 164, 170, 171, 179

Repeated numbers: 56 and 103

Numbers replacing repeated numbers: 153 and 16

Year 8 (20% x 235 = 47)

 14, 15, 17, 20, 22, 28, 33, 35, 39, 41, 46, 47, 54, 60, 61, 63, 64, 67, 69, 75, 76, 78, 79, 81, 88, 90, 91, 93, 94, 96, 97, 120, 122, 128, 140, 141, 145, 176, 180, 192, 196, 201, 205, 222, 223, 220, 233

Repeated numbers: 90, 90 and 14

Numbers replacing repeated numbers: 141, 176 and 94

Year 9 (20% x 246 = 49.2 (50))

 5, 8, 10, 15, 18, 25, 27, 28, 34, 35, 37, 40, 43, 44, 45, 50, 53, 56, 59, 60, 72, 79, 95, 96, 100, 129, 130, 140, 145, 148, 151, 153, 158, 164, 166, 167, 168, 169, 178, 181, 183, 193, 195, 202, 213, 215, 234, 240, 241, 242

Repeated numbers: 50, 79, 100, 27 and 166

Numbers replacing repeated numbers: 164, 130, 59, 28 and 151

Year 10 (20% x 242 = 48.4 (49))

 3, 4, 8, 12, 16, 22, 24, 36, 44, 46, 49, 52, 58, 62, 76, 79, 83, 85, 91, 95, 98, 107, 114, 117, 121, 126, 127, 128, 136, 144, 145, 146, 153, 157, 158, 162, 173, 177, 185, 187, 195, 211, 221, 225, 231, 234, 240, 241, 242

Repeated numbers: 225, 4, 145, 79 and 146

Numbers replacing repeated numbers: 24, 83, 22, 85 and 114

Year 11 (20% x 243 = 48.6 (49))

 4, 5, 6, 7, 30, 32, 33, 35, 41, 46, 52, 64, 69, 70, 71, 75, 77, 80, 87, 92, 95, 96, 102, 105, 109 118, 120, 121, 122, 123, 131, 132, 136, 138, 140, 150, 151, 165, 173, 180, 188, 189, 203, 206, 208, 213, 215, 237, 238

Repeated numbers: 203, 80 and 203

Numbers replacing repeated numbers: 46, 118 and 180

Below are the attendance figures of all the students in Hamilton Community College that was given to me. The highlighted cells with the bold numbers are the new set of data (this is a stratified data).

Year 7

 1 47.47 46 87.3 91 93.39 136 96.56 2 56.52 47 87.83 92 93.65 137 96.83 3 59.79 48 87.83 93 93.92 138 96.83 4 61.64 49 87.83 94 93.92 139 97.09 5 65.87 50 88.1 95 93.92 140 97.11 6 66.14 51 88.1 96 93.92 141 97.14 7 67.36 52 88.36 97 93.92 142 97.24 8 72.22 53 88.36 98 94.18 143 97.35 9 74.6 54 88.48 99 94.18 144 97.35 10 74.6 55 88.62 100 94.44 145 97.35 11 75 56 88.67 101 94.64 146 97.62 12 75.66 57 88.89 102 94.67 147 97.62 13 76.05 58 89.09 103 94.68 148 97.88 14 76.98 59 89.15 104 94.71 149 97.88 15 78.57 60 89.15 105 94.97 150 97.88 16 78.57 61 89.17 106 94.97 151 97.88 17 80.16 62 89.68 107 94.97 152 98.15 18 80.95 63 89.68 108 95 153 98.15 19 81.48 64 89.68 109 95.24 154 98.15 20 81.75 65 89.68 110 95.24 155 98.15 21 81.87 66 89.68 111 95.49 156 98.16 22 82.01 67 89.93 112 95.5 157 98.41 23 82.54 68 89.95 113 95.5 158 98.41 24 82.54 69 90.21 114 95.77 159 98.41 25 82.8 70 90.21 115 95.77 160 98.52 26 83.33 71 90.48 116 95.77 161 98.68 27 83.33 72 90.48 117 95.77 162 98.68 28 84.04 73 90.52 118 95.77 163 98.94 29 84.39 74 90.74 119 95.77 164 98.94 30 84.66 75 90.74 120 95.77 165 98.94 31 85.92 76 90.74 121 96.03 166 98.99 32 85.09 77 90.76 122 96.03 167 99.21 33 85.19 78 91.53 123 96.03 168 99.47 34 85.45 79 91.54 124 96.12 169 99.47 35 85.71 80 92.06 125 96.3 170 99.47 36 85.98 81 92.06 126 96.3 171 99.47 37 86.24 82 92.31 127 96.3 172 99.74 38 86.24 83 92.33 128 96.3 173 99.74 39 86.51 84 92.59 129 96.3 174 100 40 86.51 85 92.59 130 96.34 175 100 41 86.77 86 92.59 131 96.56 176 100 42 86.96 87 92.86 132 96.56 177 100 43 87.04 88 92.94 133 96.56 178 100 44 87.04 89 93.12 134 96.56 179 100 45 87.3 90 93.15 135 96.56

Year 8

1

16.14

48

84.66

95

90.21

142

94.97

189

97.62

Conclusion

As you can see, my hypothesis here is proven right. However, I do believe that there are other factors that could affect the student’s attendance for example the environment they live in.

Hypothesis 2

There is a relationship between the attendance of the students and their exams results. Students who comes to school often or everyday, to learn, tend to improve and have much better exam results than those who don’t.

Fortunately, for this hypothesis, even though I do not really have a solid proof, I was able to spot out during the calculations of the Spearman’s rank correlation coefficient that the highest value added goes with the students that have a full 100% attendance figure.

Evaluation

In conclusion to this investigation, one thing that I really learned is that doing standard deviation takes a lot of patient. If I have a chance to do this coursework again, I would like to do it using the full data to make it more specific and possibly use other graphs and diagrams to display the data. Also I would like to link all of the results in a much clearer way.

This student written piece of work is one of many that can be found in our AS and A Level Probability & Statistics section.

## Found what you're looking for?

• Start learning 29% faster today
• 150,000+ documents available
• Just £6.99 a month

Not the one? Search for your essay title...
• Join over 1.2 million students every month
• Accelerate your learning by 29%
• Unlimited access from just £6.99 per month

# Related AS and A Level Probability & Statistics essays

1. ## Statistics. I have been asked to construct an assignment regarding statistics. The statistics ...

28,242 + 24,341 + 22,287 + 24,379 + 23,660 + 28,270 + 27,013 + 26,142 + 28,108 + 26,072 + 27,759 = 483,775 � 19 = 25,461 is the mean Chelsea Mean: 41,589 +

2. ## Statistics coursework

using clear bars to show the distribution of data, whilst also retaining the original data so mode, median and mean can be calculated from it. This is needed as the mode shows the most common results, the median shows the middle value when the data is in order and the

1. ## Anthropometric Data

will have a foot breadth of 57(mm) using the calculator is able to give a more accurate value of the mean point passing through the a child with a foot length 139.2 and foot breadth of 57.7(mm). Excel checking give a rounded up value of the mean point this may

2. ## Statistics. The purpose of this coursework is to investigate the comparative relationships between the ...

In theory, a perfect correlation would be 1 , but the likelihood of this happening with any set of data is astronomical. The R2,for this graph, describes how much of the variation in results can be explained by Mileage. Therefore, 54% of this data's correlation can be explained my mileage

1. ## Teenagers and Computers Data And Statistics Project

Other examples of different cuboids a. 4 x 5 x 7 No of faces painted Number of cubes 0 2 x 3 x 5 = 30 1 6 + 15 + 10 = 31 x 2=62 2 4 x 3 + 4 x 5 + 4 x 2 = 40

2. ## Maths Statistics Investigation

Seicento 5000 5 5980 1915 68 83 Land Rover Discovery 43000 7 30805 8715 71.7 88 Daewoo Tacuma 55000 6 12495 4675 62.6 89 Mercedes CLK 97-03 37000 9 28770 10075 65 90 BMW 5-series 88-96 49000 11 52655 5160 90.2 100 Nissan 100 NX 43000 11 13500 1005 92.6

1. ## Frequency curves and frequency tables

So, in the class interval 160-164, 160 is the lower class limit and 164 is the upper class limit. Class boundaries Notice that the class limits are chosen to that each score belongs to only one group. Since the height is measured to the nearest centimeter, the class 160-164 includes

2. ## AS statistics coursework - correlation coefficient between height and weight in year 11 boys ...

76.61 1.73 51 2.99 2601 88.23 1.69 51 2.86 2601 86.19 1.73 58 2.99 3364 100.34 1.65 54 2.72 2916 89.1 1.65 54 2.72 2916 89.1 1.52 38 2.31 1444 57.76 1.72 60 2.96 3600 103.2 1.63 72 2.66 5184 117.36 1.60 48 2.56 2304 76.8 1.63 52 2.66 2704

• Over 160,000 pieces
of student written work
• Annotated by
experienced teachers
• Ideas and feedback to