Maths Statistics Coursework - Mayfield School Data

Maths Statistics Coursework – Mayfield School Data

Louise Bishop

5A

Task Description and Specifying the problem and Plan.

Specify and discuss the Hypotheses

I will investigate the following statements. I was interested in how age would affect a number of factors and so I chose to investigate the age variable. I decided to investigate the following hypotheses:

How does age affect height? You get taller as you get older
How does age affect weight? You get heavier as you get older.
How does age affect IQ? You get more intelligent as you get older.

I hope to be able to draw firm conclusions from these three hypotheses. I will also be interested to see the conclusions to the third hypotheses to see if you get more intelligent with age. As I thought about this, I realised that there is a subtle difference between IQ and exam results and so I decided to investigate an extra hypotheses.

How does IQ affect exam results? A high IQ guarantees good exam performance.

I think this statement will be particularly interesting as I do not think that intelligence would necessarily suggest a good exam result. People with a very high IQ may well expect to get good exam results, but would this trend be consistent? My final hypothesis is to investigate the merits of a high or low IQ.

Does a high IQ suggest strength in a particular subject? People with a high IQ tend to be mathematicians or at least score well in mathematics.

This will be an interesting hypothesis to investigate. I am not sure how I will investigate it at this stage but I hope to have formulated some ideas when it comes to processing the data when analysing this hypotheses.

I will use numerical methods to analyse the data as well as graphical. I will take mean by the formula for the mean. I try to establish the median and IQR by using cumulative frequency graphs. I may also calculate the standard deviation where appropriate to see how consistent the data is with the mean.

During this investigation, I will use correlation or scatter graphs to be able to ‘see’ correlation between variables. As mentioned above, I will use cumulative frequency graphs, histograms and box and whisker plot. I will also try and use a stem and leaf to compare two distributions.

For this project, I have decided to concentrate on numerical data as I believe statistical techniques and calculations can be made easier and more accurately with numbers. Hopefully this will lead to my conclusions being as meaningful and accurate as possible due to the precision use of the techniques.

Gathering Information (Formulating the Sample)

Year 7 boys = 151/1183 x 60 = 7.65 = Sample size of 8

Year 8 boys = 145/1183 x 60 = 7.35 = Sample size of 7

Year 9 boys = 118/1183 x 60 = 5.98 = Sample size of 6

Year 10 boys = 106/1183 x 60 = 5.37 = Sample size of 6 (rounded up to make sample add up to 31 as shown in final calculation)

Year 11 boys = 84/1183 x 60 = 4.26 = Sample size of 4

When added together before rounding, the stratified sample for each year group of the boys adds up to 30.61, which equals 31, but due to rounding, the actual sample size only adds up to 30. Therefore, to decide which size to take (30 or 31) I decided to find out what the basic gender sample should be.

Boys = 604/1183 x 60 = 30.63 = Sample size of 31.

This means the sample size of the boys should add to 31, so to overcome this problem, I rounded up the sample size for the Year 10 boys, as it was the closet to the higher integer. I now had to calculate the stratified sample for the 29 girls left to make up my sample.

Year 7 girls = 131/1183 x 60 = 6.64 = Sample size of 7

Year 8 girls = 125/1183 x 60 = 6.33 = Sample size of 6

Year 9 girls = 143/1183 x 60 = 7.25 = Sample size of 7

Year 10 girls = 94/1183 x 60 = 4.76 = Sample size of 5

Year 11 girls = 86/1183 x 60 = 4.36 = Sample size of 4

From these calculations, I am able to draw up a table to show the numbers of pupils in my sample of 60.

Table to show members of the population to make up the stratified sample of 60

From the above table, I will be able to formulate my sample. I will do a random sample and take the numbers illustrated above from each class until I have the correct number of members from each group of the population to make up my stratified sample of 60.

This is a very fair sample as it allocated a sample placing in relationship with the number of member of the particular class of the population. I hope, by taking these steps, I have done all I can to reduce bias as much as possible, although there is a grey area around the Year 10 boys as explained above.

There are still, however, limitations to the sample. It may be too stratified, as show with the differences in the boys’ sample, and the differences between the total in each year and the actual theoretical number when worked out on the calculator.

Total Year 7 members in sample (using addition): 15

Year 7 = 282/1183 x 60 = Sample size should total 14

Here we can see that the addition and stratified sample differ, but I do not think that this will pose too much of a problem during the investigation and I don’t think it will affect any conclusions drawn.

A bigger problem is the size of the sample. At only 60, it is a very small sample and despite stratifying probably does not represent the population well enough and does likely induce bias, but it is the best that can be done with time constraints and the sheer logistics of working with a large sample.

Discussing techniques that will be used and justify

When looking at age and height I will construct a histogram for the heights of the Year 7 pupils in my sample and analyse the shape of the histogram. I will then do the same for Year 10 pupils and compare the two distributions, which will enable me to draw conclusions on how age affects height. I will then construct a cumulative frequency curve for height and this will give me an alternative view of the spread of the distribution and I would also like to look at the IQR and overall range, to see if a big growth change takes place over the years. I will also use the histograms to construct frequency polygons to easier compare the distributions.

I will also construct a cumulative frequency curve for weight and I will observe the same techniques that were derived from the height graph. This will enable me to see if children have a phase of being short and fat or tall and skinny. To do this comparison, I will look at the shapes of the two cumulative frequency curves and look at the spread of data; ranges and I should also be able to calculate the mean, median and modal classes from the grouped frequency tables for height and weight.

To look at age height and weight together, I will work out the ‘density’ of all members of my sample. This would be in the form ‘cm/Kg’. I will then work out mean for each year group and then plot a line graph. This technique will enable me to discover and trends in height and weight as people get older, which I think we be interesting.

To look at IQ and SATS results, I will construct a simple scatter graph to show instantly any correlation or link between the two. I will then construct a cumulative frequency graph for IQ and I will be able to see if IQ varies or the distribution in the sample is close to the mean. With these results, I will be able to infer a similar relationship between IQ and SATS results, depending on the correlation shown in the first scatter graph. The reason I do this, is because I cannot directly compare age and SATS results, as the ...