There is a weak positive correlation between height and weight for both girls and boys at Mayfield High School.

Maths Coursework

For my coursework, I will need to collect and use data on a number of students. I can obtain this data from a database provided by the exam board about Mayfield High School students. This is a fictitious school but the data is based on a real school, therefore the data that I will use from the database is reliable and will therefore let me draw conclusions accurately.

The database contains information about 1183 students. As this would be very difficult to use and analyse, I will be sampling from the total number to provided much smaller yet accurate information for me to work with.

Hypotheses

1. There is a weak positive correlation between height and weight for both girls and boys at Mayfield High School.

I will begin to assess this first hypothesis by first taking a small fraction of the total number of pupils in the school, in order to investigate the relationship between height and weight at Mayfield High School.

I will take a stratified sample of 47 pupils, as 4% of the total number of 1183 is 47.32. This should give me a suitable number to sample and work with as it isn’t too large or too small, as to not deliver an accurate reading.

This is achieved, by taking the total number of pupils, either boys or girls, and dividing it by the total number of pupils through out the whole school, and then multiplying this by the total number that you want sampled.

To decide which pupils will be selected from each year groups, a random number generator will be used to make the method of selection unbiased and allowing every pupil the same chance of being selected. This can be done on either a calculator or in Microsoft excel.

It is not possible to choose a random sample using personal judgement. Although random samples will give results containing errors, these can be predicted or allowed for. The type of error introduced with judgemental sampling is unpredictable and corrections cannot be made. This type of unpredictable error is called bias. This is why I will not sample using personal judgement and use a random number generator.

Once I begin to take my samples, a three-digit number will appear with a decimal point in front but this will not be taken into account and will be ignored. However, by using a calculator, you cannot form a range from numbers to be chosen from. For example a random number between 1 and 30 could be chosen by using excel. This would be less time consuming than using a calculator. Therefore for this hypothesis I will use the random number generator on excel to select which students I will use.

Having got my samples, I will plot the points on a scatter graph, with both the boys and girls on the same graph, so it will be easier to make a comparison between the two correlations. I think that the scatter graph is the best method to use to analyse this hypothesis because if there were a relationship between height and weight amongst the girls and boys, then it would be easier to spot. I would use the PMCC, to find out the strength of the correlation and compare the two correlations.

2. The spread of data for height between the lower and upper quartiles will not be skewed either way for yr 7 girls compared to year 11 girls.

In order to investigate the hypothesis above, I will begin by taking sample of about 28 students from both year groups, and using a random number generator I will do this. I believe that my sample will be of an adequate size, as over 28 students could be very time consuming, although taking below 28 students could be too small and I don't think that it would give an accurate reflection on the year groups being tested.

To choose which pupils I will be sampling, I will use the random number generator, as it will sample without any bias.

Having selected the 28 students from each of the year groups, I will present my data in two box plots comparing the spreads of data for the year groups being studied. Although, before this I will present the data ain a cumulative frequency curve and use it to find out the values of Q1, Q2 and Q3. I will use a cumulative frequency graph before constructing the box plots, as I believe it is the easiest way to interpret the quartiles from.

In order to draw these box plots comparing the heights of pupils from year 7 and year 11, I will work out:

The lowest value- this will be the lowest height from each year group.

The greatest value- this will be the greatest height from each year group.

The lowest and greatest values will provide the whiskers on the box plots.

The quartiles will be worked out by the following formulas:

Q1- n/4 = 28/4

Q2- n/2 = 28/2

Q3- 3n/4 = 84/4

Having worked out the quartiles, I will construct the box plots on the same scale, so that comparing them is easier. Box plots also show the spread of data well and the interquartile range is a better measure of spread than the range, as range can be greatly affected by a few extreme values. The median is also unaffected by outliers or extremes.

3. There is a higher concentration of closely related BMIs for boys in year 8 compared to boys in year 11.

I will start testing this hypothesis by first taking a selection of each year group for sampling. I will achieve this by taking a sample of 28 boys from each year group. In order to take an unbiased sample, I will use the random number generator in Microsoft Excel to choose which pupils I will be using to analyse my hypothesis.

Having selected my samples, I will extract their heights ...