To prove my first hypothesis, (i.e. tall students are heavier than short students) I will use a sample

Mohammed Patel, 10G1

GCSE STATISTICS COURSEWORK

PLAN

The aim of this coursework is to find if there is a relationship between the height and weight of students in year 11 of different genders, and to see if taller, or older, or students who watch too much T.V are generally heavier or lighter. I am doing this investigation because I have a brother in Year 10 who is fat, short, heavy and watches too much T.V and I am intrigued by this to see if all female students in year 11 are like that or whether they are heavier or lighter if they are shorter, or watch too much T.V. I will gather the required information from the Mayfield high school data book provided by the teacher. I could go on the internet and collect the data but it might not have been updated for some time so it might be out of date. I couldn’t have gone to the school office because they don’t have information of height and weight. I could do a survey, but the problem with this is that it is very time consuming and it is disruptive in the school If I wasn’t doing this investigation, I would probably have been doing something else for example to see if students who watch too much T.V have a higher I.Q, but due to the shortage of time and the way the school timetables are set out, I figured that I would not have the time needed in order to complete this task in the appropriate way.

Some factors which may have caused me a problem if I did choose these hypothesis are, that firstly, some data in the booklet might be wrong or missing (e.g. someone might weigh 200k.g) and I wouldn’t know how much they really weigh. Also, by looking at the time allotted to me for completing the investigation I thought that the best investigation to do was to go for my second option which was to use the Mayfield data book to gather the information, my first option was to measure students in year 10 but due to the shortage of time my maths teacher said it isn’t a good idea to do it so late. I will use secondary data to work with as this is easier to work with because it is already done for you; however the problem with secondary data might be that it might not be reliable, another problem might be that it might missing data, to overcome this problem I can reject this kind of data have and also I will use continuous data, so you can work out the averages.

These are my hypotheses:-

Tall students are heavier than short students.
Older students are heavier than younger students.
Students who watch more T.V is heavier then students who watch less.

This is a preview of the whole essay

These are my hypotheses:-

Tall students are heavier than short students.
Older students are heavier than younger students.
Students who watch more T.V is heavier then students who watch less.

To prove my first hypothesis, (i.e. tall students are heavier than short students) I will use a sample. I have1500 pieces of data to work with, A 10% sample would be 150 students which is simply too large for me and will consume a lot of time. A 5% sample is about 75 students which is also too large a sample for me. A sample of about 60 will be right so that is what I will use. For the first hypothesis, I couldn’t use a convenience because of the time allotted for me to do this, but if I did do this but the problem with that will be that I would not know the difference between Year 7, Year 8 and Year 9 students as they wear the same colour of jumper. (It will be very difficult for me to collect the data.) I could use a systematic sample but the problem with that could be, for example take a sample of every fifth student, that I came across in the data book, it can be biased if low or high values occur in a regular pattern. I could use a quota sample but that wouldn’t be very much of a random sample because it would be I who would choose the students (from the data book) and this may bias or skew my results, also it is not very reliable. I could also use a cluster sample but this may also be biased because I may end up with only a certain type of people in a group e.g. a whole group of year 7. Although this is very unlikely, it is not a risk worth taking because my results would end up skewed.

So, in my opinion the best sampling methods to use are the stratified sample for hypotheses 2 and 3 and a random sample for hypothesis 1. This way my results will be more reliable as they won’t be biased or skewed. To use a random sample I will use the random number button on my calculator and then round up the decimal number that is given. For example I would round 37.323 down to 37 and round 37.545 up to 38. I may face a problem in this because if I want, for example, to pick a random sample from a total of 112 it will not be very likely for me to get 112 as all the other numbers as it is the maximum number. So what I will do to solve this problem is if I get a number under 0.5 or over 111.5 I will call it 112.

So, back to my method of supporting my first hypothesis; I will use my secondary sample of 60 students and then plot them onto a graph. I will use them as a control group. Then I will get another sample of 60 Year 7 students from the secondary source provided, that is, the Mayfield school database. I will then plot that data onto a new graph. Then I will find the line of best fit after finding the mean point and then find the equation for that line of best fit and then use it to predict data which will be both interpolated and extrapolated. Then I will calculate the Spearmans Rank Correlation Coefficient to find out how strong the correlation is between the height and weight. If the two correlations are similar then I will continue supporting my other hypotheses using the Mayfield data.

For my second hypothesis, (i.e. older students are heavier than younger students) I will firstly take a stratified sample of students from each year and then select them using the Mayfield data book provided. As the students from all years and of all sexes are already jumbled up, there would be no point in me trying to find a random sample. Instead I could just go down the list and pick, say, the first 15 year 7 students I come across. After doing this I would put all of the data together and find out the mean and standard deviation from each year group. That way, it will tell me if older students are lighter and how far away from the mean the data is spread out. I have chosen to do standard deviation in this because it is a very accurate way of finding out how far away data is from the mean.

For my third hypothesis (i.e. girls are lighter than boys) I will take another stratified sample from each year group of girls and boys and go down the Mayfield data book and again pick out for example, the first 7 girls I come across. This way my data is unlikely to be biased. After doing this I will manipulate the data to create a two-way table, categorising the weights of the girls and boys, into class widths. After this I will find out the frequency- density of the data and create a histogram for both my male and female pieces of data. From this I will draw a distribution curve and see which way more of the data lies. Also from my frequency- density table, I will also work out the cumulative frequency and draw a cumulative frequency graph for that. From the cumulative frequency graph I will find out the upper- quartile, the lower quartile and the interquartile range. Also from the cumulative frequency graph I will draw a box and whisker diagram and compare the weights of boys and girls.