I will be testing the following hypothesis in my pilot study: The taller the student the heavier the student will be

Statistics Coursework: Mayfield High School

In this coursework I will be looking at data collected from students at a fabricated school Mayfield High School. I was given information about things for each pupil such as age, IQ, weight, height of the students the school. From the data I was given I had to come up with a line of enquiry to explore. I will be using statistical presentation methods like graphs and other various calculations to test the hypotheses. I am going to be looking at the height and weight of students at the school to see if the increase in height reflects the weight of a person and vice versa.

The data I am going to be looking at is secondary data. The advantages of this are that it saves time (if correct) in collecting the data. Furthermore, secondary data allows me access to data that I could not have otherwise got. However a disadvantage is that it may not be accurate could have parts of the data missing. The data could be biased. This means that there is not an equal chance for each student to be chosen at random. Furthermore, I also will not have any knowledge of the fact that it may be biased, as the person collecting the data may have changed it to their benefit, and also because I did not collect the data as it was done for me. The data may also be out-of-date.

I could have used primary data. This is because, unlike secondary data, it would give me unbiased data and it would be directly from a population. Also, it can give a better realistic view to the researcher about the topic under consideration. Furthermore, I would know how it was obtained, therefore ensuring that the data is accurate. However, there are also problems with using this kind of data. Firstly, it is time consuming as the whole population is quite large. Also, it is quite difficult to record a large population and this may lead to loss of data or miscalculations by the researcher.

Pilot Study

A pilot study is a preliminary test to see if there is a line of enquiry to investigate further. It is normally small in comparison with the main experiment and therefore can provide only limited information on the sources and size of variation of comeback measures. I am carrying out a pilot study to see whether I can investigate further into the line of enquiry. Furthermore, it will enable me to assess whether or not the hypothesis is realistic and workable. I will be testing the following hypothesis in my pilot study:

The taller the student the heavier the student will be

I think this because the taller a person is the more muscle and fat contents in the body. Therefore, a person that is 6ft will be heavier than a person that is 5ft tall. However, this may not be the case because a person could weigh a lot and not be very tall but this may not always be the case. There are the 1183 students at Mayfield High School. The table shows the number of students in each year group for males and females.

For my pilot study I must consider each of the 10 groups in the school (i.e. Year 7 Girls, Year 7 Boys, Year 8 Girls etc) to test my initial hypotheses. To do this I will need a sample from each group because using the whole population will take too much time and may be difficult. I will take a stratified random sample of 100 students for my pilot study. This is because there will be an amount from each year group and that it is a fair proportion to take from the whole population of 1183 students. Also, as the amount from each year would be represented fairly, this should be accurate and reliable and therefore my findings should also be reliable.

A stratified sample takes a proportional number from each group in the population so that each group is fairly represented. This is necessary when producing graphs or statistical calculations on more than one section of the population together, because taking the whole population into consideration is time consuming and quite difficult. Furthermore, a stratified sample can provide greater precision than a simple random sample of the same size. The sample is calculated like this:

No. of each strata in sample =

I also need to carry out a random sample. Every person should have an equal chance of being chosen for your sample to make it fair and avoid bias. A quick way of doing this is to give each student a number, which is already done for me on the Mayfield High School spreadsheet.

The stratified samples I took from each year are as follows:

So… 24+23+22+17+14= 100 students

I have chosen to include boys and girls in my sample. I calculated the number of students I will need from each year group. I then calculated how many girls and boys from that sample I will need. So for example, for year 7 I need a sample of 24 students. In year 7 there are 151 boys and 131 girls. I calculated how many boys and girls I would need from the year by:

Therefore, 13 boys and 11 girls will be in the sample of year 7. The following table represents how many girls and boys there will be from

each year:

Now I need to take a random sample of the number of pupils shown in the table above for each year group. There has to be an equal chance for every boy or girl from each year group to be selected. The website from which the random integers have been generated from is:

I had to insert the number of random numbers I required and then had to give the values that the integers had to be between. For example, I needed 24 random integers for the year 7 sample and the numbers had to be between 1 and 283.

After carrying out the random samples these are the student numbers I collected:

Year 7:

Boys: 164 206 189 279 228

170 143 150 210 177

197 133 275

Girls: 118 31 120 5 68

62 95 124 129 46

90

Year 8:

Boys: 432 545 485 519 553

450 438 513 531 471

484 462

Girls: 376 400 357 379 343

328 290 317 336 322

324

Year 9:

Boys: 703 705 756 714 749

789 702 760 739 799

Girls: 656 584 674 689 669

659 676 658 601 684

786 810

Year 10:

Boys: 915 977 1009 1013 936

952 959 1013 968

Girls: 815 885 854 908 886

846 821 877

Year 11:

Boys: 1114 1039 1037 1111 1120

1182 1037

Girls: 1059 1093 1056 1061

1033 1034 1076

I will now create a scatter graph, which is a graph of plotted points that shows the relationship between two sets of data. In my pilot study, each dot represents one person's weight versus their height, showing the data and also work out the correlation coefficient. This is because it will give me the correlation between the height and weight. The correlation indicates the strength and direction of a linear relationship between two random variables, in this case height and weight. Furthermore, I expect the correlation to be positive and give me confidence in investigating my hypothesis further. Spearmen’s Rank Correlation Coefficient (SRCC) is a more accurate method to compare correlation. This is because it gives me one number for each sample and therefore I can indicate and compare between the samples or each year. It uses the mean of each set of data and looks at the distance away from the mean of each point. The formula, which is also known as the Correlation Coefficient or ‘r’ is

(Where and are the means of the x and y values respectively)

The value of ‘r’ determines correlation. It is always between –1 and 1.

The scatter graph should show that there is some correlation between the height and weight of the students. Furthermore, the software Autograph worked out the correlation coefficient for me and also the equation of the line of best fit. The graphs below will give you an idea of what the types of correlation look like.