I will be testing the following hypothesis in my pilot study: The taller the student the heavier the student will be

Authors Avatar

Statistics Coursework: Mayfield High School

In this coursework I will be looking at data collected from students at a fabricated school Mayfield High School.  I was given information about things for each pupil such as age, IQ, weight, height of the students the school. From the data I was given I had to come up with a line of enquiry to explore.  I will be using statistical presentation methods like graphs and other various calculations to test the hypotheses. I am going to be looking at the height and weight of students at the school to see if the increase in height reflects the weight of a person and vice versa.

The data I am going to be looking at is secondary data.  The advantages of this are that it saves time (if correct) in collecting the data. Furthermore, secondary data allows me access to data that I could not have otherwise got.  However a disadvantage is that it may not be accurate could have parts of the data missing. The data could be biased. This means that there is not an equal chance for each student to be chosen at random. Furthermore, I also will not have any knowledge of the fact that it may be biased, as the person collecting the data may have changed it to their benefit, and also because I did not collect the data as it was done for me. The data may also be out-of-date.

I could have used primary data. This is because, unlike secondary data, it would give me unbiased data and it would be directly from a population. Also, it can give a better realistic view to the researcher about the topic under consideration. Furthermore, I would know how it was obtained, therefore ensuring that the data is accurate. However, there are also problems with using this kind of data.  Firstly, it is time consuming as the whole population is quite large. Also, it is quite difficult to record a large population and this may lead to loss of data or miscalculations by the researcher.

Pilot Study

A pilot study is a preliminary test to see if there is a line of enquiry to investigate further.  It is normally small in comparison with the main experiment and therefore can provide only limited information on the sources and size of variation of comeback measures. I am carrying out a pilot study to see whether I can investigate further into the line of enquiry. Furthermore, it will enable me to assess whether or not the hypothesis is realistic and workable.  I will be testing the following hypothesis in my pilot study:

  • The taller the student the heavier the student will be

I think this because the taller a person is the more muscle and fat contents in the body. Therefore, a person that is 6ft will be heavier than a person that is 5ft tall. However, this may not be the case because a person could weigh a lot and not be very tall but this may not always be the case. There are the 1183 students at Mayfield High School.   The table shows the number of students in each year group for males and females.

For my pilot study I must consider each of the 10 groups in the school (i.e. Year 7 Girls, Year 7 Boys, Year 8 Girls etc) to test my initial hypotheses.  To do this I will need a sample from each group because using the whole population will take too much time and may be difficult. I will take a stratified random sample of 100 students for my pilot study. This is because there will be an amount from each year group and that it is a fair proportion to take from the whole population of 1183 students. Also, as the amount from each year would be represented fairly, this should be accurate and reliable and therefore my findings should also be reliable.

A stratified sample takes a proportional number from each group in the population so that each group is fairly represented.  This is necessary when producing graphs or statistical calculations on more than one section of the population together, because taking the whole population into consideration is time consuming and quite difficult. Furthermore, a stratified sample can provide greater precision than a simple random sample of the same size.  The sample is calculated like this:

No. of each strata in sample =

I also need to carry out a random sample. Every person should have an equal chance of being chosen for your sample to make it fair and avoid bias.  A quick way of doing this is to give each student a number, which is already done for me on the Mayfield High School spreadsheet.

The stratified samples I took from each year are as follows:

So… 24+23+22+17+14= 100 students

I have chosen to include boys and girls in my sample. I calculated the number of students I will need from each year group. I then calculated how many girls and boys from that sample I will need. So for example, for year 7 I need a sample of 24 students. In year 7 there are 151 boys and 131 girls. I calculated how many boys and girls I would need from the year by:

Therefore, 13 boys and 11 girls will be in the sample of year 7. The following table represents how many girls and boys there will be from

each year:

Now I need to take a random sample of the number of pupils shown in the table above for each year group.  There has to be an equal chance for every boy or girl from each year group to be selected. The website from which the random integers have been generated from is:

I had to insert the number of random numbers I required and then had to give the values that the integers had to be between. For example, I needed 24 random integers for the year 7 sample and the numbers had to be between 1 and 283.

After carrying out the random samples these are the student numbers I collected:

Year 7:

Boys: 164        206        189        279        228

        170        143        150        210        177

        197        133        275

Girls:  118        31        120        5        68

        62        95        124        129        46

        90

Year 8:

Boys:  432        545        485        519        553

        450        438        513        531        471

        484        462

Girls: 376        400        357        379        343

        328        290        317        336        322

        324        

Year 9:

Boys:  703        705        756        714        749

        789        702        760        739        799

Girls: 656        584        674        689        669

        659        676        658        601        684

        786        810        

Year 10:

Boys:         915        977        1009        1013        936

        952        959        1013        968        

Girls: 815        885        854        908        886

846        821        877

Year 11:

Boys:         1114        1039        1037        1111        1120

        1182        1037

Girls: 1059        1093        1056        1061        

1033        1034        1076

I will now create a scatter graph, which is a graph of plotted points that shows the relationship between two sets of data. In my pilot study, each dot represents one person's weight versus their height, showing the data and also work out the correlation coefficient. This is because it will give me the correlation between the height and weight. The correlation indicates the strength and direction of a linear relationship between two random variables, in this case height and weight. Furthermore, I expect the correlation to be positive and give me confidence in investigating my hypothesis further. Spearmen’s Rank Correlation Coefficient (SRCC) is a more accurate method to compare correlation. This is because it gives me one number for each sample and therefore I can indicate and compare between the samples or each year. It uses the mean of each set of data and looks at the distance away from the mean of each point.  The formula, which is also known as the Correlation Coefficient or ‘r’ is

Join now!

        

(Where and are the means of the x and y values respectively)

The value of ‘r’ determines correlation.  It is always between –1 and 1.

The scatter graph should show that there is some correlation between the height and weight of the students. Furthermore, the software Autograph worked out the correlation coefficient for me and also the equation of the line of best fit. The graphs below will give you an idea of what the types of correlation look like.

...

This is a preview of the whole essay