Testing 3 Hypotheses on Pupils Height and Weight.

Authors Avatar

                                                                                                             Jane Hird  10D

Statistics Coursework

Introduction

For this project we were given data about an imaginary High School. Data was given for all students and included eye colour, weight, IQ, Exam Results and many other categories. Looking at the data I decided to base my investigation on height and weight. I chose this as there are many different factors I could test. I will be looking to try and find similarities between the data given. I will be using secondary data because I was given the data and I didn’t collect it myself. I need to check all the data to see that it is reliable and that it is correct. I will be looking through it all to see that the data matches real life measurements. I will know that the data is incorrect if the average height is above 2.5metres or if the weight is over 70kg. I will do this because it is secondary data. Although there are positives to secondary data which are that you don’t have to go round getting all the data so it saves time. I will also look through the data to only find the data that I need for my investigation. I will also be looking for outliers to make sure that all my data is accurate. I will be using different graphs to compare the data.

Hypothesis

  • I expect that there will be a positive correlation between weight and height.  I predict this because usually as the student gets taller their weight also usually gets bigger.
  • I also predict that the older students will be taller and weigh more. I think this because usually the older you are the more likely you will be taller and weigh more.
  • I also predict that the boys will be taller and weigh more than girls. I predict this because usually boys weigh more than girls on average.

Describing the Data

The data that I have been given is secondary data. This means that it has been collected by other people. The opposite of secondary data is primary data which means data that is collected by yourself. There are advantages and disadvantages of secondary data. One advantage of secondary data is that you don’t have to collect the data yourself, so you don’t have to waste time. Although a disadvantage of using secondary data is that you have to look through all the data to make sure that it is correct and that it is reliable. So you have to check through all the data. So the main problem with secondary data is that the data can be unreliable because you don’t know where it has come from.

After looking at the data I was given I wanted to check what sort of data I had. I found that my main two pieces of data which were weight and height was continuous data. Continuous data means that the data is the proper value e.g. 6.47 or 9.865. Although I found that the year group was discrete data. Discrete data means that the value has been rounded to a whole number e.g. 7 or 9.

Data Validation

In data validation I have been making sure that all the data I have is correct. So I have been getting rid of all the outliers in the database. To do this I had to find the Median, Upper Quartile Range and Lower Quartile range. I ended with these figures:

When I had these I could work out the outliers of the data. To do this I used this formula to find the outliers that do not weigh enough:

=IF (G3<24, “outlier”)

After I did this all the outliers would say outlier next to them. I picked out the ones that had outlier next to them and deleted them. It highlights the ones that are below 24 which would be a outlier.

=IF (G3>72, "outlier")

I would use this to show all the outliers that are over 72. You then do this for the height column.

This is an example of an outlier:

Sampling

Random Sampling

Random Sampling is when everyone has an equal chance of being selected. You could do this by using a random number table or using a computer to choose the numbers. This sort of sampling is good for this project as I could easily do it as I am on a computer.

Join now!

Stratified Sampling

Stratified sampling ensures that a fair proportion of responses from each group are sampled. Stratified sampling involves putting data in more than one category. If I was to do this I wouldn’t have to put them into categories as they already are.

Systematic Sampling

Systematical sampling is a method by choosing at regular intervals from an unsorted list e.g. every 20 sets of data. This would be good to use as all my data is unsorted and would be good to use.

Cluster Sampling

Cluster Sampling involves putting numbers into groups ...

This is a preview of the whole essay