# Perform a statistical enquiry that will either prove or disapprove my hypothesis.

Statistical Enquiry

Aim: To perform a statistical enquiry that will either prove or disapprove my hypothesis.

Hypothesis: Higher the persons IQ, higher the SAT results

Method: In this statistical enquiry, I aim to find out if people that have higher IQ have better results in English. For me to be able to perform the enquiry I had to have data about the IQ of students. I have a database of pupils of Mayfield High School, which is a fictional school based on data from real schools, complete with their IQ. Because it is based on real schools, the data will be reliable and accurate enough for me to draw a conclusion from the enquiry. There are 1183 pupils at Mayfield High School- I decided to compare 10% of general population. This is roughly 100 pupils. The hundred people I get from the general population will be my sample. Since there are different strata’s in my data, I had to find out what number of boys or girls from each year group I am going to take from my hundred samples. For that I used stratified sampling.

## Stratified Sampling

Students in Mayfield high school are the population of the school.  The population of the school are the pupils that are being studied. Since in Mayfield High School there are 1183 pupils, it would be very impractical for me to study and compare all of the data. I have to take a sample- a smaller group of people from general population. I have decided to take 10% of general population as my

sample.

In the Mayfield School database there are different strata’s for the data. Strata are distinctive non-overlapping subsets of the population. In this case the data has been divided into the year groups of the population, then additionally by their gender.  It looks something like this.

The sample you chose for the investigation should be representative of the population. It should take account of variation in the characteristics of the population. These variations should be represented in the sample in the same ratios as in the total population. That means that in the total population there are twice as many boys as girls, my sample should include twice as many boys than girls. Taking the samples in this way is called Stratified sampling.

Stratified sample is one that reflects the way the original data is distributed. I have to make sure that all of the strata’s in my sample are proportional to the original data-they have to be stratified.

I stratified my data by finding out the fraction of pupils compared to the total number of pupils, then times by the total number of samples I am taking. For my particular database I had to do it like this:

So form each year I need:

Now I have found I out how many pupils I need to take from each stratum for my sample to be proportional to the original database. Now I have to take those samples from the main database and create a new one with the hundred samples I decided to investigate. For that I decided to use Random sampling.

Random Sampling

When you are trying to make a statistical observation from a population, you want to make sure that your data is as accurate as possible. For that to be possible you need to ensure that the samples you take are random, that is to say that every member of the population has the same chance of being chosen. To achieve that we use Random sampling. For example, random sampling is when you have bag full of balls, some red, some blue. You need to take 5 balls from the bag. Random sample  would be if you just put your hand in without looking and picked a ball out. Both red and blue balls would have the same chance of being picked.

That sounds simple, but when you are dealing with large populations, the process is far from simple. There are many ways you can perform random sampling for a database. However, I have decided to do my own random sampling using Excel on my computer.

I used different formulas to get the random numbers. I did the random number sampling for each strata in my database, which it is to say for each year I did one for boys and girls separate. This is how I did it.

Formulas in Excel each had there own purpose to ensure that. This is how it was possible for me to take those random samples.

Using this method ensured that my sampling would be as accurate and unbiased as possible. The samples I got were completely random. Everything depended on what number would be chosen between 0-1 and that choice was done by computer. As a result any number between 1 and number of pupils in that stratum had an equal chance of being chosen.

This method gave me all the Random numbers I needed. Now I knew which samples to take from the database. I had to create a new database with my chosen samples. From my random sampling I looked at the numbers Excel chose and I found them in the main database. Since all of the students were numbered from 1 to 1183, I could find the pupils I needed easily. To make things more convenient, I sorted my data by the year of the students and there gender. Now I can find the necessary information and copy it to a new database of my chosen sample.

When I copied all of my hundred samples into the database, I had necessary information to complete this statistical enquiry. I had the IQs and English SAT results of each person.

Now  I can prove or disapprove my hypothesis. However, before I am able to do this I must first do the summary statistics of my data.

Summary statistics

The aim of summary statistics is to replace a huge, indigestible mass of numbers (the data) by just one or two numbers, that together convey most of the information. This is what I will tried to do here by doing all the averages of the data. I am trying to convey my sample data into few numbers which represents the sample. My summary statistics shows the averages of the sample of the Mayfield High School. Since the sample is supposed to represent the entire population of the school, the summary statistics show what average for that school is. Considering this, any result I get concerning the relationship of IQ and SAT results is only to the relation of the averages of the school.

Frequency

First thing I decided to do was a frequency table of my data.  Frequency table of both pupils IQ and SAT results will tell me how often a certain value occurs. This will be very useful later on, when I have to do averages of my data.

My data is quantitative and discrete. This means that it consists of numerical data. Furthermore, ‘discrete’ means that the numerical values can be put into a list consisting of the possible value. However, for my IQ, the values possible are too numerous for me to put them into frequency table. Instead, I will put them into a group frequency table.

Group frequency table for my IQ

We use group frequency tables when the quantitative data has a wide range of values. It has more sense to group sets of data together. The frequency table is display of frequency distribution. Frequency distribution is the number a particular value occurs in a set of data.

For my IQ frequency table, I am going to put my data into groupings and find the frequency for that particular grouping. We call those groupings class intervals.

Frequency table

This is a grouped frequency table for my IQ. It shows me the number of times each value occurred in my data. From this I can find out that the largest number of people got the IQ between 100- 109. I also included cumulative frequency into the table. Cumulative frequency is running total of the frequency at the end of each class interval.

Cumulative frequency will come useful when I have to do the interquartile range.

The frequency tables can also be displayed graphically. One of the ways ...