The items for the sample should be selected to be representative of known characteristics of the total items. For example, if there was twice as many boys than girls in Mayfair high school, the sample should also contain twice the amount of boys. If there are no known characteristics of the total, then the sample is to be chosen so that any one item is as likely to be picked as the other. This is called Random sampling.
Random sampling is the technique that I will be using in this investigation. Random sampling means that every member of the population has exactly the same chance of being singled out. A random sample is obtained by giving each member of a group a number, the sample is then selected by using random numbers taken from a table or from a calculator or computer on which the number has been generated.
There are two ways of using random numbers. One is by drawing up a random number table, and the other by means of a calculator.
Random number tables
Random number tables consist of digits from 0-9. They are in random order. The digits may be arranged in rows or columns to present the data better, but a basic one looks like this:
Random digits can be obtained by reading across or down the grid of numbers; a random starting point must be chosen.
If a random sample of 5 cards needs to be obtained from the suit of clubs, the cards would first need to be numbered. There are 13 cards in each suit, therefore the cards would be numbered 1 to 13.
To obtain the 5 random numbers from the list of 13, two figure numbers between 01 and 13 would be needed. If the random starting location was the fourth figure in the second row, and reading across in groups of two because two figure numbers are needed, the numbers obtained are:
84,39,74,52,26,36,32,03,24,74,00,55.......
All duplicates and numbers larger than 13 must be discarded, this process continues until 5 numbers between 01 and 13 have been found. This method is too lengthy to be used; luckily, there is another, faster way in which the 5 digit can be obtained. This is to take the first five numbers and turn them into decimals. In doing this, the numbers obtained are:
0.84, 0.39, 0.74, 0.52, 0.26
They must then be multiplied by the total sum of cards that are being sampled. These random digits will then therefore become:
0.84 x 13 = 10.92 rounded = 11
0.39 x 13 = 5.07 rounded = 5
0.74 x 13 = 9.62 rounded = 10
0.52 x 13 = 6.76 rounded = 7
0.26 x 13 = 3.38 rounded = 3
As the numbers being dealt with are all whole, i.e. there are no half cards or bits of cards in a suit, the numbers are rounded, leaving the random samples that were being looked for at the beginning.
This can be a long and laborious process, and takes much time. However, thankfully there is a much shorter way of producing the results, by using a calculator. I will be using this process for the duration of the project. Most calculators now have a key that produces random numbers, by using this key five times; the desired number of random numbers is produced between 0 and 1. These are then multiplied by the total number of data, giving the random sample.
In order to investigate some of the lines of inquiry of the data, first it must be sorted and then a representative sample taken from it using the random number technique. I sorted the data on the database that it was provided on by year group and gender so that it is in order, which will give a more accurate result.
Now this has been done I can start taking pupils from the list that is on the database. There are 1183 pupils and I have numbered them from 01 to 1183. About 60 pupils overall will be needed to create an adequate number of pupils for this process. Before the random numbers can be collected though, the amount of boys and girls from each year needs to be determined, and as I said before- it must be proportional to the full amount, so that the results are accurate.
In the above table, I have produced the exact numbers of both exes from each pupil in every year in the school. The process involved in this was to simply divide the total sum of pupils by the number of the specific gender in each year, then multiply it by 60, the number of pupils’ data that is needed overall. This gave a number to decimal places; the number was rounded, giving the amount of pupils to a whole number so the random process can be performed.
As in the example, the random number process will be performed by giving each pupil a number, but each year group has been split into two categories- the boys and girls. Each pupil will be given a number according to the number of this gender in the year, and then multiplying the total sum of pupils by a random number, this will give the number of a certain pupil within the number boundaries that have been set.
The random numbers have been split into two categories for each year group, the boys and girls. This not only allows the information for each group but opens a greater number of lines of enquiry, so that hypothesis’s may be able to place boys against girls in height and weight etc. If I were to get any random number twice I would have to take a different one.
Standard Deviation
For the next section of my coursework, I will be using a system called standard deviation. When a group of numbers is averaged, it may be an unrepresentable number to the group. I.e. one number may be too far out and change the position of the average by an unfair amount. Standard deviation can be found by:
- Determine n= number of units of what is being studied e.g. students scores etc.
- Calculate the mean.
- Then subtract the mean from the score to give the deviation.
- Square the deviation to give the squared deviation.
- The standard deviation is found by: Square root (sum of squared deviations / (N-1)
This can be written in formulae format as:
n = number of units e.g. Height
X = units e.g. Height
At the bottom of the first column is the sum of the scores (30) and at the bottom of the second column is the sum of the squared scores (220). We need these quantities and the number of scores (5) to calculate the standard deviation.
To calculate the standard deviation for a sample the formula would look like this:
The standard deviation of the numbers 2, 4, 6, 8 is 3.162. This would be the case if the five scores were considered a sample. A different result would be given if the numbers were not used as samples but I will not be using that formula in this investigation.
The faster way of using this formula would be to use a calculator, as I will be in this investigation.
An example of using this formula is if a business is looking for new employees. On he advert it puts: Average wage £35.00 an hour. The company does not, however, put that the wages of the three current employees is one, three and the top brass of 100 pounds an hour.
The average is 35 but the man earning 100 is blowing it out of proportion. If the standard deviation were performed, the answer would be 46.2. This is way to high, the standard deviation is high and therefore it shows a wide dispersion in the data. If this was to happen in the graphs I would remove the troublesome piece of data.
Prediction 1- I predict that bys in Mayfield high school are generally taller than girls.
This prediction does not need a great deal of research to prove. For this prediction, I will need bar charts. The average of boys and girls in each class will be placed next to each other and the overall average at the end. The class average shall be taken from the samples boys and girls in the year, averaged, and placed onto the bar next to the opposite gender. The overall result will be taken by a) showing how many bars are higher for each gender and b) by taking the overall average of all the samples of each year.
Averages
Average height of boys in year 7 = (1.50 + 1.67 + 1.45 + 1.49 + 1.52 + 1.65 + 1.54 + 1.55)/8
=1.55m sd: 0.072
Average height for Girls in year 7= (1.80 + 1.62 + 1.60 + 1.51 + 1.42 +1.56 + 1.54)/7
= 1.6m sd: 0.11
Average height for boys in year 8 = (1.75 + 1.83 + 1.72 + 1.60 + 1.82 + 1.90 + 1.71)/7
= 1.56m sd: 0.091
Average height for girls in year 8 = (1.74 + 1.69 + 1.72 + 1.54 + 1.52)/5
= 1.64m sd: 0.093
Average height for boys in year 9 = (1.54 + 1.78 + 1.77 + 1.65 + 1.32 + 1.70)/6
= 1.63m sd: 0.16
Average height for girls in year 9 = (1.6 + 1.75+ 1.76 + 1.65 + 1.65 + 1.59 + 1.62 + 1.58)/8
= 1.44m sd: 2.58
Average height for boys in year10 = (1.72 + 1.57 + 1.75 +1.50 + 1.62 + 1.73)/6
= 1.65m sd: 0.092
Average height for girls in year 10 = (1.55 + 1.55 + 1.53 + 1.55 + 1.72)/5
= 1.58 sd:
Average height for boys in year 11 = (1.8 + 1.67 + 1.61 + 1.8)/4
= 1.72m sd: 0.082
Average height for girls in year 11 = (1.63 + 1.62 + 1.56 + 1.75)/4
=1.64m sd: 0.069
As the graph shows, the boys in the 3rd to 5th quarter are more than in the 1st and 2nd.
Prediction 2-I predict that children in Mayfield who live further from the school will weigh less than those who live closer, due to the fact the journey gives them more exercise.
This prediction can be solved by taking the children from each year and plotting their weight against the distance they travel to school. In this scatter graph, a negative correlation will be needed to prove the theorem. This is because the further from the school that they get the lighter they will be.