I have now found out how many of each people to pick from each strata. In order to pick them I will have to use a mathematical method.
I am going to pick the students at random, using random stratified sampling. The reason I used random sampling when working out the sample size of each strata, and not to carry out the investigation because the results I will obtain will not be a true comparison of the data, and won’t be representative of the population size. For example the outcome of random sampling for males may give results from one group like year 7 boys, instead of coming from a range of different year groups like year 8 boys. So therefore the outcomes of random sampling are inaccurate, biased results which do not reflect the true population. Stratified sampling gives a better comparison of the data where accurate conclusions can be made.
I numbered the year 7 girls from 001 to 131, so random sampling can give me a number where I can get information on IQ and KS2 results from. I used the Ran # button on the calculator.
I pressed 131 Ran # = I did this 10 Times
Once I got the generated number from the calculator, I then did a few things with it;
- I ignored numbers after the point, for example IQ = 122.348, I wrote the number down as the 122nd person.
- If I get the same number generated twice, than I will press Ran # again on the calculator to ensure that I get 10 different people.
- For each strata there will be different amounts of boys and girls, so before the Ran #, I typed the total for each group. The amount of samples I pick will also depend on the population. For example if there are more boys than girls in year 7, than I will pick more samples for yr 7 boys than year 7 girls, reflecting the population inequalities and helps me to compare each year group accurately. By doing this, my data is now representative of the population.
- If the number produced from the random sampling leads to a person with missing data, than I will pick another person making my results constant and accurate.
All Boys All Girls
Yr 7 Boys Yr 7 Girls
Yr 8 Boys Yr 8 Girls
Yr 9 Boys Yr 9 Girls
I will repeat this process for each strata of students to ensure my results are fair.
Now that I have collected my data, I will now present it as raw data.
I have decided to examine these hypotheses;
“Boys are more intelligent than girls”
“The higher the IQ the higher the average Key 2 result”
I have chosen the hypothesis “Boys are more intelligent than girls”, because this will enable me to find summary statistics such as the mean, median, mode, range and standard deviation. These statistics will therefore allow me to compare genders directly and obtain accurate results. For example, if the boys have a higher mean than the girls, than this suggests that the boys are more intelligent than girls which will therefore prove my first hypothesis correct. I am going to use the IQ scores to measure intelligence because it is used worldwide. The IQ will be the dependent variable, because IQ usually dictates the Key stage 2 result.
The reason why I chose the second hypothesis is because it will allow me to work out a bi-variate analysis, and I will be able to produce lines of best fit and scatter diagrams.
I will analyse my first hypothesis which is “Boys are more intelligent than girls” by finding out the following data;
I will draw Cumulative frequency (C.F) curves to enable me to find upper and lower quartiles, Inter Quartile Range (IQR) and also find the median. The drawing of the C.F curve will help me to draw the box plot. I am going to work out the mean, median and mode to compare the genders. I will produce a stem and leaf diagram to group the data, and make it easier to read and to give me a visual idea about the spread of the data. I will draw a box-and-whisker diagram to show the minimum and maximum values, the upper and lower quartiles and the medians for my data, as well as showing the spread and indicating skewness. I am also going to use standard deviation to measure the spread from the mean, as well as showing the spread of the data.
For my second hypothesis “The higher the IQ the higher the average Key 2 result”, I will draw scatter diagrams to determine if there is correlation between IQ and Ks2 results. I will look for any association between Ks2 results and IQ and include a line of best fit to enable me to make predictions. I will find the equation of the line of best fit and write my results, and calculate the PMCC to measure the strength of the correlation on my scatter diagram.
The boys have a lower range which shows that the boys have less spread in the results and are concentrated in the higher IQ, ranging from the highest IQ of 112 and the lowest of 91. This suggests that the boys are more intelligent than girls which supports my prediction.
The girls have a higher range which suggests that girls have less spread in results and have mixed IQ’s ranging from a high of 132 to a low of 11. This suggests that the girls are not as intelligent as boys, which supports my prediction.
Use a Scatter Diagram to determine if there is correlation between two characteristics. Sometimes, though, if we know that there is good correlation between two characteristics, we can use one to predict the other, particularly if one characteristic is easy to measure and the other isn't.
The main advantage of a stem and leaf plot is that the data are grouped and all the original data are shown, too.
Boxplots show outliers
Indicate skewness and symmetry
All boys All girls
Yr 7 boys Yr 7 girls
I know feel I should make comparisons between the year 7 and year 9 students. I feel this will give me information on how different years achieve better standards, and if their results are better, which may indicate cleverer students maybe due to better teaching. I will be doing standard deviation to find out whether how much the deviation of height and weight from the mean changes as you progress from year 7 to year 11. From doing this investigation I have established that better and clearer results are obtained when boys and girls are separated.
Product Moment Correlation Co-efficient (PMCC)
I will use PMCC to determine how strong the correlation is on the scatter diagram.
Boys and girls may have their own trends so I will draw them individually. This will ensure that I can make detailed and accurate comments on each graph individually.
I am very pleased with the outcome of this investigation as I have able to prove or disprove my various hypotheses with sufficient evidence. To extend this investigation I could have looked at the Key Stage 4 results of years 10 and 11 to see if the IQ would also be closely linked with these results. I could have used larger sample sizes and more year groups for better accuracy and reliability.
Conclusion
After completing every aspect, I felt was necessary to find relationships between height and weight I feel I have conducted a successful investigation.
The plan I constructed I felt made my investigation make sense and so allowed me to get good results that I could compare to each other and find relationships and significances within my data for height and weight for the years 7 and 11. I think it was vital that I used random sampling to find the right number of students for each year because this was the only way I could be sure my results would be fair and unbiased. But even with random sampling I could not fully trust any results or conclusion I obtained with different mathematical methods because of the fact that I got those results from a random sample of the year and so therefore those results I obtained may only be down to chance. This means I could never say what I found out related to the whole year, but even so I think I got a rough idea of what was happening.
I then decided to see whether there is any relationship between the heights and weights of the year 7 students. I hypothesised that the taller you are the more you are likely to weigh. The reason for this hypothesis was that it seemed to be pretty obvious that the taller you are the more you are likely to weigh in most cases, but still I had to prove this and so I drew scatter graphs for height and weight and then I looked for correlation. For my first scatter graph I used boys and girls together. When analysing my results they indicated that there was positive correlation, however it was not very strong correlation. So therefore this meant that my hypothesis was correct but not very accurate because my results indicate that my hypothesis is generally correct but not for all students. I then hypothesised that separating boy and girls would produce stronger correlation, and my results indicated that my new hypothesis was correct because for boys and girls I found there to be stronger correlation than there had been before. Even with these results I can not be sure that they are accurate because they come from a random sample of students and so therefore the results may be down to chance.
For year 11 I again hypothesised that the taller you are the more you are likely to weigh, I thought if this was the case for student in year 7 then it definitely would be the case for year 11 student because there is going to be a more vaster set of results for year 11 students than year 7 students. This time I decided not to scatter graphs for the mixed group but just for the boys and girls separately, this is because last time I found that considering boys and girls separately gave stronger correlation and so therefore doing the mixed group would be a waste of time. For year 11 boys I found that there was strong positive correlation meaning that for boys my hypothesis was correct and that my hypothesis is more accurate for year 11 boys than year 7 boys. For girls there was weaker correlation than the boys but yet stronger than the year 7 girls, meaning that for girls in year 11 my hypothesis was partially correct but more accurate for year 11 student girls than year 7 girls.
For the frequency density of 7 year students I found for height and weight that I had to combine the lower and higher intervals together in order to make there frequencies large enough to plot on a histogram. This indicates that before any combining of intervals was made that the most popular intervals were in the middle intervals of my data.
I have recognised that there is an outlear within the results, so If my sampling picks I will not record it to keep my results constant, and general of the total population.
Then I did frequency density so that I could compare boys against girls for height and weight. Again I found that there was very little difference between boys and girls for height and weight. For median, lower quartile, upper quartile there was never a difference of more than three, indicating that for height and weight boys and girls are very similarly match when It come to things like their interquartile range.
For the averages of the year 11 students I found that there was much more difference between boys and girls for height and weight, this indicates that for year 11 there is a much different variety of results compared to year 7 students.
For the standard deviation of my results for year 7 and year 11 and I was sure that year 7 students would have smaller standard deviations than year 11 students, this is because by year 11 students are at very different stages of growth and development. Unless by the chance of my sample unlikely results proved otherwise. After doing all the standard deviations I found that for boys from year 7 had smaller standard deviations than boys in year 11. For girls I found that for height for year 7 girls had a smaller standard deviation than girls from year 11. However this was not the case for weight. I think this is so, because of the chance of my random sample, I am confident that with another sample of girls, year 7 girls for weight would have a smaller standard deviation than girls in year 11.
Overall I feel I have conducted a very successful investigation which has answered the accuracy of most of my hypothesises, however there have been some unexpected results which I am sure are down to the chance of the random sample. So therefore with another sample I may get different results. This proves I can not be sure any of my results are completely; they may all be down to the chance of my sample. If I were to do this investigation again I would change very little, especially not my plan which I feel played a very important role in the success of my experiment.
n=31
Mean = ∑x
n
= 3150
31
= 101.613 (to 3 d.p)
Standard Deviation = √∑x2 _ (mean)2
n
= √321992 – 101.6132
31
= √321992 – 10325.20177
31
= 7.85 (to 2 d.p)
The C.F curve will enable me to compare IQ’s on the same graph, and will help me to answer the following probability question,
“What is the probability a boy or girl has an IQ over 100?”
I will now draw a stem and leaf diagram to group the data, and make it easier to read and to give me a visual idea about the spread of the data.
From my stem and leaf diagram I can see that thee are three modal pointswhich are 97, 100 and 107.
“The higher the IQ the higher the average Key stage 2 result”
I will examine this hypothesis by drawing scatter diagrams as bi-variate data, finding the line of best fit and calculating the strength of the correlation by working out the PMCC.
I will now draw a scatter diagram for all pupils on the same graph.
On the x axis I will put the IQ because it is the independent variable because the IQ is what controls the Key stage 2 grade.