Systematic Sampling
This is a much simpler method of sampling which I will use for my second hypothesis. This includes collecting data from every tenth individual from the data. I will do this by counting every tenth piece of data.
Invalid Data
Some of the data collected contained invalid information. To deal with this I simply decide it would be better for my results if I did not include the record in my sample. Data on Ahmed Nolan and James Lewis IQ was missing so they were not included in my sample. There were also many unrealistic data cells For example cells which are unrealistic included Amy Brandward of year 10 who height was 4 metres 65 centimetres which is impossible. Also Danny Nicholas was another unrealistic cell who had a weight of 14kg and was in year 11 which is also physically impossible. These numbers were easy to deal with as they were just not included in my sample yet it does demonstrate that there are exceptions to all sets of data.
Analysis
Hypothesis 1
To investigate my first hypothesis, as my data is continuous, I have decided to use scatter diagram to try and discover a relationship between the average numbers of hours TV watched per week and the students IQ. The scatter diagram can be used to see if there are any possible links or relationships between the 2 variables.
If the points lie close together a trend line (a line of best fit) can be drawn and prove if a correlation is evident. There are three types of correlation- Positive, negative and no correlation.
Positive Correlation
Is when one of the variables increases it is in with direct proportion with the other variable also causing it to increase producing an upwards straight line graph.
Negative correlation
Is when one variable increases and the other decreases producing a downwards straight line graph.
No Correlation.
When all of the points are in random positions.
The following points will be plotted on a scatter plot.
The graph shows that there is a weak positive correlation meaning the longer average number of hours watching TV per week the higher the IQ.
The product moment correlation coefficient can be used to tell us how strong the correlation between two variables is.
If there is a perfect positive correlation (in other words the points all lie on a straight line that goes up from left to right), then r = 1.
If there is a perfect negative correlation, then r = -1.
If there is no correlation, then r = 0.
The product moment coefficient for the scatter graph above is
This means there is a weak positive correlation
This diagram completely disagrees with my hypothesis as it means that the greater number of hours of TV watched causes your IQ to increase.
I also decided to use a frequency polygon as further evidence to prove my hypothesis.
This is mainly to find a relationship between the variables. The two things that I will focus on when analysing the frequency polygons are the peaks and the spread.
This is frequency table of the students.
We can see from this the modal group, the one with the highest frequency for IQ is 100<IQ<=105 meaning most students in year 10 have an IQ between 101and 105.
The peaks of the average time watching TV per week and students IQ are quite far apart suggesting that TV plays not an important factor when studying a students IQ. Also the spread of students with a certain IQ shows to have weak strong correlation with the amount of hours spent watching TV.
This diagram agrees with my hypothesis to a certain extent. It points out that too little or too much TV is bad for a student as it can affect there IQ. The graph shows that the median time spent watching TV is the best for a student.
Thirdly to find out the median for both the variables and the modal group I will use a stem and leaf diagram. Stem and leaf diagrams slightly similar to bar graphs yet the number of each leaf represent the bars. These are useful to comment on the distribution.
Stem and Leaf Diagram of Average number of hours watching TV per week.
MEDIAN=25.5 Hours per Week
MODAL GROUP=20-29 Hours per week
Stem and Leaf Diagram of IQ of the above Students
MEDIAN IQ=100.5
MODAL GROUP=101-110 IQ
From these diagrams we can see clearly that the stem and leaf diagram agrees with the frequency polygon and the average number of hours of TV watched per week does affect your IQ as the average number of pupils who watch TV for 20-29 hours of TV per week have an average IQ of 101-110.
Overall it is possible to say that this diagram does not agree with my hypothesis and that the average amount of hours TW watched is the most promising for a students IQ.
From the analysis of the data for my first hypothesis it is possible to say that my hypothesis was correct to a certain extent.
Hypothesis 2
To investigate my second hypothesis I have decided to use a scatter graph to try and discover if the year 10s or year 11s have a greater spread of weight. The scatter diagram here will show the spread of year 10 and 11 students BMI upon a scatter graph.
The graph shows that there is a very large spread of students BMI across the scatter diagram.
From this graph it is possible to say that there is not as larger spread compared to the year 10s.
Also I worked out the product moment coefficient of the two graphs.
For the year 10 the product moment coefficient was 0.39767886.
For the year 11 the product moment coefficient was 0.424072703.
This proves that there was a greater correlation for the Year 11s proving that there will be a smaller spread of weights. This disagrees with my hypothesis as it proves that year 10s have a greater spread of weight.
Secondly I calculated the average BMI of the two year groups. To calculate BMI (Body Mass Index) the formula is weight / Height ^2.
YEAR 10 YEAR 11
AVERAGE BMI FOR BOTH YEARS
Year 10= 20.53
Year 11= 19.34
This important calculation straight away proves my hypothesis incorrect as it states that the average BMI for year 10 is greater than the average BMI for year 11. The calculation also agrees with the scatter graph as it proves that Year 10 also have a greater spread of weight.
Another calculation includes the range of both of the years BMI. To calculate this I must work out the maximum and minimum BMI for both year groups and then takeaway the minimum from the maximum to give me the range.
The maximum AND minimum BMI for the year 10s was
The maximum AND minimum BMI for the year 11s was
This shows that there is a greater range or spread of BMI for the year 10 students rather than the year 11 students. This calculation disagrees with my hypothesis along with the rest of the diagrams.
Thirdly I used a histogram to show the spread of weight for both year groups. Histograms have bars whose width is in proportion to the size of the groups of data each bar represents. The bars known as the class width can have bars of different width. Each bar is representative of the frequency.
This was the data to be inputted into the Histogram.
The vertical scale of a histogram is always labelled Frequency Density. To draw a histogram you must calculate the frequency densities using the formula
Frequency density = Frequency / Class Width.
These were the two histograms. (Next Page)
The year 10 histogram shows a spread of weight forming four bars. The year 11 histogram shows a spread of weight forming three bars. This proves there is a larger distribution of weight for the year 10s than the year 11s. This agrees with a all the other diagrams and disagrees with my hypothesis.
I also decided to use a frequency polygon as further evidence to prove my hypothesis.
The spread of pupils in year 10 and 11 will provide information on which years weight spreads out more.
This was the frequency chart on which the polygon was implemented.
The diagram shows that the year 11 BMI ranges from 14 kg to 27 kg while the year 10 BMI ranges from 16kg to 31kg. This proves that the year 10 spread of BMI is greater than the year 11s which once again disproves my hypothesis and agrees with the rest of the diagram.
Also the modal frequency group is 18-19 kg while the year 10s is 20-21kg. This means the year 10s have a greater BMI. Also from the polygon the year 10s BMI frequency is quite fairly shared out between all possibilities.
Hypothesis 3
To investigate my third hypothesis I used a cumulative frequency graph. Cumulative frequency graphs are effective at telling how often a certain result was achieved. The cumulative frequency is obtained by adding all the previous results.
The results were recorded in a cumulative frequency table.
These were the results for the males at Key stage 4.
These were the results for the females at key stage 4.
Here are the graphs. (Next page)
The cumulative frequency graph shows that the examples taken at the median show that the IQ for males is much higher than the IQ for females. However examples take at other places such as 20 the IQ for females is higher. This is a contrasting diagram as it agrees and disagrees with my hypothesis.
Also I used a stem and leaf diagram to show the distribution of IQ for males and females across the 2 year groups. Stem and leaf diagrams slightly similar to bar graphs yet the number of each leaf represent the bars. These are useful to comment on the distribution. Also they will help me find out the median and modal group of the genders.
Stem and Leaf Diagram of IQ for Key Stage Four Male Students.
MALE
25 Students
Modal Group = 100-110
Median = 100.5
FEMALE
22 Students
Modal Group=100-110
Median=96.5
From the results we can see that my hypothesis was correct as the median IQ for males is 4 marks higher than for females. Also there are many more students within the modal group 100-110 for males than for females as the females IQ spreads over a range of IQs. This also agrees with the cumulative frequency graph in saying that males have a higher IQ than females. This agrees with my hypothesis.
I also decided to use bar charts to as they are simple yet effective. A bar graph is a visual display used to compare the amounts or frequency of occurrence of different characteristics of data.
I did three bar charts. One to show the number of male and female students with a certain IQ in Year 10, one chart to show the number of male and female students with a certain IQ in Year 11 and finally a chart to show the overall number of male and female students with a certain IQ in Key Stage 4.
This shows that males and females are equally spread concerning the number of students with a certain IQ.
This also shows that males and females are equally spread concerning the number of students with a certain IQ.
Overall this diagram does not agree with the stem and leaf diagram and the cumulative frequency diagram as it states that there is a spread of IQ for each gender. This diagram does not agree with my hypothesis.
Conclusion
Hypotheses
- I am going to investigate the relationship between the more hours of TV watched and the students IQ in year 10.
- I am going to investigate that students in year 11 will have a greater spread of weight (BMI) than students in year 10.
- I am going to investigate the relationship between the gender and the IQ for students at Key Stage 4.
For my first hypothesis I have decided to reject it because the data analysis provided evidence that watching more hours of TV the greater you’re IQ. Evidence for this was provided a weak positive correlation on the scatter diagram. The frequency polygon and the stem and leaf diagram agreed with each other as they stated that the median amount of average hours of TV watched per week is the most effective for a good IQ.
For my second hypothesis I have once again decided to reject it because all the data analysis techniques and calculations used came back with negative results. The obvious result from this is that Year 10s have a greater BMI than the Year 11s. I believe this because the scatter graph provided a strong product moment coefficient for the year 11 scatter graph rather than the year 10 scatter graphs. Furthermore the calculation to find out the range for the Year 10 and 11 BMI came back with the Year 10s having a larger range meaning that they had a larger spread. In addition to this the average BMI for the year 10s was greater than the year 11s. Finally the histogram showed that the year 10s had a much greater spread of weight than the year 11s as it had more bars than the year 11 histogram.
Finally for my third hypothesis I accepted the hypothesis was correct because the data showed that the males generally had a higher IQ. For example the stem and leaf diagram proved that the median IQ for males was 4.6 higher than females. Also the cumulative frequency graph when taking examples proved that males also had a higher IQ. The examples taken were at the median and the inter quartile range the average of the tow for males was slightly higher although the bar graphs showed a large spread of the IQ for both males and females.
Evaluation
Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results. In many cases, we can easily determine the minimum sample size needed to estimate a process using a specific formula. Overall I believe that the sample size for my coursework was correct because Excel in which I calculated the sample size uses this formula. All I had to do was use the excel sample function which used the formula to calculate the formula size. This ensured that the sample size was correct so it was not too big or too small.
For the sampling I decided to use two methods of sampling so that I was able to compare for future reference which was more effective. After the investigation I believe of the sampling method that I employed, random and systematic sampling it is considerably more effective to use random sampling because of the disadvantages of systematic sampling. The main draw back of systematic sampling is that it is only representative if the population is arranged in a random way and not in a way that possibly could result in bias such as high and low values grouped close together. Furthermore random sampling is effective however to ensure it is random and bias cannot occur it should be repeated a number of times. However due to lack of time I was unable to repeat the sampling a number of times. Overall I believe that the sampling methods were appropriate and if I was to redo the investigation I would decide against using systematic sampling.
I believe that it is not possible to confer that the data was not biased because we are unsure of the collection techniques so we are not 100% of how the data was collected and if it was collected in a biased way or not. However we can be sure that bias did not arise by the people asked because the whole school was asked the same questions so bias could not have arisen by the people asked.
Overall for improvement to the investigation, I would have not used systematic sampling for one of my hypotheses.
To investigate the problem further, I believe that I could have collected data from my own school and process the data and analyse whether or not the data agrees or disagrees with my hypotheses. My collection techniques would have been unbiased and I would use the same techniques used in this investigation so that I could compare the overall results of Mayfield School to Sir William Borlase Grammar School. This would have provided interesting data to analyse and would have provided me with more data to prove my hypotheses correct or incorrect.