The reason I have not chosen to use a systematic sample is that it would not work with my second and third hypothesis. What I mean by this is that there are too few of numbers to use it as it would almost be like picking the first 50 results instead of being fairer.
The next four pages show the gender, the average number of hours TV watched per week and the total exam score for Key Stage Two for the whole of Mayfield’s year 11 students.
With the data in the spreadsheets above, I used the sampling method of random to select 50 different girls and 50 different boys to be used for all three parts of my hypothesis.
TV watching and exam results are negatively correlated.
For this part of my hypothesis I am hopeful to prove that students who do watch more TV end up receiving lower grades. Below is the just the total of the Key Stage Two exam results and the average hours of TV watched by the 100 year 11 students that I had sampled out.
TV Results TV Results TV Results TV Results TV Results
With these results I will place them onto a scatter graph.
This scatter graph shows no correlation and so does not support my theory. Instead of proving that the more the average amount of TV watched in a week the lower the exam results of the students it shows no relation. Thus, I reject this part of my hypothesis. There was one anomalous result which is circled on the graph however I cannot say it is incorrect as it is a piece of data. However as this person who watches 83 hours of TV is quite rare and doesn’t seem to represent the whole population; I will be excluding this data and will replace it.
Coefficient of Regression
For this part of my hypothesis I decided that it was vital to find out the coefficient of regression to decide whether or not my results had a strong relationship at least even if my theory was proved incorrect.
The formula to configure the coefficient of regression is-
r = Sxy
Sxx x Syy
On the next pages are all my working outs for the coefficient of regression.
Thus:
Sxy = ∑xy - ∑x x ∑y
n
Sxy = 26260 – 2153 x 1214
100
Sxy = 122.58
Sxx = ∑x^2 - ∑x x ∑x
n
Sxx = 62677 – 2153 x 2153
100
Sxx = 16322.91
Syy = ∑y^2 - ∑y x ∑y
100
Syy = 15152 – 1214 x 1214
100
Syy = 414.04
So overall:
r = Sxy
Sxx x Syy
r = 122.58
16322.91 x 414.04
r = 0.047 (3dp)
The coefficient of regression was worked out to be 0.047 correct to 3 decimal places. This depicts that the relationship is a weak correlation. Thus after proving there was no clear correlation I have also investigated and configured that there was a weak relationship between all the data used for this part of the hypothesis.
The next two pages have the replaced piece of data. The boys’ data is unchanged however the girl’s data has the one replaced piece of data that is enlarged. (Student 19 of the girls was replaced by student 39 of the girls)
Girls have an overall better exam results than boys
For this second part of my hypothesis I will be trying to prove that girls have better exam results than boys overall. Below are two tables depicting the girls’ total of the Key Stage Two exam results and the boys’ total of the Key Stage Two exam results. It is obvious I have used the extra method of stratifying my data into gender categories as they need to be compared.
Girls’ Exam Results:
Total of all the Key Stage Two Results = 603
Mean of all the Key Stage Two Results = 12.06
Median = 12 Mode = 12 & 13 Range = 9
Boys’ Exam Results:
Total of all the Key Stage Two Results = 610
Mean of all the Key Stage Two Results = 12.2
Median = 12 Mode = 12 Range = 9
Before making the histograms to show my results, it is compulsory to create frequency tables. Below are the frequency tables for both the girls and the boys.
(ER = Exam Results)
Girls:
Boys:
At first I was planning on comparing the class widths of my later to be produced histograms but then I realised that it was going to be difficult in helping me configure and prove this part of my hypothesis. Hence I decided to keep all the class widths the same in both the histograms. However I will be comparing the differences in the histograms. I have also decided to draw the results on the same axis so it is much clearer to read the comparisons of the two sets of data.
Overall, if looked closely it is obvious to see that overall, boys have better exam results. The first bar on the graph is the lowest exam results possible, and as you can see only the boys have students that fall under the category of getting low scores. For the next two bars the students who have the higher results are girls as you can see that the boys have lower frequency densities. It is only the last bar that boys have more students who have more than 15 for their exam. Even so, the histograms prove this part of the hypothesis that girls have more students with good results than boys and only a few boys managed to receive higher results than girls. Knowing that more girl students had better exam results in two of the histogram intervals I would partly accept this part of the hypothesis; however it is vital to include that in total of the exam results, the girls had a total of 603 whereas the boys had 610 differing by 7. Thus my theory is proved incorrect.
Overall I thought that only partly of my hypothesis was proved correct. Girls did have a better histogram of high results than boys. However unfortunately this was contradicted by the totals of the results. The other parts were both proven incorrect also regrettably. Despite this I am still optimistic that the results are unreliable and that my original hypothesis is correct despite the results. I feel this as firstly I based my hypothesis on a small number of 50 students from a fictional school. Although the figures may be real for some students it is still recognised that 50 students cannot represent a nation of students all around the world. If I was to improve this coursework, I would make sure I used primary data so I can fully trust the results I am using. I would also try to enlarge the population I research on to receive a much wider range of reliable results. A disadvantage although is that it is too time consuming and it still is not accurate enough to represent the whole country let alone the world.
*** For the results of the entire hypothesis it is vital to remember that I cannot be perfectly sure that my hypothesis is correct or incorrect overall as they were based upon a small amount of data from the whole of a FICTIONAL SCHOOL OF SECONDARY DATA called the Mayfield Database. The sampled out data could easily and coincidentally be the type of data to make the hypothesis erroneous or accurate. As mentioned before, one piece of data looked doubtful so I ignored it and replaced it with another piece of data for the last two parts of my hypothesis.
Boys watch more TV than girls
In this part of my hypothesis I will investigate whether boys do tend to watch more TV than girls. Below are tables for boys and girls showing the average amount of hours of TV watched in a week. I have enclosed stem and leaf diagrams and box plots as well as a population pyramid to present my results. Just as a reminder, I would like to include that these numbers are from the same 50 randomised girls and 50 randomised boys from the charts.
Average hours of TV watched in a week by girls:
Total of average hours of TV watched in a week = 1115
Mean of average hours of TV watched in a week = 22.3
Median = 20.5 Mode = 10, 20 & 30 Range = 49
Stem and Leaf Diagrams:
Stem Leaf
0 3 5 6 6 7 7
1 0 0 0 0 1 1 2 4 4 4 5 5 5 7 9
2 0 0 0 0 1 1 2 4 5 5 6 7 8 8 8
3 0 0 0 0 5 6 6 8
4 0 0 0 2
5 0 3
Lower Quartile → 11.5
Upper Quartile → 30
Average hours of TV watched by boys:
Total of average hours of TV watched in a week = 1005
Mean of average hours of TV watched in a week = 20.1
Median = 17 Mode = 20 Range = 49
Stem and Leaf Diagram:
Stem Leaf
0 1 4 7 7 9
1 0 0 0 0 2 2 4 4 4 4 4 4 5 5 5 6 6 6 7 7 7 8
2 0 0 0 0 0 0 0 1 3 4 5 5 8
3 0 0 0 0 5
4 0 0 8 8
5 0
Lower Quartile → 14
Upper Quartile → 25
Population Pyramid Chart:
(A = Average hours of TV watched in a week)
Looking at the box plots, it is acknowledged that both the distributions are positively skewed. In the box plots, it is noticeable that the girls had a larger and wider inter quartile range than the boys. From both the box plots and the population pyramid, I can identify that this part of my hypothesis is proved incorrect as overall girls are the one who watch more TV than boys. The total of the average hours of TV watched in a week for girls is 1115 whereas for the boys its 1005 that leaves the fact that out of 50 boys and 50 girls the fifty girls watched 110 hours more of TV than boys. On average also, the girls are predicted to have watched at least 22.3 hours of TV a week while the boys are predicted only 20.1 hours of TV a week on average. Hence I reject this part of my hypothesis also as it does not support my theory.