Data Handling - Planning - I intend to investigate the relationship between the number of hours of TV watched per week by students and their KS2 maths results - I think the more TV a student watches the less successful they will be.
Data Handling Project
Planning
I intend to investigate the relationship between the number of hours of TV watched per week by students and their KS2 maths results. I think the more TV a student watches the less successful they will be.
Hence, I expect a negative correlation i.e. the two sets of data will be inversely proportional.
Firstly, I will retrieve the relevant data i.e. gender, hours of TV watched and maths results from the spreadsheet provided. As there are on average 200 students in each year, years 7-11, almost over 1000 students, it would be difficult to analyse such large data. Therefore, I will pick one of the five year groups randomly and base my investigation on the selected year group.
I will sort the year group into two sub-groups according to their gender. I will then apply the method of systematic sampling to the data. This will make the data more represent able. I have randomly selected Year 9.
As there are 261 students in Year 9 and I intend to have a sample of 30 students I will therefore select every 8th student and randomly eliminate two, thus leaving me with a sample of 30 students.
261
30
= 8.7
261
32 -2 = 30
8
The collected data sample of 30 students is the raw data. This needs to be arranged into what is known as frequency distribution where like quantities are counted and displayed by writing down how many of each type there are i.e. writing down their frequencies.
I will use bar charts, as these are used for discrete data, to analyse the data about KS2 maths results comparing the results for males and females.
The average number of hours of TV watched per week, being continuous data, will be analysed by recording the results using histograms.
Year 9 Females
Hours of TV watched p/week
Tally
Frequency
Midpoint
0 ? t ? 7
II
2
3.5
8 ? t ? 14
IIII
4
1
5 ? t ? 21
IIII
5
8
22 ? t ? 28
III
3
25
29 ? t ? 35
0
32
36 ? t ? 42
I
39
KS2 maths results
Tally
Frequency
3
IIII
4
4
IIII I
6
5
IIII
5
6
0
Year 9 Males
...
This is a preview of the whole essay
II
2
3.5
8 ? t ? 14
IIII
4
1
5 ? t ? 21
IIII
5
8
22 ? t ? 28
III
3
25
29 ? t ? 35
0
32
36 ? t ? 42
I
39
KS2 maths results
Tally
Frequency
3
IIII
4
4
IIII I
6
5
IIII
5
6
0
Year 9 Males
Hours of TV watched p/week
Tally
Frequency
Midpoint
0 ? t ? 7
I
3.5
8 ? t ? 14
IIII
5
1
5 ? t ? 21
IIII
5
8
22 ? t ? 28
I
25
29 ? t ? 35
II
2
32
36 ? t ? 42
I
39
KS2 maths results
Tally
Frequency
3
0
4
IIII I
6
5
IIII III
8
6
I
Bar charts of results
The mode KS2 maths results for the males in my sample was higher than the mode results for females for this particular year.
The evidence from the sample suggests males, on average, scored higher in their KS2 maths results than females, for this particular year.
In order to support the above statement I will compare the mean, mode, median and range of the KS2 maths results for males and females.
Mean maths results
Mean maths result for females = 4.06
Mean maths result for males = 4.66
Mode maths results
Mode maths result for females = 4
Mode maths result for males = 5
Median maths results
Median maths result for females = 4
Median maths result for males =5
Range of maths results
Range of maths results for females = 2
Range of maths results for males =2
I have summarised these results in a table:
Maths results
Mean
Mode
Median
Range
Females
4.06
4
4
2
Males
4.66
5
5
2
Stem and leaf diagrams
Year 9 Females
Stem
Leaf
Frequency
0
,6
2
0,4,4,6,7,7,8
7
2
,1,2,4,4
5
3
9
4
Year 9 Males
Stem
Leaf
Frequency
0
4,8
2
0,0,0.5,2
4
2
0,0,0,0,1,2
6
3
0,0
2
4
2
Averages
Hours of TV watched (hrs)
Mean
Modal class interval
Median
Range
Females
8
0-20
7
38
Males
9
20-30
20
38
From the Year 9 sample, the mean, modal class interval and median were higher for males than for females. The difference in values for these measures for males and females for my Year 9 sample was not too big.
The modal class interval shows that on average males watched more hours of TV yet scored higher grades in their maths results. The range for both males and females for the hours of TV watched was the same. This refutes my original hypothesis for my sample of students from Year 9.
My frequency polygon shows that males watched more TV yet scored higher than females, in the same year group. However, there are points on the graph, which are common to both genders. In particular I would point out that the same number of students from either gender watched 18 hours of TV to achieve the same results.
Scatter diagram for males and females:
From my results I have drew scatter diagrams for males and females separately, SD1 and SD2 respectively, for my selected year group, Year 9. I found it difficult to find any correlation and hence I decided to draw them collectively to see if I could establish any correlation.
Mixed population scatter diagrams:
Even at this stage, I find it difficult to establish any strong correlation. However, I have tried to draw a line of best fit as shown on the graph, SD3. The line drawn shows a positive correlation with a small gradient indicating small differences in the relationship between the hours of TV watched and the results obtained by the gender.
The positive gradient implies positive correlation. This could be because I am only analysing a small group from a particular year and hence the results could be slightly biased.
Cumulative frequency graphs are used to compare different data. It is easier to compare the results if the graphs are drawn on the same axis.
Cumulative frequency axis can be divided into 100 points. The halfway point is the median, the upper quartile is at the 75% point and the lower quartile is at the 25% point. The quartiles are particularly useful in finding the central 50% of the range of distribution. This is known as the interquartile range.
Interquartile range = upper quartile - lower quartile
Interquartile range is an important measure of spread. It shows how widely the data is spread. If the interquartile range is small, the middle half of the distribution is bunched together.
Using my frequency distribution tables I will now draw up cumulative frequency tables for hours of TV watched and KS2 maths result, as shown below:
Cumulative frequency
Hours of TV watched
Females
Males
Mixed
7
2
3
14
6
6
2
21
1
1
22
28
4
2
26
35
4
4
28
42
5
5
30
KS2 maths results
Cumulative frequency
(up to and including)
Females
Males
Mixed
3
4
0
4
4
0
6
6
5
5
4
29
6
5
30
Hours of TV watched
Median
Lower Quartile
Upper Quartile
Interquartile Range
Mixed
6
1
22
1
Males
5.5
1
22
1
Females
6.25
0
21.5
1.5
KS2 maths results
Median
Lower Quartile
Upper Quartile
Interquartile Range
Mixed
3
3
4
Males
4
3
4
Females
3
3
4
From my cumulative frequency curves (GRAPH1) for the number of hours of TV watched, the values for mean and interquartile range for all three curves are very close to each other i.e. with a very small distribution.
Cumulative frequency curves for hours of TV watched (GRAPH1), for males and females, have a number of common points. This gives the same percentage of hours of TV watched for both genders between these points.
Although the number of hours of TV watched by males is only marginally higher than that by females the percentage of males watching 21 to 28 hours of TV is:
2 - 11
x 100
5
= 6.7%
and females is:
4 - 11
x 100
5
= 20%
From my cumulative frequency graph GRAPH2 I can calculate the percentage of boys achieving level 4 and above
5 - 6.25
x 100
5
i.e.
= 58.3%
5 - 10
x 100
5
whereas =33.3% of females achieved level 4 and above
above
5 - 14
x 100
5
Only =6.7% of the males achieved a level 6
Review
I started with a hypothesis stating that the more time spent watching TV the less successful students would be in their KS2 maths results. Though this is a logical line of enquiry my analysis of the selected data refute this hypothesis.
It was difficult to establish any strong correlation from the scatter diagrams. I have a positive correlation from my scatter diagrams however the gradient of the line of best fit is small, indicating low positive correlation. This could be due to my data being secondary and group sample selective and small. Hence, my results maybe slightly biased.
Considering my analysis for my original selected group i.e. Year 9 and my hypothesis being refuted I extended my project and drew a scatter diagram for a random sample of 60, from the whole school.
Apart from a couple of results at the top of the level 5 column, which are also quite dispersed, I think I have a negative correlation for the overall data. The gradient in this case is more distinctively negative than it was for the first set of data. They are also opposite to each other i.e. the gradient for the line of best fit for Year 9 was positive, while the gradient for the overall group is negative, confirming my hypothesis.
- 25 -
- 25 -