I then drew stem-and-leaf diagrams for each column so that I could order the data and I could get a rough idea of the skew and so that I could find the median, the upper quartiles, the lower quartiles and the mode:
TV IQ
Weight
From the stem-and-leaf diagrams I can see that both TV and weight seem to be positively skewed which may suggest a correlation because they have roughly the same shape of distribution. Also I can see that in both the weight and TV data there is one single data value that is quite a lot bigger than the rest (outliers) and there is also an outlier in the IQ data but it is not as far away as in the other two. From the stem-and-leaf diagrams I can also see that the IQ data seems to be negatively skewed, but only slightly. And it also seems to be closer to the same shape as the TV data which may mean that TV and IQ have a stronger correlation which is the opposite of what I hypothesized.
Using the stem-and-leaf diagrams I found the median, the mode, the upper quartile and the lower quartile of each set of data:
I then found the inter-quartile range by subtracting the lower quartile from the upper quartile:
TV: 20-10=10
IQ: 105-98.5=6.5
Weight: 53-45=8
I then drew up a box plot for each set of data to get a clear view of how the data was spread using this information:
See graph paper 1 attached.
From the box plots I can see that the IQ data is positively skewed. It also has quite a small range compared to the other two data groups. The weight data could be either positively skewed or negatively skewed from the box plot and it has a very large range, the biggest out of the three. The TV data is almost symmetrical but it was a very big range even though most of the data is spread quite evenly otherwise. The TV and weight seem to be of a more similar shape than the IQ and TV. The range is so large for both TV and weight because of outliers. I then found out the outliers by finding all values greater or lower than 1.5 times the IQR:
TV: IQR=10x1.5=15 LQ=10-15=-5 - no small outliers UQ=20+15=35 – 40, 50 and 90
IQ: IQR=6.5x1.5=9.75 LQ=98.5-9.75=88.75 – 86 UQ=105+9.75=114.75 – 117 and 131
Weight: IQR=8x1.5=12 LQ=45-12=33 – no small outliers UQ=53+12=65 – 67, 70, 73, 80 and 140
From this I can see that weight has the most outliers so it is the least accurate and the most spread out. This may make it difficult to compare it with TV because the outliers will affect the calculations.
I then drew up histograms for each set of data to see the shape of distribution clearly. To draw the histograms I needed to find out the frequency density, which is the frequency, divided by the class width. I chose classes of equal intervals of 10 except for the last class in each set because of the outliers:
TV:
IQ:
Weight:
Using the information from these tables I then drew up the histograms:
See graph paper 2 and 3 attached.
From the histograms I can see that it seems like the TV and weight data are both positively skewed and the IQ data seems to be negatively skewed, which is what I could see from the stem-and-leaf diagrams. This might suggest that there is a link between TV and weight. The IQ and TV histograms are closer to the same shape though so this might also mean there could be a link between them.
I then calculated the skew of each data group to check if what I had observed was correct. I did this by doing 3(mean-median)/standard deviation to see if the box plots gave an accurate representation:
TV: 3(18.7-15)/14.78=0.751
IQ: 3(102.05-100.5)/7.65=0.608
Weight: 3(53.53-50)/16.44=0.644
This tells me that all of the data groups are quite positively skewed. It tells me that the TV data group has the strongest positive skew and the IQ data has the weakest, though they are all quite strong. The TV data has the highest skew and it is quite far from the other data sets so this may mean that there is not as much of a connection between either TV and weight or TV and IQ as I thought there may be.
I then found the mean of each set of data by adding up all the values and dividing them by the number of total number of values so that I could work out the standard deviation:
TV: 784/40=18.7
IQ: 4082/40=102.05
Weight: 2141/40=53.525
Then I calculated the standard deviation to find out how far the data was spread on average from the mean:
TV IQ Weight
Using the formula x²/n-x²
22730/40-18.7²=14.18 418908/40-102.05²=7.65 125403/40-53.535²=16.44
From this I can see that the weight data is the most spread out and the IQ data is the least spread out. The TV data is only slightly less spread out than the weight data. The more spread out the data is, the less reliable it is. The weight and TV have roughly the same standard deviation, or at least closer than the IQ’s standard deviation, which may mean that there is a link between TV and weight which is what I hypothesized.
I then created a table showing all these results and my previous results about the median, mode, LQ, UQ and IQR for easy reference and drawing of box plots:
I then drew scatter graphs to see if there was any correlation between the data:
See graph paper 3 and 4
From the scatter graphs I can see that the two sets of data I am comparing seem to have almost no correlation at all. Both TV and weight and TV and IQ seem to be pretty randomly placed. There seems to be a very slight positive correlation between the TV and weight, and a very slight negative correlation to TV and IQ which is what I predicted, though it is very weak.
Then I did Spearman’s Rank to check the correlation that I had found on the scatter graphs:
Using the formula 1-6 d²/n(n²-1):
1-(6x10360.5)/40(40²-1)=0.028
From this I can see that there is almost no correlation between the two sets of data. There is only an extremely weak positive correlation which means that the higher the weight, the more TV watched by this is still a very weak correlation.
1-(6x11495.5)/40(40²-1)= -0.078
From this I can see that there is almost no correlation at all between the two sets of data. There is an extremely weak negative correlation. Although this correlation is slightly stronger than the correlation between TV and weight it is still very weak. It is a negative correlation between IQ and TV so it means that the higher the IQ the less TV is watched which is what I hypothesized.
Conclusion
From my results I have found that there is only a very slight positive correlation (0.028) between weight and the amount of TV watched. I have also found a very slight negative correlation (-0.078) between IQ and amount of TV watched. My original hypothesis that the more TV people watch the more they weigh is not proven by the data I have collected because I did not find any strong links between the two sets of data. My other hypothesis that the higher someone’s IQ, the less TV they watch was also not proven by my results because again there was no strong evidence of any link between the two sets of data.
There are many other factors which affect my results because if for instance a person with a high IQ watches a lot of TV they might be watching information programs so it depends on what type of TV is watched as well as the hours spent watching. Also someone may watch a lot of TV but they may also play a lot of sport, which might help them to stay fitter and weigh less so the data is very inconclusive. I may have been able to get more conclusive results if I had been able to use a larger sample, but because I am working alone with a limited amount of time I could only use a fairly small sample. Also the data I used was secondary data which I did not collect myself. The data may have been inaccurate or even false because there were some ridiculous quantities within the sample that might not have been correct. If I were to get more accurate results I would collect the data myself, although this would take a long time if I want to collect a large enough sample to make the investigation worthwhile.
I think it would be interesting to investigate the link between IQ and key stage 3 results and the link between amount of TV watched and key stage 3 results because it would be interesting to see if TV helps pupils to get better marks or stops them. I think it would also be interesting to investigate the link between weight and height because I think there should be quite strong positive correlation between the two.