Question order and bias –
The order in which questions are asked can influence a respondent’s reply.
Contrast:
- Do you intend to be an organ donor?
- Did you know that dozens of people die each year because there are not enough organ donors?
with:
- Did you know that dozens of people die each year because there are not enough organ donors?
- Do you intend to be an organ donor?
I will avoid being bias. These are things I will need to take under consideration:
a) lack of a good sampling frame:
- using the telephone directory misses all those who do not have a telephone or whose number is ex-director.
b) non-response by some of the chosen units:
- the enquiry may not have been understood, for example a questionnaire may be badly designed. Questionnaires should be clear, specific, unambiguous and easily understood. Questions should be worked neutrally in opinion surveys, to avoid bias caused by pointing towards a particular response
c) bias introduced by the person conducting the survey:
- I will not ask questions to those who do not appear to be co-operative,
- The interviewer’s style of questioning may influence the response.
To begin with I will devise a questionnaire for Dunraven students and run a pilot to check that my questions are understandable and going to give me the necessary information. Before using the questionnaire it is essential to make sure that it ‘works’. Are there any ambiguous questions? Are there closed questions that cause trouble because a possibility has been overlooked? The pilot study uses the entire questionnaire with a small number of people who need not to be chosen in any scientific way. The aim is simply to find and overcome any difficulties before the real questionnaire.
Questionnaire:
- What is your gender?
Male Female
- What is your natural hair colour?
Black Brown Blonde Ginger Other (please state)…….….
- What year group are you in?
Seven Eight Nine Ten Eleven
- What is your English key stage 2 result?
0 1 2 3 4 5 6
- What is your Science key stage 2 result?
0 1 2 3 4 5 6
- What is your Maths key stage 2 result?
0 1 2 3 4 5 6
- What is your IQ?
………………………………
- Do you watch television?
Yes No
- On average how many hours of T.V do you watch in a week?
…………………………………
-
What is your height in metres?
1.00 < h < 1.10 3
1.10 < h < 1.20 0
1.20 < h < 1.30 12
1.30 < h < 1.40 20
1.40 < h < 1.50 126
1.50 < h < 1.60 297
1.60 < h < 1.70 228
1.70 < h < 1.80 98
1.80 < h < 1.90 10
1.90 < h < 2.00 0
This is a very good example of a cumulative frequency graph between frequency and height. I have used the results that I have obtained from my questionnaire I designed. It seems that the results are accurate. I will use the data I have obtained to investigate the hypothesis. When I assessed the pilot questionnaires, I made no alterations as I thought the questions were answered as I had forecast. This resulted in the final questionnaire being given out to 70 pupils.
P.T.O for collected data.
To improve my results table for my convenience, I have added an extra column in the data, which is the mean of key stage 2 results. I did this by adding all the results of Maths, English and Science of each person and then divided it by 3. Excel allowed me to do this much quicker by entering a formula.
Tally Charts and frequency distributions
The data are discrete since all the results are whole numbers. It is easy to read of this method of interpretation, but not easy to see what results are most common. A tally chart provides a sample summary.
The tally chart is constructed on a single ‘pass’ through the data. For each score a vertical stroke is entered on the appropriate row, with a diagonal stoke being used to complete each group of five strokes. This is much easier than going through that data counting the number of occurrences of English and then repeating this for each individual score.
- counting the tallies is made easy by using the ‘five-bar gates’.
- If the tallies are equally spaced then the chart provides a useful graphical representation of the data.
The tally count for each outcome is called the frequency of that outcome. The set of outcomes with their corresponding frequencies is called a frequency distribution, which can be displayed in a frequency table.
Table of results
To present this data, I have used a bar chart. I found that it is easy to compare the data this way; here we can see that there is a great difference between the amount of people who got a level 4 (an average level) in English and maths and a level 3. More people got levels 4s in English than Maths however; more people hot a level 3 in maths than in English. There is about the same number of people who got a level 2 or 6.
We can see that there seems to be more of a relationship with science and English, rather than maths and English. There is a similar relationship with all SATS Levels in this bar chart with the comparison of English and Maths, apart from the level 3. This partially proves that my hypnotises is correct so far.
There is more of an equal distribution with the amount of SATS Levels. There is not much of a difference between the compared bars of science and maths as this furthermore proves that my hypothesis is correct. People who are good in Maths are more likely to be good in Science as well, rather than English.
Does the Average hours of TV Watched per week affect a persons Weight?
Is the increase in the amount of television watched by pupils in Dunraven reflected by their size, in weight? E.g. does the heaviest person in my results watch the most TV? What are the most common hours of TV Watched and is there a concentrated result of the same weight shown here?
To get my data and graphs I am going to sort the results into the amount of hours of TV Watched in decreasing order, so that I can then draw up my Graphs. I am going to leave the Gender of the people in my data selection so that I can use this to help me make a more precise and accurate reading from my results in my Conclusion and Analysis of data. The graphs that I am going to draw are going to be:
· Scatter Diagram to show the relationship.
The results from this table and graph show me that there is no real relationship between the heaviest person and the fact that they watch the most television, as predicted in my Hypothesis. This is shown from the fact that the heaviest person watches only 20 Hrs of TV and the person who watches the most TV weighs only 58Kg. Because it is very hard to draw an accurate trend line I will have to work out to draw a regression line.
Linear correlation and regression lines – y = f(x) is a straight line. If all the points in the scatter diagram seem to lie near a straight line, we say that there’s linear correlation between x and y. I found it difficult to try to estimate fairly accurately the position of this line, which is called the regression line.
Drawing a regression line ‘by eye’
(X, Y) = mean
(X, Y) where X = total of x/n
and Y = total of y /n
I will then draw a line of best fit ensuring that it passes through (X,Y)
X = 1819/100
= 18.19
Y = 5445/100
= 54.45
Therefore, (X,Y) = (18.2, 54.5)
(Look above at graph for the drawn regression line)
However if I look at my Scatter Diagram I can see that, ignoring the anomaly, there does seem to be some weak negative correlation. However from looking at the Scatter Diagram even more I can see that this weak positive correlation isn’t coming from the fact that the heaviest person watches the most TV but more like the less TV that is watched means that they are of an average weight. This seems to apply until we get to above 20 Hrs of watching TV a week where the results seem to spread up and down. I think that this signifies that when the hours of TV is increased some peoples weight does either increase or decrease from the original concentrated area (average weight).
There was one anomaly in my data and this was a male who watched 100 hours of TV a week and Weighed 58 Kg. The reason that I think this is an anomaly is because it is literally impossible to watch this amount of TV a week when the person has to attend school and complete homework of an evening. Also the next person down in my sample only watches 48 hours of TV, so there is a difference range of [68 – 48] 20Hours.
The Conclusion to my analysis of data is that my Hypothesis is not true. However from conducting this research and graphs I think that it is evident that the less TV is watched the more that person is of an average weight.
To take this matter further, I will now see if I get similar results from doing the same hypotheses but with secondary data. This is taken from The Mayfield High School Project.
To collect the data I chose to first use stratified sampling to find out how many people from each year group I should use to get a fair representation. I wanted a sample of 40 pupils, as this is a large enough group to analyse but is not too big to manage. Then I used random sampling by using the random number button on my calculator to pick the correct number of pupils from each year level.
To analyse the data I used a number of methods:
Analysis
Sampling:
To choose my sample I have decided to use stratified sampling because this gives a fairer representation of the population. To do this I will divide the number of pupils in the year level by the number of pupils there are all together. Then I will multiply it by the amount of people I want in my sample, 40, because this is a large enough sample without being too big to handle. I am using secondary data because I did not collect this information myself, but I am using it because it was an easily accessible data source.
There were 1183 pupils altogether
Yr7 282/1183x40=9
Yr8 270/1183x40=9
Yr9 261/1183x40=9
Yr10 200/1183x40=7
Yr11 170/1183x40=6
40
Now that I knew how many pupils I need from each year level I used random sampling to pick the correct number of pupils from each year level. I did this by using the random number button on my calculator:
I then drew stem-and-leaf diagrams for each column so that I could order the data and I could get a rough idea of the skew and so that I could find the median, the upper quartiles, the lower quartiles and the mode:
TV IQ
Weight
From the stem-and-leaf diagrams I can see that both TV and weight seem to be positively skewed which may suggest a correlation because they have roughly the same shape of distribution. Also I can see that in both the weight and TV data there is one single data value that is quite a lot bigger than the rest (outliers) and there is also an outlier in the IQ data but it is not as far away as in the other two. From the stem-and-leaf diagrams I can also see that the IQ data seems to be negatively skewed, but only slightly. And it also seems to be closer to the same shape as the TV data which may mean that TV and IQ have a stronger correlation which is the opposite of what I hypothesized.
Using the stem-and-leaf diagrams I found the median, the mode, the upper quartile and the lower quartile of each set of data:
I then found the inter-quartile range by subtracting the lower quartile from the upper quartile:
TV: 20-10 =10
IQ: 105-98.5 =6.5
Weight: 53-45=8
I then drew up a box plot for each set of data to get a clear view of how the data was spread using this information:
See graph paper 1 attached.
From the box plots I can see that the IQ data is positively skewed. It also has quite a small range compared to the other two data groups. The weight data could be either positively skewed or negatively skewed from the box plot and it has a very large range, the biggest out of the three. The TV data is almost symmetrical but it was a very big range even though most of the data is spread quite evenly otherwise. The TV and weight seem to be of a more similar shape than the IQ and TV. The range is so large for both TV and weight because of outliers. I then found out the outliers by finding all values greater or lower than 1.5 times the IQR:
TV: IQR=10x1.5=15 LQ=10-15=-5 - no small outliers UQ=20+15=35 – 40, 50 and 90
IQ: IQR=6.5x1.5=9.75 LQ=98.5-9.75=88.75 – 86 UQ=105+9.75=114.75 – 117 and 131
Weight: IQR=8x1.5=12 LQ=45-12=33 – no small outliers UQ=53+12=65 – 67, 70, 73, 80 and 140
From this I can see that weight has the most outliers so it is the least accurate and the most spread out. This may make it difficult to compare it with TV because the outliers will affect the calculations.
I then drew up histograms for each set of data to see the shape of distribution clearly. To draw the histograms I needed to find out the frequency density, which is the frequency, divided by the class width. I chose classes of equal intervals of 10 except for the last class in each set because of the outliers:
TV:
IQ:
Weight:
Using the information from these tables I then drew up the histograms:
See graph paper 2 and 3 attached.
From the histograms I can see that it seems like the TV and weight data are both positively skewed and the IQ data seems to be negatively skewed, which is what I could see from the stem-and-leaf diagrams. This might suggest that there is a link between TV and weight. The IQ and TV histograms are closer to the same shape though so this might also mean there could be a link between them.
I then calculated the skew of each data group to check if what I had observed was correct. I did this by doing 3(mean-median)/standard deviation to see if the box plots gave an accurate representation:
TV: 3(18.7-15)/14.78=0.751
IQ: 3(102.05-100.5)/7.65=0.608
Weight: 3(53.53-50)/16.44=0.644
This tells me that all of the data groups are quite positively skewed. It tells me that the TV data group has the strongest positive skew and the IQ data has the weakest, though they are all quite strong. The TV data has the highest skew and it is quite far from the other data sets so this may mean that there is not as much of a connection between either TV and weight or TV and IQ as I thought there may be.
I then found the mean of each set of data by adding up all the values and dividing them by the number of total number of values so that I could work out the standard deviation:
TV: 784/40=18.7
IQ: 4082/40=102.05
Weight: 2141/40=53.525
Then I calculated the standard deviation to find out how far the data was spread on average from the mean:
TV IQ Weight
Using the formula x²/n-x²
22730/40-18.7²=14.18 418908/40-102.05²=7.65 125403/40-53.535²=16.44
From this I can see that the weight data is the most spread out and the IQ data is the least spread out. The TV data is only slightly less spread out than the weight data. The more spread out the data is, the less reliable it is. The weight and TV have roughly the same standard deviation, or at least closer than the IQ’s standard deviation, which may mean that there is a link between TV and weight which is what I hypothesized.
I then created a table showing all these results and my previous results about the median, mode, LQ, UQ and IQR for easy reference and drawing of box plots:
I then drew scatter graphs to see if there was any correlation between the data:
See graph paper 3 and 4
From the scatter graphs I can see that the two sets of data I am comparing seem to have almost no correlation at all. Both TV and weight and TV and IQ seem to be pretty randomly placed. There seems to be a very slight positive correlation between the TV and weight, and a very slight negative correlation to TV and IQ which is what I predicted, though it is very weak.
Then I did Spearman’s Rank to check the correlation that I had found on the scatter graphs:
Using the formula 1-6 d²/n(n²-1):
1-(6x10360.5)/40(40²-1)=0.028
From this I can see that there is almost no correlation between the two sets of data. There is only an extremely weak positive correlation which means that the higher the weight, the more TV watched by this is still a very weak correlation.
1-(6x11495.5)/40(40²-1)= -0.078
From this I can see that there is almost no correlation at all between the two sets of data. There is an extremely weak negative correlation. Although this correlation is slightly stronger than the correlation between TV and weight it is still very weak. It is a negative correlation between IQ and TV so it means that the higher the IQ the less TV is watched which is what I hypothesized.
Conclusion
From my results I have found that there is only a very slight positive correlation (0.028) between weight and the amount of TV watched. I have also found a very slight negative correlation (-0.078) between IQ and amount of TV watched. My original hypothesis that the more TV people watch the more they weigh is not proven by the data I have collected because I did not find any strong links between the two sets of data. My other hypothesis that the higher someone’s IQ, the less TV they watch was also not proven by my results because again there was no strong evidence of any link between the two sets of data.
There are many other factors which affect my results because if for instance a person with a high IQ watches a lot of TV they might be watching information programs so it depends on what type of TV is watched as well as the hours spent watching. Also someone may watch a lot of TV but they may also play a lot of sport, which might help them to stay fitter and weigh less so the data is very inconclusive. I may have been able to get more conclusive results if I had been able to use a larger sample, but because I am working alone with a limited amount of time I could only use a fairly small sample. Also the data I used was secondary data which I did not collect myself. The data may have been inaccurate or even false because there were some ridiculous quantities within the sample that might not have been correct. If I were to get more accurate results I would collect the data myself, although this would take a long time if I want to collect a large enough sample to make the investigation worthwhile.
I think it would be interesting to investigate the link between IQ and key stage 3 results and the link between amount of TV watched and key stage 3 results because it would be interesting to see if TV helps pupils to get better marks or stops them. I think it would also be interesting to investigate the link between weight and height because I think there should be quite strong positive correlation between the two.