Statistics Coursework

Statistics Coursework I am going to be using the data from the Mayfield high school to investigate my following hypothesis. Hypothesis 1st Hypothesis - For my first hypothesis I will investigate the relationship between the number of TV hours watched per week by the pupils against their IQ. I am going to use the columns "IQ" and "Average number of hours TV watched per week" taken from the Mayfield high datasheet. I think that there will be a relationship between them and will attempt to reveal it. 2nd Hypothesis - For my second hypothesis I will investigate the relationship between "Average number of TV hours watched per week" and "weight (kg)". I think that there will not be any major relationship between as they will not affect each other greatly. I will present my analysis and the results in graphs and tables and explain the results using the correlation of the graphs and arrangements of the figures. I will select a number of pupils to base my data on and will use random sampling to ascertain the correct number of male and female pupils needed to make the investigation fair. Stratified Sampling I do not want to use all of the data in the database for my analysis so I will need to take a sample of the number of people in the school. ...read more.


Although there is 1 area where the data is concentrated and the gradient very steep, between 95-105. The TV hours graph is much smoother and the data less spread. The data number of hour's increases steadily to a certain point then it goes flat until the end. This means that there is a n anomalous result somewhere. I know that it can only be 1 or 2 anomalous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now look at the box plots to compare the two cumulative frequency graphs. Box plots for cumulative frequency graphs of IQ and number of TV hours watched for females: The box plots for these graphs show me that the IQ data has a much larger range and that it is quite evenly spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a few exceptions as 1 pupil is likey to have a very low IQ which is why the lowest value is so low. The TV hour's data seems to be much more concentrated and the data is generally lower. This shows that there can't be any relationship between them as they each grouped in certain areas. ...read more.


The cumulative frequency graphs and box plots again proved my hypothesis incorrect, the similarities in the two sets of data's box plots showed that there was no relationship and showed why the scatter graphs showed a straight line. Both the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was still wrong. Hypothesis 2: My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequency and later the inconsistency with the female sample. The female scatter graph showed a near horizontal trend line which was what I needed to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct. Evaluation The investigation went quite well although my first hypothjesis was incorrect it showed that careful analysis of data is needed before drawing conclusions. When I next do an investigation into data I will use histograms to aid me in my analysis as they come in useful when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a little better as the program I used did not put a scale on the x axis but only the length of the range. ...read more.

