Maths Statistics Coursework

Authors Avatar

Maths Statistics Coursework

An Investigation into the Relationship between Various Statistics for a School Census

Aims/Hypotheses

  • I am going to try and find out whether on average, boys in year 9 of school will be taller than girls in year 9 of school. I predict that on average boys in year 9 will be taller than girls in year 9, because on average, males are taller than females.
  • I am going to try and find out whether on average, children in year 7 will have smaller feet than children in year 11. I predict that children in year 7 will have smaller feet than children in year 11, because as you grow older your feet become larger.
  •  I am going to find out whether there is a correlation between the circumference of a child’s wrist and the circumference of their thumb. I predict that there will be a weak positive correlation between the two, because if you have large wrists you are likely to have large thumbs as well.

My Plan

I am going to use all secondary data for this investigation, because collecting the data myself to get primary data would be far too time-consuming and I would have design a questionnaire to do this. All of my data will be obtained from a website, www.censusatschool.ntu.ac.uk, which is an international survey of school children. I have used the random data selector provided by the site to select some sample data (about 2500 records) to use and transferred this to a spreadsheet. Since my data is all secondary, there is no need for me to do any pilot tests to check that the questionnaire works, as I do not have a questionnaire. I will be using 50 pieces of data from the sample for the testing of each hypothesis.

After I have randomly selected the 50 records that I am going to study from each set of data, I will tabulate them and find such things as the inter-quartile range and the median to allow me to draw box-plot charts, which will help me take out any pieces of data that are too far from the rest and could unfairly change my results, so my investigation will be more reliable and accurate. I will remove these outliers from my data table and use the remaining data to test my hypotheses in the ways shown below:

For my first hypothesis, I will divide the height values into groups of varying sizes and use these groups to draw a pair of frequency density histograms and probability distribution curves. I will also calculate the mean of both the raw and grouped data for both boys and girls to try and quantitatively prove a relationship and show whether I was correct in thinking that on average year 9 boys are taller than year 9 girls.

For my second hypothesis, I will divide the foot size values into groups and use these groups to draw a pair of comparative pie charts. I will also calculate the mean of both the raw and grouped data for both year 7 and 11 to try and quantitatively prove a relationship and show whether I was correct in thinking that on average year 7 students will have smaller feet than year 11 students.

For my third hypothesis, I will firstly draw a scatter diagram of the wrist circumference against the thumb circumference, and then attempt to calculate the correlation between the two sets of data, so I can draw a line of regression on my graph. To do this I will calculate the mean, median, mode and range of each set of data and use these values to calculate the inter-quartile range and the standard deviation to show the spread of the data. With these results I will calculate the Product Moment Correlation Coefficient of the data to determine if there is a correlation, and whether the correlation is significant. If there is a correlation, I will be able to draw a line of regression on my graph and analyse the relationship between the data.

When I have analysed the data and prove or disprove my hypotheses I will write a conclusion to my investigation and discuss limitations and possible extensions and improvements that could be done in the future.

Sampling Methods

I am going to be using the systematic method of sampling, as the data in my spreadsheet is already in a random order and systematic sampling is a fair way of choosing which pieces of data I use. I will be selecting ever 2nd record working from the top of my spreadsheet then going downwards, until I have 50 pieces of data.

Join now!

Other sampling methods that I could have used include simple random sampling and stratified random sampling. For simple random sampling, I could have given every record an assigned number, and then used a random number generator to select them, ignoring any repeated numbers. For stratified random sampling, I could have used a field such as gender or Year to split the overall data into groups, and selected the same amount of records from each group.

Sampling the Data

As stated above I am going to use the systematic method of sampling, and I will choose every 2nd record working ...

This is a preview of the whole essay