DATA HANDLING COURSEWORK

Authors Avatar

DATA HANDLING COURSEWORK

In this data handling coursework, I will be investigating the relationship between the heights and weights of pupils at Mayfield high school.

Mayfield high school is a fictitious high school with 1183 students. The information I have received from the Edexcel website, is however, based on a real school

If I am to consider the height and weight at Mayfield, I will need the following categories from the data provided:

  1. Height
  2. Weight
  3. Year Group
  4. Gender

I will split the coursework into 3 different lines of enquiries. These are: -

  1. Relationship between the heights and weights of students without considering any factors.
  2. Relationship between the heights and weights of students considering age.
  3. Relationship between the heights and weights of students considering gender.

I am investigating the relationship between the heights and weights of students because both variables are quantitative; therefore it is more logical to find a relationship between them.

I am investigating how age and gender affect the height and weight, and I am also investigating these factors to consider whether they affect the accuracy of my samples.

Due to the fact that taking any factors into account increases the accuracy of my analysis I will be able to focus on smaller sample sizes when investigating the data in strata.

I will use the method of stratified sampling because it takes into account all students of the different age ranges and genders from the school giving each pupil as equal chance as possible, so that the analysis can be as accurate and reliable as possible. There is no need to perform a random sample and a stratified sample as well because a random sample can cause variations in biased ness of the sample because this method does not ensure that each year group or gender has an equally likely chance of being selected, and this will affect the reliability of my analysis because two of my lines of enquiry, the relationship between height and weight when considering age and when considering gender, will not be as accurate as they would be in a stratified sample.

I will be taking a 5% sample for each stratum as this size is sufficient enough to allow me to make reliable conclusions, and the sample size is not too large to cause difficulty in analysis of the sample. But for Y11 I will do a 10% sample as a 5% sample is too small to provide a reliable and accurate conclusion because there are different numbers of students in each year or gender, which means that the chance of a certain year group or gender being selected will vary, i.e. ‘Year 7’ has more students than ‘Year 11’, therefore Year 7 has a higher probability of being selected than Year 11.

When I am taking the random sample I will split the data into strata depending on year group and gender, then I will use the random function on the calculator to randomly select the students within each strata.

According my knowledge, the more factors that are taken into consideration the stronger the correlation will be. Therefore I will first do a scatter graph considering gender and age, and then I will do a scatter graph excluding all factors. Both graphs will use the same sample data to provide a reliable comparison between the graphs. I will minimize the biasedness as much as possible by taking as much equal sizes of samples as possible from each year group and gender, excluding none of the strata. This will provide reliable conclusions for the whole school. I tried to keep samples sizes of all strata as equal as possible; I have done this to allow myself to make an effective comparison between the strata as it is only logical to compare data with similar sample sizes as accuracy of samples would differ with size.

I will round my calculations for heights and weights to 0d.p. because it is illogical to process information that is more accurate than the data it is based on, although this will make my predictions less accurate.  

I will use scatter graphs to analyse the type of correlation there is between height and weight for each year group and gender. I will use these graphs to predict what the weight or height of a student would be.

I will use cumulative frequency graphs to make comparative generalised statements about heights and weights of students across all of the strata. The cumulative frequency graphs allow you to predict percentages of students within a given range.

Join now!

I will use box plots to derive how dispersed the data is, how varied the data is. This will allow me to clear relationships between the samples strata.

I will use measure of spread to compare the sample data considering with the same sample but this time excluding all factors.

First line of enquiry

The first line of enquiry is the taller the person the more the person weighs. I will pick a sample of 60 pupils. This is because I think that 60 pupils will be enough to represent the population and it is ...

This is a preview of the whole essay