Edexcel GCSE Statistics Coursework

Authors Avatar

Edexcel GCSE Statistics 1389

PLANNING SHEET – MAYFIELD HIGH

Student Name: Anya Sweilam                           Class: 11H3



This investigation is based upon the students of Mayfield High School, a fictitious school- there are 1182 students at Mayfield presented within 13 categories. I will be investigating the relationship between height and weight and how these statistics differ between females and males. I have chosen to look at height and weight mainly because in this line of enquiry my data will be numerical and continuous, meaning that I will be able to produce a more detailed analysis. For example, if I had chosen to look at eye colour and hair colour my analysis would be limited and therefore my investigation may be imprecise. 

My aim in this investigation is to query whether or not there is a correlation between height and weight and find out if this varies between genders. I believe that as a student becomes taller their weight will increase; due to this assumption I expect a graph of weight and height to show a rising trend. Listed below are my hypotheses.

The height and weight of a person is affected by their age and gender. I assume that in years 7-9 girls will generally be taller than boys- this is because girls tend to grow faster than boys during the early stages of development. Boys will, however, eventually grow taller and so in years 10-11 it can be assumed the number boys taller than girls will be greater. This also applies to adults aged 20 and above. As for the weight, boys are generally heavier than girls; this is due to their body structure. I, therefore, predict that my results will produce a pattern which shows that boys weigh more than girls. The data, however, does not include any external factors which may influence the results of the investigation, for example the dietary habits or quantity of exercise that the students do. This will affect the students’ weight regardless of their height and gender.

Listed below are my hypotheses:

  1. Females are generally shorter and weigh less than males.

I will use a scatter graph to demonstrate whether or not the two sets of data are linked and will draw a line of best fit, showing the trend of the data and allowing me to make predictions. Scatter graphs show the correlation between data sets and the measure of strength between them. These are useful because they show the linear correlation as well as showing if the two data sestets are related. I will then calculate the cumulative frequency of the data, and from this produce two cumulative frequency graphs which will allow me to make a direct comparison between the heights and weights of males and females. It will also allow me to calculate the median, upper quartile, lower quartile and inter-quartile range which will lead me to drawing box plots which will assist me in spotting any outliers or miscalculations and help me to assess the spread of data. Box plots are important as they graphically display whether the distribution is positively or negatively skewed or symmetrical, showing the relationship between the data. I am hoping that they will show me for the females the height and weight are more negatively skewed than that of the males’, showing that most of the data are smaller values, proving females generally weigh less and are shorter than males. I will also calculate Spearman’s rank correlation coefficient, which will allow me to discover easily the strength of correlation within the data, and whether the correlation is positive or negative, or if there is any correlation whatsoever. Spearman’s rank is helpful as we can use it when the data is not jointly normally distributed and rank order correlation provides a very quick and easy to use method of modelling correlation between probability distributions.

  1. A person’s height and weight will be normally distributed

I will first calculate the frequency density of my values; from this I will draw histograms for height and weight for both females and males, histograms will be helpful as they show the shape of the distribution for a large set of data. Next I will draw my normal distribution curve and I will also calculate the standard deviation of the data to find the measure of spread and dispersal of the data set. My data will be normally distributed if 95% of the data lies within 3 standard deviations.

For this investigation I will require 30 girls and 30 boys’ weights and height. It is based upon the students of Mayfield High School, and the statistics have therefore already been collected. As the statistics provided have not been collected by myself and are therefore a form of secondary data, they may be seen as unreliable. It may be wise to evaluate the validity and reliability of the information by critically evaluating how the information was gathered, analysed and presented. I will compare my results and findings with any published secondary tables or graphs available. These will allow me to assess the consistency and dependability of my results and spot any anomalous results.  

To calculate my sample size I could use a census as it is entirely precise: the whole population is taken into account meaning that the sample is dependable and equitable. This method, however, is time consuming and is not always specific, meaning there is room for errors and there may be several anomalies. I will use stratified sampling to choose the amount of data for each strata because it is representative of the whole population. It will also achieve greater precision than, for example, random sampling; therefore because it provides greater precision, a stratified sample often requires a smaller sample, which will avert wasting unnecessary time. It will also ensure better coverage of the population than simple random sampling, although it will be more complex to organise and analyse results. I will, therefore, use a sample of 60 as it is easier to analyse the pattern.

Join now!

The Population: 1182            

The sample size: 60

At a first glance at the statistics, there are several anomalous results. The unreliability of the data can be proven by looking at these; many values are missing and in some cases there appears to be two people with the same name. I will ignore these rows, as they may be genuine errors, to increase the precision and accuracy of my investigation. The only problems I can oversee are those which are included in the secondary data. As I did not collect the statistics myself, ...

This is a preview of the whole essay