Mayfield High Statistics Coursework

Authors Avatar

Mohammed Israr 10x1

Mayfield High Statistics Coursework

Introduction

I am going to complete a statistical investigation around the fictitious data of Mayfield high school, which has data that would represent a real school. I will be using various techniques that I have recently learnt, studied and captured to produce a successful & efficient piece of coursework.

Mayfield high is a fictitious school that consists of 1183 male and female students in years 7 to 11. The data given to me on these students comprises of height, weight, eye colour, favourite TV program, hair colour, eye colour, gender and favourite type of music etc.

Data

Data is made up of a collection of variables. Each variable can be described, numbered and measured.

  • Data that can only be described in words is qualitative. Such data is organised into categories, such as make of car, colour of hair, etc        .
  • Data which is given numerical values, such as shoe size or height, is quantitative. This type of data can be sorted into two categories:                                                                      - Discrete data can only take certain values, usually whole number, but may include fractions (e.g. shoe sizes).                                                                                                                         – Continuous data can take any value within a range and is measurable (e.g. height, weight, temperature, etc).

For my studies I will use quantitative data, this usually involves more complex graphs and studies. Quantitative data is usually grouped grouping data can be better then raw data as I can produce better visual graphs such as histograms, cumulative frequency graphs and frequency polygons.

One line of enquiry I have chosen to research is about height and weight. I have chosen these as I believe that pupil’s height and weight is affected by their age and gender especially as there are teenagers. This will help me create a hypothesis

Using this data I will create two hypotheses which I will state later. These hypotheses will be based on the information they have given me about the pupils including height and weight etc.

I will use a variety of different statistical analysis and techniques (which I will talk about later) that will help me in achieving a more accurate and unbiased piece of coursework.

Tasks

Stratified sampling - When you are using statistics with a large number of values, you may only want a representative sample in your survey. Using stratified sampling we could work out an unbiased amount of students. The formula used when sampling is:

 

An example of stratified sampling is shown below; if there are 1000 students in a school and I want to take a sample of fifty representing the school:

 

The first table show a representation of how many pupils make up a year. The second table samples each year to get a fair representation of the school by using the above formula.

Cumulative Frequency and C.M graphs  

The cumulative frequency is obtained by adding up the frequencies as you go along, to give a 'running total'. The reason for drawing cumulative frequency diagrams is because they can be useful for finding and representing statistical data such as upper quartile (75% of the cumulative frequency), lower quartile (25% of the cumulative frequency), range (biggest value subtracted by the smallest value) and the maximum value as well as the minimum value.

Drawing a cumulative frequency diagram

The table shows the lengths (in cm) of 32 cucumbers.  

Before drawing the cumulative frequency diagram, we need to work out the cumulative frequencies. This is done by adding the frequencies in turn.

  

Scatter Diagrams

A type of diagram used to show the relationship between data items that have two numeric properties. One property is represented along the x-axis and the other along the y-axis. Each item is then represented by a single point. Scatter diagrams can be separated into three categories, positive correlation – if there is a positive trend in the data, negative correlation – if there is a negative trend in the data and no correlation – this is when there is no trend between the data and this means the data is spread randomly on the diagram.

Correlation Coefficient

A correlation coefficient is a number between -1 and 1 which measures the degree to which two variables are linearly related. If there is perfect linear relationship with positive slope between the two variables, we have a correlation coefficient of 1; if there is positive correlation, whenever one variable has a high (low) value, so does the other. If there is a perfect linear relationship with negative slope between the two variables, we have a correlation coefficient of -1; if there is negative correlation, whenever one variable has a high (low) value, the other has a low (high) value, a correlation coefficient of 0 means that there is no linear relationship between the variables.

There are a number of different correlation coefficients that might be appropriate depending on the kinds of variables being studied. For my hypothesis I have decided that a positive correlation can be between 0.6 – 1.0.

Histograms and frequency polygons

A histogram is a special bar chart that uses area to find out the frequency. On the x-axis of a histogram goes the class or width (histograms are usually made from grouped data). The frequency can be found by multiplying the frequency density by the class width. Below is an example of a histogram.

Join now!

 

If we are going to draw a histogram to represent the data, we first need to find the class boundaries. In this case they are 5, 11, 16 and 18. The class widths are therefore 6, 5 and 2.

The area of a histogram represents the frequency.

The areas of our bars should therefore be 6, 15 and 4.  

This information can be marked on the grid.

A frequency polygon is an easy way of comparing two sets of data frequencies as they can be drawn on the same graph. This means that they are an easy way ...

This is a preview of the whole essay