Maths Data Handling

Introduction

The line of enquiry that I have chosen is ‘The relationship between height and weight’. To investigate this line of enquiry, I am using secondary data (that I will acquire from the internet) so that there will be no bias and unfairness that is obtained through the collection of primary data in questionnaires. The data that I am using is on a fictitious school, Mayfield High School, but the actual data has been obtained from a real school. This is useful as there will be five age groups that are considered in the whole investigation. However the age groups that lie below the age of 11 and age groups that lie over the age of 16 will not be thought of in the investigation as it stretches out of the age boundary in Mayfield High School.

There are 1183 pupils in Mayfield High School and I will be using the following pieces of data on each pupil: - year group, age, gender, height and weight. This means that I will have a total of 5915 datum points to work from. This is obviously too large so I will use a sampled piece of data of 100 pupils. Since I will be using stratified sampling I will need to know how many boys and girls there are in each year. The table below shows the exact figures.

I will need this table throughout my investigation so that I can construct a stratified sample. This is because I will need to know how many girls and boys there are and the number of students in each year. This will enable me to construct a fair sample, where there will be proportionate numbers of students in the sample to the actual number of students in each year. For example, if I was creating a sample of 100, I would need 11% of it to be Year 7 Girls as the whole population consists of 11% Year 7 Girls.

I will be considering many factors throughout my investigation such as age and gender. This will allow me to build up my line of enquiry, and make certain hypotheses along the way that I will study and interpret with graphs, averages and measures of spread. This will include techniques varying from frequency tables to cumulative frequency and from the mean to standard deviation.

There will be certain limitations throughout the investigation that I will explain. This will include bias and other factors such as age that will not be considered in parts of this investigation.

‘The Relationship between Height and Weight’

Hypotheses

There is a relationship between Height and Weight.
The relationship between Height and Weight is affected by gender.
Most boys will be taller and heavier than girls.

Predictions

I predict that the results will show that there is a relationship between height and weight. This will be show if the histograms and the frequency polygons for height and weight considering boys and girls will look fairly similar. If this is true then the histograms and frequency polygons will show that a person who is very tall is likely to be very heavy, whereas a person who has a small height is likely to have a small weight. I also predict that most boys in my sample will have a greater height and weight than girls. This means that I predict that my results will show that the relationship between height and weight is affected by gender.

Plan

As stated in the introduction, my line of enquiry is:

‘The relationship between height and weight’.

For the first part of the investigation I will use a stratified sample of the whole population of Mayfield High School. This is because I will be able to represent the whole school in my sample and then able to interpret ideas about reality as well as the model. Also, the stratified sampling eliminates the bias of gender and age so I will be able to find clear results relating to the whole population in Mayfield High School. Once I have created my stratified sample, I will need to use random sampling, so that each student in the school has an equal chance of being selected to be in my sample. As well as sampling, to make this a fair investigation, I am using secondary data. This is so that I do not have to worry about making questionnaires that could prove to be biased and unfair. Also I know that the secondary data is based on a real school, under a fictitious name, meaning the data that I am using is not made up.

This table in the introduction shows that 51% of the school is boys and 49% of the school is girls. I have chosen 100 pupils to be in my sample so I do not have to do any working out as the percentages are already out of a hundred. This means that in my sample there will be 51 boys and 49 girls. I still need to create my sample so that the number of pupils from each year in my sample is proportionate to the number of pupils in each year in the whole school. The table below shows the distribution of pupils in my sample.

This stratified sample creates a small version of the whole school. If I did not use a stratified sample, then bias will occur in my investigation. This is because the school is growing each year and so it is likely that year 7 will contain a vast amount of pupils. This will effect my investigation because of the relationship between age, height and weight, which will be studied later in the project. The proportionate sampling eliminates this bias completely and makes sure that the students from different age groups are equally represented in my sample. The selection of pupils that go in my sample will be done through random sampling where I will use the ‘random’ button on my calculator. For example if I am selecting Year 7 girls for my sample, I need 11 girls. I will press the random button on my calculator and then multiply it by 131, as that is how many Year 7 girls there are in the population. The number that shows up on the calculator screen will be the pupil number (shown in Microsoft Excel), which will be selected to be in my sample. I will do this eleven times for Year 7 Girls. I will repeat this method for each Year Group and gender.

After the sampling, I will construct frequency tables. I am dealing with continuous data and there will be 100 students in my sample, so it will be better to create class intervals in the frequency tables. I will be able to find the mean from grouped data as well for the frequency tables, but I will have to round to one decimal place, meaning the mean will not be completely accurate. I will also find out the modal class interval in the production of the frequency tables as the class interval with the highest frequency will be the modal class interval.

The data that will be in the frequency tables will be appropriately presented as histograms and frequency polygons. This is because the data is continuous and so a histogram would be a useful diagram to interpret the data graphically and also help me see how height and weight is affected by gender. This is similar to the frequency polygons as I will draw a two frequency polygons on each graph, one for boys and one for girls. This will be done for height and weight. Once they are drawn, I will be able to compare the height and weight of girls and boys allowing me to see who is generally taller and who is generally heavier.

I will produce back to back stem and leaf diagrams in this first part of the investigation as well as histograms and frequency polygons. This will allow me to not only find the median, but also compare the data for the boys’ and girls’ height and weight. I will find the range while creating the stem and leaf diagrams as I will know what the highest and lowest height and weight for boys and girls.

Once I have got all the information that I need and created comparative histograms, frequency polygons and stem and leaf diagrams, I will be able to put it into a table. This will make it easier for me to see the different averages for the girls and boys and so I can then compare the data. I will be able to make firm conclusions.

Limitations

Although Mayfield High School has many students that can be used in the investigation, there is no data on the people who are not students in Mayfield High School. This means that the different life styles that affect height and weight will not be considered within the investigation in depth. However there will be some exceptions due to the fact that the students of Mayfield High cannot be controlled in the way that they live their life. This means that some students will have different diet and different metabolic rates, which will affect the results.

Girls’ Weight

Mean = ∑fx = 2465 = 50.3 kg Modal Class Interval = 40 ≤ w < 50