Height and weight of pupils

Statistics Coursework

Introduction

The data that we are going to use is secondary data that has been collected from a high school. It is made up of a collection of qualitative and quantitative variables . This also includes both discrete and continuous data. The data is in the form of a spreadsheet, making it easy to identify the required information for the investigation. I am using this secondary data as it would be easier than collecting primary data myself. This would take a very long time, and using secondary data will not affect the results of my investigation if I emit any obvious mistakes in the given data. These may affect the accuracy of the investigation and could occur if, for example, the person typing up the data made a typing error or one people in the population may have misunderstood a question or answered incorrectly. The line of enquiry that I have chosen to use is the relationship between the height and weight of different pupils in the high school. To investigate this I will need the following data about the members of the population…

Year group – discrete quantitative data
Gender – discrete qualitative data
Height – continuous quantitative data
Weight – continuous quantitative data

I will delete the rest of the data as it is not needed in my investigation. I will then sort the relevant data into groups of year group, gender etc. in order to do the investigation in a logical way.

Hypothesis

The hypothesis that I have chosen to use is…

As students in the school get older, their height and weight

will increase. Boys will also be taller and heavier than girls.

Samples

In my investigation I will be using a sample of the data because it would take an unreasonable amount of time to use the data collected from every student. It will still give me a reasonably accurate result and using a sample will be more practical too. I will be using quota sampling in my investigation so that there will be equal numbers of people in each group sampled, making it a fair test. The groups I will be using are age, weight, height and gender of students. Within each of these groups, I will be using a systematic sampling system so that the sampling process is quick and easy. I have chosen not to use stratified sampling as, even though the groups would be in proportion to each other, there would be different numbers of people in each group. This would then make the groups harder to compare. If I was to use stratified sampling, and chose to use a 10% sample in each group, these are the numbers of people that would be obtained…

Year 7— boys: 15 girls: 13

Year 9— boys: 12 girls: 14

Year 11— boys: 8 girls: 9

I have also chosen not to use simple random sampling as this would take longer than systematic sampling and therefore would not be as practical and easy.

How many people?

This is a preview of the whole essay

I have also chosen not to use simple random sampling as this would take longer than systematic sampling and therefore would not be as practical and easy.

How many people?

I will be using data from students in years 7,9 and 11, and have chosen to use the data of 20 students from each year group. This will give approximately a 10% sample size in each group. This may result in a large gap between the year groups, but I think that these groups will still give a reasonably accurate result. Even though using every year group would give a more reliable result, I have chosen not to use data from students in every year group. This is because it would take longer and using just three year groups still shows a sufficient increase in age groups.

To save time when selecting the data for my investigation, I worked in a group of 3. We chose a year group each to select the necessary data from. This would mean that my results were still almost as accurate as they would be if I had sorted all the data myself, but they may be slightly more biased as I do not know how accurately the other members of my group may have sorted the data. As I have chosen to use systematic sampling, I have chosen to use every 8th person in my sample of the data. The other members of my group used every 7th and 9th person in their allocated year group’s data. This means that I have a varied sample of the data making my results more accurate. I if find that there are any gaps or obvious mistakes in the data that I am using, I have set boundaries to help eliminate any obviously incorrect data. The boundaries that I have chosen are as follows…

Boys height:

Girls height:

Boys weight: no less than 50kg; no more than 80 kg

Girls weight: no less than 45kg; no more than 75kg

I have chosen to use these boundaries as I have estimated that it would be very unlikely that any data out of these boundaries would be correct. I will delete this data from the spreadsheet to ensure that my results are as accurate as possible. These mistakes may occur if there was a typing error when the data was first being collected in a spreadsheet. The person giving the data may have also misunderstood a question, or given their answer in the incorrect units. On the occasions when this has happened, by using these boundaries I will be able to eliminate some of these results, therefore improving the accuracy of my results.

Plan

I will collect the primary data as it is already written in the spreadsheet.
I will then sort the data in the spreadsheet into the form that I want. I will do this by deleting any of the data that I do not need, leaving me with only the Heights, weights, genders and year groups of pupils in years 7, 9 and 11. This is so that I can use this data to draw graphs and make calculations on the computer to save time.
I will plot the following scatter diagrams…

Year 7 boys
Year 7 girls
Year 9 boys
Year 9 girls
Year 11 boys
Year 11 girls

I will then calculate an average point for each of the scatter diagrams and add a line of best fit (if necessary). This will allow me to see if there is a relationship between the height and weight of the pupils and comment on this.

I will work out the equation of this line using the equation y=mx+c and compare the gradients and intercepts of each of the scatter diagrams. I will then be able to comment on this and see further the relationships between height and weight of pupils.
Next, I will calculate Spearman’s rank correlation coefficient for each of the 6 sets of data. This will give me a numerical value of the correlation of the data meaning that I can further compare the relationship between the heights and weights of the pupils, making my results more accurate.
I will then draw box and whisker plots of the data so that I can further compare the boys and girls data from each year group. This will show me the distribution of each of the sets of data and I will comment on this. I have chosen not to use cumulative frequency as, even though this would also show the distribution of the data, a box and whisker plot would show the mean, mode, range, upper and lower quartiles and interquartile range in a simpler way that would be easier to understand. A box and whisker plot also allows any outliers to be calculated, making my results more accurate. This would also mean the diagrams will be easier to compare.
After drawing the box and whisker plots, I will be calculate the outliers and delete the appropriate values that have caused these outliers. I will then compare the diagrams again and comment on the skew on the diagrams, the range and the median.
I will then be able to draw accurate conclusions about my findings.

Conclusion

For my project, I have investigated the following hypothesis…

As students in the school get older, their height and weight

will increase. Boys will also be taller and heavier than girls.

‘As students in the school get older, their height and weight

will increase’. This part of my hypothesis was generally quite true. This was shown on my scatter diagrams and box and whisker plots. As the age of the students increased, the data appeared further up the scales on the diagrams. For example, the median for the heights of year 7 boys was 1.55m, whereas the median for the heights of year 11 boys was 1.67m. The median for the weights of year 9 girls was 47kg, whereas the median for the weights of year 11 girls was 48kg. This is because people grow as they get older, which would make them taller and heavier. The data for year 9 showed no correlation, or quite weak correlation. Spearman’s rank correlation coefficient was 0.12 for the boys, and 0.46 for the girls. In real life, most people have a growth spurt in year 9, so this would affect the relationship between height and weight in year 9 pupils, therefore explaining why this type of correlation occurs. This is shown on the scatter diagrams for year 9.

‘Boys will also be taller and heavier than girls’. This part of my hypothesis was also mostly true. On the box and whisker diagrams, the boys’ diagrams appeared further up the scale than the girls’ diagrams. The following results also show this…

The girls’…

median is 48 kg

lower quartile is 46 kg
upper quartile is 54 kg

interquartile range is 8 kg

The boys’…

median is 58 kg

lower quartile is 51 kg
upper quartile is 68 kg

interquartile range is 17 kg

This shows that the boys are heavier than the girls. These results also show that the boys are taller than the girls…

The girls’…

median is 1.63m

lower quartile is 1.61m
upper quartile is 1.68m
interquartile range is 0.07m

The boys’…

median is 1.67m

lower quartile is 1.63m
upper quartile is 1.79m

interquartile range is 0.16m

These results prove that the second part of my hypothesis was also true. Also, as shown on the scatter diagram to show the heights of boys in year 11, a strong positive correlation is shown. The value for Spearman’s rank correlation coefficient for year 11 boys is 0.72, which shows that there is a quite strong positive correlation. Therefore, as the boys get taller, they also get heavier. In real life, this would probably be because boys are generally more muscular and if the boys have been proved to be mostly taller than the girls, the correlation shows that as a result of this that the boys will also be heavier than the girls.

Overall, my results have shown that my hypothesis was correct. Therefore, as students in the school get older, their height and weight

will increase.

Evaluation

Overall, I think that my investigation went well, although I came across some problems during some parts. There were several obvious mistakes in the data, which I missed out of my investigation results. These were as follows…

year 7 girls— height 1.64m, weight 140kg
year 7 boys— height 1.35m, weight 29kg

year 9 boys— height 1.64, weight 35kg

year 9 boys— height 1.83m, weight 57kg

I also removed the following data after I had drawn my first scatter diagrams as they were obviously unlike the rest of the data.

year 9 boys— height 1.44m, weight 49kg

year 11 girls—height 1.54m, weight 65kg

I changed the form of the following data as it was not in the correct form…

year 11 girls— changed weight of 51.8kg to 52kg so it was to the nearest kilogram like the rest of the weights
year 9 girls— changed height of 1.6m to 1.60m so that it was to two decimal places like the rest of the heights

This then improved the accuracy of my results. These caused a problem as if there were any missed out, then they would affect the results or my investigation, especially the averages of groups of data. I had set boundaries to avoid any mistakes within my data, but as I shared the task of sampling the data with two other people, they did not use the same boundaries as me and therefore this may have decreased the accuracy of my results. I also had a time limit, so I only had time to work out a few values from the data before my deadline. If I had had longer to complete my investigation, I would have worked out more things, such as standard deviation and looked at the Body Mass Index (BMI) of each of the pupils, then compared it to a BMI chart . I think that the sampling method I used, which was systematic sampling, was the best method to use as it meant that I could quickly and easily take an accurate sample from each group of the data and use it to investigate my hypothesis. It also gave me accurate results as it was a fairly distributed sample. To save time, I shared the task of selecting a sample of data with two other people. This could have affected my results slightly in that they could have made a mistake when collecting their sample without me knowing. To improve the accuracy of my investigation, I could have collected all the data myself to ensure that there were no mistakes made, but this would make it harder to keep to my deadline. As I shared the work with two other people, I did not draw all the box and whisker diagrams myself. This meant that the scales were different on some of the diagrams, making them harder to compare. This would only slightly affect my results as I could still easily use the values (for example medians) from each of the box and whisker diagrams to compare the diagrams. To avoid this problem, I could have set scales that everybody was to use on their box and whisker diagrams to make them easier to compare. If I had more time to complete my investigation, I could have used a larger sample, therefore including more data and improving the accuracy of my results. I also could have used other values. To save time, I chose to only use the data from years 7, 9 and 11. I deleted the rest of the data. to improve the accuracy of my results, I could have used the data from years 8 and 10 aswell, but this would have taken longer. I could also have chosen to use all of the data instead of taking a sample, but his would also take a very long time and I would not be able to do this within the time limit that I was given. Overall, I feel that my results are reliable and accurate, and generally my investigation went well.

Height and weight of pupils

Introduction

This is a preview of the whole essay

The girls’…

median is 48 kg

The boys’…

median is 58 kg

The girls’…

median is 1.63m

The boys’…

median is 1.67m

Document Details

Related Essays

Height and weight of pupils

Height and weight of pupils

Height and weight of pupils

The correlation between the height and weight of pupils at Mayfield High.