I have also chosen not to use simple random sampling as this would take longer than systematic sampling and therefore would not be as practical and easy.
How many people?
I will be using data from students in years 7,9 and 11, and have chosen to use the data of 20 students from each year group. This will give approximately a 10% sample size in each group. This may result in a large gap between the year groups, but I think that these groups will still give a reasonably accurate result. Even though using every year group would give a more reliable result, I have chosen not to use data from students in every year group. This is because it would take longer and using just three year groups still shows a sufficient increase in age groups.
To save time when selecting the data for my investigation, I worked in a group of 3. We chose a year group each to select the necessary data from. This would mean that my results were still almost as accurate as they would be if I had sorted all the data myself, but they may be slightly more biased as I do not know how accurately the other members of my group may have sorted the data. As I have chosen to use systematic sampling, I have chosen to use every 8th person in my sample of the data. The other members of my group used every 7th and 9th person in their allocated year group’s data. This means that I have a varied sample of the data making my results more accurate. I if find that there are any gaps or obvious mistakes in the data that I am using, I have set boundaries to help eliminate any obviously incorrect data. The boundaries that I have chosen are as follows…
Boys height:
Girls height:
Boys weight: no less than 50kg; no more than 80 kg
Girls weight: no less than 45kg; no more than 75kg
I have chosen to use these boundaries as I have estimated that it would be very unlikely that any data out of these boundaries would be correct. I will delete this data from the spreadsheet to ensure that my results are as accurate as possible. These mistakes may occur if there was a typing error when the data was first being collected in a spreadsheet. The person giving the data may have also misunderstood a question, or given their answer in the incorrect units. On the occasions when this has happened, by using these boundaries I will be able to eliminate some of these results, therefore improving the accuracy of my results.
Plan
- I will collect the primary data as it is already written in the spreadsheet.
- I will then sort the data in the spreadsheet into the form that I want. I will do this by deleting any of the data that I do not need, leaving me with only the Heights, weights, genders and year groups of pupils in years 7, 9 and 11. This is so that I can use this data to draw graphs and make calculations on the computer to save time.
- I will plot the following scatter diagrams…
- Year 7 boys
- Year 7 girls
- Year 9 boys
- Year 9 girls
- Year 11 boys
- Year 11 girls
I will then calculate an average point for each of the scatter diagrams and add a line of best fit (if necessary). This will allow me to see if there is a relationship between the height and weight of the pupils and comment on this.
- I will work out the equation of this line using the equation y=mx+c and compare the gradients and intercepts of each of the scatter diagrams. I will then be able to comment on this and see further the relationships between height and weight of pupils.
- Next, I will calculate Spearman’s rank correlation coefficient for each of the 6 sets of data. This will give me a numerical value of the correlation of the data meaning that I can further compare the relationship between the heights and weights of the pupils, making my results more accurate.
- I will then draw box and whisker plots of the data so that I can further compare the boys and girls data from each year group. This will show me the distribution of each of the sets of data and I will comment on this. I have chosen not to use cumulative frequency as, even though this would also show the distribution of the data, a box and whisker plot would show the mean, mode, range, upper and lower quartiles and interquartile range in a simpler way that would be easier to understand. A box and whisker plot also allows any outliers to be calculated, making my results more accurate. This would also mean the diagrams will be easier to compare.
- After drawing the box and whisker plots, I will be calculate the outliers and delete the appropriate values that have caused these outliers. I will then compare the diagrams again and comment on the skew on the diagrams, the range and the median.
- I will then be able to draw accurate conclusions about my findings.
Conclusion
For my project, I have investigated the following hypothesis…
As students in the school get older, their height and weight
will increase. Boys will also be taller and heavier than girls.
‘As students in the school get older, their height and weight
will increase’. This part of my hypothesis was generally quite true. This was shown on my scatter diagrams and box and whisker plots. As the age of the students increased, the data appeared further up the scales on the diagrams. For example, the median for the heights of year 7 boys was 1.55m, whereas the median for the heights of year 11 boys was 1.67m. The median for the weights of year 9 girls was 47kg, whereas the median for the weights of year 11 girls was 48kg. This is because people grow as they get older, which would make them taller and heavier. The data for year 9 showed no correlation, or quite weak correlation. Spearman’s rank correlation coefficient was 0.12 for the boys, and 0.46 for the girls. In real life, most people have a growth spurt in year 9, so this would affect the relationship between height and weight in year 9 pupils, therefore explaining why this type of correlation occurs. This is shown on the scatter diagrams for year 9.
‘Boys will also be taller and heavier than girls’. This part of my hypothesis was also mostly true. On the box and whisker diagrams, the boys’ diagrams appeared further up the scale than the girls’ diagrams. The following results also show this…
The girls’…
- lower quartile is 46 kg
- upper quartile is 54 kg
interquartile range is 8 kg
The boys’…
- lower quartile is 51 kg
- upper quartile is 68 kg
interquartile range is 17 kg
This shows that the boys are heavier than the girls. These results also show that the boys are taller than the girls…
The girls’…
- lower quartile is 1.61m
- upper quartile is 1.68m
- interquartile range is 0.07m
The boys’…
- lower quartile is 1.63m
- upper quartile is 1.79m
interquartile range is 0.16m
These results prove that the second part of my hypothesis was also true. Also, as shown on the scatter diagram to show the heights of boys in year 11, a strong positive correlation is shown. The value for Spearman’s rank correlation coefficient for year 11 boys is 0.72, which shows that there is a quite strong positive correlation. Therefore, as the boys get taller, they also get heavier. In real life, this would probably be because boys are generally more muscular and if the boys have been proved to be mostly taller than the girls, the correlation shows that as a result of this that the boys will also be heavier than the girls.
Overall, my results have shown that my hypothesis was correct. Therefore, as students in the school get older, their height and weight
will increase.
Evaluation
Overall, I think that my investigation went well, although I came across some problems during some parts. There were several obvious mistakes in the data, which I missed out of my investigation results. These were as follows…
- year 7 girls— height 1.64m, weight 140kg
- year 7 boys— height 1.35m, weight 29kg
- year 9 boys— height 1.64, weight 35kg
- year 9 boys— height 1.83m, weight 57kg
I also removed the following data after I had drawn my first scatter diagrams as they were obviously unlike the rest of the data.
- year 9 boys— height 1.44m, weight 49kg
- year 11 girls—height 1.54m, weight 65kg
I changed the form of the following data as it was not in the correct form…
- year 11 girls— changed weight of 51.8kg to 52kg so it was to the nearest kilogram like the rest of the weights
- year 9 girls— changed height of 1.6m to 1.60m so that it was to two decimal places like the rest of the heights
This then improved the accuracy of my results. These caused a problem as if there were any missed out, then they would affect the results or my investigation, especially the averages of groups of data. I had set boundaries to avoid any mistakes within my data, but as I shared the task of sampling the data with two other people, they did not use the same boundaries as me and therefore this may have decreased the accuracy of my results. I also had a time limit, so I only had time to work out a few values from the data before my deadline. If I had had longer to complete my investigation, I would have worked out more things, such as standard deviation and looked at the Body Mass Index (BMI) of each of the pupils, then compared it to a BMI chart . I think that the sampling method I used, which was systematic sampling, was the best method to use as it meant that I could quickly and easily take an accurate sample from each group of the data and use it to investigate my hypothesis. It also gave me accurate results as it was a fairly distributed sample. To save time, I shared the task of selecting a sample of data with two other people. This could have affected my results slightly in that they could have made a mistake when collecting their sample without me knowing. To improve the accuracy of my investigation, I could have collected all the data myself to ensure that there were no mistakes made, but this would make it harder to keep to my deadline. As I shared the work with two other people, I did not draw all the box and whisker diagrams myself. This meant that the scales were different on some of the diagrams, making them harder to compare. This would only slightly affect my results as I could still easily use the values (for example medians) from each of the box and whisker diagrams to compare the diagrams. To avoid this problem, I could have set scales that everybody was to use on their box and whisker diagrams to make them easier to compare. If I had more time to complete my investigation, I could have used a larger sample, therefore including more data and improving the accuracy of my results. I also could have used other values. To save time, I chose to only use the data from years 7, 9 and 11. I deleted the rest of the data. to improve the accuracy of my results, I could have used the data from years 8 and 10 aswell, but this would have taken longer. I could also have chosen to use all of the data instead of taking a sample, but his would also take a very long time and I would not be able to do this within the time limit that I was given. Overall, I feel that my results are reliable and accurate, and generally my investigation went well.