The red results are for the three practices and are not needed.
The problem is that 14 of the 26 results in the first column are been estimated in cm so to improve the experiment I made sure the pupils were told clearly to estimate in mm.
I then got this data:
Some are still estimating in cm but not as many as in the trail data.
Question 1) Is there any link between estimating a straight length of a line and mathematical ability?
Null hypothesis: There is a relationship between estimating a length of a line and mathematical ability.
Alternative hypothesis: There is no relationship between estimating a length of a straight line and mathematical ability.
I have now got all the information of estimating a straight line and to get pupils mathematical ability I am going to use their end of year 9 exam mark which I can obtain from the school database. I am going to use the results for the estimation of a straight line not a non-straight line.
I do not need to get all the pupils results so I need to get a smaller sample. There are 86 boys in year 10 all in 5 sets. Set five did not do the same exam at the end of year nine so cannot be sampled.
The method of sampling I am going to use is Percentage / Quota Stratified random sample selection chosen:
Set 1 25 pupils 50% quota is 26/2 =13 13 results needed
Set 2 24 pupils 50% quota is 24/2 =12 12 results needed
Set 3 18 pupils 50% quota is 18/2 =9 9 results needed
Set 4 17 pupils 50% quota is 18/2 =9 to avoid halves 9 results needed
In total there are 43 results needed.
Each set is listed alphabetically in the School Sims database. These entries were numbered and then using a calculator and the random button the sample names were collected.
For set 1 you press RND on the calculator then x by 26 gave 13.346 so that was converted to the 14th name using this method. There 14th boy was first to be selected. Repeat this 26 times for set 1 to get set 1’s data.
43 results were collected. However 2 boys were ill and missed the exam and 1 boy was away for nearly a whole term so his data may not be full representative. These results were taken out.
40 results were then put into a table on excel:
Negative values are unusual in scatter graphs so positive data will be used.
I’m going to do a scattergraph because it shows correlation and comparisons between the two sets of data. The data from excel was put into autograph.
From the results I got this scatter graph:
I put in the line of best fit which runs through the mean point (blue point) this is called the centroid in autograph. All information from the graph was copied and placed below.
Number of points, n: 40
Mean, x: 27.8
Mean, y: 56.28
Standard Deviation, x: 19.4
Standard Deviation, y: 19.55
Correlation Coeff, r: -0.3979
Spearman's Ranking Coeff: -0.3781
y-on-x Regression Line: y=-0.4009 x+67.42
x-on-y Regression Line: x=-0.3949 y+50.02
The correlation coefficient is -0.3979 so this shows a low degree of negative correlation between the two variables. However, from looking at the graph, generally those good at maths achieved smaller differences in estimation hence were more accurate. The results or points were not all close to the line of best fit so this statement is only a rough guide. To a certain degree of accuracy this proves my null hypothesis.
If you wanted to roughly estimate an exam result from someone who estimated the line with a difference of 43mm you could find this out by using the regression line y dependant on x. This scatter graph shows the y dependant on x regression line.
Y on X Regression Line: y=-0.4009x+67.42
Line of best fit: y=-0.4009x+67.42.
Both the line of best fit and the y on x regression line have the same equation.
The 67.42 shows where the line crosses the exam mark axis and the -0.4 gives the gradient of the line.
Substituting x = 43 in to the equation =-0.4009x+67.42, y = 50.1813
So if a pupil estimated the line with a difference of 43 from the actual length, a rough estimate of what his exam result might be is 50%.
If you wanted to roughly estimate an estimate difference of a straight line from someone who got 59% in the year 9 end of year exam you could find out by using an x dependant on y regression line.
This scatter graph shows the x dependant on y regression line.
x-on-y regression Line: x=-0.3949y+50.02
If someone got 59% in his exam then he may have estimated the straight line with a difference of about 27mm.
However these results are not very accurate as there is not a very strong correlation.
These results are related to my sample investigation. For the reasons stated at the beginning of my project I think that these results will relate both to the study population and the target population. However within these populations more variations of these results could be expected.
2) Does the estimation of a non straight line improve after practice?
Null hypothesis: Practice improves the estimation of a non straight line.
Alternative hypothesis: Practice doesn’t improve the estimation of a non straight line.
Sample:
I got 86 results for boys and 52 results for girls. There are more girls in the year than 52 but only 52 girls could do the experiment as the high wouldn’t allow the test to be carried out in lesson time so the investigation was carried out at lunch so some girls couldn’t go or didn’t want to go. I then did Percentage / Quota Stratified random sample selection for this data (as used in the previous question) and obtained 45 results for boys and 45 results for girls to get 90 sets of data.
This sample will be used for question 3 and 4 as well.
The data I obtained was this:
Graphs to show results before practise using a histogram, a box and whisker diagram and +/- 3 standard deviations:
The box and whisker plot relates to the raw data and shows the lowest amount off actual is about -200 and the highest amount off actual is above 200. It shows that the median is about -40 and that 50% of the results are between about -80 and 60. The histogram relates to grouped data. It shows the results are fairly random and that more people under estimated than over estimated. The +/_ 3 standard deviations show the mean is around – 20.
All information from the graph was copied from autograph and placed below.
Raw Data Statistics:
Number in sample, n: 90
Mean, x: -18.9667
Standard Deviation, x: 90.1252
Range, x: 378
Lower Quartile: -83.75
Median: -28.5
Upper Quartile: 49
Semi I.Q. Range: 66.375
Grouped Data Statistics:
Total Frequency, n: 90
Mean, x: -19.5556
Standard Deviation, x: 91.1154
Modal Class: -80-
Lower Quartile: -85.4545
Median: -36
Upper Quartile: 51.6667
Semi I.Q. Range: 68.5606
Graphs to show results after practise using a histogram, box and whisker plot diagram and +/- 3 standard deviation:
The box and whisker plot diagram shows the lowest amount of cm off is about - 130 and the highest amount of cm off is about 160. The median is about 20 and 50% of the results are between -40 and 30. The histogram shows that most of the data is near 0 and roughly fits a normal distribution. The +/- standard deviation shows the mean is about 0.
All information from the graph was copied from autograph and placed below.
Raw Data Statistics:
Number in sample, n: 90
Mean, x: -1.75556
Standard Deviation, x: 44.5741
Range, x: 260
Lower Quartile: -36.5
Median: -1.5
Upper Quartile: 29
Semi I.Q. Range: 32.75
Grouped Data Statistics:
Total Frequency, n: 90
Mean, x: -6.66667
Standard Deviation, x: 44.6219
Modal Class: -40-
Lower Quartile: -37.1429
Median: -11.4286
Upper Quartile: 22.7273
Semi I.Q. Range: 29.9351.
In comparison of the two graphs and their statistics there is a clear improvement of estimates after practise.
The averages of the two bits of data:
Before practise:
Mean, x: -18.9667
Range, x: 378
Median: -28.5
After practise:
Mean, x: -1.75556
Range, x: 260
Median: -1.5
There is quite a big difference in the averages so this shows a definite improvement in results after practice so this proves my null hypothesis.
3) Does a 14/15 year olds ability to estimate the length of a straight line fit a normal distribution?
Null hypothesis: A 14/15 year olds ability to estimate a straight line fits normal distribution.
Alternative hypothesis: A 14/15 year olds ability to estimate a straight line doesn’t fit normal distribution.
The sample and data used is the same used in question 2. Here is the results from estimating the straight line:
A normal distribution is this:
There are few people who have less and more than everyone and most of the people are in the middle having the same ability.
Here is a histogram to show the results with a normal distribution curve to how good the results fit a normal distribution:
The histograms class intervals were changed slightly to make the histogram fit to the normal distribution curve more. Table and statistics from the graph was copied from autograph and placed below:
Table of Values of Histogram:
Class Int. Mid. Int. (x) Class Width Freq. Cum. Freq. Freq. Density
-100 § x < -60 -80 40 6 6 0.15
-60 § x < -40 -50 20 7 13 0.35
-40 § x < -20 -30 20 14 27 0.7
-20 § x < 20 0 40 35 62 0.875
20 § x < 40 30 20 15 77 0.75
40 § x < 60 50 20 8 85 0.4
60 § x < 100 80 40 5 90 0.125
Raw Data Statistics:
Number in sample, n: 90
Mean, x: 2.3
Standard Deviation, x: 39.1866
Range, x: 186
Lower Quartile: -26.5
Median: 10
Upper Quartile: 25.25
Semi I.Q. Range: 25.875
Grouped Data Statistics:
Total Frequency, n: 90
Mean, x: 0
Standard Deviation, x: 38.5861
Modal Class: -20-
Lower Quartile: -26.4286
Median: 0.571429
Upper Quartile: 27.3333
Semi I.Q. Range: 26.881
In a normal distribution the +/- standard deviations have a certain amounts of results between them. Between the 1st +/- standard deviations there is 67% of results between them. Between the 2nd +/- standard deviations there are 95% of results between them and between the 3rd +/- standard deviations there is 100% of results between them. I am going to show the amount of results between the +/- deviations by using probability by area overlapping a histogram.
Between the 1st +/- standard deviation (- 37 and 41) there is 69% of results. This is very close to 67% and shows a strong similarity between the results and a normal distribution.
Here is another histogram but showing the results between the 2nd +/- standard deviations:
Between the 2nd +/- standard deviations (-76 and 80) there is 93% of results. This is very close to 95% and shows a very strong similarity between the results and a normal distribution. As you can see, 100% of results are between the 3rd standard deviations and shows strong similarities between the results and a normal distribution.
My results are very close to a normal distribution showed by the amount of results between +/- 3 standard deviations and proves strongly my null hypothesis.
4) Are a 14/15 year olds ability to estimate a straight line more accurate than estimating a non straight line?
Null hypothesis: A 14/15 year olds ability to estimate a straight line is more accurate than estimating a non straight line.
Alternative hypothesis: A 14/15 year olds ability to estimate a straight line is less accurate than estimating a non straight line.
The sample used will be the same as used in question 2 and 3. The data I will use the distance from actual of a straight line and distance from actual of a non straight line before practise (before practise because the pupils did not have practise before estimating the straight line).
Here is the data I will use copied from excel: