My hypothesis is that the top 3 sets (A, B and C) predict both the size of the angles and the lengths of the line better than the middle 3 sets

My hypothesis is that the top 3 sets (A, B and C) predict both the size of the angles and the lengths of the line better than the middle 3 sets (D, E and F) and the bottom sets (G, H and I). I think this is correct because the top set is supposed to be smarter and I think they should be able to estimate closer to the correct amount.

The are 185 pieces of data in the whole population.

77 are in the top sets.

70 are in the middle sets.

38 are in the bottom sets.

Overall, there is too much data so I need to choose a sample. I need to make sure my sample size is appropriate. 10 will be too small and is not enough to represent my population and 100 will be too large. I will choose a sample size of 40 as that is not too small and not too large. It is just over 20% of my population so it is enough to get a representative amount of data.

There are not the same amount of people in top, middle and bottom sets so I cannot have the same amount of people from each group. They have to have the same proportion. I worked this out in the table below:

Now I know how many people I am going to take from each set, I need a random way of picking people so everyone has a fair chance of being picked. I knew of the random key on the calculator and decided that that was the fairest way of picking the data but first I wanted to see how the random key worked.

When I kept clicking random on the calculator again and again, I noticed two things. Firstly, all the numbers I got were 0≤n<1 and secondly all the numbers went up to 3 decimal places. This means I can get 0-0.999 so there are 1,000 random numbers (including 0).

As I will get a lot of decimal places I need to choose a way of making them whole numbers. There are two ways I could do this. These are

I could truncate
I could round the numbers.

Truncating would mean I would take the whole number and cut of the decimal places. I realised that this will not be very useful as I will never get the highest number.

E.g. If there are 12 numbers, the highest random number is 0.999 and if I multiply that by 12 I get 11.988 and that will count as 11 if I am truncating. This makes it impossible to get 12 so it will not be fair as all the data will not have a fair chance of being picked.

That leaves me with the other method, rounding. Does that give every number the same chance of being picked? I tested this for the top sets sample. I have to choose out of 77 people for my random size of 17. There are 1000 random numbers so I want to see if all 77 numbers have the same chance of being picked. 1000/77= 12.987…. This calculation shows that not all 77 numbers have the same chance of being picked. That means that if the number you are processing is not a factor of 1000 it is not entirely random. This is shown on the table below:

The numbers from 1 to 76 are 12 or 13 random numbers assigned to them so it is relatively fair. However 77 has only 5 numbers assigned to it as 0 has 6. We do not need 0 as one of our results so if the number rounds to 0 we can just say it is 77 so 77 has 11 random numbers assigned to it and makes it fairer.

Top set sample

These are my chosen sample of 17 people from the top set.

Middle Sets Sample

1000/70=14.29 which is not a whole ...