My aim is that within the limits of a small-scale survey I will collect sample data of a population, and by using estimation techniques I will determine the population's parameters (such as the mean and the variance).

Mathematics Coursework – Statistics.

S1 Task A: Measurements.

Aim.

My aim is that within the limits of a small-scale survey I will collect sample data of a population, and by using estimation techniques I will determine the population’s parameters (such as the mean and the variance). My population is smarties, and in this investigation I am looking at the individual weight of random smarties, which will be my sample. I decided to stick with weight, as it is a property that will vary a lot, I think, and so I hope will prove an interesting investigation. An important factor to help me decide on how large my sample should be is that the size of the sample must be quite small, because it is stated so in my aim. However, to make accurate estimates of population parameters the sample must be large enough.

Therefore to help me decide on the size of my sample, I have accordingly looked at the Central Limit Theorem, which states that:

If the sample size is large enough, the distribution of the sample mean is approximately Normal.
The variance of the distribution of the sample mean is equal to the variance of the sample mean divided by the sample size.

The Central Limit Theorem allows predictions to be made about the distribution of the sample mean without any knowledge of the distribution of the parent population, as long as the sample is large enough. For this reason, the sample size will be set at fifty, which I consider large enough for the distribution of its mean to be normal (according to the Central Limit Theorem). It should not be larger because the aim of this investigation is to carry out a “small scale survey”.

The sample.

The sample will be of the weight of fifty smarties. To be a “good” sample I must make sure that the results are valid and not biased in any way, which means that these smarties must be collected randomly, because the sample must be random for the Central Limit Theorem to be in effect, which would provide a Normal distribution of its mean which will allow me to make predictions of the parent population.

I have decided to collect my sample data in a group, in order to lower he costs of the investigation, and also to provide assistance with greater accuracy to collecting the sample. Five tubes of smarties will be bought, each from a different shop, and ten smarties will be selected at random from each tube to be used in the survey. This should produce a nice sample, which I am fairly confident will be random.

Calculations.

In this investigation I will need to calculate a few things in order to come to my conclusion. This is a quick plan of what I will be calculating.

The mean, standard deviation and variance of the sample.
These will be used to estimate the variance and standard deviation of the parent population of smarties.
This in turn, will be used to estimate the standard error (the standard deviation of the sample mean distribution).
And, this will be used along with the mean of the sample to create confidence intervals for the mean of the parent population of smarties.

Accuracy of measurements.

My group will randomly select the required smarties from the packets and one after the other, they will be weighed on an electronic balance that will be “reset” to zero after each measurement, which will reduce the chance of any inaccuracies that might arise from small pieces of smartie being left on the balance.

This is a preview of the whole essay

Accuracy of measurements.

The balance we have used has a high degree of accuracy, as the measurements are given in grams to three decimal places. However, if the difference in the weight of smarties is too small to be detected on this balance, either a more accurate balance must be found or a survey of something with a higher variance must be carried out.

Results (sample data).

There is a table of the sample I have collected in my group.

Although not necessary, I thought it would be somewhat useful to depict my sample data onto a stem and leaf diagram. Other information about the sample includes the lowest value, which is 0.867g, the highest is 1.110g, and the range is 0.243g.

Sample Parameters.

Mean.

Using the total sum of the fifty smarties and dividing it by fifty to obtain the mean.

Variance.

The formula for variance states that you take the ‘Mean of the squares minus the square of the mean’.

Standard Deviation.

The standard deviation is found by finding the square root of the variance.

Population Parameters.

Estimate of the Mean of the population of smarties.

The mean is an unbiased estimator, that is, the mean of its distribution is equal to the mean of the parent population. For this reason it can be used as an estimator for the mean of the population of smarties. As the mean of my sample is 0.976, then an estimate of the mean of the population of smarties is therefore:

Estimate of the Variance of the population of smarties.

The variance of the sample is a biased estimator. A biased estimator is one for which the mean of its distribution is not equal to the population value it is estimating. Therefore it must be converted to an unbiased estimator, by multiplying the sample variance by the number of smarties.

You can see that the estimate of the variance of the population is slightly larger, than the variance of the sample. This is because within each sample some values are larger and smaller then the population mean, overall these cancel out and the mean of each sample would be closer to the population mean than the individual item.

Estimate of the Standard Deviation of the population of smarties.

Standard Error.

The standard error is the standard deviation of the sample mean. According to the central limit theorem, the standard error can be calculated by performing a square root of the variance of the mean. This can be demonstrated algebraically:

The standard error calculated above is quite small. This means that the variance of the sample mean is low, and this shows that one can be quite confident that the actual mean of the population is around 0.976. However this is not a very “mathematical” or “user friendly” method of showing how confident one is about the accuracy of the estimate made. This introduces the use of Confidence Intervals.

Confidence Intervals Background.

To calculate how confident one is about the estimate of the population mean, one can use confidence intervals. These tell you how confident (as a percentage) you can be that the mean of the population falls within a given range. How they work is explained in the following.

Example Sample Mean Distribution:

According to the Central Limit Theorem, the sample mean is distributed Normally. The mean of the sample mean (the centre of the curve) is equal to the population mean. The shaded area in the diagram shows the population mean ± 1 standard error. According to the tables for the normal function, this comprises of 68% of the curve. This means that there is a 68% chance that the mean of the sample is within one standard error of the mean of the population. This probability can be written algebraically as an inequality:

However, as μ is not known when sampling, the above inequality is useless, as it is not known to which number to add or subtract the standard error from. So the inequality is rearranged into this:

This shows that the probability that the population mean is within 1 standard error of the sample mean is 68%. In other words you can be 68% confident that the population mean is within 1 s.e. of the sample mean.

This idea can be used to calculate the confidence intervals that allow you to be 90%, 95% and 99% sure of the range where the population mean is found.

90% Confidence Interval.

To work out a 90% confidence interval, you must work out how many standard errors from the mean contain 90% of the area under the curve (shown by the 0.9 in the shaded area above, as the are under the whole curve is equal to 1). The table of the Normal function shows areas to the left of points on the x-axis. This means that to work out the z score (the number of standard errors), you must calculate the total area to the left of the “z”, and look that up in the table to find the z score. This then allows you to calculate the confidence interval:

The formula again to calculate confidence interval is here:

The k value for 90% is 1.645, which I will use in the formula to find the confidence interval.

This in words means that you can be 90% confident that the mean weight of the population lies between 0.963g and 0.989g.

95% Confidence Interval.

The k value for 95% is 1.960, which I will use in the formula to find the confidence interval.

This means that you can be 95% confident that the population mean is between 0.961g and 0.991g. This is a larger range than that of the 90% confidence interval, because to be more confident, the possible range must increase.

99% Confidence Interval.

The k value for 99% is , which I will use in the formula to find the confidence interval.

This means that you can be 99% confident that the population mean is between 0.956g and 0.996g.

Validation.

I must test my confidence intervals, and check if another sample will fit, and therefore show whether my investigation has been successful. For this quick test my group collected another ten results from two further packets of smarties. We used the same method for data collection as before.

The Data we collected for the second (test) sample:

Mean.

Looking back at my confidence intervals, this new sample mean actually fits in each of the confidence statements I made.

Conclusion.

After using the estimation techniques of the Central Limit Theorem I have estimated the following population parameters for the weights of smarties (three significant figures):

Mean = 0.976g

Variance = 0.00292

Standard Deviation = 0.0547g

Using the Confidence Interval technique I have produced the following confidence:

90% = 0.963g < μ <0.989g

95% = 0.961g < μ < 0.991g

99% = 0.956g < μ < 0.996g

Also, my second sample data has proved my confidence intervals correctly, and the mean of 0.967g fit into the ranges that I calculated.

I suppose there is no real conclusion, and I could not have set myself a hypothesis, as the investigation did not really call for one. The limiting factors meant I could not carry out a more detailed investigation, as for example if I based my sample on how many of each colour of smarty per tube, I would need a sample of about fifty tubes of smarties, which would have been impractical for me, as I would have had a time and a cost burden.

Limitations.

The size of the sample was small. The calculations that relied upon the data collected are therefore inaccurate to some extent, looking at how many smarties and packets of smarties there are in the world, a sample of fifty is completely dwarfed. To be more accurate a large sample must be collected.

The sample might have been a “fluke” I might have got all the big smarties, or all the small ones. However there is not much to do to eliminate the possibility of this apart from to weigh every single smartie. This is extremely impractical.

The smarties gathered were from my groups immediate area. Even though they were taken from different shops and different packets, they do not necessarily represent all the smarties in the world, only ones in my area.

The results may be unreliable because the company that produces smarties may be changing, or have changed the mean weight setting for the smarties. They may be trying to slowly lower the weight while keeping the price the same. This could mean that the actual population parameters are somewhat different to the ones estimated here. But this may be unlikely.

I found that the actual investigation may have been very limited from the start, by choosing the weight from random packets it meant I could not analyse the ratio of smarties per packet, or the mean weight per packet, as to collect a sample for those types of investigation I would need to have collected about fifty packets of smarties, which not only may have been expensive but would have been very tedious to measure each smarty from fifty packets, and would have been very impractical to carry out.

Possible Extension.

A statistical analysis of entire tubes of smarties could be carried out. The actual weight of the smarties could be compared to the price on the tube to determine whether the manufacturers are lying about how much smartie there is in their packets. Also similar investigations looking at how many smarties per packet, average weights of packets, etc.

Weighing smarties of different colours could also be done to find if there are any differences between them. Or even counting how many smarties of different colours you get in different packets. But yet again an investigation like this would be harder to carry out, as you would need at least fifty packets of smarties to carry out a ‘small scale’ investigation…

Also, a larger sample size could be taken to determine the mean and variance more accurately, a lot more accurately in fact.

Lastly, I could have extended my confidence interval calculations; I could have included a 99% confidence of the mean varying only ± 0.001g, which would have shown I would have needed a massive sample, possibly over 20,000 to get that much confidence in such a small interval.

My aim is that within the limits of a small-scale survey I will collect sample data of a population, and by using estimation techniques I will determine the population's parameters (such as the mean and the variance).

This is a preview of the whole essay

Document Details

Related Essays

Collect data with a view to estimating population parameters using estimati...

I shall collect data from a population in order to estimate population para...

Undertake a small-scale survey to estimate population parameters.

Collect data from a population with a view to estimating population paramet...