Modelling Procedures
The first thing I did with the data was to plot a scatter graph of the results. To do this I used Microsoft Excel 2002 as I felt that it would be able to plot the points more accurately than I could and that it would select a perfect scale for the size of graph that I chose. Using a computer program also made it easier to manipulate the graph to the right size and shape.
The graph that I did can be seen as a separate page after this one. What the computer has done is put calories on the x-axis, at my request, and it has put fat on the y-axis. This is because we are measuring the fat against the calorie content, not calories against fat content. It has then selected a suitable scale, which is where the computer has an advantage over the human hand, as it is so easy to change the scale, and plotted the points on the graph where the two values for each food meet. It has simply plotted the points as they are; I have not used any methods for non-parametric graphs due to the simple reason that the graph is roughly elliptical.
Analysis
Although it would be relatively easy to just use a computer program to find out the product moment correlation co-efficient, I decided that I would do it manually and then afterwards use Microsoft Excel 2002 to check the result afterwards. My table of working can be seen below (2 pages on) where my method is quite clearly shown. It shows the different functions I had to put the data through before I got to the final result which I confirmed using Excel.
Using the formula:
Sxy
SxxSyy
The eventual result was that:
r = 0.88622
We can do a one-tail 1% significance test to see whether this value is sufficiently high to show good correlation. Our hypotheses are:
H0: There is no correlation
H1: There is a positive correlation
If we check the critical value for a 1% test, for n = 50, the critical value is 0.3281.
As my value for r, 0.88622 > 0.3281 we accept the alternate hypothesis and reject H0.
This shows that there is good correlation.
Interpretation
The conclusion that we can come to from this result is that there is a definite positive correlation between the number of calories in a food and the amount of fat in the food.
This means that we can be sure that a large proportion of the calories in food are composed of fats and that fat is a very large contributor to the energy in food. However, we can also tell that they are not the only substance that makes up the calories. Because we can see at the bottom of the graph that many of the foods there have some calories, but not all have any fats in them at all. And although this could be partly explained by the fact that I changed amounts too small to measure to 0, these amounts would be far too little to affect the results. Despite this, there were some foods included in the sample that actually did have no fat in them at all, so clearly some foods’ energy is largely contributed by other substances.
However, there is something that can throw the previous conclusion into doubt. There is one result which goes against what I had concluded before, which was that a presence of fat means that there has to be calorie content. Black pepper has a fat content of 3.3g per 100g and yet it has no energy in it at all. Despite this, I still believe that fat does make up large proportions of calories, and that the reason for this apparent anomaly is that the type of fat in black pepper is different to the usual type of fat, and that this fat does not contribute to calorie levels.
The fact that the product moment correlation coefficient is so high at 0.88622 (approx) shows that the correlation between the amount of energy in a food and the amount of fat is very high and therefore that people who are trying to cut down on fats would be wise to try and choose foods which although still had some calories, had a lower fat content for this number of calories.
Accuracy and Refinements
There are some possible sources of error in this investigation. From the actual source of information itself, you could argue that the values for calories were rounded to no decimal places, and that the values for fat were rounded to one decimal place. When the value for either of these is very low, it can make the actual error quite high which means for some of the lower values, the results could be quite inaccurate, although this would not affect the final result too much, it does make a difference and if I were to try and improve the quality of the work, I would find a source of data with more accurate figures in it.
Another place where accuracy is lost is doing the calculations on Excel. All the values from Excel are rounded to two decimal places and therefore when multiplied together and divided by each other, more accuracy is lost, which may give me an inaccurate answer. Although I have judged that two decimal places was enough this time, if I were to repeat the investigation, I might use more accurate values to get a better final result.
I could have taken the data I used from more sources of data. This would have meant I would have been able to check the values that the sources provided against each other and so check if they agreed. If they did not, I could have actually checked the foodstuff itself by actually going out and buying the food and checking the values on the back of the pack. Also, as foods can frequently change their contents, after a short time, the data in a book can become outdated and provide an inaccurate set of data for investigations such as this. The book I used was printed in 2001, and although data this old may be ok for someone on a diet who only needs to know rough values, for my investigation I would have liked to have more accurate and up-to-date values.
The other dilemma I met when collecting my data was that I was not really sure whether my data should be classed as elliptical or whether it was more of an exponential curve. I think I might have got a better correlation if I had decided to use a Spearman’s Rank method as opposed to Pearson’s Product Moment Correlation Coefficient. However, I decided that the best thing to do was to assume that the results were elliptical because they were close enough to being that way, but with only a couple of outliers towards the higher calorie and fat end.
Overall, I think my investigation was quite accurate, but could have been more accurate using data to more decimal places, and using more decimal places in calculations. It might have been more interesting to try and use Spearman’s Rank Order Correlation instead to see whether the results were similar, and if I were to repeat the investigation, and the results looked similar, it would certainly be something that I would consider.