Modelling procedures
From using the flow chart and looking at the scatter diagram, it can be seen that there is a correlation between melting points and boiling points of the elements and that Pearson’s Product Moment Correlation Coefficient (PPMCC) is the most appropriate technique to use. This is because both the x (melting point) and y (boiling point) variables are random so they have unpredictable values and can be any value in a given range, and on the scatter diagram the points form a roughly elliptical shape so there appears to be an underlying normal distribution. On the scatter diagram, boiling point increases as melting point increases so the correlation looks to be positive and a best line of fit can be drawn. Once the PPMCC has been calculated, a hypothesis test can be carried out to test whether or not the apparent correlation is significant.
Pearson’s Product Moment Correlation Coefficient
This is a technique used to calculate the correlation between two variables where there is a normal distribution; the value calculated always lies between 1 and –1 and is denoted by the letter r. A value of 1 means there is perfect positive correlation and –1 is perfect negative correlation. From looking at the scatter graph there appears to be a positive correlation so the value of r should be positive and near to 1.this technique takes into account the number of items of data and the spread within the data. The formula used is:
Sxy is the sample covariance and can be calculated using:
where x is the melting point and y is the boiling point. A spreadsheet has been used to calculate each total.
Sxx is a measure of the total square spread of the x values (melting points):
Syy is a measure of the total square spread of y values (boiling points):
r can then be calculated:
Since the correlation coefficient is 0.879, which is close to 1, there must be quite a strong positive correlation between the melting points and boiling points of the elements used in this sample.
Hypothesis Testing
To test whether the level of correlation calculated between the two variables is significant, a hypothesis test will be carried out. Since the sample size was fairly large (50 pairs of data), the significance level used can be quite small so I will test the calculated value of r at the 1% level of significance. The level of correlation within the parent population is denoted by p, and the calculated value of r can be used as an estimate for p.
H0: p=0
The null hypothesis is that there will be no correlation between the melting points and boiling points of elements.
H1: p>0
The alternative hypothesis is that there will be a positive correlation between the melting points and boiling points of elements.
Significance level: 1%
This is a one tailed test because it states the direction in which correlation is expected. From looking at data tables, the critical value of r can be found for when the sample size is 50. The critical value at the 1% significance level is 0.3281. This means that if there were no correlation between the two variables there would only be a 1% chance of calculating a correlation coefficient equal to or higher than this critical value.
Since 0.879>0.3281, the null hypothesis can be rejected and the alternative hypothesis accepted. It can be concluded that there is a significant positive correlation between the melting points and boiling points of the chemical elements.
Conclusion
From calculating the PPMCC it can be seen that there is a strong positive linear correlation between the melting points and boiling points of elements in this sample, and this correlation is significant at the 1% level. This is very strong evidence to support the original prediction that those elements with high melting points will also have high boiling points and those with low melting points also have low boiling points. It is useful to know the correlation between melting point and boiling point so that if one of the values is known about an element then the other can be predicted fairly accurately.
The value calculated is only valid for the 50 pieces of bivariate data used in the sample so does not necessarily apply to the rest of the population. However, because the total population is fairly small, 50 pairs of data is quite a large proportion of the total number of pairs of data available, and the correlation was strong and significant at the 1% level. This means that it is likely that the calculated value of r is an accurate estimate of the correlation between melting point and boiling point of the total population. Using 50 pairs of data was probably a big enough sample to use in this case to be sure of correlation but it would be better to have used all of the data available for melting points and boiling points of elements to find out the actual value of p.
Accuracy of data
There are limitations in availability of data for some elements, because not every melting point and boiling point is accurately known, so these elements could not be used in the sample. Most of the values given on the data sheets were to 4 significant figures so for some elements the data available was more accurate than it was for others. Some of the values for melting points and boiling points already appear to be estimates because they were given to the nearest 100 K so will not be very accurate and this is likely to have had an effect on the final level of correlation calculated. The data sheets on the internet that values were taken from were a few years old so there may be more up to date information available which would be more reliable. There are also variations in the melting and boiling points of the same element in some cases, for example carbon, because they can exist in different forms so the result of the correlation coefficient will vary depending on which form data was for. Despite these sources of error, the sample was chosen randomly so is likely to be fairly representative of the total population and the calculated correlation was significant.
The main restriction of the findings are that although there is positive correlation between the two variables, there is no causation between melting points and boiling points so there must be some other factor in the structure of each element that determines these properties.
To improve the quality of results, a greater sample size should be used consisting of more elements, and the melting and boiling points could be found out more accurately from more reliable sources. This would make sure that the value of r calculated is more reliable. If a smaller significance level was used it would be possible to be more certain about whether there really is a positive correlation between melting and boiling points. So that predictions could be made about one variable if the other is known, a least squares regression line would need to be calculated to express the linear correlation algebraically.