NAPIER UNIVERSITY
SCHOOL OF MATHEMATICS AND STATISTICS
MODULE MA32808
APPLIED STATISTICS
MULTIPLE REGRESSION COURSEWORK
Student: Nicolas LEGRAIS 07007619
Assessor: Phillip DARBY
Moderator: Dr Sandra BONELLIE
1/ In order to obtain an equation to predict the quality of the product, we used a model with all the variables.
Equation to predict the quality of the product, ignoring the variable shift:
Qualprod= -10,354+0,041*Temp1+0,002*Temp2+0,671*Recycle+0,620*Qualraw
This model has an R-Square value of 0,952 (95,2% of the variations are explained by the variables) but this equation isn’t the good one because with have too much variables with the high sig. (Appendix Q1)
Prediction of the mean quality of the product (with a 95% confidence interval) if the following settings were used:
a/ Temp1=200 Temp2=300 Recycle=4% Qualraw=15
Prediction of the mean quality: PRE_1=10,4458
Mean confidence interval: [LMCI_1 ; UNCI_1]=[6,6514 ; 14,2402]
b/ Temp1=200 Temp2=300 Recycle=14% Qualraw=15
Prediction of the mean quality: PRE_1=17,1519
Mean confidence interval: [LMCI_1 ; UNCI_1]=[6,2760 ; 28,0278]
- We saw in 1/ that we can’t accept the simple model because there were too much variables with a high sig. Thus we have used different approaches to variable selection in order to obtain the final equation.
We used Stepwise regression, Backward elimination and Forward selection. In each approach we can see that we obtained the same R-Square value of 0,952 but we also obtained a better equation than with the simple model. We can also see that in each approach the variable Temp2 has been dropped. Omitting Temp2 must have had the least effect on the explanatory power of the model. (Appendix Q1a)
We can conclude that the variable Temp2 isn’t significant in the model. All of these approaches gave us the same final equation with significant variables (Sig.<0,05).
The final equation is:
Qualprod= - 9,726+ 0,620* Qualraw + 0,041* Temp1+ 0,642* Recycle
- Note that different approaches to variables selection have resulted in the same model. Each of them resulted in dependant variables of QUALRAW, TEMP1 and RECYCLE. And now we have to look at the correlations between the variables, we have to do an investigation into multicollinearity in the data set. In order to do that we used the variance inflation factors, VIF.
It’s generally suggested that if VIF is 10 or more, then the regression coefficients are poorly estimated due to multicollinearity. In our case, for all variables the VIF is smaller than 10. With have values between 1,228 for the variable RECYCLE, 4,378 for the variable TEMP1 and 4, 841 for the variable QUALRAW. Thus we can say that we don’t have the problem of high correlations between each variable. (Appendix Q1b)