Applied Statistics

Authors Avatar

NAPIER UNIVERSITY

SCHOOL OF MATHEMATICS AND STATISTICS

MODULE MA32808

APPLIED STATISTICS

MULTIPLE REGRESSION COURSEWORK

Student: Nicolas LEGRAIS       07007619

Assessor: Phillip DARBY

Moderator: Dr Sandra BONELLIE


 

1/ In order to obtain an equation to predict the quality of the product, we used a model with all the variables.

Equation to predict the quality of the product, ignoring the variable shift:

Qualprod= -10,354+0,041*Temp1+0,002*Temp2+0,671*Recycle+0,620*Qualraw

This model has an R-Square value of 0,952 (95,2% of the variations are explained by the variables) but this equation isn’t the good one because with have too much variables with the high sig. (Appendix Q1)

Prediction of the mean quality of the product (with a 95% confidence interval) if the following settings were used:

a/ Temp1=200 Temp2=300 Recycle=4% Qualraw=15

Prediction of the mean quality: PRE_1=10,4458

Mean confidence interval: [LMCI_1 ; UNCI_1]=[6,6514 ; 14,2402]

b/ Temp1=200 Temp2=300 Recycle=14% Qualraw=15

Prediction of the mean quality: PRE_1=17,1519

Mean confidence interval: [LMCI_1 ; UNCI_1]=[6,2760 ; 28,0278]

  1. We saw in 1/ that we can’t accept the simple model because there were too much variables with a high sig. Thus we have used different approaches to variable selection in order to obtain the final equation.

We used Stepwise regression, Backward elimination and Forward selection. In each approach we can see that we obtained the same R-Square value of 0,952 but we also obtained a better equation than with the simple model. We can also see that in each approach the variable Temp2 has been dropped. Omitting Temp2 must have had the least effect on the explanatory power of the model. (Appendix Q1a)

We can conclude that the variable Temp2 isn’t significant in the model. All of these approaches gave us the same final equation with significant variables (Sig.<0,05).

The final equation is:

Qualprod= - 9,726+ 0,620* Qualraw + 0,041* Temp1+ 0,642* Recycle

 

  1. Note that different approaches to variables selection have resulted in the same model. Each of them resulted in dependant variables of QUALRAW, TEMP1 and RECYCLE. And now we have to look at the correlations between the variables, we have to do an investigation into multicollinearity in the data set. In order to do that we used the variance inflation factors, VIF.

It’s generally suggested that if VIF is 10 or more, then the regression coefficients are poorly estimated due to multicollinearity. In our case, for all variables the VIF is smaller than 10. With have values between 1,228 for the variable RECYCLE, 4,378 for the variable TEMP1 and 4, 841 for the variable QUALRAW. Thus we can say that we don’t have the problem of high correlations between each variable. (Appendix Q1b)

Join now!

 

We can conclude that these variables (QUALRAW, TEMP1 and RECYCLE) have no wrong consequences on the regression coefficients because they have low correlations between each other and they are significant. As there isn’t a strong relationship, these variables don’t break the underlying assumption of independence of the independent variables that must hold in linear regression.

  1. Note that we have a significant and strong relationship between these three variables and the quality of the product because each variables have a sig. smaller than 0,05 and we have an R-Square value of 0,952. Furthermore ...

This is a preview of the whole essay