(b)
(c)
Figure 1: Scatter plot of Expenditure per student in a district each independent variable. Note: only (b) shows signs of trend and this independent variables are expected to have most explanatory power.
The table1 below contains the correlation statistics of the variables .We may see that correlation between Number of schools in the district and Student/Teacher ratio is high. Because of that there may be a multicollinearity problem. This may be explained because if there will be more schools in the district then Student/Teacher ratio will be more as there will be more teachers. So I choose to ignore the multicollinearity and taking into account that R-Square of the model will be high. Also, table 2 below contains the means, standard deviations and min/max observations of the all variables.
Table1:Correlation Matrix
This table contains correlation matrix between explanatory variables
Table2 :Summary statistics
This table contains the descriptive statistics of variables which will be used to construct the regression model later
Model Estimation
The results of the regression can be found in table 3 and 4 below. As we can see from the results that two of the explanatory variables are significant at 99% confidence level and one explanatory variable (MATHSCORE) is significant at 90% confidence level. Also the explanatory variables are jointly significant as shown by the high value of F statistic (For the test of null hypothesis where all coefficient estimators are zero simultaneously except zero). The model equation is given as follows:
EXPENPUPIL = 9341.56 + 34.71*NUMBEROFSCHOOLS - 312.213* STUDENRATIO - 12.39529*MATHSCORE
As we can see he standard errors of the coefficients is quite high which may be because of the correlation between the explanatory variables. Also the model shows heteroscedasticity as shown by the white test. F-statistic has a p-value of 0.00, which means that our hypothesis of errors being homoscedastic is rejected. As a result we make a second regression with HAC errors so that standard error for the coefficients are more efficient.
Table 3
Below are two tables showing the parameters related to the linear regression of the data. The important values are all displayed with R-square, Observations, X Variable (coefficients, t-values, p-values
TABLE 4
R-Square Adjusted R Observations
0.469 0.468 1001
We replace the standard errors with the HAC standard errors. The white and HAC errors are shown in table5 below:
Table 5
This table shows the HAC standard errors and white standard errors
CLRM Assumption Analysis
To further improve the model we will now check for the classical linear regression model assumptions:
Average Value of the Errors is zero
Our OLS estimate regression model includes a constant, so this assumption is not violated.
Homoscedasticity
The assumption of homoscedasticity is tested with the White’s test. The p-value of 0.00 in white test results in rejection of our assumption. So we regressed a model with HAC errors and replaced the normal standard errors of coefficients with HAC standard coefficient errors which makes our model more efficient.
Covariance between error terms is zero
As our data is cross-sectional. So we do not have issue of covariance between error terms
The error is not correlated with regressors
The error term of in regression explains the variance which is not explained by the model. I assume the assumption is not violated as I do not have any data or theory to explain this.
The Disturbances are normally distributed
We do not assume violation of this assumption as for large sample sizes. As we have a large sample size so violation of this assumption is not consequential.
Joint Hypothesis Test (Wald Test)
The Wald test is used to test the joint hypothesis that our last two coefficients are zero, which is our null hypothesis. The result of the test is a F-value of 389 and probability of 0.00.This means that the probability of our last two coefficients being zero is 0.00. So we will reject our null hypothesis. Conclusion is that at least one of the last two coefficients is zero.