# Expenditure per Student in High Schools :Estimation Using Cross-sectional Regression Analysis

Introduction

The purpose of this essay is to search for a model that explains the expenditure per student in high schools. In my model I will try to answer the question that :what factors determine the expenditure per student in a district. I will use cross sectional regression to find a first order relationship between my dependent variable and the parameters it depends upon.

The structure of the essay will be as following: First I will explain the data and its statistics. Then I will perform the regression and test for the CLRM assumptions. Then I will interpret the regression results. Next I perform the joint hypothesis on regressors coefficients.. Finally I will conclude the model.

Data and Summary Statistics

The data comes from a survey on High schools in US for different districts across different counties. In includes 1001 observation on Expenditure per student in a district, numbers of schools in the district, Student/Teacher ratio and mean score of students in tenth grade.

28

1318.941

1.343219

7.872708

 NUMBEROFSCHOOLS 2.96404 1 26 1 5.12354 3.37953 14.0852
 STUDENTRATIO 13.243 12.9 21.9 4.8 3.15131 0.205833 2.51656
 MATHSCORE 45.5145 46 58 29 4.07971 -0.466202 3.70439

Table2 :Summary statistics

This table contains the descriptive statistics of variables which will be used to construct the regression model later

Model Estimation

The results of the regression can be found in table 3 and 4 below. As we can see from the results that two of the explanatory variables are significant at 99% confidence level and one explanatory variable (MATHSCORE) is significant at 90% confidence level. Also  the explanatory variables are jointly significant  as shown by the high value of F statistic (For the test of null hypothesis where all coefficient estimators are zero simultaneously except zero). The model equation is given as follows:

EXPENPUPIL = 9341.56  +  34.71*NUMBEROFSCHOOLS  - 312.213* STUDENRATIO  -  12.39529*MATHSCORE

As we can see he standard errors of the coefficients is quite high which may be because of the correlation between the explanatory variables. Also the model shows heteroscedasticity as shown by the white test. F-statistic has a p-value of 0.00, which means that our hypothesis of  errors being  homoscedastic is rejected. As a result we make a second regression with HAC errors so that standard error for the coefficients are more efficient.

Table 3

Below are two tables  showing the parameters related to the linear regression of the data. The important values are all displayed with R-square, Observations, X Variable (coefficients, t-values,       p-values

 Variable Coefficients Std. Error t-Statistic P-Value
 C 9341.56 380.183 24.5713 0

Conclusion

Covariance between error terms is zero

As our data is cross-sectional. So we do not have issue of covariance between error terms

The error is not correlated with regressors

The error term of in regression explains the variance which is not explained by the model. I assume the assumption is not violated as I do not have any data or theory to explain this.

The Disturbances are normally distributed

We do not assume violation of this assumption as for large sample sizes. As we have a large sample size so violation of this assumption is not consequential.

Joint Hypothesis Test (Wald Test)

The Wald test is used to test the joint hypothesis that our last two coefficients are zero, which is our null hypothesis. The result of the test is a F-value of 389 and probability of 0.00.This means that the probability of our last two coefficients being zero is 0.00. So we will reject our null hypothesis. Conclusion is that at least one of the last two coefficients is zero.

