Linear regressions.

Linear regressions

Problem 1

You have estimated the linear regression model

yt = a + b1x1t + b2x2t + b3x3t + et

using annual data for the period 1960-94. Explain briefly how you would construct a test of the model’s forecasting performance using additional data for the period 1995-98.

Solution.

First, estimate the model and obtain estimations for the coefficients a, b1, b2, b3 using 1960-94 data.

Then, obtain forecast for the 1995-98 based on estimated model together with errors of forecast. So, the intervals for the values for the 1995-98 will be obtained.

After this one can check whether actual data fit into the obtained intervals or not. Also, one can check how far are the actual values from the forecast.

Problem 2

Describe briefly how you would test whether the OLS residuals from the linear regression model

Yt = a + bXt + ut

are serially correlated. Outline how you would modify the specification of your model, or

the estimation procedure, if your test revealed showed significant serial correlation.

Solution

One may estimate the initial regression and obtain regression residuals:

then one should estimate the regression of et on its lag:

et=ρet-1+zt

If the coefficient ρ appeared to be significant, then there is serial correlation in residuals.

If this is the case, the estimation procedure should be modified as follows. Instead of using Y and X one should use (Yt-ρYt-1) and (Xt-ρXt-1) and estimate the regression

(Yt-ρYt-1)=a+b(Xt-ρXt-1)+et

Problem 3

Part B

An investigator analysing consumers expenditure in the UK using quarterly data over the period 1979-1997 estimated the following two models

Model A

D4Ct = 0.0083 + 0.558 D4Ct-1 + 0.241 D4ct-2 + 0.037 D4Ct-3 - 0.220 D4Ct-4

(0.0026) (0.096) (0.116) (0.125) (0.103)

+ 0.208 D4Yt-1 - 0.124 D4Yt-2 + 0.016 D4Yt-3 - 0.172 D4Yt-4

(0.120) (0.123) (0.110) (0.101)

R2 = 0.8333 SSR = 0.0082684 DW = 1.42

Data period 1979Q2-1997Q4 (75 observations)

Model B

D4Ct = 0.0068 + 0.602D4Ct-1 + 0.401D4Ct-2 + 0.133D4Ct-3 - 0.378D4Ct-4

(0.0021) (0.082) (0.092) (0.096) (0.082)

R2 = 0.8132 SSR = 0.0092674 DW = 1.37

Data period 1979Q2-1997Q4 (75 observations)

In these equations D4Ct and D4Yt are respectively the four quarter changes in the logarithms of real consumers expenditure and real disposable income, so that D4Ct is defined as Ct - Ct-4. R2 is the coefficient of determination, SSR is the sum of squared OLS residuals, and DW is the Durbin-Watson statistic. Figures in parentheses are standard errors. All hypothesis tests should be carried out at the 5% significance level.

(i) Test the hypothesis that the four quarter change in the logarithm of consumption is ...

This is a preview of the whole essay

(0.0021) (0.082) (0.092) (0.096) (0.082)

R2 = 0.8132 SSR = 0.0092674 DW = 1.37

Data period 1979Q2-1997Q4 (75 observations)

(i) Test the hypothesis that the four quarter change in the logarithm of consumption is unaffected by any lagged variables. Why would an economist be interested in this hypothesis?

(ii) Test the hypothesis that that the four quarter change in the logarithm of consumption is unaffected by lagged changes in income.

The investigator then re-estimated Model A over two distinct sub-periods, and derived the following estimates for the sum of squared OLS residuals SSR

Data period 1979Q2-1990Q4 (47 observations) SSR = 0.0065239

Data period 1991Q1-1997Q4 (28 observations) SSR = 0.0010334

(iii) Test the hypothesis that the coefficients of Model A are constant over the period 1979-1997 against the alternative that there is a structural break after 1990Q4.

Solution

Let us call constant in the regression as β1, coefficients on D4Ct-1 - D4Ct-4 as β2-β5, coefficients on D4Yt-1 - D4Yt-4 as β6-β9 correspondingly.

(i)

The test is simply a Fisher test i.e. β2=β3=…=β9=0.

One has to calculate F-statistic (according to model A):

The critical value for F-statistic is F0.95(k-1,n-k)=F0.95(8,56)=2.1

Since 35>2.1 then null hypothesis on all the coefficient are zeroes has to be rejected, i.e. hypothesis that the four quarter change in the logarithm of consumption is unaffected by any lagged variables is rejected.

An economist is interested in this hypothesis to check whether consumption in current period is affected by previous periods or is determined by current income only.

(ii)

The tested hypothesis is H0: β6=…=β9=0

To test the hypothesis one has to calculate Fisher statistic (model A is unrestricted regression, model B is restricted regression i.e. regression in which β6=…=β9=0):

Critical value for the F-statistic is F0.95(q,n-k)=F0.95(4,56)=2.536

Since 1,69<2.526 then the hypothesis is accepted

(iii)

This is to provide Chow break-point test.

One has to calculate F-statistic by formulae:

where SSRR is SSR from model A (restricted regression; restriction is that the coefficients over the two time periods are equal, i.e. there is no structural break),

SSRUR=SSRA1+SSRA2, where A1 and A2 are model A estimated on first and second time periods. So, SSRR is SSR in unrestricted regression, i.e. regression with different coefficients for the two time periods.

SSRUR=0.0065239+0.0010334=0.0075573

Critical value is F0.95(k,n+m-2k)=F0.95(9,57)=2.049

Since 0.59<2.049, then the hypothesis of the coefficients are equal for the two time periods is accepted.

Problem 4

An investigator trying to forecast employment growth in the UK estimates the following simple dynamic relationship between employment and output using quarterly data over the period 1960Q3-1997Q1 (147 observations)

DEMPt _= -0.000015 - 0.0686DGDPt -0.0462DGDPt-1 -0.0195DGPDt-2

(0.0006) (0.0202) (0.0205) (0.0211)

- 0.5405DEMPt-1 - 0.1794DEMPt-2

(0.0832) (0.0825)

R2 _ 0.4766 σ2= 0.00359 RSS = 0.0018166 AR(5) = 0.1057

where DEMPt is the rate of growth of employment in period t and DGDPt is the rate of growth of real GDP in period t. Figures in parentheses are standard errors, R2 is the coefficient of determination, σ2 is the equation standard error, RSS the residual sum of squares, and AR(5) the probability value associated with a diagnostic test of serial correlation up to 5th order.

The investigator then re-estimates this model over two subperiods, with the following results

Sample 1960Q3-1990Q4

DEMPt = -0.000121 - 0.0771DGDPt - 0.0419 DGDPt-1 -0.0302DGDPt-2

(0.0007) (0.0197) (0.0205) (0.0207)

-0.5404 DEMPt-1 - 0.2099 DEMPt-2

(0.0917) (0.0908)

R2 _ 0.5176 σ2= 0.00337 RSS = 0.0013208 AR(5) = 0.1541

Sample 1991Q1-1997Q1

DEMPt = -0.001612 -0.1616 DGDPt – 0.0072 DGDPt-1 -0.3058 DGDPt-2

(0.0050) (0.1826) (0.1831) (0.1595)

_0.7415DEMPt-1-0.1220DEMPt-2

(0.2414) (0.2462)

R2 _ 0.4932 σ2= 0.00439 RSS = 0.0003655 AR(5) = 0.0837

(i) Using the results given above, test the hypothesis that the relationship between employment growth and output growth was structurally stable over the period 1960Q3-1997Q1.

The investigator now defines a dummy variable D1, taking the value 0 for the period 1960Q3-1990Q4 and 1 for the period 1991Q1-1997Q1. He includes this variable in a further regression, with the following results:

Sample 1960Q3-1997Q1

DEMPt_ -0.000146 _0.0696DGDPt -0.0449DGDPt"1 -0.0183DGDPt"2

(0.0008) (0.0205) (0.0209) (0.0215)

_0.5411DEMPt"1_0.1805DEMPt"2_ 0.000280D1

(0.0834) (0.0829) (0.000131)

R2 _ 0.4932 @ _ 0.00354 RSS _ 0.0017592 AR(5) _ 0.0937

(ii) Using this equation, test the hypothesis that the intercept term in the regression equation is subject to a significant structural break after 1990Q4.

(iii) Test whether including this dummy variable provides an adequate model of the relationship

between employment growth and output growth over the whole period.

(iv) Why is the Durbin-Watson test statistic not reported in these regressions?

Solution

(i)

One has to provide Chow test.

One has to calculate F-statistic by formulae:

where SSRR is SSR from the whole sample model, SSRR=0.0018166

SSRUR=SSR1+SSR2, where 1 and 2 are models estimated on first and second time subperiods. So, SSRR is SSR in unrestricted regression, i.e. regression with different coefficients for the two time periods.

SSRUR=0.0013208+0.0003655=0.0016863

(k is number of regressors in the equation, n is number of observation in the first subperiod, m is number for second subperiod, n+m is the total number of observations)

Critical value is F0.95(k,n+m-2k)=F0.95(6,135)=2.17

Since 1.74<2.17, then the hypothesis of the coefficients are equal for the two time subperiods is accepted.

(ii)

This is to test significance of the coefficient on D1. The test is done using t-statistic:

Critical value is 2-tales 95% quintile of t-distribution t0.95(n-k)= t0.95(147-7)=1.98

Since 2.137>1.98, the coefficient on D1 is significant and the tested hypothesis is accepted

(iii)

The model is not adequate to data since R2 value is too low (0.4932), it is only a bit more than for model without dummy variable.

(iv)

Durbin-Watson test works only if auto-correlation is of first order. Here we have at least 4-th order auto-correlation.

Problem 5

Student A undertakes an econometrics project using a survey dataset consisting of a cross-section of 100 observations. When he estimates his model, he finds that the coefficient in which he is particularly interested has the value he expected on the basis of economic theory, but the t-ratio is only 1. His supervisor tells him that he must collect more data, and that he should aim to obtain a t-ratio of 2.

(a) A fellow-student, B, suggests to A that he can increase the size of his dataset, while avoiding the effort involved in collecting more data, just by duplicating each of his existing observations the appropriate number of times. Explain why a dataset which is extended in this way fails to satisfy the assumptions of the Gauss-Markov theorem, and identify the assumption which will not hold.

(b) Suppose A follows B’s advice, and constructs an extended dataset which consists of a number of copies of observation 1, followed by a number of copies of observation 2, and so on. Explain which standard diagnostic test is likely to reveal his deception.

Solution

(a)

The dataset obtained in this way fails to satisfy the assumptions of the Gauss-Markov theorem since errors et for duplicated observations are the same, so they are not independent, as required by Gauss-Markov Theorem. To be more precise, the condition that E[εtεs]=0 when t ≠s is not satisfied for s and t are numbers of the pair of duplicated observations

(b) In this case errors are correlated with order 1 (and probably there is higher order correlation). So, the deception is to be revealed by Darbin-Watson test which will show high positive autocorrelation in errors (i.e. DW will be close to 4).

Linear regressions.

This is a preview of the whole essay

Document Details

Related Essays

Aim: in this task, you will investigate the different functions that best m...

Anthropometric Data

AS statistics coursework - correlation coefficient between height and weigh...

Is there any link between original price and insurance group, and if so, is...