Breusch-Pagan and Hausman Interpretation and Execution in Panel Data Models

Jonathan Anderson

Join Date: Aug 2021

Posts: 4
#1

Breusch-Pagan and Hausman Interpretation and Execution in Panel Data Models

06 Aug 2021, 01:40

Hi Guys,

I am running a series of regressions on a panel of 36 countries over a 40 year period to determine the nature of the relationship between democracy (measured by an index - 0-1) and inequality (Gini coefficient). The relationship is being modelled though a 2nd order polynomial function (reverse u-shaped curve).

For a simple linear regression without controls (xtreg), the Breusch Pagan test indicates that POLS is unreliable and the Hausman test indicates that FE are preferred to RE.

For my understanding, I have the following questions:
1). Where heteroskedasticity is detected from BP test and POLS cannot be relied upon, how does RE improve on this? Is the assumption of constant variance relaxed or is the bias in standard errors corrected for? If so then why is it still necessary to deal with the heteroskedasticity, e.g., through use of robust standard errors?
2). Also, in this regression set up, what is the most appropriate way to incorporate robust se's?
3). Where endogenity issues mean that FE is peferred to RE, how should this endogeneity be dealt?
4). Moreover, I have seen it suggested that heteroskedasticity and endogeneity issues be dealt with before rerunning the BP and Hausman tests. Is this the correct order, i.e. run BP and Hausman, then correct for issues, then rerun BP and Hausman to check most appropriate model?

I also have the following general questions:
i). In country, panel data does the FE model have the equivalent effect of adding a full set of country dummies to control for time invariant effects country specific effects?
ii). Is it standard procedure to redo the BP and Hausman test each time a control variable is added to the model, as it is my understanding that adding controls affects the outcome of these tests?

Any and all help is very much appreciated.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

06 Aug 2021, 03:12

Jonathan:
welcome to this forum.
0) as you have a T>N panel dataset, you're taking -xtreg- to its limit (see also -xtregar- and -xtgls-);
1) The BP test you meant (-estat hettest-) does not work after -xtreg-. The other BP test (-xttest0-) aims at detecting the evidence of a group-wise effect after -xtreg,re- (which has hothing to do with espilon heteroskedasticity). Hence, neither -fe-, nor -re- specifications bear any magical effect on heteroskedastcity, that shoud be addressed with non-default standard errors (by the way, -robust- and -vce(cluster clusterid)- options deal with heteroskedasticity and/or autocorrelation).
2a)

Code:

xtreg y x1 x2...xn, re robust

.
2b)

Code:

xtreg y x1 x2...xn, re vce(cluster clusterid)

.
3) your statement is not correct. The -fe- estimator wipes out time invariant variables (therefore, also the unobserved heterogeneity related to them). However, if the the endogeneity relates to time-varying predictors, there's nothing magic that -fe- can do, and you have to switch to -xtivreg,fe-.
4) once detected, heteroskedasticity and/or autocorrelation should be addressed via non-default standard errors. Interestingly, while -xttest0- supports non-default standard errors, -hausman- does not. Hence, you shoud switch to the community-contributed module -xtoverid- (just type -search xtoverid- to spot and install it).

Other question:
i) yes, but there's no gain in doing that (set aside practicing purposes). Go -xtreg,fe-, instead;
ii) yes, your belief is correct.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jonathan Anderson

Join Date: Aug 2021

Posts: 4
#3

06 Aug 2021, 05:28

Thank you Carlo, it seems you have been the saviour of many a thesis on this forum.

- In response to 1) are you saying that neither -estat hettest- or xttest0- are appropriate in detecting heteroskedasticity of errors in this case? If not, then what is the appropriate course of action?
- In response to 2) how should I decide between -robust- and -vce (cluster clustered)-
- In response to 3) how can I test to see if the endogeneity relates to time-varying predictors and thus would need to use the instrumental variable approach
- In response to 4) are the results of xtoverid interpreted in the same manner as with a standard hausman test?

Once again thank you and any advice is welcomed.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

06 Aug 2021, 07:15

Jonathan:
1) the community-contributed module -xttest3- is the answer, as you can see from the following toy-example (in which the evidence of heteroskedatcity creeps up):

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
. xtreg ln_wage c.age##c.age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,23798)        =    1451.88
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
             |
 c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
             |
       _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000

. xttest3

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (4710)  =  4.4e+35
Prob>chi2 =      0.0000


.

In this case you should invoke robust or cluster standard errors (in reply to 2): under -xtreg- you can change one of them as you like, as they both call clustered-robust standard errors):

Code:

. xtreg ln_wage c.age##c.age, fe rob

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode )

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
*as expected, results are identical*

3) you cannot test endogeneity, but model misspecification that, in turn, may well be explained by endogeneity. The procedure actually test the correctness of the functional form of the regressand (something along the lines of -linktest-; see related entry in Stata .pdf manual):

Code:

. xtreg ln_wage c.age##c.age fitted sq_fitted , fe vce(cluster idcode)
note: c.age#c.age omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1105                                         min =          1
     between = 0.1029                                         avg =        6.1
     overall = 0.0882                                         max =         15

                                                F(3,4709)         =     355.44
corr(u_i, Xb)  = 0.0411                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0184474    .004408     4.19   0.000     .0098057    .0270891
             |
 c.age#c.age |          0  (omitted)
             |
      fitted |   6.920927   1.152074     6.01   0.000     4.662324    9.179531
   sq_fitted |  -2.079755   .4060541    -5.12   0.000    -2.875811   -1.283699
       _cons |  -4.586115   .8935105    -5.13   0.000    -6.337813   -2.834416
-------------+----------------------------------------------------------------
     sigma_u |  .40319282
     sigma_e |  .30215883
         rho |  .64035936   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
*as both -fitted- and -sq_fitted- reach stsistical significance, the model is clearly misspecified*

4) yes. The null of the community-contributed module -xtoverid- is that -re- is the way to go. Therefore, when the null is rejected you should switch to -fe- specification.

Kind regards,
Carlo
(Stata 19.0)

Comment

Jonathan Anderson

Join Date: Aug 2021

Posts: 4
#5

06 Aug 2021, 09:29

Brilliant, and thank you for the example code.

1) With respect to heteroskedasticity, where do data transformations come into the picture? Is it advised to use transformations prior to using robust standard errors and then using BP test to see if heteroskedasticity persists, or not? If transformations do not improve heteroskedasticity then should I revert back to untransformed data and use this with robust standard errors or stick with the transformed data?
2) Presumably after accounting for heteroskedasticity and endogeneity issues it may be the case that they persist anyhow. I recall seeing you suggest that this is not necessarily a bad thing and that it may be an inherent characteristic of the data that should be embraced and explained in the write-up. Is this the case? And if so then is it possible to yield any meaningful insight from the regression results given that the standard errors are likely to be biased?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#6

06 Aug 2021, 10:42

Jonathan:
1) heteroskedasticity remains (because it is a matter of residual distribution) even after invoking clustered-robust standard errors. The relevant difference is that non-default standard errors accomodate dispersion for heteroskedastcity, whereas default standard errors do not. Hence, there's no gain in repeating -xtest3- after imposing non-defaults tandarde errors. It might be that data transformation fixes heteroskedastcity and model misspecification, but this should be not taken for granted. As an aside, please note that -estat hettest- won't work after -xtreg-.
2) as far as heteroskedasticity is concerned, please see 1). Endogeneity should be dealt with idoneous instruments (if hopefully available), otherwise all the estimates are unreliable.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement