Hausman Test result

Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#1

Hausman Test result

20 Sep 2020, 12:18

This is the output of my Hausman test.. what does it mean by model fitted on these data does not meet asymptotic assumptions.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#2

20 Sep 2020, 14:56

It means what the error message says. The statistic is supposed to be asymptotically Chi-squared distributed (a non-negative random variable), but the calculated statistic is negative.
Comment
Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#3

20 Sep 2020, 22:35

There is an issue here then. How do I take care of this problem?
Using xtoverid is the solution?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

21 Sep 2020, 01:09

Anuradha:
yes, the community-contributed command -xtoverid- is a good alternative (checking the -re- specification only is enough; if -xtoverid- outcome reaches statistical significance, you should switch to -fe- specification).
Please remember that it does not support -fvvralist- notation.

Kind regards,
Carlo
(Stata 19.0)
Comment
Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#5

21 Sep 2020, 02:02

Okay Sir Carlo Lazzaro . How do I report it in the methodology part of my paper. The Hausman Test did not work and its a standard test. I can't write I used xtoverid as its only a stata command.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#6

21 Sep 2020, 02:09

Anuradha:
the -xtoverid- helpfile gives you the full reference of this community-contributed command, that you can well include in your research report/paper:

Schaffer, M.E., Stillman, S. 2010. xtoverid: Stata module to calculate tests of overidentifying restrictions after xtreg, xtivreg, xtivreg2 and xthtaylor
http://ideas.repec.org/c/boc/bocode/s456779.html

Kind regards,
Carlo
(Stata 19.0)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#7

21 Sep 2020, 03:12

Apart from citing the user written command -xtoverid- as Carlo explained above (user written commands should be cited, they are research as any other), you can also see in the help file, and to use the key reference on which -xtoverid- is based.

Arellano, M. 1993. On the testing of correlated effects with panel data. Journal of Econometrics, Vol. 59, Nos. 1-2, pp. 87-97.
1 like
Comment
Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#8

21 Sep 2020, 04:22

Woow. Thank you so much Joro Kolev Carlo Lazzaro .
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#9

21 Sep 2020, 05:51

The xtoverid is a bit of a black box. I like using the Mundlak approach, where one includes the time averages of all time-varying variables, estimates the equation by random effects, and tests the time averages. This reproduces the fixed effects estimates on all time-varying variables. Plus, one can see which time averages are important. I cover the general unbalanced case in my 2019 Journal of Econometrics paper on correlated random effects models.
1 like
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3047

#10

21 Sep 2020, 06:27

Indeed Professor Wooldridge, this would be the easier approach here. And I was about to propose this to Anuradha Saikia, but then after I tried it, I remembered that one does not achieve exact equivalence for non-balanced panels.

What I tried is another version of the Mundlak's approach which I explain in the paper attached (which I submitted to Stata Journal in year 2008, I think I got a referee who did not understand the issue and I was too young to care fighting the powers that be, I had more fun stuff to do back in those days). The other version of the Mundlak's approach is just estimating the equation that you describe by OLS, then the slopes on the time averages show you the difference between the Fixed Effects and the Between estimator, and the slopes on the time varying covariates have to be equal to the Fixed Effects estimates (but only in balanced panels, so I got quite some difference). So a test of joint significance of the slopes on the time averages is a Hausman test of FE vs BE model.

Anyways, even if we go with your version of the Mundlak's approach, we still get some slight differences for unbalanced panels:

Code:

.  webuse nlswork, clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtset idcode
       panel variable:  idcode (unbalanced)

.  xtreg ln_w  age ttl_exp tenure 2.race grade, fe
note: 2.race omitted because of collinearity
note: grade omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,099
Group variable: idcode                          Number of groups  =      4,697

R-sq:                                           Obs per group:
     within  = 0.1443                                         min =          1
     between = 0.2745                                         avg =        6.0
     overall = 0.1924                                         max =         15

                                                F(3,23399)        =    1315.26
corr(u_i, Xb)  = 0.1651                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0030427   .0008644    -3.52   0.000    -.0047369   -.0013484
     ttl_exp |    .029036   .0014505    20.02   0.000      .026193     .031879
      tenure |   .0116574   .0009249    12.60   0.000     .0098444    .0134704
             |
        race |
      black  |          0  (omitted)
       grade |          0  (omitted)
       _cons |   1.547951   .0181798    85.15   0.000     1.512317    1.583584
-------------+----------------------------------------------------------------
     sigma_u |   .3751722
     sigma_e |  .29556813
         rho |  .61703248   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4696, 23399) = 7.64                 Prob > F = 0.0000

. qui for var age ttl_exp tenure: egen meanX = mean(X), by(idcode)

.  xtreg ln_w  age ttl_exp tenure mean* 2.race grade, re

Random-effects GLS regression                   Number of obs     =     28,099
Group variable: idcode                          Number of groups  =      4,697

R-sq:                                           Obs per group:
     within  = 0.1443                                         min =          1
     between = 0.4329                                         avg =        6.0
     overall = 0.3250                                         max =         15

                                                Wald chi2(8)      =    7538.32
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0030268   .0008614    -3.51   0.000    -.0047152   -.0013385
     ttl_exp |   .0290337   .0014457    20.08   0.000     .0262003    .0318672
      tenure |   .0116424   .0009222    12.62   0.000     .0098349      .01345
     meanage |  -.0026319     .00142    -1.85   0.064    -.0054151    .0001513
 meanttl_exp |  -.0008391   .0025701    -0.33   0.744    -.0058764    .0041982
  meantenure |   .0165731   .0024676     6.72   0.000     .0117366    .0214095
             |
        race |
      black  |   -.062727   .0103071    -6.09   0.000    -.0829286   -.0425254
       grade |   .0701835   .0020152    34.83   0.000     .0662339    .0741332
       _cons |    .709563   .0346377    20.49   0.000     .6416744    .7774516
-------------+----------------------------------------------------------------
     sigma_u |  .27539065
     sigma_e |  .29556813
         rho |  .46470444   (fraction of variance due to u_i)
------------------------------------------------------------------------------

So what we see above is that the estimates for unbalanced panels are a bit different, and this is hard to explain to a novice. (That asymptotic results do not exactly hold in finite samples.)

For my interpretation of the Mundlak's approach the differences were even harder to explain:

Code:

. reg ln_w  age ttl_exp tenure mean* 2.race grade, noheader
------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   -.002911   .0011444    -2.54   0.011    -.0051542   -.0006679
     ttl_exp |    .028964    .001921    15.08   0.000     .0251989    .0327292
      tenure |   .0115805   .0012274     9.44   0.000     .0091748    .0139861
     meanage |  -.0022723   .0013232    -1.72   0.086    -.0048658    .0003212
 meanttl_exp |  -.0023782   .0022615    -1.05   0.293    -.0068108    .0020545
  meantenure |   .0171518   .0017178     9.98   0.000     .0137848    .0205188
             |
        race |
      black  |  -.0806084   .0052841   -15.25   0.000    -.0909655   -.0702513
       grade |   .0708902   .0011003    64.43   0.000     .0687336    .0730468
       _cons |   .7061531   .0197196    35.81   0.000     .6675017    .7448045
------------------------------------------------------------------------------

Originally posted by Jeff Wooldridge View Post

The xtoverid is a bit of a black box. I like using the Mundlak approach, where one includes the time averages of all time-varying variables, estimates the equation by random effects, and tests the time averages. This reproduces the fixed effects estimates on all time-varying variables. Plus, one can see which time averages are important. I cover the general unbalanced case in my 2019 Journal of Econometrics paper on correlated random effects models.

Attached Files

regressionbasedHausman2.pdf (107.0 KB, 1 view)

Comment

Eric de Souza

Join Date: Mar 2014
Posts: 587

#11

21 Sep 2020, 08:01

Joro Kolev : with the following I get exactly the same coefficients with FE and with Correlated Random Effects.
I think that you have forgotten to select only the "complete" observations: those for which no value is missing for any of the variables concerned
This is done by the selection indicator.

Code:

webuse nlswork
xtset idcode

xtreg ln_w  age ttl_exp tenure 2.race grade, fe

gen s = (ln_wage != .) & (age != .) & (ttl_exp != .) & (tenure != .) & (race != .) & (grade != .)
egen agebar = mean(age) if s, by(idcode)
egen ttl_expbar = mean(ttl_exp) if s, by(idcode)
egen tenurebar = mean(tenure) if s, by(idcode)
egen racebar = mean(race) if s, by(idcode)
egen gradebar = mean(grade) if s, by(idcode)

xtreg ln_wage  age ttl_exp tenure 2.race grade agebar ttl_expbar tenurebar racebar gradebar, re

On Edit:

Code:

egen racebar = mean(race) if s, by(idcode)

should be

Code:

egen racebar = mean(2.race) if s, by(idcode)

Last edited by Eric de Souza; 21 Sep 2020, 08:46.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#12

21 Sep 2020, 08:05

Thank you, Eric. I was about to write the same thing. I emphasize this point in my paper. I admit that it tripped me up for several years. Also important is that, if the model includes time dummies or any aggregate time variables, their time averages must also be included.

In fact, using different observations to compute the time averages is not even consistent, in general, when the complete cases FE estimator is.
1 like
Comment

Eric de Souza

Join Date: Mar 2014
Posts: 587

#13

21 Sep 2020, 10:02

Addition to #11 (posted by me):
Ssince 2.race and grade are time constant, racebar and gradebar can be dropped and, in fact, are dropped. It was because racebar was not dropped that I realised my mistake and edited my previous post (#11)
I have also added time dummies to illustrate the point made by Jeff Wooldridge (#12)

Code:

log using CRE_unbalanced_panel.log, replace
webuse nlswork
keep if (year == 68) | (year == 69) | (year ==70) | (year == 71)
tab year, gen(year)
xtset idcode

xtreg ln_w  age ttl_exp tenure 2.race grade year2-year4, fe

gen s = (ln_wage != .) & (age != .) & (ttl_exp != .) & (tenure != .) & (race != .) & (grade != .)
egen agebar = mean(age) if s, by(idcode)
egen ttl_expbar = mean(ttl_exp) if s, by(idcode)
egen tenurebar = mean(tenure) if s, by(idcode)
egen year1bar = mean(year1) if s, by (idcode)
egen year2bar = mean(year2) if s, by (idcode)
egen year3bar = mean(year3) if s, by (idcode)
egen year4bar = mean(year4) if s, by (idcode)


xtreg ln_wage  age ttl_exp tenure 2.race grade year2-year4 agebar ttl_expbar tenurebar year2bar year3bar year4bar, re


clear
log close

Last edited by Eric de Souza; 21 Sep 2020, 10:39.

Comment

Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#14

21 Sep 2020, 23:03

Thank you Professor Jeff Wooldridge Eric de Souza for enlightening me to new ways of looking into the problem. I am quite novice and still learning. Prof Wooldridge if you could share the paper you mentioned .
In my case I have a strongly balanced panel though. Everything goes smooth until the Hausman Test and I guess the iteration is not done properly in the command.
Comment

Raymond Zhang

Join Date: Jan 2021
Posts: 349

#15

21 Jan 2021, 01:40

@Jeff Wooldridge @Eric de Souza @Joro Kolev I add "if e(sample)"options on the code of @Eric de Souza to ensure the same observations used in xtreg, fe robust and xtreg, re robust. And also get get exactly the same coefficients with FE and with Correlated Random Effects.Maybe adding "if e(sample)" a simpler way to guarantee the complete observations just as @Jeff Wooldridge said.This kind of method also tells us the observations we use to FE and CRE should be exactly the same.

Code:

webuse nlswork
xtset idcode
xtreg ln_w  age ttl_exp tenure 2.race grade, fe r
Fixed-effects (within) regression               Number of obs     =     28,099
Group variable: idcode                          Number of groups  =      4,697

R-sq:                                           Obs per group:
     within  = 0.1443                                         min =          1
     between = 0.2745                                         avg =        6.0
     overall = 0.1924                                         max =         15

                                                F(3,4696)         =     544.06
corr(u_i, Xb)  = 0.1651                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,697 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |     -0.003      0.001    -2.35   0.019       -0.006      -0.001
     ttl_exp |      0.029      0.002    12.72   0.000        0.025       0.034
      tenure |      0.012      0.001     7.93   0.000        0.009       0.015
             |
        race |
      black  |      0.000  (omitted)
       grade |      0.000  (omitted)
       _cons |      1.548      0.027    56.78   0.000        1.494       1.601
-------------+----------------------------------------------------------------
     sigma_u |   .3751722
     sigma_e |  .29556813
         rho |  .61703248   (fraction of variance due to u_i)
------------------------------------------------------------------------------

egen agebar = mean(age) , by(idcode)
egen ttl_expbar = mean(ttl_exp) , by(idcode)
egen tenurebar = mean(tenure) , by(idcode)
egen racebar = mean(race) , by(idcode)
egen gradebar = mean(grade) , by(idcode)

xtreg ln_wage  age ttl_exp tenure 2.race grade agebar ttl_expbar tenurebar racebar gradebar if e(sample), re r
Random-effects GLS regression                   Number of obs     =     28,099
Group variable: idcode                          Number of groups  =      4,697

R-sq:                                           Obs per group:
     within  = 0.1443                                         min =          1
     between = 0.4339                                         avg =        6.0
     overall = 0.3252                                         max =         15

                                                Wald chi2(9)      =    4529.55
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,697 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |     -0.003      0.001    -2.34   0.019       -0.006      -0.000
     ttl_exp |      0.029      0.002    12.72   0.000        0.025       0.034
      tenure |      0.012      0.001     7.92   0.000        0.009       0.015
             |
        race |
      black  |     -0.114      0.025    -4.50   0.000       -0.163      -0.064
       grade |      0.070      0.002    31.31   0.000        0.066       0.075
      agebar |     -0.003      0.002    -1.57   0.116       -0.006       0.001
  ttl_expbar |     -0.001      0.003    -0.28   0.777       -0.007       0.005
   tenurebar |      0.017      0.003     6.12   0.000        0.011       0.022
     racebar |      0.053      0.024     2.22   0.027        0.006       0.099
    gradebar |      0.000  (omitted)
       _cons |      0.655      0.044    14.86   0.000        0.569       0.742
-------------+----------------------------------------------------------------
     sigma_u |  .27510497
     sigma_e |  .29556813
         rho |   .4641881   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Best regards.

Raymond Zhang
Stata 17.0,MP

Announcement