Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel regression: Confusion about FE vs RE estimation

    I would like to better understand the bias in the coefficient estimates of panel regressions when the RE assumption is violated. For this purpose, I consider the following setup:
    • The dependent variable Yit is equal to Yit = b1 Xt + uit.
    • Xt is a macro variable that only changes over time t but not across subjects i.
    • Disturbance uit = ai + vit is strictly exogeneous to Xt. It contains a subject specific fixed effect ai and an orthogonal white noise disturbance term vit.
    • Furthermore, there is a subject specific variable Zit = c1 ai + wit which is correlated with the subject specific fixed effects.
    For the situation described above, I would then like to analyze under what circumstances the panel regression Yit = b1 Xt + b2 Zit + ai + uit can be estimated with the RE estimator and when it needs to be estimated with the FE estimator in order to obtain consistent coefficient estimates.

    To study this, I perform a single simulation run as follows:

    Code:
    cls 
    clear all
    
    
    * Panel setting
      local N = 50  
      local T = 100
      
    * Setting for sigma_z
      local sigma_z = 0.05  
      
    * Span the panel
      tempfile Data_N
      set obs `N'
      gen ID = _n
      gen a_i = rnormal(0, 1)   // fixed effects a_i
      save "`Data_N'"
      drop _all
      set obs `T'
      gen t = _n
      gen X_t = rnormal(2, 1)  // c0=2, sigma(epsilon)=1
      cross using "`Data_N'"
      order ID t
      xtset ID t, generic
      
    * Compute Y_it
      gen Y_it = 2*X_t + a_i + rnormal(0,1)
      
    * Generate subject characteristic Z_it
      gen Z_it = 0.5*a_i + rnormal(0, `sigma_z')
      
    * Panel regression & Hausman test
      xtreg Y_it X_t Z_it, fe
      estimates store FE
      xtreg Y_it X_t Z_it, re
      estimates store RE
      predict a_i_hat, u
      predict resid, ue
      hausman FE RE, sigmamore
      corr X_t Z_it a_i_hat resid
    What confuses me is the fact that the Hausman test rejects the null hypothesis of the RE assumption at any conventional level. However, the coefficient estimate for b2 is much closer to the true value of 2 in case of the RE estimation than in case of the FE estimation. Put differently, it looks as if the RE estimator in this case would be less biased than the FE estimator, but the Hausman test clearly favors FE estimation. Why is this? If considering such a situation in practice, does it really pay off to rely on FE estimation? Do I miss something?

    Thanks a lot for your thoughts and insights on this.


  • #2
    Ingo:
    I find difficult to reply positively without taking a look at what Stata gave you back (as the FAQ recommend posters to do).
    That said, your code assume that there are no heteroskedasticity and/or autocorrelation or again across panels correlation issues.
    More substantively, since you decided to deal with a T>N panel dataset, -xtreg- is not the wat to go, as it was conceived for short panel datasets (ie, those with N>T).
    Take a look at -xtgls- and -xtregar-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks, Carlo, for your thoughts. - I thought that since my code block can be easily run within Stata that should do the trick. However, here are the results from a specific simulation run, where I set N=500 and T=100, such that I consider a situation with N>T:

      Code:
      
      . clear all
      
      . 
      . 
      . * Panel setting
      .   local N = 500  
      
      .   local T = 100
      
      .   
      . * Setting for sigma_z
      .   local sigma_z = 0.05  
      
      .   
      . * Span the panel
      .   tempfile Data_N
      
      .   set obs `N'
      number of observations (_N) was 0, now 500
      
      .   gen ID = _n
      
      .   gen a_i = rnormal(0, 1)   // fixed effects a_i
      
      .   save "`Data_N'"
      file C:\Users\INGO~1.BRO\AppData\Local\Temp\ST_6b94_000001.tmp saved
      
      .   drop _all
      
      .   set obs `T'
      number of observations (_N) was 0, now 100
      
      .   gen t = _n
      
      .   gen X_t = rnormal(2, 1)  // c0=2, sigma(epsilon)=1
      
      .   cross using "`Data_N'"
      
      .   order ID t
      
      .   xtset ID t, generic
             panel variable:  ID (strongly balanced)
              time variable:  t, 1 to 100
                      delta:  1 unit
      
      .   
      . * Compute Y_it
      .   gen Y_it = 2*X_t + a_i + rnormal(0,1)
      
      .   
      . * Generate subject characteristic Z_it
      .   gen Z_it = 0.5*a_i + rnormal(0, `sigma_z')
      
      .   
      . * Panel regression & Hausman test
      .   xtreg Y_it X_t Z_it, fe
      
      Fixed-effects (within) regression               Number of obs     =     50,000
      Group variable: ID                              Number of groups  =        500
      
      R-sq:                                           Obs per group:
           within  = 0.8090                                         min =        100
           between = 0.9882                                         avg =      100.0
           overall = 0.6751                                         max =        100
      
                                                      F(2,49498)        =  104835.26
      corr(u_i, Xb)  = -0.0071                        Prob > F          =     0.0000
      
      ------------------------------------------------------------------------------
              Y_it |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               X_t |   1.996059   .0043592   457.90   0.000     1.987515    2.004603
              Z_it |  -.0298832   .0897779    -0.33   0.739    -.2058491    .1460826
             _cons |   .0026067   .0097796     0.27   0.790    -.0165614    .0217748
      -------------+----------------------------------------------------------------
           sigma_u |  1.0041056
           sigma_e |  .99609093
               rho |  .50400688   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(499, 49498) = 2.19                  Prob > F = 0.0000
      
      .   estimates store FE
      
      .   xtreg Y_it X_t Z_it, re
      
      Random-effects GLS regression                   Number of obs     =     50,000
      Group variable: ID                              Number of groups  =        500
      
      R-sq:                                           Obs per group:
           within  = 0.8071                                         min =        100
           between = 0.9882                                         avg =      100.0
           overall = 0.8360                                         max =        100
      
                                                      Wald chi2(2)      =  248320.19
      corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
      
      ------------------------------------------------------------------------------
              Y_it |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               X_t |   1.995729   .0043813   455.51   0.000     1.987141    2.004316
              Z_it |   1.985372   .0098336   201.90   0.000     1.966098    2.004645
             _cons |   .0038504   .0099983     0.39   0.700     -.015746    .0234468
      -------------+----------------------------------------------------------------
           sigma_u |  .04075747
           sigma_e |  .99609093
               rho |  .00167144   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      .   estimates store RE
      
      .   predict a_i_hat, u
      
      .   predict resid, ue
      
      .   hausman FE RE, sigmamore
      
      Note: the rank of the differenced variance matrix (1) does not equal the number of coefficients being tested (2); be sure this is what you expect, or there may be problems computing the test.  Examine the output of your estimators for
              anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.
      
                       ---- Coefficients ----
                   |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                   |       FE           RE         Difference          S.E.
      -------------+----------------------------------------------------------------
               X_t |    1.996059     1.995729        .0003308        .0000147
              Z_it |   -.0298832     1.985372       -2.015255        .0896963
      ------------------------------------------------------------------------------
                                 b = consistent under Ho and Ha; obtained from xtreg
                  B = inconsistent under Ha, efficient under Ho; obtained from xtreg
      
          Test:  Ho:  difference in coefficients not systematic
      
                        chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                =      504.79
                      Prob>chi2 =      0.0000
                      (V_b-V_B is not positive definite)
      
      .   corr X_t Z_it a_i_hat resid
      (obs=50,000)
      
                   |      X_t     Z_it  a_i_hat    resid
      -------------+------------------------------------
               X_t |   1.0000
              Z_it |   0.0003   1.0000
           a_i_hat |   0.0000   0.1091   1.0000
             resid |   0.0000   0.0017   0.1078   1.0000
      The true coefficient estimate for explanatory variable Zit is 2. Therefore, the coefficient estimate of the RE estimator is much closer to the true value than the FE estimate. Yet, the Hausman test rejects the null hypothesis of the RE assumption and, hence, I'm supposed to favor the FE estimator due to endogeneity issue of variable Zit being correlated with the fixed effect. Is this a general issue of the Hausman test?

      Comment


      • #4
        Ingo:
        some comments about your post:
        1) T=100 and N=500 make yours a panel with both N and T long: hence, -xtreg- is not the way to go. An example of a short pane may be: N=500; T=8;
        2) -hausman- properties are asymptotic: hence, it is frequent that it gives back unexpected results. In your case, as
        (V_b-V_B is not positive definite)
        the results of the -hausman- test are not that reliable.
        3) the correlation between the panel-wise effect and the vector or regressors is really low in your -fe- specification.
        4) you may give it a shot with the community-contributed programme -xtoverid- from SSC (just type -search xtoverid- to spot and install it).
        After -xtoverid- (and its prerequired community-contributed companions) installation, you can code as follows:
        Code:
        xtreg Y_it X_t Z_it, re
        xtoverid
        If -xtoverid- reject the null, you shoud go -fe-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X