How can I estimate dummy variables in a fixed effect model?

Guest

How can I estimate dummy variables in a fixed effect model?

22 Nov 2019, 15:35

Hello everyone. I came across an issue related to dummy variables. Well, I have a set of 6 independent variables ( X1 X2 X3 X4 X5 X6 where X5 and X6 are dummy variables : 0 1) and 7 control variables (X7 X8 X9 X10 X11 X12 i.SECTOR_* where X7 and i.SECTOR_* are dummy variables). In order to test the -re- estimator against the-fe- estimator, I wrote the following syntax in Stata:

Code:

xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 i.SECTOR_*, fe

Code:

estimates store fixe

Code:

xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 i.SECTOR_*, re

Code:

  hausman fixe

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |      fixe          .          Difference          S.E.
-------------+----------------------------------------------------------------
          X1 |    .0230283     .0449401       -.0219117        .0095658
          X2 |     .054124     .0405121        .0136119        .0214008
          X3 |    1.183001      1.18501        -.002009        .0754256
          X4 |    .0469244     .0380146        .0089098        .0086543
          X5 |   -.0094629     .0576902       -.0671531        .0189781
          X6 |   -.0161048    -.0065753       -.0095295        .0054429
          X8 |   -.0282489    -.0660878        .0378389        .0159534
          X9 |   -.1262577    -.1517426        .0254849        .0118437
         X10 |   -.0206716     .0442456       -.0649172        .0168754
         X11 |    .0103989    -.0010335        .0114324               .
         X12 |    .2898366     .0341556         .255681        .0524921
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                 chi2(11) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       37.59
                Prob>chi2 =      0.0001
                (V_b-V_B is not positive definite)

Like the showing results, I had the message "Prob>chi2 = 0.0001 (V_b-V_B is not positive definite)" So, I read that I can not trust the Hausman test results to be valid.
I looked for threads with the same issue, and I found that -xtoverid- is one of the possible solutions. However, I got some weird error (O: operator invalid). AFter digging in, I found that -xtoverid- is an old-ish program which does not take factor variables. So, I tried the following syntax:

Code:

 xi: xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 i.SECTOR, re

Code:

R-sq:                                           Obs per group:
     within  = 0.0859                                         min =          6
     between = 0.4142                                         avg =        6.0
     overall = 0.3839                                         max =          6

                                                Wald chi2(17)     =      36.35
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0041

------------------------------------------------------------------------------
           Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |   .0449401   .0382549     1.17   0.240    -.0300382    .1199183
          X2 |   .0405121   .0418403     0.97   0.333    -.0414934    .1225176
          X3 |    1.18501   .4796841     2.47   0.013     .2448461    2.125173
          X4 |   .0380146     .05113     0.74   0.457    -.0621984    .1382276
          X5 |   .0576902   .0285335     2.02   0.043     .0017656    .1136149
          X6 |  -.0065753   .0186427    -0.35   0.724    -.0431142    .0299637
          X7 |  -.0187848   .0427348    -0.44   0.660    -.1025436    .0649739
          X8 |  -.0660878   .0495238    -1.33   0.182    -.1631526     .030977
          X9 |  -.1517426   .0766785    -1.98   0.048    -.3020298   -.0014555
         X10 |   .0442456   .0143496     3.08   0.002      .016121    .0723703
         X11 |  -.0010335   .0101295    -0.10   0.919     -.020887    .0188201
         X12 |   .0341556   .0342669     1.00   0.319    -.0330064    .1013175
  _ISECTOR_2 |  -.0382053   .0673129    -0.57   0.570    -.1701362    .0937257
  _ISECTOR_3 |  -.0565498   .0641123    -0.88   0.378    -.1822076     .069108
  _ISECTOR_4 |  -.0879594    .069313    -1.27   0.204    -.2238103    .0478915
  _ISECTOR_5 |  -.0798642   .0711193    -1.12   0.261    -.2192554    .0595269
  _ISECTOR_6 |   .1036839    .079567     1.30   0.193    -.0522644    .2596323
       _cons |  -.5061066   .3213306    -1.58   0.115    -1.135903    .1236898
-------------+----------------------------------------------------------------
     sigma_u |  .10463913
     sigma_e |  .04052402
         rho |  .86957944   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Code:

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  
Sargan-Hansen statistic  33.244  Chi-sq(11)   P-value = 0.0005

I'm not sure if what I have done is indeed correct, therefore, according to -xtoverid- I should go fe estimator? right? If yes, How can I fix the issue with estimating my dummy variables (because in -fe- all dummies are dropped) ? I hope someone can help me realize what I have missed in the process.

Edit: I have yet another question, why when I run -fe- only X7 and i.SECTOR are dropped not X5 and X6?

here are the results:

Code:

   R-sq:                                           Obs per group:
     within  = 0.1784                                         min =          6
     between = 0.1564                                         avg =        6.0
     overall = 0.1313                                         max =          6

                                                F(11,179)         =       3.53
corr(u_i, Xb)  = -0.8687                        Prob > F          =     0.0002

------------------------------------------------------------------------------
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |   .0230283   .0394328     0.58   0.560    -.0547845    .1008412
          X2 |    .054124   .0469958     1.15   0.251    -.0386131    .1468611
          X3 |   1.183001   .4855778     2.44   0.016     .2248073    2.141194
          X4 |   .0469244   .0518573     0.90   0.367    -.0554058    .1492546
          X5 |  -.0094629   .0342685    -0.28   0.783    -.0770851    .0581593
          X6 |  -.0161048    .019421    -0.83   0.408    -.0544283    .0222187
          X7 |          0  (omitted)
          X8 |  -.0282489   .0520299    -0.54   0.588    -.1309199    .0744221
          X9 |  -.1262577   .0775878    -1.63   0.105    -.2793622    .0268467
         X10 |  -.0206716   .0221515    -0.93   0.352    -.0643833    .0230401
         X11 |   .0103989   .0100433     1.04   0.302    -.0094196    .0302174
         X12 |   .2898366   .0626869     4.62   0.000     .1661363     .413537
  1.SECTOR_6 |          0  (omitted)
  1.SECTOR_3 |          0  (omitted)
  1.SECTOR_2 |          0  (omitted)
  1.SECTOR_1 |          0  (omitted)
  1.SECTOR_4 |          0  (omitted)
  1.SECTOR_5 |          0  (omitted)
       _cons |  -.1309836    .395794    -0.33   0.741    -.9120061     .650039
-------------+----------------------------------------------------------------
     sigma_u |  .24279272
     sigma_e |  .04052402
         rho |   .9728968   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(37, 179) = 30.10                    Prob > F = 0.0000

Thank you in advance.

Last edited by sladmin; 06 Aug 2020, 05:03. Reason: anonymize original poster

Tags: fixed effects, panel data

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

23 Nov 2019, 07:18

Guest:
1) as expected, the .fe. estimator wipes out all the time-invariant predictors. For instance, if, within the same panel the -panelid- does not change sector during the observed timespan, the predictors will be omitted from calculation and there's no fix about that (this is well explained in any decent panel data econometrics textbook, like https://www.stata.com/bookstore/micr...metrics-stata/).
2) -hausman- works aymptoticalli. Hence, oftentimes it throw the message you reported. You were wise in using -xtoverid- that points you to -fe- specification.
3) under -xtreg,re- Stata probably omits predictors due to perfect correlation (but Stata should have warned you about that).

Last edited by sladmin; 06 Aug 2020, 05:04. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#3

23 Nov 2019, 10:02

Thank you Carlo Lazzaro for your quick reply. Absolutley right!! I forgot that the firm belongs to the same sector during the analysis so it's a time-invariant preditor and hence it's omitted in -fe-. My issue is that -xtoverid- points to -fe- specification and the variable "Sector" is omitted. In order to get the estumation of those kind of variables, daniel klein points out in a thread with the same issue <https://www.statalist.org/forums/for...-effects-model >that we need to use "hybrid model" or what we usually call "between-within model".

1-My first question is : Do you think that working with"hybrid model" is hard to manage giving the fact that my knowledge in econometrics is not that strong?

2- Do you think I should drop this sector or industry variable even if it's an important control variable?

3-Last but not least, if I decide, somehow, to go with hybrid model, how can I manage to test for like : heteroscedasticity, serial correlation, collineraity model misspecification and endogeneity problems with this kind of model especially those tests are very important in my report? Is there any stata commands for this kind of model like the ones with -fe-?
Thank you so much for any helpful advice.

Last edited by sladmin; 06 Aug 2020, 05:04. Reason: anonymize original poster
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

23 Nov 2019, 11:01

Guest:
1) see also: https://blog.stata.com/2015/10/29/fixed-effects-or-random-effects-the-mundlak-approach/. Yes, this approach is more difficult that the ones you're used to..
2) No, you should live with them: the fact they were omitted by the -fe- machinery is unfortunate, but expected and unavodable..
3) As far as I know, you should manage those issues yourself (but this almost always the case with panel data regression models due to the lack of built-in postestimation commands).

Last edited by sladmin; 06 Aug 2020, 05:04. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Guest
#5

23 Nov 2019, 11:41

Thank you so much Carlo Lazzaro for your feedbacks. Can you suggest me some good references that can give me some insights in how to manage do those tests under hybrid models? Or can I refer to random effects -re- because I just figure out that in "hybrid" models we use xtreg-re- ? Correct me please if I am wrong.
Comment

Guest

23 Nov 2019, 11:57

Carlo Lazzaro I have tried the Mundlak approach with the following:

Code:

 by i, sort: egen x1_between = mean( X1)

Code:

xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 x1_between x2_between x3_between x4_between x5_between x6_between x7_between x8_between x9_between x10_between x11_between i.SECTOR_*, vce(robust)

Code:

test x1_between x2_between x3_between x4_between x5_between x6_between x7_between x8_between x9_between x10_between x11_between

Code:

 ( 1)  x1_between = 0
 ( 2)  x2_between = 0
 ( 3)  x3_between = 0
 ( 4)  x4_between = 0
 ( 5)  x5_between = 0
 ( 6)  x6_between = 0
 ( 7)  x7_between = 0
 ( 8)  x8_between = 0
 ( 9)  x9_between = 0
 (10)  x10_between = 0
 (11)  x11_between = 0

           chi2( 11) =   45.20
         Prob > chi2 =    0.0000

We reject H0. This suggests that the fixed effects model is appropriate. The same result as-xtoverid.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#7

23 Nov 2019, 12:05

Guest:
1) for model misspecification you can -test- if the squared fitted values are significant (I think I've shown you an example in one of the previous threads you started).
2) heteroskedasticity and autocorrelation: use -cluster()- options for standard errors if you suspect one or both can bias your regression results. Please note that, to work as expected, cluster should be enough (say 15-20, at least).
3) endogeneity can be avoided knowing the data generating process. Besides, a misspcified model might be affected by endogeneity, too.
4) quasi-extreme multicollinearity can be suspected by taking a look at the 95% CIs. If they look weird (usually too wide), take a look at -estat vce, corr- matrix.

Last edited by sladmin; 06 Aug 2020, 05:04. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#8

23 Nov 2019, 12:09

Guest:
then go -fe-.

Last edited by sladmin; 06 Aug 2020, 05:04. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#9

23 Nov 2019, 12:30

But if I go -fe- my invariant "sector' variable will be omitted. So, the only solution would be go "hybrid" model which in fact just the name scares me . Thank you so much Carlo Lazzaro . I'm grateful to you . Your responses are (as always) helpful .
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#10

23 Nov 2019, 14:28

Guest:
that's the way panel data regression goes.
-fe- specification allows weak endogeneity but at the cost of not estimating the time-invariant predictors you're interested in. Conversely, -re- specification estimates time-invariant coefficients, but assumes no correlation between the vector of regressors and both the components of the error (even though that assumption cannot be taken for granted).
You went -mundlak- but the -test- you performed seems to point you out to -fe- again.
Being back to square one, I would wonder whether a different set of predictors could make sense in your regression model.

Last edited by sladmin; 06 Aug 2020, 05:04. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#11

24 Nov 2019, 07:04

@Carlo in #10: No, it does not point to the fe model. All it does is reject the original re model (without the "between" variables added) and you are left with the Mundlak (or CRE) formulation with the "between" variables included.
Also, the fe specification requires strict exogeneity because of the presence of the (unobserved) individual specific effects .
Moreover, with the Mundlak formulation Alexis has estimates of the coefficients for the time-invariant variables.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#12

24 Nov 2019, 07:55

Eric:
see however https://www.statalist.org/forums/for...interpretation

Kind regards,
Carlo
(Stata 19.0)
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#13

24 Nov 2019, 08:55

Carlo, I had a look. It doesn't contradict what I said. The test results posted above under #6 leads to a rejection of the RE model in its traditional formulation. The coefficient estimates obtained using the Mundlak approach are the same for the time varying variables as those obtained by the FE model but it also provides estimates for the coefficients of the time invariant variables which the FE model does not. In this sense it does not reduce to the FE model if by FE model one understands the within estimator.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#14

24 Nov 2019, 09:02

Eric:
thanks.
Enlightening.

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#15

24 Nov 2019, 09:35

Thank you Eric de Souza for your reply. So, are you saying the the Mundlak approach does not reject -re -??
Comment

Announcement

How can I estimate dummy variables in a fixed effect model?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment