Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I estimate dummy variables in a fixed effect model?

    Hello everyone. I came across an issue related to dummy variables. Well, I have a set of 6 independent variables ( X1 X2 X3 X4 X5 X6 where X5 and X6 are dummy variables : 0 1) and 7 control variables (X7 X8 X9 X10 X11 X12 i.SECTOR_* where X7 and i.SECTOR_* are dummy variables). In order to test the -re- estimator against the-fe- estimator, I wrote the following syntax in Stata:

    Code:
    xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 i.SECTOR_*, fe
    Code:
    estimates store fixe
    Code:
    xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 i.SECTOR_*, re
    Code:
      hausman fixe
    
                     ---- Coefficients ----
                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                 |      fixe          .          Difference          S.E.
    -------------+----------------------------------------------------------------
              X1 |    .0230283     .0449401       -.0219117        .0095658
              X2 |     .054124     .0405121        .0136119        .0214008
              X3 |    1.183001      1.18501        -.002009        .0754256
              X4 |    .0469244     .0380146        .0089098        .0086543
              X5 |   -.0094629     .0576902       -.0671531        .0189781
              X6 |   -.0161048    -.0065753       -.0095295        .0054429
              X8 |   -.0282489    -.0660878        .0378389        .0159534
              X9 |   -.1262577    -.1517426        .0254849        .0118437
             X10 |   -.0206716     .0442456       -.0649172        .0168754
             X11 |    .0103989    -.0010335        .0114324               .
             X12 |    .2898366     .0341556         .255681        .0524921
    ------------------------------------------------------------------------------
                               b = consistent under Ho and Ha; obtained from xtreg
                B = inconsistent under Ha, efficient under Ho; obtained from xtreg
    
        Test:  Ho:  difference in coefficients not systematic
    
                     chi2(11) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                              =       37.59
                    Prob>chi2 =      0.0001
                    (V_b-V_B is not positive definite)
    Like the showing results, I had the message "Prob>chi2 = 0.0001 (V_b-V_B is not positive definite)" So, I read that I can not trust the Hausman test results to be valid.
    I looked for threads with the same issue, and I found that -xtoverid- is one of the possible solutions. However, I got some weird error (O: operator invalid). AFter digging in, I found that -xtoverid- is an old-ish program which does not take factor variables. So, I tried the following syntax:

    Code:
     xi: xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 i.SECTOR, re
    Code:
    R-sq:                                           Obs per group:
         within  = 0.0859                                         min =          6
         between = 0.4142                                         avg =        6.0
         overall = 0.3839                                         max =          6
    
                                                    Wald chi2(17)     =      36.35
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0041
    
    ------------------------------------------------------------------------------
               Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              X1 |   .0449401   .0382549     1.17   0.240    -.0300382    .1199183
              X2 |   .0405121   .0418403     0.97   0.333    -.0414934    .1225176
              X3 |    1.18501   .4796841     2.47   0.013     .2448461    2.125173
              X4 |   .0380146     .05113     0.74   0.457    -.0621984    .1382276
              X5 |   .0576902   .0285335     2.02   0.043     .0017656    .1136149
              X6 |  -.0065753   .0186427    -0.35   0.724    -.0431142    .0299637
              X7 |  -.0187848   .0427348    -0.44   0.660    -.1025436    .0649739
              X8 |  -.0660878   .0495238    -1.33   0.182    -.1631526     .030977
              X9 |  -.1517426   .0766785    -1.98   0.048    -.3020298   -.0014555
             X10 |   .0442456   .0143496     3.08   0.002      .016121    .0723703
             X11 |  -.0010335   .0101295    -0.10   0.919     -.020887    .0188201
             X12 |   .0341556   .0342669     1.00   0.319    -.0330064    .1013175
      _ISECTOR_2 |  -.0382053   .0673129    -0.57   0.570    -.1701362    .0937257
      _ISECTOR_3 |  -.0565498   .0641123    -0.88   0.378    -.1822076     .069108
      _ISECTOR_4 |  -.0879594    .069313    -1.27   0.204    -.2238103    .0478915
      _ISECTOR_5 |  -.0798642   .0711193    -1.12   0.261    -.2192554    .0595269
      _ISECTOR_6 |   .1036839    .079567     1.30   0.193    -.0522644    .2596323
           _cons |  -.5061066   .3213306    -1.58   0.115    -1.135903    .1236898
    -------------+----------------------------------------------------------------
         sigma_u |  .10463913
         sigma_e |  .04052402
             rho |  .86957944   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------

    Code:
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  
    Sargan-Hansen statistic  33.244  Chi-sq(11)   P-value = 0.0005
    I'm not sure if what I have done is indeed correct, therefore, according to -xtoverid- I should go fe estimator? right? If yes, How can I fix the issue with estimating my dummy variables (because in -fe- all dummies are dropped) ? I hope someone can help me realize what I have missed in the process.

    Edit: I have yet another question, why when I run -fe- only X7 and i.SECTOR are dropped not X5 and X6?

    here are the results:

    Code:
       R-sq:                                           Obs per group:
         within  = 0.1784                                         min =          6
         between = 0.1564                                         avg =        6.0
         overall = 0.1313                                         max =          6
    
                                                    F(11,179)         =       3.53
    corr(u_i, Xb)  = -0.8687                        Prob > F          =     0.0002
    
    ------------------------------------------------------------------------------
               Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              X1 |   .0230283   .0394328     0.58   0.560    -.0547845    .1008412
              X2 |    .054124   .0469958     1.15   0.251    -.0386131    .1468611
              X3 |   1.183001   .4855778     2.44   0.016     .2248073    2.141194
              X4 |   .0469244   .0518573     0.90   0.367    -.0554058    .1492546
              X5 |  -.0094629   .0342685    -0.28   0.783    -.0770851    .0581593
              X6 |  -.0161048    .019421    -0.83   0.408    -.0544283    .0222187
              X7 |          0  (omitted)
              X8 |  -.0282489   .0520299    -0.54   0.588    -.1309199    .0744221
              X9 |  -.1262577   .0775878    -1.63   0.105    -.2793622    .0268467
             X10 |  -.0206716   .0221515    -0.93   0.352    -.0643833    .0230401
             X11 |   .0103989   .0100433     1.04   0.302    -.0094196    .0302174
             X12 |   .2898366   .0626869     4.62   0.000     .1661363     .413537
      1.SECTOR_6 |          0  (omitted)
      1.SECTOR_3 |          0  (omitted)
      1.SECTOR_2 |          0  (omitted)
      1.SECTOR_1 |          0  (omitted)
      1.SECTOR_4 |          0  (omitted)
      1.SECTOR_5 |          0  (omitted)
           _cons |  -.1309836    .395794    -0.33   0.741    -.9120061     .650039
    -------------+----------------------------------------------------------------
         sigma_u |  .24279272
         sigma_e |  .04052402
             rho |   .9728968   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(37, 179) = 30.10                    Prob > F = 0.0000
    Thank you in advance.

    Last edited by sladmin; 06 Aug 2020, 06:03. Reason: anonymize original poster

  • #2
    Guest:
    1) as expected, the .fe. estimator wipes out all the time-invariant predictors. For instance, if, within the same panel the -panelid- does not change sector during the observed timespan, the predictors will be omitted from calculation and there's no fix about that (this is well explained in any decent panel data econometrics textbook, like https://www.stata.com/bookstore/micr...metrics-stata/).
    2) -hausman- works aymptoticalli. Hence, oftentimes it throw the message you reported. You were wise in using -xtoverid- that points you to -fe- specification.
    3) under -xtreg,re- Stata probably omits predictors due to perfect correlation (but Stata should have warned you about that).
    Last edited by sladmin; 06 Aug 2020, 06:04. Reason: anonymize original poster
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you Carlo Lazzaro for your quick reply. Absolutley right!! I forgot that the firm belongs to the same sector during the analysis so it's a time-invariant preditor and hence it's omitted in -fe-. My issue is that -xtoverid- points to -fe- specification and the variable "Sector" is omitted. In order to get the estumation of those kind of variables, daniel klein points out in a thread with the same issue <https://www.statalist.org/forums/for...-effects-model >that we need to use "hybrid model" or what we usually call "between-within model".

      1-My first question is : Do you think that working with"hybrid model" is hard to manage giving the fact that my knowledge in econometrics is not that strong?

      2- Do you think I should drop this sector or industry variable even if it's an important control variable?

      3-Last but not least, if I decide, somehow, to go with hybrid model, how can I manage to test for like : heteroscedasticity, serial correlation, collineraity model misspecification and endogeneity problems with this kind of model especially those tests are very important in my report? Is there any stata commands for this kind of model like the ones with -fe-?
      Thank you so much for any helpful advice.
      Last edited by sladmin; 06 Aug 2020, 06:04. Reason: anonymize original poster

      Comment


      • #4
        Guest:
        1) see also: https://blog.stata.com/2015/10/29/fixed-effects-or-random-effects-the-mundlak-approach/. Yes, this approach is more difficult that the ones you're used to..
        2) No, you should live with them: the fact they were omitted by the -fe- machinery is unfortunate, but expected and unavodable..
        3) As far as I know, you should manage those issues yourself (but this almost always the case with panel data regression models due to the lack of built-in postestimation commands).
        Last edited by sladmin; 06 Aug 2020, 06:04. Reason: anonymize original poster
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you so much Carlo Lazzaro for your feedbacks. Can you suggest me some good references that can give me some insights in how to manage do those tests under hybrid models? Or can I refer to random effects -re- because I just figure out that in "hybrid" models we use xtreg-re- ? Correct me please if I am wrong.

          Comment


          • #6
            Carlo Lazzaro I have tried the Mundlak approach with the following:

            Code:
             by i, sort: egen x1_between = mean( X1)
            Code:
            xtreg Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 x1_between x2_between x3_between x4_between x5_between x6_between x7_between x8_between x9_between x10_between x11_between i.SECTOR_*, vce(robust)
            Code:
            test x1_between x2_between x3_between x4_between x5_between x6_between x7_between x8_between x9_between x10_between x11_between
            Code:
             ( 1)  x1_between = 0
             ( 2)  x2_between = 0
             ( 3)  x3_between = 0
             ( 4)  x4_between = 0
             ( 5)  x5_between = 0
             ( 6)  x6_between = 0
             ( 7)  x7_between = 0
             ( 8)  x8_between = 0
             ( 9)  x9_between = 0
             (10)  x10_between = 0
             (11)  x11_between = 0
            
                       chi2( 11) =   45.20
                     Prob > chi2 =    0.0000
            We reject H0. This suggests that the fixed effects model is appropriate. The same result as-xtoverid.

            Comment


            • #7
              Guest:
              1) for model misspecification you can -test- if the squared fitted values are significant (I think I've shown you an example in one of the previous threads you started).
              2) heteroskedasticity and autocorrelation: use -cluster()- options for standard errors if you suspect one or both can bias your regression results. Please note that, to work as expected, cluster should be enough (say 15-20, at least).
              3) endogeneity can be avoided knowing the data generating process. Besides, a misspcified model might be affected by endogeneity, too.
              4) quasi-extreme multicollinearity can be suspected by taking a look at the 95% CIs. If they look weird (usually too wide), take a look at -estat vce, corr- matrix.
              Last edited by sladmin; 06 Aug 2020, 06:04. Reason: anonymize original poster
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Guest:
                then go -fe-.
                Last edited by sladmin; 06 Aug 2020, 06:04. Reason: anonymize original poster
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  But if I go -fe- my invariant "sector' variable will be omitted. So, the only solution would be go "hybrid" model which in fact just the name scares me . Thank you so much Carlo Lazzaro . I'm grateful to you . Your responses are (as always) helpful .

                  Comment


                  • #10
                    Guest:
                    that's the way panel data regression goes.
                    -fe- specification allows weak endogeneity but at the cost of not estimating the time-invariant predictors you're interested in. Conversely, -re- specification estimates time-invariant coefficients, but assumes no correlation between the vector of regressors and both the components of the error (even though that assumption cannot be taken for granted).
                    You went -mundlak- but the -test- you performed seems to point you out to -fe- again.
                    Being back to square one, I would wonder whether a different set of predictors could make sense in your regression model.
                    Last edited by sladmin; 06 Aug 2020, 06:04. Reason: anonymize original poster
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      @Carlo in #10: No, it does not point to the fe model. All it does is reject the original re model (without the "between" variables added) and you are left with the Mundlak (or CRE) formulation with the "between" variables included.
                      Also, the fe specification requires strict exogeneity because of the presence of the (unobserved) individual specific effects .
                      Moreover, with the Mundlak formulation Alexis has estimates of the coefficients for the time-invariant variables.

                      Comment


                      • #12
                        Eric:
                        see however https://www.statalist.org/forums/for...interpretation
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Carlo, I had a look. It doesn't contradict what I said. The test results posted above under #6 leads to a rejection of the RE model in its traditional formulation. The coefficient estimates obtained using the Mundlak approach are the same for the time varying variables as those obtained by the FE model but it also provides estimates for the coefficients of the time invariant variables which the FE model does not. In this sense it does not reduce to the FE model if by FE model one understands the within estimator.

                          Comment


                          • #14
                            Eric:
                            thanks.
                            Enlightening.
                            Kind regards,
                            Carlo
                            (StataNow 18.5)

                            Comment


                            • #15
                              Thank you Eric de Souza for your reply. So, are you saying the the Mundlak approach does not reject -re -??

                              Comment

                              Working...
                              X