Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

    Dear Statalisters,

    I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.

    Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.

    The new command is currently only available for installation from my own website and not yet from SSC:
    Code:
    . net install xtseqreg, from(http://www.kripfganz.de/stata/)
    After the installation, detailed documentation of the syntax and available options can be found in the help files:
    Code:
    . help xtseqreg
    . help xtseqreg postestimation
    As always, comments and suggestions are welcome and highly appreciated.

    Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.
    Code:
    . xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
                                                 Obs per group:    min =         5
                                                                   avg =         5
                                                                   max =         5
    
                                                 Number of instruments =        10
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           lwage |
             L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
             L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
                 |
             exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
            exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
             occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
             ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
           union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
           _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
    ------------------------------------------------------------------------------
    With the following syntax, we can then run a second-stage instrumental-variables regression of the first-stage residuals on some time-invariant regressors. The first-stage results are automatically taken from the previous estimation. Just as an illustration, ed is assumed to be endogenous and instrumented with occ.
    Code:
    . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, iv(occ fem blk, model(level)) vce(robust)
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
    ------------------------------------------------------------------------------
    Equation _first                              Equation _second
    Number of obs         =      2975            Number of obs         =      2975
    Number of groups      =       595            Number of groups      =       595
    
    Obs per group:    min =         5            Obs per group:    min =         5
                      avg =         5                              avg =         5
                      max =         5                              max =         5
    
    Number of instruments =        10            Number of instruments =         4
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |               Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _first       |
           lwage |
             L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
             L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
                 |
             exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
            exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
             occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
             ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
           union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
           _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
    -------------+----------------------------------------------------------------
    _second      |
              ed |   .0634885   .0348497     1.82   0.068    -.0048158    .1317927
             fem |  -.0967082   .0575629    -1.68   0.093    -.2095295     .016113
             blk |  -.1531252   .1010073    -1.52   0.130     -.351096    .0448456
           _cons |  -.7936727   .4419754    -1.80   0.073    -1.659929    .0725831
    ------------------------------------------------------------------------------
    Instead of specifying both stages one after the other, with some more complicated syntax the same results can also be obtained with a single command line:
    Code:
    . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse equation(#1)) iv(exp exp2 occ ind union, difference model(difference) equation(#1)) iv(occ fem blk, model(level) equation(#2)) twostep vce(robust) both
    As a postestimation command, estat overid provides Hansen's J-test for the validity of the overidentifying restrictions for both stages. (In the current example, the second stage is exactly identified.)
    Code:
    . estat overid
    
    Hansen's J-test for equation _first                    chi2(2)     =    0.2935
    H0: overidentifying restrictions are valid             Prob > chi2 =    0.8635
    
    Hansen's J-test for equation _second                   chi2(0)     =    0.0000
    note: coefficients are exactly identified              Prob > chi2 =         .
    The following command line exactly replicates the above results for the first stage with xtabond2, including Hansen's J-test:
    Code:
    . xtabond2 L(0/2).lwage exp exp2 occ ind union, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp exp2 occ ind union, equation(diff)) twostep robust
    Notice that the reported results for Hansen's J-test would differ between xtseqreg and xtabond2 if the one-step GMM estimator was used (the above example without option twostep) because xtabond2 silently still estimates the two-step estimator for this purpose while xtseqreg evaluates the first-step moment functions while still using an optimal weighting matrix (that would have been used in a second step).

    Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.
    Code:
    . xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust)
    . xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)
    You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.

    Reference:
    • Kripfganz, S., and C. Schwarz (2015). Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838, European Central Bank.
    https://twitter.com/Kripfganz

  • #2
    I need to add that I have used the following data set to generate the above examples:
    Code:
    . webuse psidextract
    Also, my statement about the default weighting matrix used by xtseqreg in the last paragraph was wrong. (It is easy to get lost with all the available options.) The default is the same as with xtabond2. In the last example above, for the two commands to be equivalent, the option wmatrix(independent) needs to be added to the xtseqreg command line. (In this particular example, however, the estimates remain the same.)
    https://twitter.com/Kripfganz

    Comment


    • #3
      There is already a first update available:
      Code:
      adoupdate xtseqreg, update
      As I have experienced that many (or at least some) people seem to struggle with the correct specification of time-fixed effects in estimation commands for (dynamic) panel data models and motivated by the discussion mentioned at the very end of my opening post, I have added the teffects option to my xtseqreg command. This option adds time-fixed effects to the model and makes sure that the correct number of dummy variables is added as well as the correct type and number of corresponding instruments.
      https://twitter.com/Kripfganz

      Comment


      • #4
        Another update is available that adds the Arellano-Bond test for absence of serial correlation in the first-differenced errors as a postestimation command. To continue the example from above:
        Code:
        . webuse psidextract
        
        . quietly xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)
        
        . estat serial, ar(1/3)
        
        Arellano-Bond test for autocorrelation of the first-differenced residuals
        H0: no autocorrelation of order 1:     z =   -3.3576   Prob > |z|  =    0.0008
        H0: no autocorrelation of order 2:     z =   -0.4852   Prob > |z|  =    0.6275
        H0: no autocorrelation of order 3:     z =    0.2946   Prob > |z|  =    0.7683
        https://twitter.com/Kripfganz

        Comment


        • #5
          Originally posted by Sebastian Kripfganz View Post
          There is already a first update available:
          Code:
          adoupdate xtseqreg, update
          As I have experienced that many (or at least some) people seem to struggle with the correct specification of time-fixed effects in estimation commands for (dynamic) panel data models and motivated by the discussion mentioned at the very end of my opening post, I have added the teffects option to my xtseqreg command. This option adds time-fixed effects to the model and makes sure that the correct number of dummy variables is added as well as the correct type and number of corresponding instruments.
          Dear Sebastian,

          Thanks a lot for this very helpful thread.
          I am one of those people who struggle with the correct specification of time fixed-effects. I try to replicate the xtabond2 (estimation for a different dataset). However, I struggle to specify the tdum and so cannot run the estimation.
          I basically duplicate your code with my own variables and stata do not recognize the tdum. I noted that the psidextract data set has tdum1-7 in the list of the variables. How could I generate or specified the time fixed effects?

          Thanks for your help.
          BR

          Nadia

          Comment


          • #6
            Nadia Oue,
            Welcome to Statalist. Could you please show us the command lines that you have typed in Stata as well as Stata's output when you type xtset. Otherwise, it is hard to give specific advice. (Please see the FAQ of this forum, in particular Section 12: http://www.statalist.org/forums/help#stata)
            https://twitter.com/Kripfganz

            Comment


            • #7
              Originally posted by Sebastian Kripfganz View Post
              Nadia Oue,
              Welcome to Statalist. Could you please show us the command lines that you have typed in Stata as well as Stata's output when you type xtset. Otherwise, it is hard to give specific advice. (Please see the FAQ of this forum, in particular Section 12: http://www.statalist.org/forums/help#stata)
              Dear Sebastian,

              Thanks a lot for your quick reply. Thanks for the FAQ link.

              Here my xtset:

              xtset
              panel variable: ccode (unbalanced)
              time variable: year, 1946 to 2016
              delta: 1 unit


              And my code:

              Code:
              [xtabond2 L(0/2).fh_polity2 Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod tdum4-tdum7, gmmstyle(L.fh_polity2, equation(diff) lag limits (1 4) collapse) ivstyle (Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod, difference equation(diff)) ivstyle (tdum4-tdum7, equation(diff)) ivstyle (tdum4-tdum7, equation(level)) twostep robust h(2)
              Many thanks for your kind assistance

              Nadia
              Last edited by Nadia Oue; 21 Mar 2017, 07:02.

              Comment


              • #8
                Thanks for the additional information.

                If you do not have time dummies yet in your data set, you can generate them with the following command:
                Code:
                tabulate year, generate(tdum)
                Please note two further remarks:
                1. Your time span ranges from 1946 to 2016. This is a rather large time dimension and the GMM estimators as implemented by xtabond2, xtdpd, and xtseqreg are usually not appropriate for such "large T" circumstances.
                2. When you read again the last paragraph of my opening post and follow the link that follows the example there, please notice in particular the recommendation NOT to use instruments for the time dummies in both the first-differenced and the levels equation:
                Originally posted by Sebastian Kripfganz View Post
                Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.
                Code:
                . xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust)
                . xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)
                You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.
                https://twitter.com/Kripfganz

                Comment


                • #9
                  Dear Sebastian,
                  Thanks again for your kind assistance.
                  1. I took a smaller subset of my data (1990-2016)
                  2. I introduce the time dummies only at the first differenced equation only.
                  Now I obtain "No observations r(2000)"

                  Code:
                  xtabond2 L(0/2).fh_polity2 Quantity Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod, gmmstyle (L.fh_polity2, equation(diff) lag limits(1 4) collapse) ivstyle ( Quantity Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod, difference equation(diff)) ivstyle (tdum4-tdum27, equation(diff)) ivstyle (equation(level)) twostep robust h(2)
                  I will have two additional silly questions. How do I determine the lag limits and the time dummies lags in this case?
                  Many thanks for your help

                  Br,
                  Nadia

                  Comment


                  • #10
                    Did you really specify both of the following options?
                    Code:
                    ivstyle(tdum4-tdum27, equation(diff)) ivstyle(equation(level))
                    The second one should result in an error because no variables are specified. You should remove it if not needed. In addition, you also need to specify the time dummies explicitly as independent variables before the first comma in the command line.

                    It is hard to say why you get the r(2000) error message. Could you please report the Stata output of the tabulate command that you used to generate the time dummies? Also, does the xtabond2 command produce proper output without the time dummies?

                    The question about the selection of lag limits is less trivial to answer. I recommend that you have a look at David Roodman's paper on How to do xtabond2. The time dummies should not be lagged.
                    https://twitter.com/Kripfganz

                    Comment


                    • #11
                      Dear Sebastian,

                      Thanks.

                      1. The r(2000) was due to the choice of my variables that I corrected.

                      Now I get proper outputs (even for the longer time span range 1946-2016)

                      2.On the time dummies, I meant, what did guide your choice of tdum4-tdum7 (tdum7 I get it but why from tdum4)?


                      Comment


                      • #12
                        If you have time periods 1 to 7, the first two time dummies (tdum1 and tdum2) cannot be included because the first two periods are removed from the estimation sample due to the two lags of the dependent variable in the above example. The third time dummy (tdum3) cannot be included because otherwise there would be perfect colinearity of all time dummies together with the regression intercept ("dummy trap"). Hence, only tdum4 to tdum7 can be used. (Of course, instead of tdum3 any other time dummy could be excluded which only changes the reference period.)
                        https://twitter.com/Kripfganz

                        Comment


                        • #13
                          Dear Sebastian,
                          Many thanks for your help throughout. This is the first time I am using Xtabond and you have been very helpful.
                          Can I ask a favor that you have a look to my code and tell me if it does look good? Thanks


                          So
                          My dependent variable is Democraty (ft_polity2) and my regressors are: laggedDemo , Energy consumption and my control variables are: gdp per capita, Oil Production, and ethnic fragmentation.

                          panel variable: ccode (unbalanced)
                          time variable: year, 1990 to 2016
                          delta: 1 unit


                          my code is:
                          Code:
                          xtabond2 L(0/2).fh_polity2 Ener Ener2 wdi_gdpcapcur al_ethnic ross_oil_prod tdum4-tdum27, gmmstyle(L.fh_polity2, equation(diff) lag(1 .) collapse) ivstyle(Ener Ener2 wdi_gdpcapcur ross_oil_prod al_ethnic, equation(diff)) ivstyle(tdum4-tdum27, equation(level)) twostep robust h(2)
                          Many thanks.

                          BR,
                          Nadia

                          Comment


                          • #14
                            Indeed, your specification looks good, provided that you can assume that your regressors and control variables (besides the lagged dependent variable) are strictly exogenous. If that is a good assumption or not depends on your underlying economic theory. In addition, you would need to check the usual specification tests (Arellano-Bond test, Hansen test).

                            If you have further questions that are specific to the xtabond2 command, I would recommend to start a new Statalist topic because this topic is primarily about the new xtseqreg command.
                            https://twitter.com/Kripfganz

                            Comment


                            • #15
                              xtseqreg has been updated to version 1.1.2.
                              Code:
                              adoupdate xtseqreg, update
                              In combination with my other new command xtdpdgmm, the xtseqreg command can now also be used for two-stage estimation based on a first-stage Ahn-Schmidt GMM estimator with nonlinear moment conditions. Continuing the example from the beginning of this topic based on the psidextract data set:
                              Code:
                              . xtdpdgmm L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust) noserial aux
                              . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, first(, copy) iv(occ fem blk, model(level)) vce(robust).
                              With this new version, the Arellano-Bond test statistic (estat serial) after two-step robust estimation might now slightly differ from previous versions (and the one reported by xtabond2 or xtdpd) because xtseqreg now fully accounts for the finite-sample Windmeijer correction in the computation of this test statistic (while other commands do not). Postestimation statistics now further include the possibility to compute difference-in-Hansen tests and generalized Hausman tests. The help file and the Statalist topic on the xtdpdgmm command provide further information on these tests. (xtseqreg and xtdpdgmm produce identical results for one-stage GMM estimation with linear moment conditions only.)

                              Thanks to Kit Baum, this version of xtseqreg is now also available for installation from SSC:
                              Code:
                              ssc install xtseqreg
                              The SSC version of this command might see less frequent updates than the version on my own website. Yet, some users may not be able to install the package directly from my website due to corporate firewall restrictions, etc.
                              https://twitter.com/Kripfganz

                              Comment

                              Working...
                              X