  • You would not normally run two separate regressions for the effects above and below the threshold. Just combine everything in a single regression:
    xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep


    • I tried following command
      xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep However, it gives this error:
      no observations

      In this case, should I replace the missing values in the newly generated threshold variables with zero? As follows:

      gen X2_h = X2 if X2 > 0.32
      replace X2_h = 0 if X2_h == .

      gen X2_l = X2 if X2 <= 0.32
      replace X2_l = 0 if X2_l == .


      • Originally posted by Sarah Magd View Post
        should I replace the missing values in the newly generated threshold variables with zero?


        • I estimate the Cobb-Douglas production function in a static form as follows:
          GDP per capita = Capital formation per capita + energy consumption per capita + inflation + trade openness + financial development
          My sample is 13 years for 27 countries.
          - I am using the fixed effect regression with robust standard errors and panel corrected standard errors with fixed effects. The two regressions give the expected results of my variable of interest (i.e., financial development). However, since the energy consumption variable is endogenous (i.e., due to the reverse causality), I should use a model that corrects the potential biases of this endogeneity. As I mentioned in #424, I can use the two-step GMM estimator to control for the endogeneity. Nevertheless, the financial development (my main variable) in this regression is insignificant/or counterintuitive.

          - Given my sample size and the static specification, which estimator would be the most relevant to control for the endogeneity?


          • A general answer is that lots of things can happen to your estimates when you change the underlying assumptions (i.e. one variable is treated as endogenous instead of exogenous). Instrumental variables estimators (including GMM) may help to alleviate the endogeneity problem, but they might create other problems. For example, standard errors might become quite large if instruments are relatively weak. Especially when you have a relatively small sample size, the differences between estimators might appear large because the coefficients are not estimated very precisely.

            I would recommend to change the estimator as little as necessary when you make different assumption, to get the best possible comparison. Say, you start with a fixed-effects estimator:
            xtreg Y X1 X2 X3, fe vce(robust)
            Note that you can replicate this regression with xtdpdgmm as follows:
            xtdpdgmm Y X1 X2 X3, model(mdev) iv(X1 X2 X3, norescale) small vce(robust)
            Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:
            xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)
            Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.


            • Dear Prof. Sebastian Kripfganz
              - We have a static panel regression with relatively small T (i.e., T = 13 and cross-section units = 30), an endogenous variable (i.e., due to the reverse causality), and fixed effects.
              The OLS fixed effects with robust standard errors is used first to obtain baseline results. As far as I understood, the two-step system GMM estimator can be used to control only for the endogeneity problem. Is there another statistical issue that is considered by the two-step system GMM estimator in the case of a static specification (i.e., more efficiency or consistency - omitted variable bias - serial autocorrelation - etc.)?


              • An estimator is either consistent or not. The GMM estimator is consistent if all the moment conditions/instruments are valid (and there are sufficiently many instruments available to estimate all coefficients).

                Efficiency is a relative concept. Among different GMM estimators, the asymptotically efficient estimator uses all non-redundant moment conditions/instruments and an optimal weighting matrix (as the two-step estimator does). If feasible, other estimators (such as a maximum likelihood estimator) might be more efficient in the sense that they achieve a smaller asymptotic variance.

                Omitted variables are a source of endogeneity. If appropriate instruments are available (which are uncorrelated with the omitted variables), then GMM can deal with this problem.

                Serial correlation may or may not be a problem. If all regressors are strictly exogenous, serial correlation can be accounted for by using an optimal weighting matrix and panel-robust standard errors. Sometimes, serial correlation can be an indication of omitted dynamics (which could be an omitted lagged dependent variable or omitted lags of the regressors). In that case, an omitted variables problem could arise.


                • Thanks a lot for the constructive and organized reply.

                  As far as I understood, for the case of static regression with fixed effects:
                  - The two-step system GMM estimator can control the endogeneity problem resulting from either a reverse causality or omitted variables bias (assuming that appropriate estimators are available).
                  - The two-step system GMM estimator is relatively more efficient than the one-step system GMM estimator because it accounts for the extra variance coming from the unobserved fixed effects
                  - The validity of the two-step GMM estimator is tested by the Hansen test. If it is insignificant, we can conclude that the results obtained by this estimator are consistent and the GMM can deal with the problem of omitted variable bias.
                  - Given the existence of endogenous regressors, the serial correlation would still affect the first admissible lag for the instruments. Therefore, for the Arellano-Bond test for autocorrelation of the first-differenced residuals, if H0: no autocorrelation of order 2 is accepted, then this can be an indication that there are no omitted dynamics nor omitted lags of the regressors.

                  Am I right?


                  • In general, this is correct.


                    • The previous update of xtdpdgmm introduced doubly-corrected (DC) misspecification-robust standard errors for the one-step, two-step, and iterated GMM estimator with linear moment conditions. A new update is now available which also supports DC standard-errors - vce(robust, dc) - with these estimators for models with nonlinear moment conditions - nl(). With thanks to Kit Baum, this latest version 2.4.2 is now also available on SSC.
                      ado update xtdpdgmm, update
                      As a minor addition, the new version also supports use of the identity matrix as an initial weighting matrix - wmatrix(identity) - although use of this option would be hardly ever recommended in practice.


                      • Yet another update is now available:
                        net install xtdpdgmm, from(
                        Version 2.5.0 of xtdpdgmm allows to estimate the model with the nonlinear moment conditions recently proposed by Chudik and Pesaran (2022). The respective command option is nl(predetermined). The name of this option reflects the fact that these nonlinear moment conditions are only valid if all of the right-hand side variables are predetermined (or strictly exogenous). Similar to the Ahn and Schmidt (1995) nonlinear moment conditions, a crucial assumption is that the idiosyncratic error term is serially uncorrelated. However, the Ahn-Schmidt moment conditions do not require the regressors to be predetermined. On the other side, the Chudik-Pesaran moment conditions relax some assumptions about the initial observations; see the Remarks section in the xtdpdgmm help file for more details.

                        In either case, the nonlinear moment conditions help with identification when the dependent variable is highly persistent. They become redundant when the additional Blundell and Bond (1998) instruments for the model in levels are added. In Monte-Carlo simulations, the Chudik-Pesaran estimator performs quite well.

                        This latest version also comes with the new option center, which centers the moments in the optimal weighting matrix around their mean. This is asymptotically irrelevant but might improve the finite-sample performance.

                        Here is an example of the Chudik-Pesaran estimator with centered weighting matrix:
                        . webuse abdata
                        . xtdpdgmm L(0/1).n w k, gmm(L.n w k, diff lag(1 4) collapse) model(diff) nl(predetermined) twostep center vce(robust)
                        Generalized method of moments estimation
                        Fitting full model:
                        Step 1:
                        initial:       f(b) =  6.9079499
                        alternative:   f(b) =  1.8818777
                        rescale:       f(b) =  .03794837
                        Iteration 0:   f(b) =  .03794837  
                        Iteration 1:   f(b) =  .00207619  
                        Iteration 2:   f(b) =  .00183829  
                        Iteration 3:   f(b) =  .00183771  
                        Iteration 4:   f(b) =  .00183766  
                        Step 2:
                        Iteration 0:   f(b) =   .9265277  
                        Iteration 1:   f(b) =  .72579345  
                        Iteration 2:   f(b) =  .58900414  
                        Iteration 3:   f(b) =  .53498361  
                        Iteration 4:   f(b) =  .53034607  
                        Iteration 5:   f(b) =  .51523719  
                        Iteration 6:   f(b) =  .50824095  
                        Iteration 7:   f(b) =  .50752705  
                        Iteration 8:   f(b) =  .50736446  
                        Iteration 9:   f(b) =  .50733335  
                        Iteration 10:  f(b) =  .50732691  
                        Iteration 11:  f(b) =  .50732563  
                        Iteration 12:  f(b) =  .50732537  
                        Iteration 13:  f(b) =  .50732532  
                        Group variable: id                           Number of obs         =       891
                        Time variable: year                          Number of groups      =       140
                        Moment conditions:     linear =      13      Obs per group:    min =         6
                                            nonlinear =       6                        avg =  6.364286
                                                total =      19                        max =         8
                                                           (Std. err. adjusted for 140 clusters in id)
                                     |              WC-Robust
                                   n | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                                   n |
                                 L1. |   .4433475   .0635349     6.98   0.000     .3188213    .5678737
                                   w |  -.7539702   .0810252    -9.31   0.000    -.9127767   -.5951636
                                   k |   .3452531   .0555308     6.22   0.000     .2364147    .4540914
                               _cons |   3.103905   .2695798    11.51   0.000     2.575538    3.632272
                        Instruments corresponding to the linear moment conditions:
                         1, model(diff):
                           L1.D.L.n L2.D.L.n L3.D.L.n L4.D.L.n L1.D.w L2.D.w L3.D.w L4.D.w L1.D.k
                           L2.D.k L3.D.k L4.D.k
                         2, model(level):
                        Notice that the estimator uses differenced gmm() instruments for the first-differenced model.
                        • I am in update mood. Version 2.5.1 of xtdpdgmm comes with the following small improvements:
                          1. With the new suboption model(mean), instruments can now be specified for the model in within-group means. This is essentially the "between model" with averaged observations for each group. It might be useful for implementing a GMM version of the Hausman and Taylor (1981) estimator, as discussed by Arellano and Bover (1995).
                          2. A collinearity check has been added for the independent variables. In some cases, this circumvents non-convergence of the numerical optimization algorithm.
                          3. When the option nolevel is specified, groups with just a single observation (in levels) are now removed from the estimation sample. This affects the reported number of groups/observations and can have a small effect on standard errors and test statistics (but not coefficient estimates).
                          • Dear Sebastian Kripfganz

                            I updated xtdpdgmm to the newest version 2.5.1, however, I now cannot replicate models that I ran before the update and I don't know whether it's a bug or willingly implemented this way.
                            With my older version (must have been from May/June 2022), I was able to run
                            . xtdpdgmm log_co2emipercap L.log_co2emipercap log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity if year>=2000&year<=2017, gmm(L.log_co2emipercap, model(difference) collapse) iv(log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity, difference) nolevel nl(noser) small overid vce(robust)
                            After the update it says that nl() and nolevel cannot be combined:
                            options nl() and nolevel may not be combined

                            Furthermore, I wanted to replicate your suggestion from above
                            Originally posted by Sebastian Kripfganz View Post
                            Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:
                            xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)
                            Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.
                            I get the following error
                            . xtdpdgmm log_co2emipercap log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity if year>=2000&year<=2017, model(mdev) gmm(log_gdppercap log_gdppercapsq, lag(2 8) collapse model(diff)) iv(nuclearshare hydroshare windshare solarshare othershare energyintensity, norescale) twostep vce(robust, dc) small
                                             hash1():  3300  argument out of range
                                  asarray_contains():     -  function returned error
                            xtdpdgmm_init_ivvars_rescale():     -  function returned error
                                             <istmt>:     -  function returned error
                            What did I miss?
                            Last edited by Simon Rottler; 25 Jul 2022, 03:22.


                            • Simon Rottler

                              The moment conditions created by option nl(noserial) are a function of the level errors. Hence, this option is not compatible with the nolevel option. You could consider it a bug in the command's previous version that xtdpdgmm did not prevent you from running your code nevertheless.

                              The error message you received in your second example is puzzling. It looks like a bug but I was not able to replicate it with other data sets. I have checked the program's code but do not see how this could have happened. If you are able and willing to send me your data set by e-mail, I might be able to find out what's wrong.


                              • This week's update of the xtdpdgmm package to version 2.6.0 brings a new command, xtdpdgmmfe, which serves as a wrapper for xtdpdgmm with simplified syntax. Instead of specifying all the instruments manually, this wrapper command does it for you based on a set of assumptions you input.
                                • With option lags(), you can specify the autoregressive order of the model. By default, a dynamic model with 1 lag of the dependent variable is estimated.
                                • With options exogenous(), predetermined(), and endogenous(), you need to classify the regressors accordingly.
                                • Dummies for time effects can be added in the familiar way with option teffects.
                                • With the familiar option collapse and the new option curtail(), you can easily reduce the number of instruments using collapsing or curtailing. The latter sets a maximum lag order for the instruments.
                                • Option orthogonal allows you to request orthogonal deviations instead of first differences. (Note: For strictly exogenous variables, this will typically add instruments for the model in deviations from within-group means and for the model in forward-orthogonal deviations, while for predetermined and endogenous variables instruments are only available for the model in forward-orthogonal deviations. Also importantly, orthogonal automatically reduces the maximum lag order specified with option curtail() by 1 lag to ensure that the number of instruments stays the same with and without orthogonal deviations. The reason is that with orthogonal deviations, the minimum lag that is valid as an instrument is lower by 1 as well.)
                                • With option serial(), you can allow for serially correlated idiosyncratic errors up to the specified order. This will affect the minimum lag order of instruments for predetermined and endogenous variables, and possibly the availability of nonlinear moment conditions. By default, serially uncorrelated idiosyncratic errors are assumed.
                                • With option iid, you can add a homoskedasticity assumption in addition to serially uncorrelated idiosyncratic errors. This might enable additional linear or nonlinear moment conditions.
                                • Option initdev is less intuitive. It assumes that the deviations of the initial observations from their long-run means are uncorrelated with the idiosyncratic errors. This relaxes the slightly stronger default assumption that initial observations and group-specific effects (not their deviations) must be each uncorrelated with the idiosyncratic errors. Under the default assumption, lagged levels can be used as instruments for the first-differenced/forward-orthogonally transformed model. Under the initdev assumption, only lagged first differences or backward-orthogonally transformed variables can be used as instruments. It also effects the type of nonlinear moment conditions that might be available.
                                • With option stationary, additional first-differenced instruments become available for the level model. Nonlinear moment conditions become redundant.
                                • If nonlinear moment conditions are undesired irrespective of the assumptions, option nonl can be specified.
                                • In contrast to xtdpdgmm, the default estimator with xtdpdgmmfe is the iterated GMM estimator (igmm). Alternatively, the onestep, twostep, or continuously-updating GMM estimator (cugmm) can be requested.
                                • By default, xtdpdgmmfe displays the respective xtdpdgmm command line used to estimate the model. This allows you to fine-tune your estimator using xtdpdgmm, which offers additional specialist options, and to see which options are implied by your chosen assumptions. If this feature is undesired, display of the command line can be prevented with option nocmdline.
                                • Because the model is actually estimated by xtdpdgmm, the usual postestimation commands are available.
                                See the help file for details:
                                help xtdpdgmmfe
                                Here are some examples of conventional estimators, assuming that the regressors are predetermined. The examples also show how the xtdpdgmmfe syntax translates into the xtdpdgmm syntax:
                                . webuse abdata
                                1. Anderson and Hsiao (1981) "difference IV" estimators with lagged levels or lagged differences as instruments:
                                . xtdpdgmmfe n w k, predetermined(w k) collapse curtail(1) nonl teffects onestep
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) collapse curtail(1) teffects nolevel onestep
                                . xtdpdgmmfe n w k, predetermined(w k) initdev collapse curtail(1) nonl teffects onestep
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .) difference) collapse curtail(1) teffects nolevel onestep
                                2. Arellano and Bond (1991) one-step "difference GMM" estimator with curtailed instruments:
                                . xtdpdgmmfe n w k, predetermined(w k) curtail(3) nonl teffects onestep
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) curtail(3) teffects nolevel onestep
                                3. Arellano and Bover (1995) one-step "forward-orthogonal GMM" estimator with curtailed instruments:
                                . xtdpdgmmfe n w k, predetermined(w k) curtail(3) orthogonal nonl teffects onestep
                                  xtdpdgmm L(0/1).n w k , model(fodev) gmmiv(L.n w k, lagrange(0 .)) curtail(2) teffects nolevel onestep
                                4. Hayakawa, Qi, and Breitung (2019) "backward/forward-orthogonal IV" estimator:
                                . xtdpdgmmfe n w k, predetermined(w k) initdev collapse curtail(1) orthogonal nonl teffects onestep
                                  xtdpdgmm L(0/1).n w k , model(fodev) gmmiv(L.n w k, lagrange(0 .) bodev) collapse curtail(0) teffects nolevel onestep
                                5. Blundell and Bond (1998) two-step "system GMM" estimator with curtailed/collapsed instruments and doubly-corrected robust standard errors:
                                . xtdpdgmmfe n w k, predetermined(w k) stationary collapse curtail(3) teffects twostep vce(robust, dc)
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) gmmiv(L.n w k, lagrange(0 0) difference model(level)) collapse curtail(3) teffects twostep vce(robust, dc)
                                6. Ahn and Schmidt (1995) two-step GMM estimator (with curtailed instruments and doubly-corrected robust standard errors) using nonlinear moment conditions valid under serially uncorrelated idiosyncratic errors without or with homoskedasticity:
                                . xtdpdgmmfe n w k, predetermined(w k) curtail(3) teffects twostep vce(robust, dc)
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) nl(noserial) curtail(3) teffects twostep vce(robust, dc)
                                . xtdpdgmmfe n w k, predetermined(w k) iid curtail(3) teffects twostep vce(robust, dc)
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) nl(iid) curtail(3) teffects twostep vce(robust, dc)
                                7. Chudik and Pesaran (2022) iterated GMM estimator (with collapsed instruments, centered weighting matrix, and doubly-corrected robust standard errors) using nonlinear moment conditions valid under serially uncorrelated idiosyncratic errors (and no endogenous regressors):
                                . xtdpdgmmfe n w k, predetermined(w k) initdev collapse teffects igmm vce(robust, dc) center
                                  xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .) difference) nl(predetermined) collapse teffects nolevel igmm vce(robust, dc) center
                                8. Finally, a replication of a fixed-effects estimator in a static model (necessarily with strictly exogenous regressors):
                                . xtdpdgmmfe n w k, lags(0) exogenous(w k) collapse curtail(1) orthogonal teffects onestep norescale
                                  xtdpdgmm n w k , model(mdev) gmmiv(w k, lagrange(0 0)) collapse curtail(0) teffects nolevel onestep norescale
                                . xtreg n w k i.year, fe
                                To install the latest version of the package, type the following in Stata's command window:
                                net install xtdpdgmm, from( replace
                                Suggested citation if you find this package useful in your work:
