XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Taka Sakamoto

Join Date: Dec 2014
Posts: 88

#571

23 Jul 2023, 05:00

Dear Professor Kripfganz:

I would like to ask you to teach me how to apply your xtdpdgmm correctly. I thank you in advance for your kind help. I would like to implement a system-GMM model like the following:

Code:

xtdpdgmm lpgrow smei3 labutilpcgr mfppwt rknapcgr hfcegrow l.lrgdpopc if id~=13,gmm(lpgrow  labutilpcgr mfppwt rknapcgr hfcegrow lrgdpopc, lag(2 2) collapse model(diff)) gmm(lpgrow labutilpcgr mfppwt  rknapcgr hfcegrow lrgdpopc, lag(1 1) diff collapse model(level)) iv(smei3,diff model(level)) two vce(cl id) small

And I get the results like this:

Code:

. xtdpdgmm lpgrow smei3 labutilpcgr mfppwt rknapcgr hfcegrow l.lrgdpopc if id~=13,gmm(lpgrow  labutilpcgr mfppwt rknap
> cgr hfcegrow lrgdpopc, lag(2 2) collapse model(diff)) gmm(lpgrow labutilpcgr mfppwt  rknapcgr hfcegrow lrgdpopc, lag
> (1 1) diff collapse model(level)) iv(smei3,d model(level)) two vce(cl id) small

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =    .162683
Step 2         f(b) =  .16926059

Group variable: id                           Number of obs         =       514
Time variable: year                          Number of groups      =        20

Moment conditions:     linear =      14      Obs per group:    min =        12
                    nonlinear =       0                        avg =      25.7
                        total =      14                        max =        28

                                    (Std. err. adjusted for 20 clusters in id)
------------------------------------------------------------------------------
             |              WC-Robust
      lpgrow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       smei3 |   .0991957    .057027     1.74   0.098    -.0201631    .2185545
 labutilpcgr |  -.3546281   .1410527    -2.51   0.021    -.6498549   -.0594013
      mfppwt |   .6901644   .0872634     7.91   0.000     .5075199    .8728089
    rknapcgr |   .1629855   .0893485     1.82   0.084     -.024023    .3499939
    hfcegrow |   .0271975   .0837204     0.32   0.749    -.1480314    .2024264
             |
    lrgdpopc |
         L1. |  -.7435579   .2764233    -2.69   0.015    -1.322118   -.1649973
             |
       _cons |   8.620034   3.041985     2.83   0.011     2.253086    14.98698
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L2.lpgrow L2.labutilpcgr L2.mfppwt L2.rknapcgr L2.hfcegrow L2.lrgdpopc
 2, model(level):
   L1.D.lpgrow L1.D.labutilpcgr L1.D.mfppwt L1.D.rknapcgr L1.D.hfcegrow
   L1.D.lrgdpopc
 3, model(level):
   D.smei3
 4, model(level):
   _cons

1. Am I specifying the instrumentation correctly? I'm particularly not confident about specifying the lags in level and difference equations. I would like to make sure I'm doing the right thing.

2. What if I replace the first independent variable "smei3" with "sme" which is a dummy variable? Do I need to change anything inside "iv()"? A difference of a dummy variable is obviously wrong.

3. Is there anything that you notice I'm doing wrong? I would appreciate any suggestion.

Thank you for your generous help.

Best wishes,

Taka

If the results are hard to see, here is a picture that shows the results.

Doc14.docx

Attached Files

Last edited by Taka Sakamoto; 23 Jul 2023, 05:06.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#572

23 Jul 2023, 07:16

1. I cannot see any obvious problem with your specification. However, it is of some concern that you only have 20 groups. It is difficult to obtain reliable results with such a small cross-sectional sample size. If you nevertheless want to do a GMM estimation, you might want to stick to the one-step estimator, which does not require estimation of the optimal weighting matrix. The one-step estimator will be asymptotically inefficient, but with N=20 you are very far away from asymptopia anyway.

2. If your dummy variable varies over time, then you can in principle leave everything as it is. However, there is some risk that lags and/or first differences of dummy variables can be weak instruments. If the dummy variable is time invariant, then you can obviously not first difference it. You might then have to adopt the assumption that this variable is uncorrelated with both the idiosyncratic and the group-specific error component, which might or might not be acceptable, depending on your research question. In this case, you could include the dummy variable without the difference option for the level model.

3. The main concern is about the sample size; see point 1.

https://www.kripfganz.de/stata/
Comment
Taka Sakamoto

Join Date: Dec 2014

Posts: 88
#573

23 Jul 2023, 15:20

Thank you so much for your response and explanation. May I ask two questions?

1. Re: iv():
What should determine the use of iv() and whether to use difference or level in it? Also, iv() doesn't always have to be used and specified?

2. Re: how different are two step and one step and what should determine which to use?

Thank you your generous and kind help.

Taka
Comment
Taka Sakamoto

Join Date: Dec 2014

Posts: 88
#574

23 Jul 2023, 17:11

3. How many cross-sections is desirable? Also, when researchers find that their cross-sections are too small, what other estimation do they apply?

Thank you again.

Taka
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#575

24 Jul 2023, 05:23

iv(x) is equivalent to gmm(x, lag(0 0) collapse). You then need to justify (based on assumptions you make) whether differenced or undifferenced variables are valid instruments (for the level model or the differenced model); this follows the arguments set out in the seminal papers by Arellano and Bond (1991) and Blundell and Bond (1998), among others.

Two-step estimators use the same instruments as one-step estimators but rely on an estimation of the optimal weighting matrix. This yields asymptotically efficient estimates. If the instruments are valid, both one-step and two-step estimators are invalid. The inefficiency of the one-step estimator might be less of a problem in small samples, where it can be difficult to estimate the optimal weighting matrix, especially the larger the number the instruments is. If you have a large sample size, then you should go for the efficiency gains with the two-step estimators. With small samples, it is not clear whether this would actually improve the estimates.

I cannot provide a general answer on the "desirable" number of cross sections. More is better. With 50 or even 100 cross sections, the finite-sample performance of the estimators could still be unsatisfying, but this depends on a lot of other data characteristics. With small N, you just need to keep your model as simple as possible, and impose some strong assumptions where this might be okay (e.g. assume that your regressors are exogenous). A simple IV estimator using a minimal number of collapsed instruments might do the job. But the truth is: If your sample is small, you just cannot really expect precise and robust estimates, no matter which estimator you use.

https://www.kripfganz.de/stata/
Comment
Taka Sakamoto

Join Date: Dec 2014

Posts: 88
#576

24 Jul 2023, 15:06

Thank you so much for your explanation. It's very helpful. I have one more elementary question and I would be grateful if you could teach me:

In the level equation of the gmm ("gmm(...model(level))), you do absolutely specify diff option under all circumstances, right? Or does it depend on your assumptions? I ask this, because the gmm(...model(level)) option produces instruments in levels unless you specify "diff", and it gives me the impression that it can also be instruments in levels as long as "diff" is optional.

Thank you so much for your generous help.

Taka
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 60
#577

25 Jul 2023, 05:13

Dear Prof. Sebastian Kripfganz
We estimate our Sys-GMM model with the following code:

Code:

xtabond2 L(0/1).GDP Labor Capital Financial_Development Temperature, gmmstyle(L.GDP L.Labor L.Capital L.Financial_Development , lag(1 3)) ivstyle(Temperature) robust twostep

This sys-GMM gives the estimated coefficient of Financial_Development is positive and insignificant. Also, the estimated coefficient of the lagged dependent variable is 0.9 and is statistically significant.

However, when we estimate the Diff-GMM with the following code:

Code:

xtabond2 L(0/1).GDP Labor Capital Financial_Development Temperature, gmmstyle(L.GDP L.Labor L.Capital L.Financial_Development , lag(1 3)) ivstyle(Temperature) robust twostep noleveleq

This Diff-GMM gives the estimated coefficient of Financial_Development is positive and significant. Also, the estimated coefficient of the lagged dependent variable is 0.2 and statistically insignificant.

The Diff-GMM gives the result we expected for our main variable (i.e., Financial_Development), but sys-GMM. However, in the relevant literature to our RQ, we find that most papers use system-GMM. Also, please note that fixed effect regression results are consistent with Diff-GMM.

Code:

xtreg GDP Labor Capital Financial_Development Temperature, fe r

We have two questions:
- Is it normal to have different results for Sys-GMM and Diff-GMM? Is there a reason behind this?
- Do we need to consider other specifications for the system GMM to get similar results to Diff-GMM? For example, are there any other specifications from xtdpdgmm that could improve our sys-GMM results?

Last edited by Sarah Magd; 25 Jul 2023, 05:23.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#578

25 Jul 2023, 08:30

Taka Sakamoto
Instruments for the level model must be uncorrelated with the unobserved group-specific effects, no matter whether you specify them with gmmiv() or iv(). Without option diff, this requires that the levels of those instruments themselves are uncorrelated with those unobserved effects, which is akin to a "random-effects" assumption. With option diff, the first-differenced instruments need to be uncorrelated with the unobserved effects, which is a weaker requirement but still needs to be justified (see the seminal paper by Blundell and Bond (1998)).

Sarah Magd
This topic is about the xtdpdgmm command. It would be better to start a different topic if you have a question about a different command, such as xtabond2. Some general comments:
System GMM requires stronger assumptions about the initial observations than difference GMM. In a macroeconomic context, the additional assumption is quite likely to be violated due to the heterogeneous development of the countries. Unfortunately, in the empirical practice there is often not much effort made in justifying the extra assumption for system GMM. Just because the relevant literature used system GMM, this does not mean that it really is justified.

If there is a lot of persistence in the data, which again is quite likely with macroeconomic data, then difference GMM might suffer from a weak-instruments problem and the coefficient of the lagged dependent variable can be severely downward biased. This would be consistent with the difference in estimates between the difference and system GMM estimator, but the first point above could also explain that difference if the additional assumption for system GMM is violated. Also, even if both estimators are consistent, in small samples they can have a large sampling variation, which could lead to the different estimates you observed.

If the true data-generating process is dynamic, then estimating a static fixed-effects model yields biased estimates. So, it could be coincidentally that the bias from the static fixed-effects estimator is similar to the bias of the difference GMM estimator.

In order to reduce the weak-instruments problem of the difference GMM estimator, without imposing the stronger assumption for system GMM, a good solution can be to use the difference GMM estimator with added nonlinear moment conditions; see xtdpdgmm option nl(noserial).

https://www.kripfganz.de/stata/
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 60
#579

25 Jul 2023, 10:07

Thanks Prof. Sebastian Kripfganz
I have tried the following code:

Code:

xtdpdgmm L(0/1).GDP Labor Capital Financial_Development Temperature, model(diff) collapse gmm(GDP Labor Capital , lag(2 4)) gmm(Financial_Development, lag(1 2)) gmm(Temperature, lag(. .)) two vce(r) overid nl(noserial)

So I consider Temperature as a predetermined variable, and Temperature as an exogenous variable. However, when I run this code, I get this error:
xtdpdgmm_init_nl(): 3498 touse variable for model 'diff' required
<istmt>: - function returned error
r(3498);

Could you please help me figure out the problem?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#580

25 Jul 2023, 10:21

That's an error message you should not see. Would it be possible for you to send me your data set by e-mail, so that I can replicate the problem?

https://www.kripfganz.de/stata/
Comment
Taka Sakamoto

Join Date: Dec 2014

Posts: 88
#581

25 Jul 2023, 16:38

Thank you. What am I making happen if I do not use iv() option? I have read your 2019 slides, and see that you sometimes do not use iv() option.

Thank you for your help.
Comment
Taka Sakamoto

Join Date: Dec 2014

Posts: 88
#582

25 Jul 2023, 17:17

Sorry, I have one more question. The variable that goes in iv() cannot and should not be correlated with the dependent variable?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#583

26 Jul 2023, 02:50

The variable that you specify in iv() (or gmmiv()) should not be correlated with the error term. In other words, if it is excluded from your regression specification, it should not have a direct effect on the dependent variable after controlling for any indirect effects through the included regressors. This is the standard requirement for a valid instrument.

As mentioned earlier, iv() is just a special case of gmmiv(). If the relevant instruments are already specified with gmmiv(), then there is often no need to use iv(). In some cases, e.g. for dummy variables, the iv() option is easier to use.

https://www.kripfganz.de/stata/
Comment

Taka Sakamoto

Join Date: Dec 2014
Posts: 88

#584

26 Jul 2023, 15:58

Thank you. Could you tell me what's happening in the following estimation? I enter the command:

Code:

xtdpdgmm gdpgrow sme inflation gfcfgrow hfcegrow tradeopen l.lrgdpopc ,gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc , lag(2 2) collapse model(diff)) gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc, lag(1 1) diff collapse model(level)) iv(sme,model(level))  one vce(cl id) small overid

I get the following results:

Code:

. xtdpdgmm gdpgrow sme inflation gfcfgrow hfcegrow tradeopen l.lrgdpopc ,gmm(gdpgrow inflation  gfcfgrow hfcegr
> ow lrgdpopc , lag(2 2) collapse model(diff)) gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc, lag(1 1) diff
>  collapse model(level)) iv(sme,model(level))  one vce(cl id) small overid

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =   3.170343

Fitting reduced model 1:
Step 1         f(b) =  8.867e-16

Fitting reduced model 2:
Step 1         f(b) =  1.413e-14

Fitting reduced model 3:
Step 1         f(b) =  3.1380076

Group variable: id                           Number of obs         =       919
Time variable: year                          Number of groups      =        21

Moment conditions:     linear =      12      Obs per group:    min =         6
                    nonlinear =       0                        avg =   43.7619
                        total =      12                        max =        46

                                    (Std. err. adjusted for 21 clusters in id)
------------------------------------------------------------------------------
             |               Robust
     gdpgrow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         sme |   1.371398    .415195     3.30   0.004     .5053168     2.23748
   inflation |  -.2012583   .0656044    -3.07   0.006    -.3381068   -.0644099
    gfcfgrow |   .1756431   .0365951     4.80   0.000     .0993071    .2519791
    hfcegrow |   .6537494   .2542447     2.57   0.018     .1234043    1.184095
   tradeopen |  -.0440692   .0166601    -2.65   0.016    -.0788215   -.0093169
             |
    lrgdpopc |
         L1. |  -.3924132   1.467164    -0.27   0.792    -3.452864    2.668037
             |
       _cons |   7.086845   15.30321     0.46   0.648     -24.8351    39.00879
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L2.gdpgrow L2.inflation L2.gfcfgrow L2.hfcegrow L2.lrgdpopc
 2, model(level):
   L1.D.gdpgrow L1.D.inflation L1.D.gfcfgrow L1.D.hfcegrow L1.D.lrgdpopc
 3, model(level):
   sme
 4, model(level):
   _cons

When I remove "iv(sme,model(level))" from the command I get the results:

Code:

. xtdpdgmm gdpgrow sme inflation gfcfgrow hfcegrow tradeopen l.lrgdpopc ,gmm(gdpgrow inflation  gfcfgrow hfcegr
> ow lrgdpopc , lag(2 2) collapse model(diff)) gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc, lag(1 1) diff
>  collapse model(level))  one vce(cl id) small overid

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  3.1380076

Group variable: id                           Number of obs         =       919
Time variable: year                          Number of groups      =        21

Moment conditions:     linear =      11      Obs per group:    min =         6
                    nonlinear =       0                        avg =   43.7619
                        total =      11                        max =        46

                                    (Std. err. adjusted for 21 clusters in id)
------------------------------------------------------------------------------
             |               Robust
     gdpgrow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         sme |    -.64922   6.394653    -0.10   0.920    -13.98823    12.68979
   inflation |  -.1814501   .1103874    -1.64   0.116    -.4117143    .0488141
    gfcfgrow |    .181537   .0435933     4.16   0.000     .0906029    .2724712
    hfcegrow |   .6699812   .2849286     2.35   0.029     .0756305    1.264332
   tradeopen |  -.0526704   .0459819    -1.15   0.266    -.1485869    .0432461
             |
    lrgdpopc |
         L1. |   .1360665     2.8836     0.05   0.963    -5.879018    6.151151
             |
       _cons |    3.10732   25.29937     0.12   0.903    -49.66625    55.88089
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L2.gdpgrow L2.inflation L2.gfcfgrow L2.hfcegrow L2.lrgdpopc
 2, model(level):
   L1.D.gdpgrow L1.D.inflation L1.D.gfcfgrow L1.D.hfcegrow L1.D.lrgdpopc
 3, model(level):
   _cons

"sme" is an invariant dummy variable, but the same results happen when I use a continuous version of "sme".

I am sorry I have taken time from you. And I appreciate your generous, kind help.

Many thanks.

TS

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#585

27 Jul 2023, 02:08

By removing your instrument for sme, the coefficient of that regressor might be poorly identified. Not surprisingly, the standard errors of that coefficient estimate become huge. It (unsuccessfully) tries to borrow some identification strength from the other instruments, which then also slightly inflates the other standard errors.

The coefficient of your lagged dependent variable also appears to be poorly identified. It would probably require additional lags as instruments, which in turn would however increase the number of instruments, which can cause further trouble.

https://www.kripfganz.de/stata/
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment