XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#631

22 Jan 2024, 07:16

If the variables w k are strictly exogenous (with respect to the idiosyncratic error component), then any serial error correlation does not affect the validity of them (or any of their lags) as instruments. If there is serial error correlation due to the omission of relevant lags of w k as regressors, e.g. due to delayed direct effects of L2.(w k), then w k would not be strictly exogenous in the first place in a model with those omitted lags. Thus, saying that w k are strictly exogenous effectively is also a statement about the correct specification of the model dynamics.

In this regard, I wonder what your motivation is for including L.(w k) as regressors instead of w k. Sometimes, people do this to avert simultaneous feedback from the dependent variable. In that case, however, L.(w k) may not be endogenous any more, but they cannot be strictly exogenous either. At best, they would be predetermined (weakly exogenous). For predetermined variables, serial error correlation does matter for the validity of the instruments. Probably even more important, simply lagging the regressors for this argument typically creates model misspecification, which then puts the whole analysis in jeopardy.

https://www.kripfganz.de/stata/
Comment
Arkangel Cordero

Join Date: Apr 2020

Posts: 32
#632

22 Jan 2024, 10:55

Dear Professor @Sebastian Kripfganz

Understood. This is helpful. Thank you!
Comment
Tugrul Cinar

Join Date: Sep 2020

Posts: 5
#633

07 Feb 2024, 04:04

Dear Sebastian,

I am going to use a Micro dataset for an upcoming study. However, this dataset consists of random samples for each year, Essentially, it's a pooled dataset rather than panel data. Moreover, I suspect an endogeneity issue between the dependent and independent variables in the model I'm aiming to estimate. Additionally, the dataset encompasses roughly 100,000 units per year, spanning across seven years.

Given that the xtdpdgmm command is designed for linear (dynamic) panel data, do you recommend it for analyzing a pooled dataset?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#634

07 Feb 2024, 04:08

What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.

https://www.kripfganz.de/stata/
1 like
Comment
Tugrul Cinar

Join Date: Sep 2020

Posts: 5
#635

07 Feb 2024, 11:46

What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.

Thank you very much for the quick response.
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 63
#636

28 Feb 2024, 09:03

Dear Prof. Sebastian Kripfganz

1) Can we use the sys-gmm with a sample that has 28 countries and 20 years? Is this considered a big T or can we still use the sys-GMM?
2) When we define the lagged dependent variable as a predetermined variable, the estimated coefficient of this variable is 0.542. However, when we specify the variable as an endogenous, its magnitude becomes .745. Does the magnitude of the lagged dependent variable have to be close to 1?

Could you please guide us on these two points.

Thanks
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#637

01 Mar 2024, 01:54

I would call this a small N, moderately small T sample. You probably do not need to be concerned much with asymptotic efficiency; it might thus be a good idea to use the one-step insted of the two-step estimator, to avoid estimating the weighting matrix. Also, use the available options (collapsing and lag restrictions) to limit the number of instruments. You could still use the system GMM estimator if you can theoretically justify its assumptions. With such a data set, testing these assumptions empirically is challenging and probably not very reliable.

From the outset, we do not know what the true value of the coefficient of the lagged dependent variable is; that is why we are estimating it. There can be different reasons for the observed differences: (i) sampling variability due to the small data set; (ii) endogeneity of the lagged dependent variable (due to neglected serial correlation in the error term) such that the model treating it as predetermined is misspecified; (iii) weak instruments when treating the lagged dependent variable as endogenous, to name a few.

https://www.kripfganz.de/stata/
1 like
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 63
#638

04 Mar 2024, 09:51

Dear Prof. Sebastian Kripfganz

Thanks for your constructive replies.

1. Are there any issues if we restrict our sample to 28 countries and 13 years? We use a one-step system GMM estimator to estimate our model with this sample. Could you please let us know if we still have any issues with this setup?
2. Given this sample, can we use the diff-GMM for robustness checks? or would you recommend another estimator for robustness?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#639

04 Mar 2024, 10:50

N=28 is still small; therefore, my previous comments still apply.

Yes, you can (and probably should) use a diff-GMM estimator as a robustness check (again, preferably one-step only).

https://www.kripfganz.de/stata/
1 like
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 63
#640

06 Mar 2024, 10:27

Dear Prof. Sebastian Kripfganz

Thanks for your constructive replies.

Does the specification of the system GMM have to be the same as the specification of the Diff-GMM? For example, if we use lags(1 3) in the system GMM, do we have to specify the same range of lags in the Diff-GMM? or can the two estimators have different specifications for the range of lags?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#641

06 Mar 2024, 11:42

The lags for those instrument that refer to the first-differenced model should be the same for the two estimators; otherwise the results become less easy to compare.

https://www.kripfganz.de/stata/
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#642

07 Mar 2024, 07:43

A new update is available for xtdpdgmm on my personal website. Version 2.6.6 fixes a few bugs in the postestimation command estat serialpm.

Code:

net install xtdpdgmm, from(https://www.kripfganz.de/stata) replace

https://www.kripfganz.de/stata/
Comment
Ismail Boujnane

Join Date: Aug 2022

Posts: 5
#643

08 Mar 2024, 22:20

Dear sebastian, I have some cross-sectional (categorical) data collected from a questionnaire in 2021, which are integrated into a longitudinal dataset collected at different points in time for a period of 6 years, from 2015 to 2020. Knowing that the sample is the same for both data collection method, and my categorical data (institutional support, corporate governance) are dynamic, not static, I want to know if integrating them into my panel data is feasible.
Comment
Arkangel Cordero

Join Date: Apr 2020

Posts: 32
#644

09 Mar 2024, 17:26

Dear Professor @Sebastian Kripfganz

I have a quick question regarding a difference gmm model. The output below comes after successfully reproducing the results for the difference gmm model in xtdpdgmm with xtivreg2 in order to access the instrument diagnostics available for the latter. In general, the diagnostics look fine. The Arellano-Bond autocorrelation test of the residuals look fine as well-- statistically significant ar(1) but statistically insignificant for higher-order autocorrelation in residuals. However, both statistics for the Weak identification test look quite low in magnitude and to complicate things, the "Stock-Yogo weak ID test critical values" are <not available>. My questions are:

1) Is this a matter for concern given the low values of the statistics for the Weak identification test?
2) Is there anything to be done to obtain valid "Stock-Yogo weak ID test critical values"?
3) Do you find these diagnostics concerning?
4) Is there anything to be done at all?

Thank you in advance!

HTML Code:

Underidentification test (Kleibergen-Paap rk LM statistic): 98.401 Chi-sq(14) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 1.345 (Kleibergen-Paap rk Wald F statistic): 1.879 Stock-Yogo weak ID test critical values: <not available> ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 15.589 Chi-sq(12) P-val = 0.1780 -endog- option: Endogeneity test of endogenous regressors: 17.543 Chi-sq(3) P-val = 0.0004

Last edited by Arkangel Cordero; 09 Mar 2024, 17:30.
Comment

Arkangel Cordero

Join Date: Apr 2020
Posts: 32

#645

09 Mar 2024, 20:07

Dear Professor @Sebastian Kripfganz

As a follow up, I ran the "weakiv" test after ivreg2 (ssc install weakiv) and obtained the diagnostics below for the same model. Can I conclude that the instruments are strong enough despite the low magnitude of the Weak identification test statistics that come by default in ivreg2?

HTML Code:

----------------------------------------
 Test |       Statistic         p-value
------+---------------------------------
  CLR | stat(.)   =   137.16     0.0000
    K | chi2(32)  =    99.89     0.0000
    J | chi2(13)  =    42.29     0.0000
  K-J |        <n.a.>            0.0000
   AR | chi2(44)  =   142.18     0.0000
------+---------------------------------
 Wald | chi2(32)  =   146.58     0.0000
----------------------------------------

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment