XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

alessandro gst

Join Date: Jun 2019

Posts: 62
#76

25 Sep 2019, 13:17

Dear Sebastian,
thank you for writing this very good command! I have a question regarding the implementation of system GMM. I would like to estimate a model with both GMM instruments and exogenous variables. or more generally with regular IV instruments. However, I am having some trouble to replicate the xtabond2 results.
I fear the syntax for system GMM is not completely clear. I would like to estimate a model similar to this one:

HTML Code:

xtabond2 L(0/1).n w k ys*, gmm(L.n w) iv(k ys*) robust noconstant

I have tried to write the xtdpdgmm equivalent, but results do not match.

HTML Code:

xtdpdgmm L(0/1).n w k ys*, gmm(L.n w, model(difference)) gmm(L.n w, d l(0 0)) iv(k ys*, d model(d)) iv(k ys*, l(0 0)) nocons vce(r)

What am I doing wrong?

thank you very much in advance for your help

Best
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#77

26 Sep 2019, 02:56

The reason for the discrepancy is that xtabond2 does something counterintuitive when you specify the iv() option without the eq() suboption. To replicate the xtdpdgmm results, you need to specify

Code:

xtabond2 L(0/1).n w k ys*, gmm(L.n w) iv(k ys*, eq(d)) iv(k ys*, eq(l)) robust noconstant

Note that the xtabond2 options iv(k ys*, eq(d)) iv(k ys*, eq(l)) are not the same as iv(k ys*). The former create the separate moment conditions
\[E [\Delta \mathbf{Z}_i' \Delta \mathbf{e}_i] = \mathbf{0} ,\quad E [\mathbf{Z}_i' \mathbf{e}_i] = \mathbf{0}\]
while the latter creates combined moment conditions
\[E [\Delta \mathbf{Z}_i' \Delta \mathbf{e}_i + \mathbf{Z}_i' \mathbf{e}_i] = \mathbf{0}\]

I would argue that almost nobody intentionally wants to create these combined moment conditions (although they are technically valid) and the danger of the xtabond2 implementation is that users might believe they are doing the former while actually doing the latter. (To be fair, it is not a bug but a documented feature of xtabond2.)

xtdpdgmm always creates the moment conditions separately for the transformed and untransformed model.

https://www.kripfganz.de/stata/
2 likes
Comment
alessandro gst

Join Date: Jun 2019

Posts: 62
#78

26 Sep 2019, 03:33

Thank you very much for explaining this, it is much clearer now!

I only have one further question, that depends on the fact that I might have mistaken writing the code. when specifying level moment condition for the iv() instrument, I have not differentiated it in the code of the previous post.

HTML Code:

iv(k ys*, d model(d)) iv(k ys*, l(0 0))

can you confirm that this was a mistake and the accurate code to estimate the moment conditions would include the differentiated iv?

HTML Code:

iv(k ys*, d model(d)) iv(k ys*, d l(0 0))

thanks again for your help
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#79

26 Sep 2019, 03:48

What is accurate depends on your assumptions. With xtdpdgmm, the option iv(k ys*) creates standard untransformed instruments for the untransformed level model. This requires those instruments to be uncorrelated with both the idiosyncratic errors and the unobserved unit-specific heterogeneity. The option iv(k ys*, d) creates first-differenced instruments for the level model. It still requires the assumption that the (first-differenced) instruments are uncorrelated with both error components, but assuming that the first differences are uncorrelated with the unit-specific heterogeneity might be easier to justify than for the untransformed instruments. (This is essentially the original idea behind the system GMM approach.)

https://www.kripfganz.de/stata/
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#80

26 Sep 2019, 12:33

Hello,

I would like to know if we can run difference-GMM (Arellano and Bond, 1991) using xtdpdgmm. If yes, what changes should we make to an already existing command being used to run system GMM through xtdpdgmm. Below is the sample command I am using to run system GMM.

Code:

xtdpdgmm CashHoldings L.CashHoldings Size Leverage, teffects twostep vce(cluster CompanyID) gmmiv(L.CashHoldings, lag(1 1) model(fodev)) gmmiv(Leverage, lag(1 4) collapse model(fodev)) iv(Size, model(level))

Thanks!

Last edited by Prateek Bedi; 26 Sep 2019, 12:35.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#81

27 Sep 2019, 03:36

Besides iv(Size, model(level)), all instruments of your specification already refer to the transformed model (forward-orthogonal deviations in your case). To obtain the Arellano-Bond estimator, you need to replace model(fodev) by model(diff), and remove the instrument for the level model (or replace it by an appropriate instrument for the first-differenced model).

To obtain the Arellano-Bond estimator in a strict sense, you would also need to specify the time dummies for the first-differenced model. The teffects option always specifies those instruments for the level model. You could replace the teffects option by iv(year*, diff model(diff)), assuming that your time identifier variable is called year, and also specify year* in the list of independent variables.

Also, in the original Arellano-Bond paper, collapsing is not used and the number of lags is not restricted, thus lag(1 .) would be the respective suboption.

You can find an example on slide 17 of my presentation at this year's London Stata conference:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
Comment
alessandro gst

Join Date: Jun 2019

Posts: 62
#82

14 Oct 2019, 13:49

Dear Prof. Kripfganz,
I have an unbalanced panel with N=96 and T=24 and I was trying to estimate an iterated GMM model because this appears better than the one-step and two-step model as it is robust to misspecification of the initial weighting matrix. Of course, being the data unbalanced, the model should use forward orthogonal deviations.

I have a series of questions:
1) is it possible to estimate a difference (rather than system) iterated GMM?
2) Assuming an affirmative answer to the first question and that the specifications of lag length hold, is the syntax below a correct implementation of a difference GMM with forward orthogonal deviations?

Code:

xtdpdgmm L(0/1).y l.x $controls_lag, model(fodev) collapse gmm(y, lag(2 4)) gmm(l.x $controls_lag, lag(1 3)) igmm vce(r)

3) should I be concerned with the number of iterations? I have tried the code above on my data and there are tens of thousands of iterations.
4) Does the vce(r) command report the Windmeijer corrected SE in the iterated version?
5) Do you advise following Kiviet procedure also to determine lag length and model specification with iterated GMM
4) Finally, an issue is whether I fully understand the assumptions I am making using the lags. Does putting lag limits of lag(2 4) means that I am using only values that are two to four time periods ahead to calculate the forward orthogonal deviations? Also, can you confirm that I am violating the assumption of the regressor being uncorrelated with the error when I am using a lagged regressor (l.x) and a lag length that includes 1 (lag(1 3)).

I thank you in advance for your helpfulness. It is really a great command.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#83

14 Oct 2019, 14:59

1) Yes. Just specify only those instruments that refer to the differenced model together with the igmm option.
2) With forward-orthogonal deviations, assuming that the idiosyncratic error term is not serially correlated, the first lag of the dependent variable is already a valid instrument: gmm(y, lag(1 4)). Similarly, if your lagged controls are exogenous or predetermined, you can use the contemporaneous terms as well: gmm(l.x $controls_lag, lag(0 3)). (This argument does not apply with first differences instead of forward-orthogonal deviations.) Other than that, the specification appears correct.
3) Tens of thousands of iterations does not sound good. It might indicate that there is a high degree of collinearity among your instruments which makes it difficult to estimate the optimal weighting matrix. You could try whether removing some of the controls solves this problem. Alternatively, you could also limit the number of iterations with the igmmiterate() option. After some dozens of iterations, the additional improvements should typically be not be so large any more. When the iterative GMM estimator fails to converge, you might also consider to stick to the two-step estimator.
4) Yes, but it only accounts for the previous step (i.e. as if the last step is the result from a two-step estimator).
5) You could use the iterated GMM estimator also in the model specification stage. I do not have a strong opinion on that matter.
6) No, the forward-orthogonal deviations are always computed as deviations from all future observations. The lag limits refer to the instruments. lag(2 4) means that the second to fourth lag of the specified variable are used as instruments. gmm(l.x, lag(1 3)) implies that the first to third lag of the lag of x are used as instruments (i.e. the second to fourth lag of x). These lags need to be uncorrelated with the transformed error term (i.e. the forward-orthogonally deviated error term or the first-differenced error term, depending on your model() specification). This depends on the assumptions you make about x. See slide 67 of my London Stata Conference presentation for forward-orthogonal deviations and slide 11 for first differences.

https://www.kripfganz.de/stata/
2 likes
Comment
Tim Grünebaum

Join Date: Aug 2014

Posts: 49
#84

16 Oct 2019, 08:05

Hello again,

I have a rather technical question I could not answer myself on my way to specify a correct model:
As we know Blundell and Bond (1998) suggested to improve the Arellano and Bond (1991) DiffGMM estimator by adding new moment conditions such that differences of lagged endogenous variables may serve as instruments for lagged endogenous variables, yielding SysGMM.
So I wonder if it would be appropriate to only use these new instruments and drop the original ones from Arellano and Bond. I never saw this in practice before.
This comes from my model rejecting H0 of valid instruments only for the Arellano Bond instruments (thus L.D.y are valid for L.y but L2.y are not valid for D.y).
I tested this using your estat overid and estat overid, diff postestimation commands to identify the problematic set of instruments. The model "works" neither with DiffGMM nor SysGMM as described in your (well written) Stata Conference presentation. So I tried something new...

So can I estimate something like this

Code:

xtdpdgmm L(0/1).n w k, collapse gmm(w k, lag(1 3)) gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)

instead of this SysGMM from slide 36 in your presentation or any DiffGMM model?

Code:

xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)

For the first model the moment conditions are valid, but only in this case... I also tried the nl(noserial) and nl(iid) options if available/appropriate and this did not solve the rejection of the Hansen test.

Furthermore, what does the rejection of the typical DiffGMM moment conditions tell me about my model and the endogenous variables?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#85

16 Oct 2019, 08:33

In theory, you can run such a "level GMM" estimation but it is always much less efficient than the system GMM estimation.

The more problematic aspect is the validity of the instruments: If L2.y is not a valid instrument in the first-differenced model, then it is hard to believe that LD.y is a valid instrument in the level model. The validity of L2.y fails if the first-differenced error term has second-order serial correlation. But then it cannot be serially uncorrelated in levels. If the level error term is serially correlated, then LD.y is no longer a valid instrument for the level model.

There are at least two explanation for your empirical observations:
It could be that either the overidentification test for the first-differenced model resulted in a type-1 error (i.e. rejecting the null even though it is true) or the the overidentification test for the level model resulted in a type-2 error (i.e. not rejecting the null even though it is not true).

More importantly, if you use a difference-in-Hansen test to test the validity of the instruments for the level model after a system GMM estimation, then this test assumes that all the other instruments are valid (in particular all those instruments for the first-differenced model). In other words, if you initially reject the difference GMM estimation based on the respective Hansen test, then you cannot use the difference-in-Hansen test in a system GMM estimation that in part uses these invalid instruments from the first-differenced model.

https://www.kripfganz.de/stata/
Comment
Tim Grünebaum

Join Date: Aug 2014

Posts: 49
#86

16 Oct 2019, 09:02

You're probably right. If already the DiffGMM instruments fails then going further does not make much sense. Only then we could improve efficiency.
So I suppose that GMM cannot overcome the endogeneity problems in such a case. :-(
I think one could only go for deeper lags in the instruments but this does not help, neither do igmm, collapse or fodev. Unfortunately external instruments are not available.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#87

16 Oct 2019, 09:08

Another way of dealing with this problem could be to add further lags of the dependent variable, lags of the other independent variables, and even interaction terms to the regression model. The reason for this is that invalidity of the instruments could be a consequence of omitted regressors. Once you add these additional regressors to the model, the same instruments that were invalid before might become valid.

https://www.kripfganz.de/stata/
Comment
Tim Grünebaum

Join Date: Aug 2014

Posts: 49
#88

17 Oct 2019, 03:41

These are some good thoughts. Omitted variables might indeed exist in my analysis since my dependent variable is firm return on assets, a very noisy measure which is hard to predict.
Another way which looks promising in my applicytion might be building subsamples for different groups (in my case industries) which should be similar to using interactions to account for varying coefficients. Different subgroups might display different correlations among the variables.
1 like
Comment
Tim Grünebaum

Join Date: Aug 2014

Posts: 49
#89

17 Oct 2019, 05:21

Sorry for double posting.
My hausman test of nl(iid) against nl(noserial) yielded an error. Do you have an advice?

Code:

. estat hausman iid *: 3200 conformability error xtdpdgmm_score(): - function returned error <istmt>: - function returned error
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2569
#90

17 Oct 2019, 06:12

That should not happen and certainly looks like a bug in the programm. It is difficult to identify the problem without a reproducible example. Would it be possible for you to send me the data and code you used by e-mail?

https://www.kripfganz.de/stata/
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment