How not to do xtdpdgmm

David Roodman

Join Date: Jul 2014

Posts: 477
#1

How not to do xtdpdgmm

22 Nov 2020, 21:50

The title of this post is deliberately provocative: it echoes slides and tweets by Sebastian Kripfganz: “How not to do xtabond2.”

In fact, I only chose that title to at least subliminally flag the downsides of oppositional discourse. I’m not interested in an argument. I haven’t used Sebastian’s xtdpdgmm much, but on a glance it looks excellent. Sebastian’s writings show that he understands the mathematical substance very well while his obvious attention to detail indicates that he is a good coder. As far as I can see xtdpdgmm does most of what xtabond2 does, plus more, thanks to its nonlinear framework. It has a more modern syntax. (xtabond2 just turned 17 years old!) And all his benchmarking against other programs surely adds reliability. So xtdpdgmm looks like a great tool for the Stata community.

About the only thing xtabond2 may have going for it is that it goes faster, because it sticks to linear models with analytical solutions, thereby avoiding iterative search. Sometimes speed matters. A lot of times it doesn’t. And even if it does, you might also look at Stata’s built-in commands, xtabond, xtdpd, and xtdpdsys. (That said, I generally distrust dynamic panel regressions using internal/lagged instruments: this, this, this, this).

Sebastian has for several years been throwing darts at xtabond2. Some of the criticisms are valid. But I believe the majority of the situations he is highlighting are rare and/or more ambiguous than it may at first appear, making words like “bug” and “incorrect” seem falsely objective, even tendentious. Users who do not follow all of the technical details may not understand the ambiguity and rarity of most of the situations Sebastian posits, and may thus worry overmuch about the validity of results they’ve obtained from xtabond2. I’ve heard from users with just such concerns. There is a place for publicly criticizing the work of others: I often replicate, reanalyze, and challenge existing literature. But if the substantive stakes are low, then it risks looking like tearing down others' work in order to draw attention to one's own.

Below I comment on the 6 items in Sebastian’s recent post, "A collection of bugs in xtabond2, xtabond, xtdpd, xtdpdsys, gmm," that mention xtabond2.

Short version:
1. Yes, I agree that’s a bug. Rare situation. Fixed.
2(a). Yes, that’s a bug, and not rare: it will happen for example when time dummies are included with a syntax such as “i.year.” It does not affect estimation results, but it can typically decrease by 1 or 2 the degrees of freedom assumed for the Sargan/Hansen tests of overall instrument validity. This bias happens to be conservative: it makes the tests harder to pass. This bug appeared in the last major version of xtabond2, 3.6.3 of September 2015. It is fixed.
2(b). Part of a larger issue that no program handles perfectly (which doesn’t mean xtdpdgmm doesn’t sometimes do better).
2(c). Rare and strange example.
3. Doesn’t mention xtabond2
4. As an assertion of a bug, ambiguous. And “rare” and of “minor relevance” according to Sebastian.
5. Also “rare” and similar to 2(b).
6. Doesn’t mention xtabond2

Fuller discussion
1. Sebastian correctly identified a bug in xtabond2: when using the orthogonal deviations transform while also including an mz suboption in an iv() option, if there are observations that are incomplete only in the variables in the iv() option, then the sample will be incorrectly restricted for purposes of constructing instruments. But I daresay the combination of orthogonal and iv(, mz) is rare. At any rate, I have fixed it.

2(a). I introduced a bug in version 3.6.3 of 30 September 2015. xtabond2 was sometimes counting dropped regressors, such as time dummy for a time period outside the estimation sample, when computing the degrees of freedom for the Sargan and Hansen tests. This did not affect estimation results but it imparted a conservative bias to those tests--slight if the decrease was from, say, 37 to 35, greater if the increase was from, say, 5 to 3. This is fixed.

(See below for 2(b).)

2(c) points out that in performing a difference-in-Sargan/Hansen test for a subset of instruments which together provide the only identifying verification for some explanator, xtabond2 does not adjust for the fact that in the comparison regression without those instruments, the explanator is not identified and is effectively dropped. The loss of explanator increases the errors in the estimation. In particular, it raises by 1 the degrees of freedom for the magnitude of the minimized moments (J statistic). Thus it decreases by 1 the degrees of freedom for the difference between the full-regression’s J and the instrument-starved regression’s J. And xtabond2 does not account for this shift when it goes to compute the significance of the difference-in-J.

Actually ivreg2 doesn’t either. In this example, both xtabond2 and ivreg2 report a difference test for the instrument k and ys with 2 degrees of freedom where Sebastian reasonably argues it should be 1:

Code:

webuse abdata, clear replace w = 0 if _n<500 replace k = 0 if _n<500 replace ys = 0 if _n<500 replace rec = 0 if _n>=500 ivreg2 n (w = rec k ys), nocons orthog(k ys) xtabond2 n w, iv(rec, eq(lev)) iv(k ys, eq(level)) h(1) nocons

How concerned should we be about this behavior in xtabond2? I think not very. The circumstances under which it can occur seem rare and strange. Not obvious in Sebastian’s example is that the instrumented variable of interest, ed, as well as its instruments, fem and blk, are constant over time. It would be pointless to include fixed regressors in Difference GMM because they get differenced out. It is more meaningful, but still I think somewhat unusual, to include them in System GMM, as in the example; GMM retains the instruments in levels. I believe it is unusual because the spirit of System GMM is to treat fixed effects as present but unobserved, for observing too many of them induces dynamic panel bias. What I believe is quite rare is to treat them as endogenous: think of a cross-country regression in which the dummy for Africa is instrumented with other fixed factors. Indeed, the careful Kripfganz and Schwarz (2019) study of how to estimate coefficients on fixed variables in dynamic panels starts by assuming the fixed variables are exogenous.

As noted, in Sebastian's example, ed, fem, and blk disappear from the difference equation. They appear in the levels equation—along with little else, for the other instruments are restricted to the difference equation. As a result, when fem and blk are dropped as instruments in order to perform the difference-Sargan/Hansen test, ed is zero only where the remaining instruments are non-zero (in the difference equation), and non-zero where the remaining instruments are zero (in the levels equation). It is perfectly unidentified. But to get to that point requires a strange construction: a short, dynamic panel with a fixed endogenous variable with fixed instruments.

4. Sebastian points out that a specification such as gmm(x, lag(2 2)) can produce different results from gmm(L.x, lag(1 1)) even though the two seem to mean the same thing, “although I believe this is a rare phenomenon.” This happens when a panel is stored with gaps. Suppose a factory has rows in a data set for 1978 and 1980 but not 1979. (That’s perhaps the rare bit.) To compute gmm(x, lag(2 2)) for the factory for 1980, xtabond2 simply grabs the value of x for 1978, as desired. But to compute gmm(L.x, lag(1 1)), xtabond2 first computes and stores L.x for all available observations (just as it would if the specification requested D.x); but since 1979 is not in the data set for this factory, the value of L.x for 1979, i.e., the value of x for 1978, goes into a black hole. Then when xtabond2 goes to find the once-lag of L.x for 1980, in obedience to the lag(1 1) specification, it comes up empty handed, and produces a missing value.

While recognizing the downside, I favor the status quo. It arguably constitutes a literal interpretation of a user’s orders, and there’s something to be said for software doing exactly what it’s told. gmm(L.x, lag(1 1)) arguably tells the computer to temporarily add L.x to the existing data set (again, in analogy with D.x), then shift it forward. That’s what xtabond2 does.

Put this another way. I could end this counterintuitive behavior by making xtabond2 first execute the tsfill command, which adds rows to a data set in order to eliminate gaps. Then the 1979 observation of L.x would not fall into a black hole, because there would be a data row waiting to hold it. But doing that would generate a new counterintuitive behavior. The following two estimates would produce different results:

Code:

gen Llrfdi = L.lrfdi xtabond2 L(0/1).lrfdi, gmm( Llrfdi, lag(1 3) eq(diff)) xtabond2 L(0/1).lrfdi, gmm(L.lrfdi, lag(1 3) eq(diff))

The first would reproduce xtabond2’s current (“incorrect”) behavior. The second would produce the evidently “correct” behavior. Yet the pairing could then be held up as an internal contradiction and labeled a “bug.” I copied that example from a message I sent Sebastian in 2018. In response, he wrote that, adding tsfill “is probably not worth implementing…in xtabond2…given the minor relevance of this issue.” In his recent post, Sebastian provides further evidence of the ambiguity of this situation, showing that various built-in Stata programs handle it differently.

Items 2(b) and 5 have to do with assessing collinearity among instruments, especially as it relates to the degrees of freedom assumed in overidentification tests. In my view, this is a more complicated business than is implied by binaristic words such as “correct” and “bug.”

To explain why, I will start by demonstrating a “bug” in the degrees-of-freedom calculation in xtdpdgmm. A common situation in dynamic panels is for the variables to be plausibly I(1) (like random walks), with the period-to-period changes being small relative to the magnitudes of the variables. The instruments that characterize Difference GMM are based on lagged observations of such variables. Where the lags are unavailable, a zero is imputed. As a result, Difference GMM instrument matrices can be rather ill-conditioned, with the crucial identifying variation—the period-to-period changes—being small relative to the total variation in the matrix. A period-to-period change from 999 to 1000 is small in a matrix whose entries are mostly 0 or near 1000. This can lead to numerical imprecision and instability, including in the calculation of how many columns of the instrument matrix are linearly independent. And that is an input to the degrees of freedom (df) for the overidentification tests. In turn, changing the df changes the p value.

The following code constructs a 100x10 (wide, short) panel with a single variable y whose initial value and period-to-period changes are standard Gaussian. It runs one-step Difference GMM on the panel. Then it repeats the exercise after adding a fixed amount to all y observations, starting at 1000 and rising to 100,000. Since the regressors are differenced while the instruments remain in levels, this has the effect of leaving the regressors unchanged while adding a shared constant to all the instruments. As this constant increases, xtdpdgmm perceives the instrument matrix as more and more collinear, and reduces the degrees of freedom in the reported Sargan overidentification test, from 35 to 7. Yet the number of degrees of freedom in the true data generation process (DGP) does not change. So one could say that the results reported by xtdpdgmm are “incorrect” and the program has a “bug.”

But I would not say that. A superficial reason is that xtabond2 behaves the same way (ahem). Before going deeper, I’ll show you the code and the resulting graph. The horizontal axis is that fixed increment added to the entire data set. It is labeled “E[y]” since it determines the average value of y. If xtdpdgmm and xtabond2 worked perfectly, all the curves would be perfectly horizontal. Instead, for both, the df’s decline and the test statistics and p values are volatile:

Code:

scalar N = 100 // panel width scalar T = 10 // panel depth set seed 234908720 drop _all set obs `=N' gen id = _n gen double _y = rnormal() expand `=T' sort id gen t = mod(_n-1, T)+1 xtset id t replace _y = L._y + rnormal() if t > 1 // for each id, make y a random walk gen double y = . cap mat drop plot forvalues Ey=0e4(.1e4)10e4 { replace y = _y + `Ey' xtdpdgmm y L.y, gmm(L.y) nocons m(d) estat overid mat _plot = r(chi2_J), r(df_J), r(p_J) xtabond2 y L.y, gmm(L.y) nolevel mat plot = nullmat(plot) \ `Ey', _plot, e(sargan), e(sar_df), e(sarganp) } mat colnames plot = Ey xtdpdgmm_chi2 xtdpdgmm_df xtdpdgmm_p xtabond2_chi2 xtabond2_df xtabond2_p svmat plot, names(col) set scheme s1color line xtdpdgmm_chi2 xtdpdgmm_df xtabond2_chi2 xtabond2_df Ey, /// lpat(solid solid dash dash) lstyle(p1 p2 p1 p2) || /// line xtdpdgmm_p xtabond2_p Ey, lpat(solid dash) lstyle(p3 p3) yaxis(2) /// ytitle("chi2, df") ytitle(p, axis(2)) xtitle(E[y]) legend(cols(3) order(1 2 5 3 4 6))

More to the point, I don’t call this a bug because I understand that the cause runs deep: one can only expect so much perfection from software. (Not that there isn't room for improvement, which Sebastian might explore.) In this case, the general-purpose tools xtdpdgmm and xtabond2 don’t “know” the true DGP and so cannot always be certain of its df even after examining the data. This is particularly the case in these GMM estimators, with their propensity to generate many similar instruments. Instruments such as successive observations of employment at a factory may appear nearly collinear to the computer, even when humans who understand the process that generated them know the variables to be partially independent. On the other hand, variables that are truly collinear may appear inexactly collinear to the computer, because of rounding to finite precision. So software must check whether instruments are nearly multicollinear, using a definition of “nearly” that may or may not do a good job in any given situation of balancing Type I and Type II errors.

Once we recognize that computing the number of independent instruments, thus the df for the overidentification test, is an art, we enter the non-binary realm of design tradeoffs. xtabond2 never actually analyzes the entire instrument matrix for collinearity. One reason is that this could be computationally expensive if observations and instruments are numerous. Another is that it would be prohibitively difficult in xtabond2’s space-saving mode—a feature added in 2005 when computers had less memory. In this mode, the entire instrument matrix never exists in memory all at once. Rather, xtabond2 estimates the rank of the instrument matrix Z as the rank of Z’HZ (see How to Do xtabond2 for definitions), which as the graph shows doesn’t always work. Finally, there is the concern about reproducibility: as xtabond2 has been used by so many for so long, I hesitate to change its logic in a way that will affect results, when the considerations in favor are ambiguous.

In general we should not be surprised when working with these large instrument collections if the df or even the estimation results shift in response to seemingly innocuous specification changes.

Case in point: Sebastian’s item 5 (also described as a “rare phenomenon”). There, duplicating a variable in an instrument list changes the results. I could make xtabond2 check for duplicate names. But what if the same variable is added twice under different names? Then I’m sliding down the slippery slope toward checking the full instrument matrix for multicollinearity, which I just argued against. And if I merely added a check for duplicate names, then including the same variable twice as an instrument could produce different results depending on whether the two copies had the same or different names. That could then be called a “bug.” Moreover, in this case, xtabond2 warns that in order to perform two-step GMM it is "inverting" a non-invertible matrix, which is a sign that the "optimal" and "efficient" two-step GMM results actually can't be determined. Jitteriness in the results should be interpreted in this light.

You can install the latest version of xtabond2, which addresses items 1 and 2(a), with

Code:

net install xtabond2, replace from(https://raw.github.com/droodman/xtabond2/v3.7.1)

It should be available on SSC any day now, after which you can also do

Code:

ssc install xtabond2, replace

Last edited by David Roodman; 22 Nov 2020, 22:10.
Tags: None

7 likes
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#2

26 Nov 2020, 12:39

Hi David,
I am happy that you took the time to share your thoughts and that you managed to provide an update for xtabond2. I know it is anything but easy to improve one's code again after doing other things for quite some time.

The title of this thread is appropriate given what I have done before in my presentations and on Twitter. Well played!

xtdpdgmm was not intended to be a competitor for xtabond2. When I first embarked on developing a Stata command for the Ahn-Schmidt GMM estimator with nonlinear moment conditions. I was initially hoping to either build directly on the xtabond2 or the gmm command, but for several reasons this did not work out. Eventually, the only way to implement this estimator in an efficient way was to build a new command that was capable of doing almost everything that xtabond2 is able to do. I got a lot of inspiration from your command, and on the way I have added some more flexibility and a couple of extra features.

There is no doubt that in some regards xtabond2 has still advantages over xtdpdgmm and you can do a few things with it that you cannot do with the latter. You mentioned speed. xtdpdgmm is not optimized towards speed, although it became faster compared to earlier versions. Since version 2.0.3 it actually uses the analytical solutions by default whenever all moment conditions are linear. There is still quite some computational overhead due to the added flexibility that may not be needed in many situations. But I agree that sometimes speed matters.

I would not want my activities in the past couple of years be seen as "throwing darts at xtabond2". I would rather say that I was raising flags to make users aware of certain issues. We can debate (and it is good to have this discussion) whether they should be called "bugs" or not, and I admit that I may have overused this word. As you argued in your post, the program cannot know what the data-generating process is and it is simply not designed to handle some specific situations in the way it may be desirable. That is true for xtabond2, xtdpdgmm, and other commands as well. Still, users should be aware of potential pitfalls. Eventually it is them who need to assess whether those issues affect the validity of their empirical conclusions. A lack of understanding of technical details cannot be an excuse for leaving those users in the dark with a black box. Whether the stakes are low is not really the programmer's decision.

A brief rejoinder on the specific issues:

1. I am glad we agree that this indeed was a bug. The mz suboption is probably indeed only used rarely, although I personally would always want to use it whenever it makes a difference. (xtdpdgmm always replaces missing values in the instruments with zeros to avoid dropping observations.) However, I stumbled into another issue with forward-orthogonal deviations that is not related to the mz suboption. Consider the following simplified example for a static model, using a system GMM estimator with standard instruments for the level model:

Code:

webuse abdata, clear xtabond2 n w k, orthogonal iv(k, eq(level)) iv(L(1/2).w, passthru eq(diff) mz) robust nodiffsargan mat A1 = e(A1) xtdpdgmm n w k, model(fodev) iv(k, model(level)) iv(L(0/1).w) vce(robust) mat W = e(W)

(The weighting matrices are stored for further demonstration below.) Unless I am missing something, the two specifications should be equivalent. Yet, the results differ. I managed to replicate the results with the official gmm command and to trace the issue back to differences in the initial weighting matrix:

Code:

tstransform n w k, fdemean rescale gen Lw = L.w replace Lw = 0 if missing(Lw) gmm (1: n_FDM - {b_w} * w_FDM - {b_k} * k_FDM) (2: n - {b_w} * w - {b_k} * k - {c}), inst(1: w Lw, nocons) inst(2: k) winit(A1) one nocommonesample gmm (1: n_FDM - {b_w} * w_FDM - {b_k} * k_FDM) (2: n - {b_w} * w - {b_k} * k - {c}), inst(1: w Lw, nocons) inst(2: k) winit(W) one nocommonesample

(The tstransform command is available from my website. It creates the forward-orthogonally transformed variables.) If instead of the default weighting matrix I choose h(2) for xtabond2 and the equivalent wmatrix(independent) for xtdpdgmm, the results coincide:

Code:

xtabond2 n w k, h(2) orthogonal iv(k, eq(level)) iv(L(1/2).w, passthru eq(diff) mz) robust nodiffsargan mat A1o = e(A1) xtdpdgmm n w k, wmatrix(independent) model(fodev) iv(k, model(level)) iv(L(0/1).w) vce(robust) mat Wo = e(W)

Here, the weighting matrices e(A1) and e(W) from the two commands differ by a constant factor which drops out in the estimation. I could not figure out, where this factor comes from. What I also do not understand is why the lower-right block in the xtabond2 weighting matrix changes when we replace forward-orthogonal deviations by first differences:

Code:

xtabond2 n w k, h(2) iv(k, eq(level)) iv(L(1/2).w, passthru eq(diff) mz) robust nodiffsargan mat A1d = e(A1) xtdpdgmm n w k, wmatrix(independent) model(diff) iv(k, model(level)) iv(L(1/2).w) vce(robust) mat Wd = e(W)

With xtdpdgmm, the lower-right block of the matrices Wo and Wd are identical. With xtabond2, the corresponding blocks differ for A1o and A1d. The lower-right block here corresponds to the instruments for the level model. Since those instruments are the same, irrespective of whether we use the orthogonal or the difference transformation for the other two instruments, and the lower-right block in matrix H should always be an identity matrix, those lower-right blocks in the block-diagonal weighting matrices should not change. It might well be that I am overlooking something here and it would be great, if you could shed some light on this matter. I believe the differences observed with the default weighting matrices have the same root. Whether the observed behavior is a bug or not, I cannot tell.

2(a). I am glad that you managed to fix this.

2(c). Naturally, given that I published a whole paper about it, I disagree with the assessment that this was a "rare and strange example". It is essentially a dynamic version of a Hausman-Taylor model which has been used for decades for static panel models. With xthtaylor, there is even an official Stata command (for the static model). Anyway, your change to xtabond2 is reasonable to no longer report the troublesome Difference-in-Hansen test when dropping the respective instruments leads to an underidentification of the model.

4. Thanks for clarifying the technical details. With that extra information, I would not label it a bug anymore but rather a technical peculiarity. In any case, I believe it is good that users are made aware of it, and with your explanations they can then decide themselves which version is more appropriate. In xtdpdgmm, the two specifications yield the same results because gmm(x, lag(2 2)) and gmm(L.x, lag(1 1)) both are turned into L(2/2).x already at the parsing stage before the respective value is fetched.

5. Thanks for the excellent demonstration of the general problems with assessing collinearity. It nicely illustrates that we should be careful whenever variables enter a model at substantially different scales. Such differences in scale may not always be easy to spot, for example when they are a consequence of a transformation that happens within a command (such as differencing). I am happy to see the observation I made is not due to an actual bug but just a consequence of these general numerical problems with the calculation of the rank of the instrument matrix. For 2(b), however, the observed behavior is not a consequence of the numerical difficulties to obtain the "correct" rank. It is rather a consequence of evaluating the rank of Z’HZ. If there is perfect collinearity of (transformed) instruments across equations, this is not reflected in a rank reduction of Z'HZ when H is a block-diagonal matrix. xtdpdgmm effectively always determines the rank of Z'HZ with the version of H that is not block-diagonal (i.e. the default weighting matrix), even when the actually used weighting matrix is block-diagonal. It is okay to leave the current xtabond2 implementation as it is for backward compatibility, but I also believe it is fair to make users aware of this issue. A practical consequence from this example would be that users should ideally just specify time dummies as instruments for the level model (or the transformed model, but not both), to be on the safe side.

Bottom line, my list of issues in the other thread was not meant to be destructive but to help users make informed decisions. I often hear people complaining that GMM estimation of dynamic panel models is a large black box to them. Being transparent about what is going on seems to be an important step to build trust in the commands.

https://www.kripfganz.de/stata/
3 likes
Comment

Announcement

How not to do xtdpdgmm

Comment