In my posts here on Statalist, I am frequently referring to some bugs in popular commands for the GMM estimation of linear dynamic panel data models. I decided to compile a list of them to avoid losing track and to have them all in a single place for referencing purposes. For brevity, I do not show output here, but all examples are replicable using publicly available data sets.
1. Forward-orthogonal deviations in xtabond2:
The following two specifications should yield identical results because collapsing of GMM-type instruments is equivalent to using standard instruments. However, the results differ. The estimates from the second specification are incorrect.
It is hard to come up with a complete list of circumstances under which xtabond2 with option orthogonal produces incorrect results. I recommend to always double check the results, for example with my xtdpdgmm command:
Notice that the lags in the xtabond2 command lines are shifted by one time period compared to xtdpdgmm. This is a "feature" of xtabond2 that could easily lead to confusion. The second lag of an instrument for the forward-orthogonally transformed model in xtabond2 is actually the first lag of that variable. The latter problem is also present in the official xtdpd command with option fodeviation which applies the same shift by one time period as xtabond2 does.
2. Overidentification tests in xtabond2:
(a) When there are coefficients in the xtabond2 output displayed as "omitted" or "empty", the degrees of freedom of the Sargan/Hansen overidentification tests are incorrect. Consequently, also the p-values are incorrect (too small). The problem is that xtabond2 treats the omitted coefficients as if they were estimated and reduces the degrees of freedom accordingly. This happens frequently when time dummies (or other dummy variables) are specified with the factor variable notation. The following specifications yield identical coefficient estimates but the degrees of freedom and p-values of the overidentification tests are incorrect for the second specification.
(b) In some situations with a non-default weighting matrix, i.e. h(1) or h(2), xtabond2 reports an instruments count that is too large because it does not detect the perfect multicollinearity among the instruments. The following two specifications yield identical estimates. In the second specifications, the additional time dummies as instruments for the level model are redundant because they are perfectly multicollinear with the time dummy instruments for the first-differenced model. Yet, xtabond2 does not recognize this redundancy and reports 3 instruments too many in the second specification. This has a negative consequence for the degrees of freedom of the overidentification tests which are too large by this amount in the second specification. Hence, also the p-values are too large.
(c) In the following situation, xtabond2 reports too many degrees of freedom for the Difference-in-Sargan/Hansen test of iv(fem blk), and therefore a p-value that is too large. The correct degrees of freedom are 1 instead of 2. The reason is that after removing the instruments iv(fem blk) from the model, the coefficient of ed is no longer identified.
Compare with xtdpdgmm which reports the correct degrees of freedom and p-value.
3. Time dummies in xtabond, xtdpd, and xtdpdsys:
In the following specification, the official xtdpd command drops the time dummy for the year 1984 from the list of regressors due to alleged collinearity. However, there actually is no such collinearity problem. Because xtabond and xtdpdsys are just wrappers for xtdpd, the same problem is present in those two commands (not explicitly shown here).
For the same model specification, xtabond2 and xtdpdgmm correctly do not drop that time dummy.
4. Unbalanced panels in xtabond2, xtabond, xtdpd, xtdpdsys, and gmm:
The following example shows a problem that can happen in some cases of unbalanced panels, although I believe this is a rare phenomenon. The data set used here can be downloaded from the JAE Data Archive. The two specifications should be equivalent but the second xtabond2 results are incorrect.
A similar issue arises with xtdpd (and thus also with xtabond and xtdpdsys). Interestingly, the following two estimates not only differ from each other but also from the xtabond2 results.
To maximize confusion, the official gmm command yields yet again results that are different from all those before.
Compare with the corresponding results from xtdpdgmm, which are identical in both specifications and also equal the first specification of xtabond2.
5. Collinearity among instruments in xtabond2 two-step estimation:
The following example is again a rare phenomenon, and I could not really replicate it with a simpler model. What happens here is that the two-step estimation results change when a redundant instrument (wks_ed4) is added to the second specification. Note that the total number of instruments reported by xtabond2 remains unchanged.
Note that this problem does not occur with the one-step estimator. The following two results are identical despite the added instrument.
6. Option diffvars() in xtabond:
Option diffvars() of the official xtabond command adds strictly exogenous regressors to the first-differenced model, together with the respective standard instruments. However, in the regression output those regressors appear as if they were added to the untransformed level model. In the second specification of the following example, the estimated coefficients are correct but predictions with the postestimation command predict would be incorrect.
This problem exists since Stata 10. Prior to that, xtabond reported the results for the first-differenced model, not the level model. The diffvars() option should have been removed with that change.
Version information:
Stata version 16.1, update level 29 Sep 2020
Some of these bugs I have mentioned already in my 2019 London Stata Conference presentation:
1. Forward-orthogonal deviations in xtabond2:
The following two specifications should yield identical results because collapsing of GMM-type instruments is equivalent to using standard instruments. However, the results differ. The estimates from the second specification are incorrect.
Code:
webuse abdata, clear xtabond2 L(0/1).n w, orthogonal gmm(n, lag(2 4)) gmm(w, lag(1 3) collapse) nolevel robust nodiffsargan xtabond2 L(0/1).n w, orthogonal gmm(n, lag(2 4)) iv(L(1/3).w, passthru mz) nolevel robust nodiffsargan
Code:
xtdpdgmm L(0/1).n w, model(fodev) gmm(n, lag(1 3)) gmm(w, lag(0 2) collapse) nocons vce(robust) xtdpdgmm L(0/1).n w, model(fodev) gmm(n, lag(1 3)) iv(w, lag(0 2)) nocons vce(robust)
2. Overidentification tests in xtabond2:
(a) When there are coefficients in the xtabond2 output displayed as "omitted" or "empty", the degrees of freedom of the Sargan/Hansen overidentification tests are incorrect. Consequently, also the p-values are incorrect (too small). The problem is that xtabond2 treats the omitted coefficients as if they were estimated and reduces the degrees of freedom accordingly. This happens frequently when time dummies (or other dummy variables) are specified with the factor variable notation. The following specifications yield identical coefficient estimates but the degrees of freedom and p-values of the overidentification tests are incorrect for the second specification.
Code:
webuse abdata, clear xtabond2 L(0/1).n yr1978-yr1984, iv(yr1978-yr1984, eq(level)) gmm(n, lag(2 4) eq(diff)) robust xtabond2 L(0/1).n i.year, iv(i.year, eq(level)) gmm(n, lag(2 4) eq(diff)) robust
Code:
keep if year > 1977 & year < 1983 xtabond2 L(0/1).n yr1980-yr1982, h(1) iv(yr1980-yr1982, eq(diff)) gmm(n, lag(2 4) eq(diff)) robust xtabond2 L(0/1).n yr1980-yr1982, h(1) iv(yr1980-yr1982, eq(diff)) iv(yr1980-yr1982, eq(level)) gmm(n, lag(2 4) eq(diff)) robust
Code:
webuse psidextract, clear xtabond2 L(0/1).lwage ed, gmm(lwage, lag(2 4) eq(diff)) iv(fem blk, eq(level)) twostep robust
Code:
xtdpdgmm L(0/1).lwage ed, gmm(lwage, lag(2 4) model(diff)) iv(fem blk, model(level)) twostep vce(robust) overid estat overid, difference
In the following specification, the official xtdpd command drops the time dummy for the year 1984 from the list of regressors due to alleged collinearity. However, there actually is no such collinearity problem. Because xtabond and xtdpdsys are just wrappers for xtdpd, the same problem is present in those two commands (not explicitly shown here).
Code:
webuse abdata, clear
xtdpd L(0/1).n yr1978-yr1984, dgmm(n, lag(2 4)) liv(yr1978-yr1984) vce(robust)
Code:
xtabond2 L(0/1).n yr1978-yr1984, h(2) gmm(n, lag(2 4) eq(diff)) iv(yr1978-yr1984, eq(level)) robust xtdpdgmm L(0/1).n, teffects wmat(ind) gmm(n, lag(2 4) model(diff)) iv(yr1978-yr1984, model(level)) vce(robust)
The following example shows a problem that can happen in some cases of unbalanced panels, although I believe this is a rare phenomenon. The data set used here can be downloaded from the JAE Data Archive. The two specifications should be equivalent but the second xtabond2 results are incorrect.
Code:
use data_us.dta, clear egen id = group(codeim ind) xtset id year xtabond2 L(0/1).lrfdi, gmm(lrfdi, lag(2 4) eq(diff)) nocons robust xtabond2 L(0/1).lrfdi, gmm(L.lrfdi, lag(1 3) eq(diff)) nocons robust
Code:
xtdpd L(0/1).lrfdi, dgmm(lrfdi, lag(2 4)) nocons vce(robust) xtdpd L(0/1).lrfdi, dgmm(L.lrfdi, lag(1 3)) nocons vce(robust)
Code:
gmm (D.lrfdi - {b} * LD.lrfdi), xtinst(lrfdi, lag(2/4)) inst(, nocons) winit(xt D) onestep vce(robust) gmm (D.lrfdi - {b} * LD.lrfdi), xtinst(L.lrfdi, lag(1/3)) inst(, nocons) winit(xt D) onestep vce(robust)
Code:
xtdpdgmm L(0/1).lrfdi, gmm(lrfdi, lag(2 4) model(diff)) vce(robust) xtdpdgmm L(0/1).lrfdi, gmm(L.lrfdi, lag(1 3) model(diff)) vce(robust)
The following example is again a rare phenomenon, and I could not really replicate it with a simpler model. What happens here is that the two-step estimation results change when a redundant instrument (wks_ed4) is added to the second specification. Note that the total number of instruments reported by xtabond2 remains unchanged.
Code:
webuse psidextract, clear forvalues i = 4/17 { gen wks_ed`i' = c.wks#`i'.ed } xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, twostep iv(LD.(wks_ed5-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, twostep iv(LD.(wks_ed4-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan
Code:
xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, iv(LD.(wks_ed5-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, iv(LD.(wks_ed4-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan
Option diffvars() of the official xtabond command adds strictly exogenous regressors to the first-differenced model, together with the respective standard instruments. However, in the regression output those regressors appear as if they were added to the untransformed level model. In the second specification of the following example, the estimated coefficients are correct but predictions with the postestimation command predict would be incorrect.
Code:
webuse abdata, clear xtabond n w, nocons vce(robust) predict xb1 xtabond n, diffvars(D.w) nocons vce(robust) predict xb2 summarize xb1 xb2
Version information:
Code:
. which xtabond2 c:\ado\plus\x\xtabond2.ado *! xtabond2 3.6.3 30 September 2015 *! Copyright (C) 2015 David Roodman . which xtabond C:\Program Files\Stata16\ado\base\x\xtabond.ado *! version 4.2.0 21jun2018 . which xtdpd C:\Program Files\Stata16\ado\base\x\xtdpd.ado *! version 1.6.0 21jun2018 . which gmm C:\Program Files\Stata16\ado\base\g\gmm.ado *! version 2.2.0 30nov2018 . which xtdpdgmm c:\ado\plus\x\xtdpdgmm.ado *! version 2.3.1 08oct2020 *! Sebastian Kripfganz, www.kripfganz.de
Some of these bugs I have mentioned already in my 2019 London Stata Conference presentation:
- Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.
Comment