Dear Statalisters,
I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.
Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.
The new command is currently only available for installation from my own website and not yet from SSC:
After the installation, detailed documentation of the syntax and available options can be found in the help files:
As always, comments and suggestions are welcome and highly appreciated.
Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.
With the following syntax, we can then run a second-stage instrumental-variables regression of the first-stage residuals on some time-invariant regressors. The first-stage results are automatically taken from the previous estimation. Just as an illustration, ed is assumed to be endogenous and instrumented with occ.
Instead of specifying both stages one after the other, with some more complicated syntax the same results can also be obtained with a single command line:
As a postestimation command, estat overid provides Hansen's J-test for the validity of the overidentifying restrictions for both stages. (In the current example, the second stage is exactly identified.)
The following command line exactly replicates the above results for the first stage with xtabond2, including Hansen's J-test:
Notice that the reported results for Hansen's J-test would differ between xtseqreg and xtabond2 if the one-step GMM estimator was used (the above example without option twostep) because xtabond2 silently still estimates the two-step estimator for this purpose while xtseqreg evaluates the first-step moment functions while still using an optimal weighting matrix (that would have been used in a second step).
Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.
You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.
Reference:
I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.
Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.
The new command is currently only available for installation from my own website and not yet from SSC:
Code:
. net install xtseqreg, from(http://www.kripfganz.de/stata/)
Code:
. help xtseqreg . help xtseqreg postestimation
Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.
Code:
. xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust) Group variable: id Number of obs = 2975 Time variable: t Number of groups = 595 Obs per group: min = 5 avg = 5 max = 5 Number of instruments = 10 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | WC-Robust lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lwage | L1. | .365887 .1722314 2.12 0.034 .0283197 .7034543 L2. | .1009276 .0732219 1.38 0.168 -.0425848 .2444399 | exp | .0501576 .0282205 1.78 0.076 -.0051536 .1054688 exp2 | -.000206 .000148 -1.39 0.164 -.000496 .000084 occ | -.0428486 .0283624 -1.51 0.131 -.0984379 .0127406 ind | .0481791 .0305408 1.58 0.115 -.0116798 .108038 union | .006991 .0288093 0.24 0.808 -.0494742 .0634562 _cons | 2.737719 1.088102 2.52 0.012 .6050775 4.87036 ------------------------------------------------------------------------------
Code:
. xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, iv(occ fem blk, model(level)) vce(robust) Group variable: id Number of obs = 2975 Time variable: t Number of groups = 595 ------------------------------------------------------------------------------ Equation _first Equation _second Number of obs = 2975 Number of obs = 2975 Number of groups = 595 Number of groups = 595 Obs per group: min = 5 Obs per group: min = 5 avg = 5 avg = 5 max = 5 max = 5 Number of instruments = 10 Number of instruments = 4 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | Robust lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _first | lwage | L1. | .365887 .1722314 2.12 0.034 .0283197 .7034543 L2. | .1009276 .0732219 1.38 0.168 -.0425848 .2444399 | exp | .0501576 .0282205 1.78 0.076 -.0051536 .1054688 exp2 | -.000206 .000148 -1.39 0.164 -.000496 .000084 occ | -.0428486 .0283624 -1.51 0.131 -.0984379 .0127406 ind | .0481791 .0305408 1.58 0.115 -.0116798 .108038 union | .006991 .0288093 0.24 0.808 -.0494742 .0634562 _cons | 2.737719 1.088102 2.52 0.012 .6050775 4.87036 -------------+---------------------------------------------------------------- _second | ed | .0634885 .0348497 1.82 0.068 -.0048158 .1317927 fem | -.0967082 .0575629 -1.68 0.093 -.2095295 .016113 blk | -.1531252 .1010073 -1.52 0.130 -.351096 .0448456 _cons | -.7936727 .4419754 -1.80 0.073 -1.659929 .0725831 ------------------------------------------------------------------------------
Code:
. xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse equation(#1)) iv(exp exp2 occ ind union, difference model(difference) equation(#1)) iv(occ fem blk, model(level) equation(#2)) twostep vce(robust) both
Code:
. estat overid Hansen's J-test for equation _first chi2(2) = 0.2935 H0: overidentifying restrictions are valid Prob > chi2 = 0.8635 Hansen's J-test for equation _second chi2(0) = 0.0000 note: coefficients are exactly identified Prob > chi2 = .
Code:
. xtabond2 L(0/2).lwage exp exp2 occ ind union, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp exp2 occ ind union, equation(diff)) twostep robust
Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.
Code:
. xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust) . xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)
Reference:
- Kripfganz, S., and C. Schwarz (2015). Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838, European Central Bank.
Comment