XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Prateek Bedi

Join Date: Sep 2018

Posts: 199
#106

27 Dec 2019, 22:43

Originally posted by Sebastian Kripfganz View Post

I cannot replicate your problem. With the following example, predict gives me exactly the same predicted values as when I calculate them manually:

Code:

. webuse abdata . xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(0 3) c m(fod)) gmm(L.n w k, l(0 0) d c m(level)) two vce(r) teffects . predict yhat if e(sample) . gen yhat_manual = _b[L1.n] * L1.n + _b[w] * w + _b[k] * k + _b[1978.year] * 1978.year + _b[1979.year] * 1979.year + _b[1980.year] * 1980.year + _b[1981.year] * 1981.year + _b[1982.year] * 1982.year + _b[1983.year] * 1983.year + _b[1984.year] * 1984.year + _b[_cons] if e(sample)

Code:

. sum yhat yhat_manual Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- yhat | 891 1.043574 1.426621 -2.375833 5.083724 yhat_manual | 891 1.043574 1.426621 -2.375833 5.083724

Did you make sure that the coefficients of the year dummies are only added to the predictions of the respective year?

In relation to the command you mention, I have a few doubts.

1. The lag length starts from 0 in both instances. Is there any special reason for it? What difference would it make if we start it from 1 instead? When we use 'fodev', what should be the starting point for lag length ideally?
2. I notice that you us gmm() twice whereas gmmiv() has not been used at all. As gmmiv() is used to specify endogenous variables, should I assume there we no endogenous variable in your model?
3. In the second gmm(), the lag length starts and ends at 0. What is the significance of this when lag length starts and ends at the same point?

Thanks!
1 like
Comment

�
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#107

28 Dec 2019, 04:37

Originally posted by Prateek Bedi View Post

In relation to the command you mention, I have a few doubts.P

1. The lag length starts from 0 in both instances. Is there any special reason for it? What difference would it make if we start it from 1 instead? When we use 'fodev', what should be the starting point for lag length ideally?
2. I notice that you us gmm() twice whereas gmmiv() has not been used at all. As gmmiv() is used to specify endogenous variables, should I assume there we no endogenous variable in your model?
3. In the second gmm(), the lag length starts and ends at 0. What is the significance of this when lag length starts and ends at the same point?

Thanks!

Please ignore point #2. It is irrelevant. Instead, I just wanted to know why have we used gmm() twice in the same command?
Comment

�
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#108

30 Dec 2019, 05:57

The first gmm() option (which is an abbreviation for gmmiv()) refers to the GMM-style instruments for the model in forward-orthogonal deviations, m(fod), while the second gmm() option refers to the instruments for the model in levels, m(level).

Please have a look at my 2019 London Stata Conference presentation. Slide 67 tells you the admissible lags under forward-orthogonal deviations. For strictly exogenous regressors (which I implicitly assumed here for w and k), any lag can be used. For predetermined regressors (L.n), lag 0 is the first admissible lag. In my specification, I assumed indeed that there are no endogenous variables (with respect to the idiosyncratic error term) in the model.

For the level model, slide 31 tells you the additionally available instruments. For strictly exogenous and predetermined regressors, lag 0 of the first-differenced instruments is usually used (because it has the strongest correlation with the instrumented variables compared to any other lag). It is common practice not to use multiple lags as instruments for the level model (based on the idea that further lags would be redundant if all available lags were used for the transformed model).

https://www.kripfganz.de/stata/
1 like
Comment

�
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#109

06 Jan 2020, 07:10

Thanks a lot, Prof Sebastian for your helpful guidance, once again! Really appreciate!
Comment

�
alessandro gst

Join Date: Jun 2019

Posts: 62
#110

05 Feb 2020, 04:08

Hello, I have a question about how to treat unit root data. I am currently estimating a GMM model in differences like the following one

Code:

xtdpdgmm L(0/1).n w k ys*, model(difference) gmm(L.n w) iv(k ys*, d) nocons vce(r)

My issue is that the key independent variable of interest (w) is a unit root, but d.w is stationary.
My question is: am I correct to run the model with the simple w, as it differences the data automatically or should I run the following model

Code:

xtdpdgmm L(0/1).n d.w k ys*, model(difference) gmm(L.n d.w) iv(k ys*, d) nocons vce(r)

thanks a lot for your help!
Comment

�
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#111

05 Feb 2020, 10:49

If your underlying theory requires w to enter in levels, then you should not transform this variable. Instead, it might be a good idea to also add L.w as another regressor. The data can then speak for itself, i.e. if the estimated coefficients of w and L.w are about the same with opposite signs, then this would be equivalent to directly estimating the model with D.w.

In any case, lagged differences of w might be weak instruments.

https://www.kripfganz.de/stata/
Comment

�
alessandro gst

Join Date: Jun 2019

Posts: 62
#112

11 Feb 2020, 04:57

Hello, I have a doubt about my model, and I was wondering whether perhaps you could help me understand if there are some imprecisions. I aim to run the equivalent models using the difference one-step, two-step and the iterated GMM, to make sure that my estimates are not dependent on the estimation method. I am using xtabond2 for the one-step and two-step estimation and xtdpdgmm for the iterated model.

the model I am estimating are the following:

Code:

* the one step GMM xi: xtabond2 y l.y l.x $controls_lag yeardum*, gmm(y x $controls, lag(2 5) collapse) iv(yeardum*) noleveleq small noconstant robust * the two step GMM xi: xtabond2 y l.y l.x $controls_lag yeardum*, gmm(y x $controls, lag(2 5) collapse) iv(yeardum*) noleveleq small two noconstant robust * Iterated xi: xtdpdgmm L(0/1).y l.x $controls, model(diff) collapse gmm(y x $controls, lag(2 5) collapse) igmm vce(r) small noconstant teffects igmmiterate(100)

I am not sure that these model specifications are equivalent because while the one-step and two-step GMM have 1627 observations, the iterated model uses 1723. Is this normal, or without realizing I am miss specifying the model?

Thank you a lot in advance for your help. I would be more than grateful for any guidance you could provide

Best
Comment

�
alessandro gst

Join Date: Jun 2019

Posts: 62
#113

11 Feb 2020, 04:59

* Correction to previous post: the formula I am using for the iterated model is the following

Code:

xi: xtdpdgmm L(0/1).y l.x $controls_lag, model(diff) collapse gmm(y x $controls, lag(2 5) collapse) igmm vce(r) small noconstant teffects igmmiterate(100)
Comment

�
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#114

11 Feb 2020, 05:05

The model specifications appear to be equivalent. xtabond2 reports the number of observations for the first-differenced model while xtdpdgmm always reports the number of observations for the untransformed levels model. There is 1 observation less per group in the first-differenced model.

https://www.kripfganz.de/stata/
Comment

�
alessandro gst

Join Date: Jun 2019

Posts: 62
#115

11 Feb 2020, 05:13

Great Thank you very for this super quick reply and for the explanation
Comment

�
alessandro gst

Join Date: Jun 2019

Posts: 62
#116

11 Feb 2020, 08:19

Hello,
I have a question about the use of time fixed effects in the xtdpdgmm command. I am running the following one step difference GMM.

Code:

xtdpdgmm L(0/1).y l.x $controls_lag yeardum*, model(difference) gmm(y x $controls, lag(2 5)) iv(yeardum*) collapse nocons vce(r) one * which provides identical estimates to the xtabond2 estimation, even thought standard errors are sligthly different. xi: xtabond2 y l.y l.x $controls_lag yeardum*, gmm(y x $controls, lag(2 5) collapse) iv(yeardum*) noleveleq small noconstant robust

However, if I remove the inclusion of year dummies from the model, and use the teffects my estimates change significantly.
Why is this the case in your opinion? is there misspecification in the usage of teffects?

Code:

xtdpdgmm L(0/1).y l.x $controls_lag, model(difference) gmm(y x $controls, lag(2 5)) collapse nocons vce(r) one teffects

I am attaching a photo of the results of the 3 models calculated in the specified order. (The 4th row is the autoregressive parameter.)

Thank you very much in advance for taking the time to answer these very applied questions. I ask them because I want to migrate and use more xtdpdgmm relative to xtabond2, in my research and I want to make sure that I am doing it correctly.
Comment

�
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#117

11 Feb 2020, 08:34

Standard errors seem to be different because you did not specify the small option with xtdpdgmm.

The teffects option always creates instruments for the time dummies in the levels model even if you specify model(difference)! While there is nothing wrong with that given that time dummies are exogenous, it may sometimes be preferable to specify those instruments for the differenced model, in particular if it is your aim to estimate the model with the one-step difference-GMM estimator.

https://www.kripfganz.de/stata/
Comment

�

Prateek Bedi

Join Date: Sep 2018
Posts: 199

#118

27 Mar 2020, 04:13

Hello,

I am working on corporate cash holdings using data for 1696 firms over the period 2001-16. I ran two regressions with the following commands and outputs (only the lag ranges differ for these two regressions, rest of the command is same for both regressions). In the first regression, the variable 'Dividend2' (which is a dummy variable and takes value of 1 when the firm pays dividend in a particular year) is omitted and in the second regression, 'constant' is omitted'. In fact, either of these independent variables is omitted when I run these commands with different lag ranges. I am not able to figure out why only one of these two independent variables (Dividend2 or constant) gets reported and the other is omitted.

Code:

xtdpdgmm CashHoldings2 L.CashHoldings2 Size1 Leverage1 Liquidity2 Profitability4 GrowthPotential2 OperatingCashflow Dividend2 CapitalExpenditure1 CashFlowVol15years WPromoterSharesin1 c.SIR2#c.L.CashHoldings2 if ExcessCashDummy2==1, teffects twostep vce(cluster CompanyID) gmmiv(L.CashHoldings2, lag(1 8) coll  model(fodev)) gmmiv(Leverage1 Liquidity2 GrowthPotential2 Dividend2 CapitalExpenditure1 c.SIR2#c.L.CashHoldings2, lag(1 6) coll model(fodev)) iv( Year* Size1 Profitability4  WPromoterSharesin1 CashFlowVol15years OperatingCashflow, model(level)) nofootnote

Code:

Group variable: CompanyID                    Number of obs         =      3067
Time variable: Year                          Number of groups      =       630

Moment conditions:     linear =      64      Obs per group:    min =         1
                    nonlinear =       0                        avg =  4.868254
                        total =      64                        max =        14

                                       (Std. Err. adjusted for 630 clusters in CompanyID)
-----------------------------------------------------------------------------------------
                        |              WC-Robust
          CashHoldings2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
          CashHoldings2 |
                    L1. |   .9202039   .2110783     4.36   0.000      .506498     1.33391
                        |
                  Size1 |   -.000045   .0000642    -0.70   0.483    -.0001709    .0000808
              Leverage1 |   .0051676   .0012412     4.16   0.000     .0027349    .0076002
             Liquidity2 |  -.0015128   .0012105    -1.25   0.211    -.0038853    .0008598
         Profitability4 |   .0116191   .0023033     5.04   0.000     .0071047    .0161335
GrowthPotential2TobinsQ |  -.0009329   .0003077    -3.03   0.002     -.001536   -.0003297
      OperatingCashflow |   .0048039   .0008251     5.82   0.000     .0031867    .0064211
              Dividend2 |          0  (omitted)
    CapitalExpenditure1 |  -.0071506   .0028852    -2.48   0.013    -.0128055   -.0014958
     CashFlowVol15years |   .0178048   .0019959     8.92   0.000      .013893    .0217166
     WPromoterSharesin1 |  -.0013006   .0004333    -3.00   0.003    -.0021498   -.0004514
                        |
c.SIR2#cL.CashHoldings2 |  -5.289581   3.051242    -1.73   0.083    -11.26991    .6907433
                        |
                   Year |
                  2003  |   .0000116   .0002457     0.05   0.962      -.00047    .0004932
                  2004  |   .0000302   .0002628     0.11   0.909    -.0004849    .0005453
                  2005  |   .0006248   .0003621     1.73   0.084     -.000085    .0013345
                  2006  |    .000956   .0003532     2.71   0.007     .0002637    .0016483
                  2007  |   .0011733   .0003632     3.23   0.001     .0004615    .0018851
                  2008  |   .0005805   .0002797     2.08   0.038     .0000323    .0011286
                  2009  |   .0002752   .0002446     1.13   0.260    -.0002041    .0007545
                  2010  |   .0007216   .0004453     1.62   0.105    -.0001511    .0015943
                  2011  |   .0000444   .0003437     0.13   0.897    -.0006292    .0007179
                  2012  |   .0001928   .0002958     0.65   0.515     -.000387    .0007727
                  2013  |  -.0004563   .0002832    -1.61   0.107    -.0010113    .0000987
                  2014  |   .0003354   .0003359     1.00   0.318     -.000323    .0009938
                  2015  |   .0010732   .0004351     2.47   0.014     .0002205    .0019259
                  2016  |   .0014033   .0004847     2.90   0.004     .0004533    .0023532
                        |
                  _cons |   .0006629   .0008362     0.79   0.428    -.0009761    .0023018
-----------------------------------------------------------------------------------------

Code:

xtdpdgmm CashHoldings2 L.CashHoldings2 Size1 Leverage1 Liquidity2 Profitability4 GrowthPotential2 OperatingCashflow Dividend2 CapitalExpenditure1 CashFlowVol15years WPromoterSharesin1 c.SIR2#c.L.CashHoldings2 if ExcessCashDummy2==1, teffects twostep vce(cluster CompanyID) gmmiv(L.CashHoldings2, lag(1 1) coll  model(fodev)) gmmiv(Leverage1 Liquidity2 GrowthPotential2 Dividend2 CapitalExpenditure1 c.SIR2#c.L.CashHoldings2, lag(1 6) coll model(fodev)) iv( Year* Size1 Profitability4  WPromoterSharesin1 CashFlowVol15years OperatingCashflow, model(level)) nofootnote

Code:

Group variable: CompanyID                    Number of obs         =      3067
Time variable: Year                          Number of groups      =       630

Moment conditions:     linear =      57      Obs per group:    min =         1
                    nonlinear =       0                        avg =  4.868254
                        total =      57                        max =        14

                                       (Std. Err. adjusted for 630 clusters in CompanyID)
-----------------------------------------------------------------------------------------
                        |              WC-Robust
          CashHoldings2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
          CashHoldings2 |
                    L1. |   .9731027   .2012846     4.83   0.000     .5785921    1.367613
                        |
                  Size1 |  -.0000497   .0000688    -0.72   0.470    -.0001845     .000085
              Leverage1 |    .004433   .0013046     3.40   0.001      .001876    .0069899
             Liquidity2 |  -.0010752    .001213    -0.89   0.375    -.0034527    .0013023
         Profitability4 |   .0097248   .0025618     3.80   0.000     .0047037     .014746
GrowthPotential2TobinsQ |  -.0007632   .0003483    -2.19   0.028    -.0014458   -.0000806
      OperatingCashflow |   .0045713   .0008541     5.35   0.000     .0028974    .0062453
              Dividend2 |   .0008311   .0008739     0.95   0.342    -.0008817    .0025439
    CapitalExpenditure1 |  -.0056351   .0031341    -1.80   0.072    -.0117778    .0005076
     CashFlowVol15years |    .017407   .0020669     8.42   0.000     .0133559     .021458
     WPromoterSharesin1 |   -.001282   .0004315    -2.97   0.003    -.0021278   -.0004363
                        |
c.SIR2#cL.CashHoldings2 |  -5.954595   2.924771    -2.04   0.042    -11.68704   -.2221496
                        |
                   Year |
                  2003  |  -.0000296   .0002446    -0.12   0.904     -.000509    .0004497
                  2004  |  -.0000637    .000269    -0.24   0.813    -.0005909    .0004635
                  2005  |   .0004414   .0003817     1.16   0.248    -.0003069    .0011896
                  2006  |   .0007825   .0003749     2.09   0.037     .0000476    .0015173
                  2007  |   .0009998   .0003916     2.55   0.011     .0002323    .0017674
                  2008  |   .0004919   .0002916     1.69   0.092    -.0000796    .0010633
                  2009  |   .0001564   .0002715     0.58   0.565    -.0003757    .0006885
                  2010  |   .0005933   .0004476     1.33   0.185    -.0002839    .0014706
                  2011  |  -.0000429   .0003644    -0.12   0.906    -.0007572    .0006714
                  2012  |   .0001835   .0003093     0.59   0.553    -.0004227    .0007898
                  2013  |  -.0004763   .0002945    -1.62   0.106    -.0010536     .000101
                  2014  |   .0002245   .0003674     0.61   0.541    -.0004956    .0009446
                  2015  |   .0008169   .0004961     1.65   0.100    -.0001555    .0017892
                  2016  |   .0011492   .0005454     2.11   0.035     .0000803    .0022181
                        |
                  _cons |          0  (omitted)
-----------------------------------------------------------------------------------------

Thanks!

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#119

27 Mar 2020, 06:06

My guess is that there seems to be a problem of perfect colinearity (of Dividend2?) with the time dummies. This is not currently checked by the program. You might have to specify the time dummies manually instead using the teffects option (with dummies form 2004 to 2016 only), although the Dividend2 coefficient in that case might still be difficult to interpret because it essentially would just capture the omitted time effect.

https://www.kripfganz.de/stata/
Comment

�
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#120

01 Apr 2020, 03:50

Originally posted by Sebastian Kripfganz View Post

My guess is that there seems to be a problem of perfect colinearity (of Dividend2?) with the time dummies. This is not currently checked by the program. You might have to specify the time dummies manually instead using the teffects option (with dummies form 2004 to 2016 only), although the Dividend2 coefficient in that case might still be difficult to interpret because it essentially would just capture the omitted time effect.

You are right, Prof. Kripfganz. Thanks a lot for your valuable response!
Comment

�

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment