XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Prateek Bedi

Join Date: Sep 2018
Posts: 199

#121

07 Apr 2020, 07:21

Hello

I ran the following regression to model accounting performance of Indian firms for the period 2001-16. Two independent variables namely Leverage1 and CurrentRatio are endogenous on account of simultaneity. Other three independent variables namely are Size2, lnAgeofthefirm and SalesGrowth are exogenous. Although all independent variables are statistically significant, the AR(2) and Sargan-Hansen test do not get satisfied. I have tried many combinations of lag lengths with collapse and without collapse options. However, results do not conform to the diagnostic tests of AR(2) and Sargan-Hansen test. Is there anything we can change in the model so that conditions of AR(2) and Sargan-Hansen are met?

Code:

xtdpdgmm Profitability4 L.Profitability4 Size2 lnAgeofthefirm Leverage1 CurrentRatio SalesGrowth , t
> effects twostep vce(cluster CompanyID) gmmiv(L.Profitability4, lag(1 2)  model(fodev)) gmmiv(Leverag
> e1 CurrentRatio, lag(1 1) model(fodev)) iv(Size2 lnAgeofthefirm SalesGrowth , model(level)) nofootno
> te

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .00024728
Step 2         f(b) =  .05463668

Group variable: CompanyID                    Number of obs         =     20574
Time variable: Year                          Number of groups      =      1657

Moment conditions:     linear =      71      Obs per group:    min =         1
                    nonlinear =       0                        avg =  12.41642
                        total =      71                        max =        15

                            (Std. Err. adjusted for 1,657 clusters in CompanyID)
--------------------------------------------------------------------------------
               |              WC-Robust
Profitability4 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
Profitability4 |
           L1. |   .6668446   .0262416    25.41   0.000     .6154121    .7182772
               |
         Size2 |    .001875   .0005509     3.40   0.001     .0007952    .0029548
lnAgeofthefirm |   .0024152   .0011865     2.04   0.042     .0000897    .0047406
     Leverage1 |   .0495562   .0120445     4.11   0.000     .0259494     .073163
  CurrentRatio |  -.0006555   .0002972    -2.21   0.027    -.0012379   -.0000731
   SalesGrowth |   .0357161   .0018874    18.92   0.000     .0320168    .0394154
               |
          Year |
         2003  |    .002406   .0020818     1.16   0.248    -.0016743    .0064863
         2004  |   .0069989   .0020466     3.42   0.001     .0029877    .0110102
         2005  |    .004757   .0020531     2.32   0.021     .0007329    .0087811
         2006  |   .0086493   .0020496     4.22   0.000     .0046321    .0126664
         2007  |   .0050735   .0020601     2.46   0.014     .0010357    .0091112
         2008  |   .0046124    .002133     2.16   0.031     .0004317    .0087931
         2009  |  -.0105933   .0021875    -4.84   0.000    -.0148807   -.0063059
         2010  |   .0152441    .002063     7.39   0.000     .0112006    .0192876
         2011  |   -.000839   .0021657    -0.39   0.698    -.0050836    .0034057
         2012  |  -.0072511   .0021387    -3.39   0.001    -.0114428   -.0030594
         2013  |  -.0026715   .0021785    -1.23   0.220    -.0069412    .0015982
         2014  |  -.0036378   .0022557    -1.61   0.107    -.0080589    .0007834
         2015  |  -.0026539   .0022484    -1.18   0.238    -.0070607    .0017528
         2016  |    .003732   .0023898     1.56   0.118    -.0009518    .0084159
               |
         _cons |  -.0172578   .0060395    -2.86   0.004     -.029095   -.0054206
--------------------------------------------------------------------------------

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =  -16.8996   Prob > |z|  =    0.0000
H0: no autocorrelation of order 2:     z =    2.3835   Prob > |z|  =    0.0171

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix       chi2(50)    =   90.5330
                                                       Prob > chi2 =    0.0004

2-step moment functions, 3-step weighting matrix       chi2(50)    =   90.0441
                                                       Prob > chi2 =    0.0004

Thanks!

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#122

07 Apr 2020, 11:10

You are assuming that Size2 lnAgeofthefirm SalesGrowth are strictly exogenous and in particular also uncorrelated with the unobserved company-specific effects. This is a strong assumption.

Another explanation might be that your model is dynamically misspecified. Further lags of the dependent variable and/or the independent variables might have predictive power when added as regressors. This could help to deal with the serial correlation (and thus also with the overidentification tests). See slides 90 onwards of my 2019 London Stata Conference presentation.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#123

08 Apr 2020, 08:49

Thanks for your quick reply, Prof. Kripfganz. I saw your PPT

1. On slide 90, you mention that higher order lags of dependent variable (or independent variables) may have predictive power and their inclusion may help resolve auto-correlation. However, I have studied in some papers (for eg. Flannery and Hankins, 2013) that instruments in system GMM are invalidated due to presence of second order serial correlation. In this regard, is it fine to include higher order lags of dependent variable as independent variables in the model? Actually, I have not seen a system GMM regression in empirical research which includes higher order lags (second and onward) of dependent variable as regressors in the model.

2. Even when I include the second lag of the dependent variable as an independent variable in my model, it comes out to be insignificant. Should we still include it in our model?

3. I considered the three regressors namely Size2 lnAgeofthefirm SalesGrowth as endogenous by putting them in gmmiv bracket. This also didn't improve diagnostic tests. Whenever we are not sure about a variable being endogenous or exogenous, is it a safer choice to consider it as endogenous?

4. Should I follow the sequential model selection process you describe in your PPT? If yes, is there a relevant text available for the same because I am not able to understand it fully as of now..

Thanks!

Reference:

Flannery, M. and Hankins, K., 2013. Estimating dynamic panel models in corporate finance. Journal of Corporate Finance, 19, pp.1-19.

Last edited by Prateek Bedi; 08 Apr 2020, 08:52.
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#124

08 Apr 2020, 09:20

1. The invalidity of some instruments might precisely be caused by the omission of these variables as regressors from the regression model. If these variables are not included as regressors, they might end up in the error term if they have a nonzero coefficient. When they become part of the error term, we cannot use these variables as instruments any more because they would be naturally correlated with the error term. The other way round, adding those variables (lags of existing variables) as regressors, removes them from the error term and thus could help to improve the validity of the instruments. The discussion in my presentation slides is motivated from a recent paper by Jan Kiviet:
Kiviet, J. (2020). https://doi.org/10.1016/j.ecosta.2019.08.003]Microeconometric dynamic panel data methods: Model specification and selection issues.[/URL] Econometrics and Statistics 13, 16-45.

2. If it helps to improve the AR and Hansen test, it might be worth including it even if it is statistically insignificant. If not, then there is probably no point for including it.

3. As often, it depends. If the instruments are strong enough, then treating the variables as endogenous might be the safer approach. If the instruments are weak, then treating the variables as endogenous might possibly do more harm than good.

4. Please see the reference given to 1.

https://www.kripfganz.de/stata/
2 likes
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#125

08 Apr 2020, 12:31

Thanks a lot, Prof. Kripfganz for clearing my doubts! I can relate to most of your points. Here are some follow-up questions:

1. Inclusion of lags of independent variables is alright. But if inclusion of second lag (and further lags) of dependent variable as regressor in dynamic panel regressions is not prohibited, then why do some papers mention that absence of second order serial correlation is required for the results to be valid? Specifically, I have not seen any model which has second lag (and beyond) of dependent variable as a regressor. So far, I thought that it is not allowed to include second lag of dependent variables as regressor in dynamic panel regression.

2. Further, if we include second lag of dependent variable in our model, don't you think that the hypothesis testing for AR(2) becomes redundant since the researcher himself believes that the dependent variables follows a dynamic data generating process of order 2 i.e. an AR(2) process (which is why he included second lag of dependent variable as a regressor in the first place).

3. I understand that lags of existing independent variables (such as Size, Leverage, Current Ratio etc. in my case) may have an influence on my dependent variable. However, including them in my model poses a challenge to me in the form of justifying my model specification on the basis of existent literature since most earlier papers only include contemporaneous values of these variables as regressors in the model. In such a case, do we build our own theoretical arguments to justify inclusion of lags of independent variables in our model?

4. Could you please throw some light on the difference between weak and strong instruments?

Thanks a lot!
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#126

08 Apr 2020, 12:52

1. It is true that the vast majority of empirical papers includes only 1 lag of the dependent variable as a regressor. It is hard to speculate for the reasons. The absence of second-order serial correlation (of the first-differenced error term) is only relevant for the validity of the instruments. The first lag of the first-differenced dependent variable is endogenous by construction. The second lag of the first-differenced dependent variable is exogenous if there is no second-order serial correlation and endogenous otherwise. In the latter case, we can still continue by using appropriate instruments (e.g. higher-order lags if there is no higher-order serial correlation).

2. No, as indicated in 1., we still need to find valid instruments for both the first and the second lagged dependent variable and this depends on the degree of serial correlation. Adding a second lag of the dependent variable as a regressor does not guarantee that the error term no longer has second-order serial correlation.

3. If you deviate from the existing literature, you can either justify this based on the arguments put forward in Kiviet (2020) or by finding theoretical arguments why lags should have a direct effect in the model.

4. Weak instruments bias the coefficient estimates and result in large standard errors. They also lead to imprecise estimates of the optimal weighting matrix which negatively affects two-step estimation and overidentification tests. You can find a large literature on weak instruments.

https://www.kripfganz.de/stata/
2 likes
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#127

09 Apr 2020, 13:46

Thanks a lot, Prof. Sebastian for your insightful response! Please take care!
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#128

13 Apr 2020, 12:16

Hi,

Although I understand that Sargan-Hansen test of instrument validity may not get satisfied due to multiple reasons including omitted explanatory variables. However, assuming that our model is correctly specified and supposing the Sargan-Hansen conditions are not still getting satisfied, is there a way to know which instruments are causing the problem. For instance, if we have a list of 3 endogenous variables (lagged Y, X1 and X2) for which we are using instruments, how do we figure out that specific endogenous variable which is causing the Sargan-Hansen test to show significant p-values?

Actually, I observed that xtabond2 shows us Sargan-Hansen test separately for different endogenous variables. Is there a similar mechanism in xtdpdgmm as well?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#129

13 Apr 2020, 12:24

Yes, these are Difference-in-Hansen tests or incremental Hansen tests. Please check out slides 48 to 52 of my 2019 London Stata Conference presentation.

https://www.kripfganz.de/stata/
1 like
Comment

Prateek Bedi

Join Date: Sep 2018
Posts: 199

#130

14 Apr 2020, 04:11

Thanks Prof. Kripfganz. I read these slides.

1. I tried calculating the difference-in-Hansen tests for my model. However, I received the following error.

Code:

. xtdpdgmm Profitability4 L.Profitability4 Size2 AgeoftheFirm Leverage1 CurrentRatio SalesGro CapitalE
> xpenditure2 WPromoterSharesin1 AD_Totalremuneration , teffects twostep vce(cluster CompanyID) gmmiv(
> L.Profitability4 , lag(1 10) coll model(fodev)) gmmiv(Leverage1, lag(1 4)  model(fodev)) gmmiv(Curre
> ntRatio, lag(1 4)  model(fodev)) gmmiv(CapitalExpenditure2 , lag(1 4)  model(fodev)) gmmiv(AD_Totalr
> emuneration, lag(1 14)  model(fodev)) iv(Size2 AgeoftheFirm SalesGrowth WPromoterSharesin1, model(le
> vel)) nofootnote overid

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .00059667
Step 2         f(b) =  .22160483

Fitting reduced model 1:
Step 1         f(b) =   .2036076

Fitting reduced model 2:
Step 1         f(b) =  .17650799

Fitting reduced model 3:
Step 1         f(b) =  .17006182

Fitting reduced model 4:
Step 1         f(b) =  .18061305

Fitting reduced model 5:
Step 1         f(b) =  .14821727

Fitting reduced model 6:
Step 1         f(b) =    .221365

Fitting reduced model 7:
Step 1         f(b) =  .20598217
              uniqrows():  3001  expected 1 arguments but received 2
              xtdpdgmm():     -  function returned error
                 <istmt>:     -  function returned error
r(3001);

2. Further, you mention on slide 48 - "Incremental overidentifications tests are only meaningful if the reduced model already passed the overidentification test." Could you please elaborate the significance of this statement, preferably with the help of an example?

Thanks!

Last edited by Prateek Bedi; 14 Apr 2020, 04:15.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#131

14 Apr 2020, 05:17

1. That error should not have happened. Could you please type

Code:

which xtdpdgmm

in Stata and tell me which version of the command you have installed?

2. Suppose you have 2 instruments Z1 and Z2. The incremental overidentification test is comparing 2 versions of the model, one with instruments Z1 and Z2, and another one with instrument Z1 but not Z2 (or the other way round). The null hypothesis to be tested is that Z2 (which is excluded in the second version of the model) is a valid instrument. But since Z1 is still part of both model versions, the test only makes sense if Z1 is a valid instrument. Otherwise the estimates from both model versions will be inconsistent and the test useless. In other words, you would first need to establish that Z1 is a valid instrument.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#132

14 Apr 2020, 06:18

Thanks Prof. Kripfganz. Here is the output I got after typing which xtdpdgmm

Code:

. which xtdpdgmm c:\ado\plus\x\xtdpdgmm.ado *! version 2.2.1 20aug2019 *! Sebastian Kripfganz, www.kripfganz.de
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#133

14 Apr 2020, 06:32

The latest version is 2.2.3. which has a few bugs fixed. May I ask you to update the command and try again whether the error message still occurs?

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

I also recommend to regularly check for updates of community-contributed commands by typing

Code:

adoupdate

in Stata's command window.

https://www.kripfganz.de/stata/
Comment

Prateek Bedi

Join Date: Sep 2018
Posts: 199

#134

14 Apr 2020, 07:47

I updated the command, Prof. Kripfganz. However, the error still persists:

Code:

. xtdpdgmm Profitability4 L.Profitability4 Size2 AgeoftheFirm Leverage1 CurrentRatio SalesGro CapitalE
> xpenditure2 WPromoterSharesin1 AD_Totalremuneration , teffects twostep vce(cluster CompanyID) gmmiv(
> L.Profitability4 , lag(1 10) coll model(fodev)) gmmiv(Leverage1, lag(1 4)  model(fodev)) gmmiv(Curre
> ntRatio, lag(1 4)  model(fodev)) gmmiv(CapitalExpenditure2 , lag(1 4)  model(fodev)) gmmiv(AD_Totalr
> emuneration, lag(1 14)  model(fodev)) iv(Size2 AgeoftheFirm SalesGrowth WPromoterSharesin1, model(le
> vel)) overid

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .00059667
Step 2         f(b) =  .22160483

Fitting reduced model 1:
Step 1         f(b) =   .2036076

Fitting reduced model 2:
Step 1         f(b) =  .17650799

Fitting reduced model 3:
Step 1         f(b) =  .17006182

Fitting reduced model 4:
Step 1         f(b) =  .18061305

Fitting reduced model 5:
Step 1         f(b) =  .14821727

Fitting reduced model 6:
Step 1         f(b) =    .221365

Fitting reduced model 7:
Step 1         f(b) =  .20598217
              uniqrows():  3001  expected 1 arguments but received 2
              xtdpdgmm():     -  function returned error
                 <istmt>:     -  function returned error
r(3001);

I hope there's nothing wrong with my command. Further, I have a query. As you mentioned in #131, incremental overidentification compares 2 versions of the model, one with instruments Z1 and Z2, and another one with instrument Z1 but not Z2. However, since I run only one model with all the instruments, how shall the software decide which instruments to keep and which ones to drop? Moreover, how do we read the output of difference-in-Hansen test that you mention on slide 52 of your presentation?

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#135

14 Apr 2020, 08:24

Would it be possible for you to send me your data set by e-mail so that I can locate where exactly the problem is?

The command separates your instruments into different sets, one for each gmm() or iv() option. It then internally reestimates the model, leaving out one of those instrument sets at a time. The table you can see on slide 52 shows the Hansen test for the model with Z1 but without the respective Z2 in the first column and the difference to the Hansen test for the full model with Z1 and Z2 in the second column. The test in the first column should not reject the null hypothesis. Otherwise, the test in the second column is meaningless as explained before. The test in the second column is the test for the validity of Z2 given that Z1 is valid.

https://www.kripfganz.de/stata/
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment