XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#676

13 Jan 2025, 07:40

As long as you do not change the second equation, my earlier statement about X_it being predetermined still stands.

Once X_it becomes a direct function of Y_it, as in your amended second equation, X_it becomes endogenous.

https://www.kripfganz.de/stata/
Comment
Nursena Sagir

Join Date: Jan 2022

Posts: 25
#677

13 Jan 2025, 08:52

Dear Sebastian,

Is there any reference that I can understand this better and explain why I can treat X_it as predetermined? I had difficulties to explain it in the method section of my paper. Or can you elaborate more on your reasoning?

Best regards,
Nursena
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#678

14 Jan 2025, 02:05

Any standard econometrics textbook should cover systems of simultaneous equations and regressor endogeneity.

A variable is predetermined if it is a function of (i.e., determined by) previous periods' shocks to the equation of interest (but not current or future periods' shocks).

https://www.kripfganz.de/stata/
Comment
Nicu Sprincean

Join Date: Nov 2016

Posts: 47
#679

07 Feb 2025, 06:30

Sebastian Kripfganz

Hi, Sebastian,

I have a question regarding a model specification where a I include the second lag of the dependent variable to deal with serial correlation. The model takes the following form:

Code:

xtdpdgmm DV l(1/2).DV l.IV l.(controls), gmm(l(1/2).DV, lag(0 0) collapse m(fodev)) gmm(l.IV, lag(0 3) collapse m(fodev)) iv(l.(controls),d m(level)) teffects two vce(robust) nocons

I assume that l.DV and l.IV are both predetermined - all right-hand side variables enter the model with a one-year lag, due to economic reasons, and all controls to be strictly exogeneous. I am not sure whether

Code:

gmm(l(1/2).DV, lag(0 0) collapse m(fodev))

and

Code:

iv(l.(controls),d m(level))

are correctly specified.

Thank you in advance for your response!

Last edited by Nicu Sprincean; 07 Feb 2025, 06:51.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#680

07 Feb 2025, 07:30

There is nothing wrong with these instruments per se. It is just a bit unusual that you are using different lag orders to instrument the lagged dependent variables and the independent variable. You should make sure that this can be justified; otherwise it looks like cherry picking a model specification that delivers the nicest results.

For the controls, it is also unusual to not specify instruments for the transformed model; e.g., iv(l.(controls), m(fodev)).

As a technical comment, note that gmm(l(1/2).DV, lag(0 0) collapse m(fodev)) is equivalent to iv(l(1/2).DV, m(fodev)).

https://www.kripfganz.de/stata/
Comment
Matej Korinek

Join Date: Mar 2025

Posts: 1
#681

02 Apr 2025, 09:16

Dear professor Kripfganz, Sebastian Kripfganz

I am just wondering about the following. I went through your 2019 London Stata Conference presentation very carefully. I am not sure how to exactly interpret the Incremental overidentification test. In particular, I have the following output:

Sargan-Hansen (difference) test of the overidentifying restrictions
H0: (additional) overidentifying restrictions are valid

2-step weighting matrix from full model

| Excluding | Difference
Moment conditions | chi2 df p | chi2 df p
------------------+-----------------------------+-----------------------------
1, model(fodev) | 249.0901 235 0.2521 | 5.7804 13 0.9538
2, model(fodev) | 171.4524 131 0.0102 | 83.4181 117 0.9919
3, model(level) | 244.7790 235 0.3172 | 10.0916 13 0.6864
4, model(level) | 170.3794 131 0.0118 | 84.4912 117 0.9897
model(fodev) | 151.3709 118 0.0208 | 103.4997 130 0.9581
model(level) | 151.3709 118 0.0208 | 103.4997 130 0.9581

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix chi2(248) = 254.8706
Prob > chi2 = 0.3686

2-step moment functions, 3-step weighting matrix chi2(248) = 265.6096
Prob > chi2 = 0.2111

For example, the first row tells me the Hansen J statistic from the whole moment matrix excluding 13 moment conditions (I have 13 collapsed lags of the dependent variable) and then the same statistics just for those 13 conditions.. In particular I do not understand those last two rows. What do they mean? How to interpret them? In particular, I know that the difference column should tell you the Hansen J statistics of just the level equation moments (or the fodev moments based on the particular row) but what model does those 118 moment conditions in the excluding column represent? I am just trying to understand that particular reduced model since it never passes in my models. That leads me to second question. I always nicely pass the difference criteria with p-value above 0.8 always. But frequently, I do not pass the reduced model in the excluding column. Is that a problem? How is that possible?

Thank you for your time

Matěj Kořínek
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#682

03 Apr 2025, 04:44

The first column, labelled "Excluding", provides an overidentification test for a model without the instruments from the respective row. If this test rejects the null hypothesis, then this indicates that even without those instruments the model might be misspecified. In this case, adding the respective instruments would not help, even if the additional ones were all valid, because there are still some invalid instruments. Thus, testing those additional instruments would not be feasible. This is because the column labelled "Difference" effectively compares the Hansen test for the full model with all instruments to the initial model from the "Excluding" column. But if this initial model is misspecified, then the "Difference" test would compare the full model to a misspecified model. Consequently, if both models are similarly misspecified, it the "Difference" test might not reject. But this could be misleading. Therefore, looking at the "Difference" test really only makes sense when the "Excluding" test is successfully passed.

The last two rows in your table are basically a combination of rows 1-2 and rows 3-4, respectively. For example, in the last row the "Excluding" test jointly excludes all the instruments for the level model from rows 3 and 4, and the "Difference" test compares the full model to this model with those excluded instruments.

While I understand why the degrees of freedoms in the last two rows are identical - this is because there is an equal number of instruments for the level model and the transformed model - I am bit puzzled about the numerically identical values of the test statistic. This looks a bit odd and appears to be a bug!

https://www.kripfganz.de/stata/
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#683

03 Apr 2025, 05:05

An important update to version 2.6.9 is now available for xtdpdgmm from my personal website, which fixes the bug just mentioned in the previous post, where some of the Difference-in-Hansen test statistics obtained after running xtdpdgmm with option overid had been incorrect. (The source of the bug was an incorrect selection of the relevant moments when combining all instruments for the level model or a transformed model.)

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

https://www.kripfganz.de/stata/
Comment

Arkangel Cordero

Join Date: Apr 2020
Posts: 32

#684

03 Apr 2025, 18:12

Dear Professor @Sebastian Kripfganz

I hope you are well.

I have, what I hope is, a quick set of questions. I am playing with the following four models below (Models I through IV).

Code:

webuse abdata, clear

/* Balancing the panels for simplicty*/
keep if year>=1977&year<=1982
by id: keep if _N==6

/* Model I: xtdpdgmm NATIVE syntax for GMM-Style instruments */
xtdpdgmm L(0/1).n wage emp k yr1979 - yr1982, model(difference) gmm(L.(n), lag(1 2) model(difference)) iv(wage emp k yr1979 - yr1982, model(difference))  twostep nocons vce(cluster id)
estimate store m1_xtdpdgmm




/* Model II: xtdpdgmm with GMM-Style instruments calculated "by hand" */
* Generate GMM-Style instruments "by hand"
foreach var of varlist n {
    di "`var'"
forvalues lag = 2(1)3 {
 display `lag'
 
 capture drop il`lag'`var'
 
 gen il`lag'`var' = L`lag'.`var'
 
 replace il`lag'`var' = 0 if il`lag'`var' ==.
 

 foreach year of varlist yr1977- yr1982 {
            
            capture drop i`year'l`lag'`var'
            gen i`year'l`lag'`var' = `year' * il`lag'`var'
            replace i`year'l`lag'`var' = 0 if i`year'l`lag'`var'  == .
            *replace i`year'l`lag'`var' = . if year == 1977
}
}
}

findname, all(@==0)
drop `r(varlist)'

xtdpdgmm L(0/1).n wage emp k yr1979 - yr1982, model(difference) iv(iyr* wage emp k yr1979 - yr1982, model(difference))  twostep nocons vce(cluster id)
estimate store m2_xtdpdgmm




/* Model III: Stata's "gmm" command with the GMM-Style instruments calculated by hand ORTHOGONALIZED relative to the fixed-effects */
* Orthogonalize instruments calculated "by hand" relative to the fixed-effects
foreach var of varlist iy* wage emp k yr1979 - yr1982 {    
    capture drop  orth_`var'    
    gen orth_`var' = `var'
    bysort id (year): replace orth_`var' = 0 if _n == 2
    bysort id (year):  replace orth_`var'  = orth_`var' - F1.orth_`var' if _n != _N    
    bysort id (year): replace orth_`var'  = . if _n == 1    
}

gmm (eq1: n  - {n:  L.n wage emp k yr1979 yr1980 yr1981 yr1982}), ///
    instruments(orth_*, noconstant) ///
    winitial(unadjusted) vce(cluster id) twostep
estimate store m1_gmm


/* Model IV:  Stata's "gmm" command with the GMM-Style instruments calculated "by hand" but NOT orthogonalized with respect to the fixed-effects */
gmm (D.n - {xb: LD.n D.wage D.emp D.k D.yr1979 D.yr1980 D.yr1981 D.yr1982}), ///
    instruments(iyr* wage emp k yr1979 - yr1982, noconstant) ///
    winitial(unadjusted) vce(cluster id) twostep
estimate store m2_gmm

esttab  m1_xtdpdgmm m2_xtdpdgmm m1_gmm m2_gmm ,  b(7) se(7) order(L.n LD.n wage D.wage emp D.emp k D.k)

With results:

HTML Code:

----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)  
                        n               n                                  
----------------------------------------------------------------------------
main                                                                        
L.n            -0.1244813      -0.1244813      -0.1244813                  
              (0.3169631)     (0.3169631)     (0.2429797)                  

LD.n                                                           -0.1492370  
                                                              (0.2419743)  

wage           -0.0294276      -0.0294276      -0.0294276                  
              (0.0170682)     (0.0170682)     (0.0160661)                  

D.wage                                                         -0.0299815  
                                                              (0.0163513)  

emp             0.0144419       0.0144419       0.0144419*                  
              (0.0092611)     (0.0092611)     (0.0071777)                  

D.emp                                                           0.0142584*  
                                                              (0.0072503)  

k               1.0777604**     1.0777604**     1.0777604***                
              (0.3357940)     (0.3357940)     (0.2396408)                  

D.k                                                             1.1037843***
                                                              (0.2389169)  

yr1979         -0.0256686*     -0.0256686*     -0.0256686*                  
              (0.0121993)     (0.0121993)     (0.0123034)                  

yr1980         -0.0271234      -0.0271234      -0.0271234                  
              (0.0151974)     (0.0151974)     (0.0155296)                  

yr1981         -0.0024327      -0.0024327      -0.0024327                  
              (0.0345831)     (0.0345831)     (0.0299392)                  

yr1982          0.0534354       0.0534354       0.0534354                  
              (0.0579830)     (0.0579830)     (0.0523127)                  

D.yr1979                                                       -0.0260374*  
                                                              (0.0125335)  

D.yr1980                                                       -0.0267815  
                                                              (0.0158829)  

D.yr1981                                                       -0.0002548  
                                                              (0.0303609)  

D.yr1982                                                        0.0568577  
                                                              (0.0532091)  
----------------------------------------------------------------------------
N                     690             690             690             552  
----------------------------------------------------------------------------  
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

I have three sets of questions that I was hoping you could provide some guidance with. Please not that all pertain to the “classic” first-difference” gmm models a la Arellano & Bond (1991).

1) My first question is about the way that xtdpdgmm orthogonalizes the instruments with respect to the unit-level fixed-effects. Previously you mentioned that the key is that the sum of the instruments within panel be equal to “0”.

a. My way of understanding the above statement is that if the within-unit (i.e., panel) sum of an instrument is equal to “0”, then its mean will also be equal to “0”. Under such circumstances, the instrument become deviations from its within unit means (which is “0”), and therefore orthogonal to the unit fixed-effects. Is this interpretation accurate?

b. When orthogonalizing each instrument with respect to the unit-fixed effects, it seems that xtdpdgmm simply subtracts from each value the value at the next time period within each unit(panel). Is that correct?

c. If so, it appears that for each instrument:

*The value of an instrument for the last time-period within each unit is left intact because we don’t have anything to subtract from it. Is that correct?

*The value of an instrument for the first time-period within each unit is set to missing to avoid the “dummy-variable trap”. Is that accurate?

*The value of an instrument for the second time-period within each unit is set to “0” before subtracting the value for the subsequent time period. This is done because we have set the first period to missing and, hence, to ensure that the within unit sum is “0”. Is that correct?

2) Can you please provide some insight as to why xtdpdgmm chooses to orthogonalize the instruments rather than taking first-differences in the equation to be estimated? I noticed that the sample size is larger because of this. Is that part of the reason? I’m just curious.

3) Finally, as you can see, models I through III above, all estimate the same coefficients. Do you have any insights as to why the “un-orthogonalized” instruments produce different coefficient estimates in the last model (Model IV) when they produce the correct estimates with xtdpdgmm in Model II?

Thank you in advance for any insights.

Last edited by Arkangel Cordero; 03 Apr 2025, 19:07.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#685

04 Apr 2025, 08:17

Thank you for this well-designed replication example.

I will start with question 3. The reason why your final results differ from the previous ones is that the unadjusted initial weighting matrix does not account for the first-order serial correlation in the first-differenced equation. You would need to use winitial(xt D) for this purpose. However, the command only allows this option in combination with xtinstruments(). You can trick the gmm command to deliver the desired results by supplying an xt-instrument full of zeros:

Code:

gen zeros = 0 gmm (D.n - {xb: LD.n D.wage D.emp D.k D.yr1979 D.yr1980 D.yr1981 D.yr1982}), /// instruments(iyr* wage emp k yr1979 - yr1982, noconstant) /// xtinstruments(zeros, lags(0/0)) winitial(xt D) vce(cluster id) twostep

Regarding your other questions:
1.a. You could think about it this way, yes.
1.b/c. What you have done in your manual construction of the instruments is correct.
2. The rational behind this approach is that there is no need to estimate a "system" of equations, when you think about the system GMM estimator. Of course, the Arellano-Bond estimator only has one equation in first differences, but the approach taken by xtdpdgmm is that all transformations are special cases of the system approach, which can be recast as a conventional estimator for the equation in levels with appropriately orthogonalized instruments. This simplifies the command's architecture substantially and makes it straightforward to implement any other type of transformation (e.g., forward-orthogonal deviations). The sample size is larger because it refers to the level model, not the first-differenced one. It is true though that effectively still one observation is lost due to the orthogonalization. Consequently, it is a fair question whether this should be reflected in the reported number of observations.

https://www.kripfganz.de/stata/
Comment
Arkangel Cordero

Join Date: Apr 2020

Posts: 32
#686

04 Apr 2025, 11:00

Dear Professor @ Sebastian Kripfganz,

Thank you for your insights. I have two followup questions.

1) Really cool the way around tricking the gmm command into accounting for first-order serial correlation in the first-difference equation. My question is, why was this not necessary in Model 3 on my original post above in #684?

2) Also referring to Model 3 in #684 above, is the reason that when orthogonalizing the instruments relative to the unit fixed-effects we force the first observation for each panel to be missing to a) avoid the dummy variable trap or b) because our estimator is really a first-difference model and we are taking into account that we lose the first observation?

Thank you again!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#687

06 Apr 2025, 03:41

1) Model 3 was implemented in terms of the level equation with serially uncorrelated idiosyncratic errors. This is another benefit of orthogonalizing the instruments instead of transforming the model itself: You can use all the conventional procedures, including conventional weighting matrices.

2) I have never really thought about an interpretation of the specific form of the orthogonalized instruments. There construction is merely mechanical; see slide 33 of my 2019 London Stata Conference presentation. I guess, interpretation (b) makes more sense.

https://www.kripfganz.de/stata/
Comment
Arkangel Cordero

Join Date: Apr 2020

Posts: 32
#688

07 Apr 2025, 17:24

Dear Professor @Sebastian Kripfganz,

Thank you for your always valuable insights!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment