xtdpdgmm for Two-Step Diff-GMM and SYS-GMM

Paul Allard

Join Date: Oct 2024

Posts: 6
#1

xtdpdgmm for Two-Step Diff-GMM and SYS-GMM

22 Oct 2024, 13:54

Hi everyone,

I'm trying to estimate the effect of public spending of a certain kind on the real GDP per capita growth rate. I'm trying to use the xtdpdgmm command to perform the Two-Step Diff and Two-Step Sys GMM, but I'm not entirely sure I understand the syntax fully.

Let me explain how I've arranged the dataset, otherwise, the syntax I'm going to show you won’t be clear. I have a balanced panel dataset with 35 countries (unfortunately N is not particularly large) and 25 years, and I use five-year non-overlapping averages, resulting in 5 observations per country in periods 1990-1994, 1995-1999,...,2010-2014.

The equation to estimate is:

y_it- y_i,t-x = (β₁ - 1) y_i,t-x + β₂h_i,t-x + β₃ x_it + α_i + δ_t+ u_it

where t=1994,1999,2004,2009,2014 and x=5, except for the first period where x=4. Further, h_it is assumed to be predetermined and x_it = Mean of x_i from time t-x+1 to time t. For instance, at t=1999, x_i1999 denotes the average of x_i from year 1995 to year 1999.
For each country i, my dataset in stata has 5 rows (all full), and the first row for country i, has the following columns:
i) the dependent variable (y_i,1994- y_i,1990 ) is named gdp_growth
ii) the AR part, y_i,1990, is named gdp_lag
iii) the predetermined variable, h_i,1990, is named school_lag
iv) the control, x_i,1994 is named fiscal

Finally, I create year dummies (years*)

For the Two-Step Diff-GMM (and collapsing the instruments),

xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) nocons two vce(r) nolog

where:
i) the first difference of fiscal (Δx_it) is instrumented by lags 2 and 3 of levels of x_it;
ii) gdp_lag (y_i,t-1 ) is endogenous and in the first differenced equation it is instrumented by lag 1 and 2 levels (which correspond to lag 2 and 3 levels of y_i,t) ;
iii) school_lag is predetermined and since enters at lag 1 in the equation to estimate, it turns out to be exogenous and in the first differences equation, the first difference of school_lag shall be instrumented on itself
iv) For the time dummy, I guess it shall be added in that way.

In this case,only one year dummy is dropped (I was expecting three year dummies to be dropped).

For the Two-Step SYS-GMM (and collapsing instruments),

xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) gmm( fiscal , lag(1 1) diff model(level)) gmm(gdp_lag school_lag years*, lag(0 0) diff model(level)) nocons two vce(r)

where:
i) the lag one first difference of fiscal (Δx_it-1) is used as instruments in the level equation for x_it, respectively ;
ii) the lag 1 first difference of gdp_lag (Δy_i,t-1 ) and school ( Δh_it-1) are used as instruments for y_i,t-1 and h_it-1, respectively.
iii) For the time dummy, I guess it shall be added in that way

However, for the SYS-GMM estimator, none of the year dummies are dropped.

I suspect there is something wrong in my coding, and maybe the way I've arranged the dataset is problematic.

Thanks so much for your help in advance!
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#2

23 Oct 2024, 03:18

As far as I can tell, the command specifications are in line with your explanations. Why would you expect 3 time dummies to be dropped by the diff-GMM estimator? One time dummy needs to be dropped because one observation is effectively lost due to first differencing; this is not the case for system GMM.

https://www.kripfganz.de/stata/
1 like
Comment
Paul Allard

Join Date: Oct 2024

Posts: 6
#3

23 Oct 2024, 03:43

Thank you very much, Sebastian, for your reply. If I understand the diff-GMM correctly, the first available instrument in the first-differenced equation for my endogenous variable, Δx_it, is at time t=3, because at t=3, I use x_i,t-2 as an instrument for Δx_it.The same should apply to SYS-GMM for the first-differenced equation. As a result, time effectively starts at t=3 and we lose two observations. To avoid perfect collinearity among time dummies, the time dummy for t=3 (for instance) would be dropped. Perhaps the issue lies in how I’ve organized my dataset.

Last edited by Paul Allard; 23 Oct 2024, 03:48.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#4

23 Oct 2024, 03:51

The first available instrument for school_lag and the time dummies is t=1. The missing observations for other instruments are internally replaced by 0.

https://www.kripfganz.de/stata/
Comment
Paul Allard

Join Date: Oct 2024

Posts: 6
#5

23 Oct 2024, 04:04

Are you saying that at time t=1, gdpgrowth ( y_it - y_i,t-1 ) is regressed only on school_lag (h_i,t-1) and time dummies? Then, at time t=2 the regression changes only because there is a new available instrument, that is, L.gdp_lag (Lag 2 level of yit) while fiscal (x_it) is still set to 0.
Do you think this is the correct approach, or should I re-arrange the dataset so that my estimations start from time t=3? Thanks!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#6

23 Oct 2024, 05:56

No, the regressors are all nonmissing in all periods. Just some of the instruments are missing in some periods and are effectively not used. This is the GMM idea: You will have more instruments available the larger t becomes. I think what you are doing so far is the right approach.

https://www.kripfganz.de/stata/
Comment
Paul Allard

Join Date: Oct 2024

Posts: 6
#7

23 Oct 2024, 06:03

I see, thanks for your time!
Comment
Paul Allard

Join Date: Oct 2024

Posts: 6
#8

23 Oct 2024, 07:41

Dear Sebastian, Apologies for bothering you again, but as I review the instruments I used in the Two-Step Difference and Two-Step SYS GMM, something seems a bit off to me. When I run the following code:

xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) nocons two vce(r)

I get the following list of instruments:

[
Instruments corresponding to the linear moment conditions:
1, model(diff):
L2.fiscal L3.fiscal
2, model(diff):
L1.gdp_lag L2.gdp_lag
3, model(diff):
school_lag years2 years3 years4 years5
]

However, for the time dummies and school_lag, I was expecting to use their first-differenced values as instruments. I believe the solution to this issue is to specify:

xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(d.school_lag d.years*, lag(0 0) ) nocons two vce(r)

Similarly, for the System GMM, when I run the following code:

xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) gmm( fiscal , lag(1 1) diff model(level)) gmm(gdp_lag school_lag years*, lag(0 0) diff model(level)) nocons two vce(r)

I get this:
[
1, model(diff):
L2.fiscal L3.fiscal
2, model(diff):
L1.gdp_lag L2.gdp_lag
3, model(diff):
school_lag years2 years3 years4
4, model(level):
L1.D.fiscal
5, model(level):
D.gdp_lag D.school D.years4 D.years5
]

Again, to overcome this issue, I guess, I shall type

xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(d.school_lag d.years*, lag(0 0) ) gmm( fiscal , lag(1 1) diff model(level)) gmm(gdp_lag school_lag years*, lag(0 0) diff model(level)) nocons two vce(r)

Finally, I’ve rejected the null hypothesis of no second-order autocorrelation, which suggests that the instruments in levels may not be exogenous. Would you recommend increasing the autoregressive specification of my model by adding y_i,t-2 . If so, how should I appropriately instrument y_i,t-2 ?

Thank you very much!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#9

24 Oct 2024, 03:21

You can either specify the differenced instruments with the d. operator, as you have done in your amended command line, or you can specify the diff suboption, as you have done for the additional system GMM instruments. Both ways are equivalent.

Increasing the autoregressive order by adding a second lag could be a reasonable approach for dealing with serially correlated errors, although it means that you are losing one time period. You can also try to add lags of the regressors instead. These additional lags are instrumented with the same instruments you already have. You might simply want to increase the lag order for the instruments; e.g., lag(2 4) instead of lag(2 3).

Alternatively, instead of adding further lags as regressors, you could also start instrumenting with deeper lags; e.g., lag(3 4) instead of lag(2 3), assuming that there is no higher-order serial correlation in the errors.

https://www.kripfganz.de/stata/
1 like
Comment
Paul Allard

Join Date: Oct 2024

Posts: 6
#10

01 Nov 2024, 15:15

Dear Sebastian,

I apologize for reopening this discussion here, but there are some inconsistencies between xtdpdgmm and xtabond2, as well as questions about the model selection process that I'd appreciate your insights on. I'll outline all my questions in this post to avoid further follow-ups:

Q1: Model Selection Process
I’m working with a panel dataset of 42 countries (I managed to expand the dataset) observed over 25 years (1990-2014), resulting in T=5 (five-year non-overlapping averages) and N=42. For my dependent variable, I also have data from 1985 to 1989, allowing me to structure the dataset so that each country has 5 complete rows, including the lagged dependent variable.

In developing the model, I followed your presentation from the London Stata conference and Kiviet (2019, Econometrics and Statistics) for model selection criteria. In the baseline model (where I vary only the types of fiscal variables, without changing the total number of regressors), the one-step and two-step diff GMM estimators (always using corrected standard errors for both estimators) produce nearly identical point estimates.

However, the two-step diff GMM estimator consistently yields a higher p-value for autocorrelation at lag 1 (which exists by construction) around 0.005 - 0.045 compared to the one-step diff GMM (which is always < 0.01). This seems tolerable, given the 0.05 threshold.

When I expand the baseline model by adding, say, population growth to the fiscal variables—thus increasing both regressors and instruments—the point estimates between the estimators diverge significantly. The one-step diff GMM maintains similar statistical significance and point estimates to the baseline model, suggesting it might be the more reliable choice. My understanding is that in small samples (N=42), estimation of the weighting matrix in two-step GMM can be problematic, and one-step diff GMM may be preferred. Could you confirm if this interpretation is correct and, if possible, point me to a reference on this?

A coding-related question: how are the standard errors calculated in the following commands?

xtdpdgmm gdp_growth gdp_lag school_lag fiscal i.year, model(diff) gmm( fiscal, lag(2 2)) gmm(school_lag, lag(1 1) ) gmm(gdp_lag, lag(1 .)) gmm(i.year, lag(0 0) model(diff) ) nocons vce(r)

xtabond2 gdp_growth gdp_lag school_lag fiscal i.year, gmm(fiscal, lag(2 2)) gmm(school_lag,lag(1 1)) gmm(gdp_lag, lag(1 .)) iv(i.year) noleveleq robust small

Although the point estimates match, the robust standard errors differ. Which is correct?

Q2: Curtailing and Collapsing Instruments

The point estimates are highly sensitive to how I curtail and/or collapse instruments. Following the guideline of keeping the number of instruments (L) fewer than observations (N), i.e., L < N, I also ensured they conform to the rule of thumb you and Kiviet discussed in your presentation. The only lagged term is the dependent variable (lagged once). Autocorrelation issues appear resolved by using second-order and higher lags of the dependent variable (in levels) as instruments. I avoided adding further lags of the dependent variable in the regression specification due to the limited sample size (T=5). Anyway, my model specification is validated by the Andrews and Lu (2001) test. Do you see any concerns with the asymmetric curtailing of instruments, as shown in the example code?

Q3: Two Step SYS-GMM
With a relatively small sample size (N=42), if my model selection and instrument management approach are sound, adding additional moment conditions from SYS-GMM leads to instrument proliferation (53 instruments), and using the collapse option significantly alters my point estimates. Thus, I'm not sure I can consistently test the additional momement conditions on the level equation. Furthermore, the mild stationarity assumption needed for growth regressions is questionable; macroeconomic theory suggests that the initial distance from steady-state is likely correlated with country fixed effects (e.g., institutions). Since the difference in the Hansen tests for my one-step diff GMM supports the exogeneity assumptions, would you advise proceeding with the one-step diff GMM with robust standard errors?

Thank you very much for your time!

Last edited by Paul Allard; 01 Nov 2024, 15:51.
Comment

Announcement

xtdpdgmm for Two-Step Diff-GMM and SYS-GMM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment