Questions about System GMM in Stata 18 for Panel Data Analysis

Shogo Yano

Join Date: Dec 2024

Posts: 1
#1

Questions about System GMM in Stata 18 for Panel Data Analysis

22 Dec 2024, 10:49

I am using translation to write this question, so I apologize if anything is unclear.

I want to conduct an analysis using System GMM (Blundell and Bond, 1998), but due to the complexity of specifying commands, I have several questions about its implementation in Stata 18. I would appreciate guidance from someone experienced with this method.

Data:
Panel data covering 12 years per firm (2008–2019)

Data from 2004–2007 is also available for lagged variables (more on lags below).

Variables:
Dependent variable: Continuous variable At

Independent variables:
Lagged dependent variable: At-1

Quadratic term of At-1

Lagged variable: Bt-1

Quadratic term of Bt-1

Lagged variable: Ct-1

Lagged variable: Dt-1

Dummy variable: E (0 for years before a specific point, 1 after)

Lagged variable: Ft-1

Interaction term: E × Ft-1

Methodology:
System GMM (Blundell and Bond, 1998)

Treat all variables as endogenous

Use lags of t-2, t-3, t-4

The coefficients of interest are E, F, and the interaction term E × Ft-1.

Command Draft:
Below is the draft command I have constructed with initial assistance from resources like ChatGPT:

xtabond2 At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 i.E L.Ft-1 c.E#c.Ft-1 if analysis_period == 1, gmm(At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 L.Ft-1, lag(2 4) collapse) iv(i.E i.year, eq(level)) twostep robust

I have included data from 2004–2007 specifically for lagged variables. However, the primary analysis focuses only on the period from 2008 to 2019. Therefore, I specified the analysis period using the following command:

gen analysis_period = year >= 2008 & year <= 2019
I have five four questions about this command:

1. Selection of Lagged Variables:
If I want to use all variables as endogenous with their lagged terms, is it correct to prefix endogenous variables with L.?

My dataset includes variables like At, At-1, Bt-1, Ct-1 all within the same row.

Should I use:
xtabond2 At L.At (let Stata automatically handle lags)

xtabond2 At At-1 (use pre-existing lagged variables from the dataset)

If I mistakenly write L.At-1 in the regression, does it mean Stata uses the 2-period lag instead of a 1-period lag? Similarly, does L.Bt-1 refer to a 2-period lag?

If my understanding is correct, my current dataset already includes pre-processed lagged variables such as At, At-1, Bt-1, Ct-1 in the same row. In this case, would it be better to restructure the dataset so that only current-period values (At, Bt, Ct, etc.) are included in the same row, and let Stata handle the lags automatically using L.At, L.Bt, etc.?

2. Specification of gmm() and iv():
In the gmm() option, should I prefix all endogenous variables with L.?

In the iv() option, should exogenous variables be specified directly without L.?

What is the difference between including eq(level) in iv() versus omitting it?

3. Final Specification (robust vs twostep):
What is the difference between:
robust twostep small

twostep robust

How does this affect the estimation results?

4. Inclusion of Time Dummies (i.year):
When using dynamic panel data models, is it correct to include i.year as an explanatory variable in the regression?

I would greatly appreciate detailed explanations for these points. Thank you in advance for your assistance!

Last edited by Shogo Yano; 22 Dec 2024, 11:02.
Tags: None

Announcement

Questions about System GMM in Stata 18 for Panel Data Analysis