I am using translation to write this question, so I apologize if anything is unclear.
I want to conduct an analysis using System GMM (Blundell and Bond, 1998), but due to the complexity of specifying commands, I have several questions about its implementation in Stata 18. I would appreciate guidance from someone experienced with this method.
Data:
Command Draft:
Below is the draft command I have constructed with initial assistance from resources like ChatGPT:
xtabond2 At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 i.E L.Ft-1 c.E#c.Ft-1 if analysis_period == 1, gmm(At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 L.Ft-1, lag(2 4) collapse) iv(i.E i.year, eq(level)) twostep robust
I have included data from 2004–2007 specifically for lagged variables. However, the primary analysis focuses only on the period from 2008 to 2019. Therefore, I specified the analysis period using the following command:
gen analysis_period = year >= 2008 & year <= 2019
I have five four questions about this command:
1. Selection of Lagged Variables:
2. Specification of gmm() and iv():
3. Final Specification (robust vs twostep):
4. Inclusion of Time Dummies (i.year):
I want to conduct an analysis using System GMM (Blundell and Bond, 1998), but due to the complexity of specifying commands, I have several questions about its implementation in Stata 18. I would appreciate guidance from someone experienced with this method.
Data:
- Panel data covering 12 years per firm (2008–2019)
- Data from 2004–2007 is also available for lagged variables (more on lags below).
- Dependent variable: Continuous variable At
- Independent variables:
- Lagged dependent variable: At-1
- Quadratic term of At-1
- Lagged variable: Bt-1
- Quadratic term of Bt-1
- Lagged variable: Ct-1
- Lagged variable: Dt-1
- Dummy variable: E (0 for years before a specific point, 1 after)
- Lagged variable: Ft-1
- Interaction term: E × Ft-1
- System GMM (Blundell and Bond, 1998)
- Treat all variables as endogenous
- Use lags of t-2, t-3, t-4
Command Draft:
Below is the draft command I have constructed with initial assistance from resources like ChatGPT:
xtabond2 At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 i.E L.Ft-1 c.E#c.Ft-1 if analysis_period == 1, gmm(At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 L.Ft-1, lag(2 4) collapse) iv(i.E i.year, eq(level)) twostep robust
I have included data from 2004–2007 specifically for lagged variables. However, the primary analysis focuses only on the period from 2008 to 2019. Therefore, I specified the analysis period using the following command:
gen analysis_period = year >= 2008 & year <= 2019
I have five four questions about this command:
1. Selection of Lagged Variables:
- If I want to use all variables as endogenous with their lagged terms, is it correct to prefix endogenous variables with L.?
- My dataset includes variables like At, At-1, Bt-1, Ct-1 all within the same row.
- Should I use:
- xtabond2 At L.At (let Stata automatically handle lags)
- xtabond2 At At-1 (use pre-existing lagged variables from the dataset)
- If I mistakenly write L.At-1 in the regression, does it mean Stata uses the 2-period lag instead of a 1-period lag? Similarly, does L.Bt-1 refer to a 2-period lag?
- If my understanding is correct, my current dataset already includes pre-processed lagged variables such as At, At-1, Bt-1, Ct-1 in the same row. In this case, would it be better to restructure the dataset so that only current-period values (At, Bt, Ct, etc.) are included in the same row, and let Stata handle the lags automatically using L.At, L.Bt, etc.?
2. Specification of gmm() and iv():
- In the gmm() option, should I prefix all endogenous variables with L.?
- In the iv() option, should exogenous variables be specified directly without L.?
- What is the difference between including eq(level) in iv() versus omitting it?
3. Final Specification (robust vs twostep):
- What is the difference between:
- robust twostep small
- twostep robust
- How does this affect the estimation results?
4. Inclusion of Time Dummies (i.year):
- When using dynamic panel data models, is it correct to include i.year as an explanatory variable in the regression?