Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about System GMM in Stata 18 for Panel Data Analysis

    I am using translation to write this question, so I apologize if anything is unclear.

    I want to conduct an analysis using System GMM (Blundell and Bond, 1998), but due to the complexity of specifying commands, I have several questions about its implementation in Stata 18. I would appreciate guidance from someone experienced with this method.

    Data:
    • Panel data covering 12 years per firm (2008–2019)
    • Data from 2004–2007 is also available for lagged variables (more on lags below).
    Variables:
    • Dependent variable: Continuous variable At
    • Independent variables:
      • Lagged dependent variable: At-1
      • Quadratic term of At-1
      • Lagged variable: Bt-1
      • Quadratic term of Bt-1
      • Lagged variable: Ct-1
      • Lagged variable: Dt-1
      • Dummy variable: E (0 for years before a specific point, 1 after)
      • Lagged variable: Ft-1
      • Interaction term: E × Ft-1
    Methodology:
    • System GMM (Blundell and Bond, 1998)
    • Treat all variables as endogenous
    • Use lags of t-2, t-3, t-4
    The coefficients of interest are E, F, and the interaction term E × Ft-1.

    Command Draft:
    Below is the draft command I have constructed with initial assistance from resources like ChatGPT:

    xtabond2 At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 i.E L.Ft-1 c.E#c.Ft-1 if analysis_period == 1, gmm(At L.At-1 L.At-1^2 L.Bt-1 L.Bt-1^2 L.Ct-1 L.Dt-1 L.Ft-1, lag(2 4) collapse) iv(i.E i.year, eq(level)) twostep robust

    I have included data from 2004–2007 specifically for lagged variables. However, the primary analysis focuses only on the period from 2008 to 2019. Therefore, I specified the analysis period using the following command:

    gen analysis_period = year >= 2008 & year <= 2019
    I have five four questions about this command:


    1. Selection of Lagged Variables:
    • If I want to use all variables as endogenous with their lagged terms, is it correct to prefix endogenous variables with L.?
    • My dataset includes variables like At, At-1, Bt-1, Ct-1 all within the same row.
    • Should I use:
      1. xtabond2 At L.At (let Stata automatically handle lags)
      2. xtabond2 At At-1 (use pre-existing lagged variables from the dataset)
    • If I mistakenly write L.At-1 in the regression, does it mean Stata uses the 2-period lag instead of a 1-period lag? Similarly, does L.Bt-1 refer to a 2-period lag?
    • If my understanding is correct, my current dataset already includes pre-processed lagged variables such as At, At-1, Bt-1, Ct-1 in the same row. In this case, would it be better to restructure the dataset so that only current-period values (At, Bt, Ct, etc.) are included in the same row, and let Stata handle the lags automatically using L.At, L.Bt, etc.?


    2. Specification of gmm() and iv():
    • In the gmm() option, should I prefix all endogenous variables with L.?
    • In the iv() option, should exogenous variables be specified directly without L.?
    • What is the difference between including eq(level) in iv() versus omitting it?


    3. Final Specification (robust vs twostep):
    • What is the difference between:
      1. robust twostep small
      2. twostep robust
    • How does this affect the estimation results?


    4. Inclusion of Time Dummies (i.year):
    • When using dynamic panel data models, is it correct to include i.year as an explanatory variable in the regression?
    I would greatly appreciate detailed explanations for these points. Thank you in advance for your assistance!
    Last edited by Shogo Yano; 22 Dec 2024, 11:02.
Working...
X