Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtdpdgmm for Two-Step Diff-GMM and SYS-GMM

    Hi everyone,

    I'm trying to estimate the effect of public spending of a certain kind on the real GDP per capita growth rate. I'm trying to use the xtdpdgmm command to perform the Two-Step Diff and Two-Step Sys GMM, but I'm not entirely sure I understand the syntax fully.

    Let me explain how I've arranged the dataset, otherwise, the syntax I'm going to show you won’t be clear. I have a balanced panel dataset with 35 countries (unfortunately N is not particularly large) and 25 years, and I use five-year non-overlapping averages, resulting in 5 observations per country in periods 1990-1994, 1995-1999,...,2010-2014.

    The equation to estimate is:

    yit - yi,t-x = (β1 - 1) yi,t-x + β2 hi,t-x + β3 xit + αi + δt + uit

    where t=1994,1999,2004,2009,2014 and x=5, except for the first period where x=4. Further, hit is assumed to be predetermined and xit = Mean of xi from time t-x+1 to time t. For instance, at t=1999, xi1999 denotes the average of xi from year 1995 to year 1999.
    For each country i, my dataset in stata has 5 rows (all full), and the first row for country i, has the following columns:
    i) the dependent variable (yi,1994 - yi,1990 ) is named gdp_growth
    ii) the AR part, yi,1990, is named gdp_lag
    iii) the predetermined variable, hi,1990, is named school_lag
    iv) the control, xi,1994 is named fiscal

    Finally, I create year dummies (years*)

    For the Two-Step Diff-GMM (and collapsing the instruments),

    xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) nocons two vce(r) nolog

    where:
    i) the first difference of fiscal (Δxit) is instrumented by lags 2 and 3 of levels of xit;
    ii) gdp_lag (yi,t-1 ) is endogenous and in the first differenced equation it is instrumented by lag 1 and 2 levels (which correspond to lag 2 and 3 levels of yi,t) ;
    iii) school_lag is predetermined and since enters at lag 1 in the equation to estimate, it turns out to be exogenous and in the first differences equation, the first difference of school_lag shall be instrumented on itself
    iv) For the time dummy, I guess it shall be added in that way.

    In this case,only one year dummy is dropped (I was expecting three year dummies to be dropped).

    For the Two-Step SYS-GMM (and collapsing instruments),

    xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) gmm( fiscal , lag(1 1) diff model(level)) gmm(gdp_lag school_lag years*, lag(0 0) diff model(level)) nocons two vce(r)

    where:
    i) the lag one first difference of fiscal (Δxit-1) is used as instruments in the level equation for xit, respectively ;
    ii) the lag 1 first difference of gdp_lag (Δyi,t-1 ) and school ( Δhit-1) are used as instruments for yi,t-1 and hit-1, respectively.
    iii) For the time dummy, I guess it shall be added in that way

    However, for the SYS-GMM estimator, none of the year dummies are dropped.

    I suspect there is something wrong in my coding, and maybe the way I've arranged the dataset is problematic.

    Thanks so much for your help in advance!

  • #2
    As far as I can tell, the command specifications are in line with your explanations. Why would you expect 3 time dummies to be dropped by the diff-GMM estimator? One time dummy needs to be dropped because one observation is effectively lost due to first differencing; this is not the case for system GMM.
    https://twitter.com/Kripfganz

    Comment


    • #3
      Thank you very much, Sebastian, for your reply. If I understand the diff-GMM correctly, the first available instrument in the first-differenced equation for my endogenous variable, Δxit, is at time t=3, because at t=3, I use xi,t-2 as an instrument for Δxit. The same should apply to SYS-GMM for the first-differenced equation. As a result, time effectively starts at t=3 and we lose two observations. To avoid perfect collinearity among time dummies, the time dummy for t=3 (for instance) would be dropped. Perhaps the issue lies in how I’ve organized my dataset.
      Last edited by Paul Allard; 23 Oct 2024, 04:48.

      Comment


      • #4
        The first available instrument for school_lag and the time dummies is t=1. The missing observations for other instruments are internally replaced by 0.
        https://twitter.com/Kripfganz

        Comment


        • #5
          Are you saying that at time t=1, gdpgrowth ( yit - yi,t-1 ) is regressed only on school_lag (hi,t-1) and time dummies? Then, at time t=2 the regression changes only because there is a new available instrument, that is, L.gdp_lag (Lag 2 level of yit) while fiscal (xit) is still set to 0.
          Do you think this is the correct approach, or should I re-arrange the dataset so that my estimations start from time t=3? Thanks!

          Comment


          • #6
            No, the regressors are all nonmissing in all periods. Just some of the instruments are missing in some periods and are effectively not used. This is the GMM idea: You will have more instruments available the larger t becomes. I think what you are doing so far is the right approach.
            https://twitter.com/Kripfganz

            Comment


            • #7
              I see, thanks for your time!

              Comment


              • #8
                Dear Sebastian, Apologies for bothering you again, but as I review the instruments I used in the Two-Step Difference and Two-Step SYS GMM, something seems a bit off to me. When I run the following code:


                xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) nocons two vce(r)

                I get the following list of instruments:

                [
                Instruments corresponding to the linear moment conditions:
                1, model(diff):
                L2.fiscal L3.fiscal
                2, model(diff):
                L1.gdp_lag L2.gdp_lag
                3, model(diff):
                school_lag years2 years3 years4 years5
                ]

                However, for the time dummies and school_lag, I was expecting to use their first-differenced values as instruments. I believe the solution to this issue is to specify:

                xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(d.school_lag d.years*, lag(0 0) ) nocons two vce(r)

                Similarly, for the System GMM, when I run the following code:

                xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(school_lag years*, lag(0 0) ) gmm( fiscal , lag(1 1) diff model(level)) gmm(gdp_lag school_lag years*, lag(0 0) diff model(level)) nocons two vce(r)

                I get this:
                [
                1, model(diff):
                L2.fiscal L3.fiscal
                2, model(diff):
                L1.gdp_lag L2.gdp_lag
                3, model(diff):
                school_lag years2 years3 years4
                4, model(level):
                L1.D.fiscal
                5, model(level):
                D.gdp_lag D.school D.years4 D.years5
                ]


                Again, to overcome this issue, I guess, I shall type

                xtdpdgmm gdp_growth gdp_lag school_lag fiscal years*, model(diff) collapse gmm( fiscal, lag(2 3)) gmm(gdp_lag, lag(1 2)) gmm(d.school_lag d.years*, lag(0 0) ) gmm( fiscal , lag(1 1) diff model(level)) gmm(gdp_lag school_lag years*, lag(0 0) diff model(level)) nocons two vce(r)

                Finally, I’ve rejected the null hypothesis of no second-order autocorrelation, which suggests that the instruments in levels may not be exogenous. Would you recommend increasing the autoregressive specification of my model by adding yi,t-2 . If so, how should I appropriately instrument yi,t-2 ?

                Thank you very much!

                Comment


                • #9
                  You can either specify the differenced instruments with the d. operator, as you have done in your amended command line, or you can specify the diff suboption, as you have done for the additional system GMM instruments. Both ways are equivalent.

                  Increasing the autoregressive order by adding a second lag could be a reasonable approach for dealing with serially correlated errors, although it means that you are losing one time period. You can also try to add lags of the regressors instead. These additional lags are instrumented with the same instruments you already have. You might simply want to increase the lag order for the instruments; e.g., lag(2 4) instead of lag(2 3).

                  Alternatively, instead of adding further lags as regressors, you could also start instrumenting with deeper lags; e.g., lag(3 4) instead of lag(2 3), assuming that there is no higher-order serial correlation in the errors.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Dear Sebastian,

                    I apologize for reopening this discussion here, but there are some inconsistencies between xtdpdgmm and xtabond2, as well as questions about the model selection process that I'd appreciate your insights on. I'll outline all my questions in this post to avoid further follow-ups:


                    Q1: Model Selection Process
                    I’m working with a panel dataset of 42 countries (I managed to expand the dataset) observed over 25 years (1990-2014), resulting in T=5 (five-year non-overlapping averages) and N=42. For my dependent variable, I also have data from 1985 to 1989, allowing me to structure the dataset so that each country has 5 complete rows, including the lagged dependent variable.

                    In developing the model, I followed your presentation from the London Stata conference and Kiviet (2019, Econometrics and Statistics) for model selection criteria. In the baseline model (where I vary only the types of fiscal variables, without changing the total number of regressors), the one-step and two-step diff GMM estimators (always using corrected standard errors for both estimators) produce nearly identical point estimates.

                    However, the two-step diff GMM estimator consistently yields a higher p-value for autocorrelation at lag 1 (which exists by construction) around 0.005 - 0.045 compared to the one-step diff GMM (which is always < 0.01). This seems tolerable, given the 0.05 threshold.

                    When I expand the baseline model by adding, say, population growth to the fiscal variables—thus increasing both regressors and instruments—the point estimates between the estimators diverge significantly. The one-step diff GMM maintains similar statistical significance and point estimates to the baseline model, suggesting it might be the more reliable choice. My understanding is that in small samples (N=42), estimation of the weighting matrix in two-step GMM can be problematic, and one-step diff GMM may be preferred. Could you confirm if this interpretation is correct and, if possible, point me to a reference on this?

                    A coding-related question: how are the standard errors calculated in the following commands?

                    xtdpdgmm gdp_growth gdp_lag school_lag fiscal i.year, model(diff) gmm( fiscal, lag(2 2)) gmm(school_lag, lag(1 1) ) gmm(gdp_lag, lag(1 .)) gmm(i.year, lag(0 0) model(diff) ) nocons vce(r)

                    xtabond2 gdp_growth gdp_lag school_lag fiscal i.year, gmm(fiscal, lag(2 2)) gmm(school_lag,lag(1 1)) gmm(gdp_lag, lag(1 .)) iv(i.year) noleveleq robust small

                    Although the point estimates match, the robust standard errors differ. Which is correct?

                    Q2: Curtailing and Collapsing Instruments

                    The point estimates are highly sensitive to how I curtail and/or collapse instruments. Following the guideline of keeping the number of instruments (L) fewer than observations (N), i.e., L < N, I also ensured they conform to the rule of thumb you and Kiviet discussed in your presentation. The only lagged term is the dependent variable (lagged once). Autocorrelation issues appear resolved by using second-order and higher lags of the dependent variable (in levels) as instruments. I avoided adding further lags of the dependent variable in the regression specification due to the limited sample size (T=5). Anyway, my model specification is validated by the Andrews and Lu (2001) test. Do you see any concerns with the asymmetric curtailing of instruments, as shown in the example code?

                    Q3: Two Step SYS-GMM
                    With a relatively small sample size (N=42), if my model selection and instrument management approach are sound, adding additional moment conditions from SYS-GMM leads to instrument proliferation (53 instruments), and using the collapse option significantly alters my point estimates. Thus, I'm not sure I can consistently test the additional momement conditions on the level equation. Furthermore, the mild stationarity assumption needed for growth regressions is questionable; macroeconomic theory suggests that the initial distance from steady-state is likely correlated with country fixed effects (e.g., institutions). Since the difference in the Hansen tests for my one-step diff GMM supports the exogeneity assumptions, would you advise proceeding with the one-step diff GMM with robust standard errors?

                    Thank you very much for your time!
                    Last edited by Paul Allard; 01 Nov 2024, 16:51.

                    Comment

                    Working...
                    X