Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to conduct GMM style regression anaylsis

    Hi.

    I know this question has been asked numerous times, I also did read roodman and blundell, searched for the answers on stata forum. However, I am still confused about the model. I don't have econometrics background so maybe that's why. But I am just writing down the command here and asking my queries, if someone kind enough to explain that to me.



    So I am using two-step system GMM for my model.


    xtabond2 Y Y(t-1) X control vars i.Year, gmm (Y(t-1)) iv ( control vars i.Year, equation(level)) nodiffsargan twostep robust orthogonal small

    Here are the questions:

    1) In gmm() , what sort of variable can we put there? endogenous to what? In this case, I am using the first lag of my dependent variable, Does it make sense?
    2) In iv() can we put control vars and i.Year ? Again these vars are endogenous to what?

    Also, is the command xtabond2 correct to be used in this way? I know this is not the place to ask such question because it may seem like a assignment type of a question, but it is more about understanding the dynamics of the model in simple terms for research? I know theory has to play a big role in term of finding endogenous variables, but I just want to know the basics of the model.

    Any comments and feedback shall be appreciated.










  • #2
    1) There can be many reasons why a variable is endogenous, e.g. simultaneity (i.e. feedback from the dependent variable to the regressors). In panel data, a common issue is the potential correlation of the regressors with the unobserved group-specific effects. For the lagged dependent variable, this correlation exists by construction of the model, which is why it is usually instrumented with GMM-style instruments.

    2) Any variables you put in your iv() option for the level equation are assumed to be uncorrelated with the unobserved group-specific effects (and exogenous with regard to any other unobserved variables). This is comparable to a "random effects" assumption for these variables.

    If your control variables are strictly exogenous and uncorrelated with the unobserved group-specific effects, then your specification could be correct.

    More on the GMM estimation of linear dynamic panel data models in Stata:
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Many thanks for your valuable feedback.

      Comment


      • #4
        Hello! I'm currently running GMM tests in Stata but I keep getting both a very high number of instruments (100-200+) and an even higher Wald Chi2 score (1.71e+08).

        Below is the code that I ran and its following output:

        xtdpdsys PFAGDP LNGDPPcap SDGDum LNGDPPcapxSDGDum DependencyRatio Inflation CMReturns PopGrowth LFParticRate, lags(1) twostep artests(2).
        System dynamic panel-data estimation Number of obs = 603
        Group variable: Country Number of groups = 35
        Time variable: Year
        Obs per group:
        min = 1
        avg = 17.22857
        max = 20

        Number of instruments = 218 Wald chi2(9) = 1.71e+08
        Prob > chi2 = 0.0000
        Two-step results
        ----------------------------------------------------------------------------------
        PFAGDP | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
        PFAGDP |
        L1. | . 9695742 .003495 277.42 0.000 .9627241 .9764244
        |
        LNGDPPcap | 35.98271 1.9354 18.59 0.000 32.18939 39.77602
        SDGDum | 16.02427 5.916328 2.71 0.007 4.428483 27.62006
        LNGDPPcapxSDGDum | -3.615439 1.266003 -2.86 0.004 -6.096759 -1.134119
        DependencyRatio | 3.062755 .2688706 11.39 0.000 2.535778 3.589732
        Inflation | -.1901946 .0337445 -5.64 0.000 -.2563326 -.1240565
        CMReturns | 11.93403 .2071649 57.61 0.000 11.528 12.34007
        PopGrowth | -2.534041 .3077528 -8.23 0.000 -3.137226 -1.930857
        LFParticRate | -.3305291 .0430299 -7.68 0.000 -.4148661 -.246192
        _cons | -152.4423 8.000879 -19.05 0.000 -168.1238 -136.7609
        ----------------------------------------------------------------------------------
        Warning: gmm two-step standard errors are biased; robust standard
        errors are recommended.
        Instruments for differenced equation
        GMM-type: L(2/.).PFAGDP
        Standard: D.LNGDPPcap D.SDGDum D.LNGDPPcapxSDGDum D.DependencyRatio
        D.Inflation D.CMReturns D.PopGrowth D.LFParticRate
        Instruments for level equation
        GMM-type: LD.PFAGDP
        Standard: _cons

        When i tried running the following code, i was able to lower the number of instruments and Wald Chi2 score, but both are still relatively high:
        xtdpdsys PFAGDP LNGDPPcap SDGDum LNGDPPcapxSDGDum DependencyRatio Inflation CMReturns PopGrowth LFParticRate, lags(1) maxldep(1) maxlags(1) pre(LNGDPPcap, lagstruct(1,1)) artests(2)

        xtdpdsys PFAGDP LNGDPPcap SDGDum LNGDPPcapxSDGDum DependencyRatio Inflation CMReturns PopGrowth LFParticRate, lags(1) maxldep(1) maxlags(1) pre(LNGDPPcap, lags
        > truct(1,1)) artests(2)
        note: LNGDPPcap dropped because of collinearity

        System dynamic panel-data estimation Number of obs = 602
        Group variable: Country Number of groups = 35
        Time variable: Year
        Obs per group:
        min = 1
        avg = 17.2
        max = 20

        Number of instruments = 85 Wald chi2(10) = 5043.31
        Prob > chi2 = 0.0000
        One-step results
        ----------------------------------------------------------------------------------
        PFAGDP | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
        PFAGDP |
        L1. | 1.000401 .0301854 33.14 0.000 .9412387 1.059563
        |
        LNGDPPcap |
        --. | 19.03887 27.25736 0.70 0.485 -34.38458 72.46231
        L1. | -13.77005 26.78573 -0.51 0.607 -66.26912 38.72902
        |
        SDGDum | -8.781053 47.83211 -0.18 0.854 -102.5303 84.96817
        LNGDPPcapxSDGDum | 1.667439 10.34791 0.16 0.872 -18.61409 21.94897
        DependencyRatio | -1.056532 1.346547 -0.78 0.433 -3.695715 1.582651
        Inflation | -.1494649 .2983697 -0.50 0.616 -.7342587 .4353289
        CMReturns | 9.951293 2.3603 4.22 0.000 5.32519 14.5774
        PopGrowth | -1.691797 1.676526 -1.01 0.313 -4.977727 1.594133
        LFParticRate | .0269194 .3792645 0.07 0.943 -.7164253 .7702641
        _cons | -18.29174 33.14363 -0.55 0.581 -83.25207 46.66859
        ----------------------------------------------------------------------------------
        Instruments for differenced equation
        GMM-type: L(2/2).PFAGDP L(1/1).L.LNGDPPcap
        Standard: D.LNGDPPcap D.SDGDum D.LNGDPPcapxSDGDum D.DependencyRatio
        D.Inflation D.CMReturns D.PopGrowth D.LFParticRate
        Instruments for level equation
        GMM-type: LD.PFAGDP LD.LNGDPPcap
        Standard: _cons

        Any ideas on how to fix this, and also specify the use of only 2 variables (lag of the dependent variable PFAGDP, and LNGDPPcap) as the only instruments?

        Thank you so much!

        Comment

        Working...
        X