Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dear Professor Kripfganz:

    I would like to ask you to teach me how to apply your xtdpdgmm correctly. I thank you in advance for your kind help. I would like to implement a system-GMM model like the following:

    Code:
    xtdpdgmm lpgrow smei3 labutilpcgr mfppwt rknapcgr hfcegrow l.lrgdpopc if id~=13,gmm(lpgrow  labutilpcgr mfppwt rknapcgr hfcegrow lrgdpopc, lag(2 2) collapse model(diff)) gmm(lpgrow labutilpcgr mfppwt  rknapcgr hfcegrow lrgdpopc, lag(1 1) diff collapse model(level)) iv(smei3,diff model(level)) two vce(cl id) small
    And I get the results like this:

    Code:
    . xtdpdgmm lpgrow smei3 labutilpcgr mfppwt rknapcgr hfcegrow l.lrgdpopc if id~=13,gmm(lpgrow  labutilpcgr mfppwt rknap
    > cgr hfcegrow lrgdpopc, lag(2 2) collapse model(diff)) gmm(lpgrow labutilpcgr mfppwt  rknapcgr hfcegrow lrgdpopc, lag
    > (1 1) diff collapse model(level)) iv(smei3,d model(level)) two vce(cl id) small
    
    Generalized method of moments estimation
    
    Fitting full model:
    Step 1         f(b) =    .162683
    Step 2         f(b) =  .16926059
    
    Group variable: id                           Number of obs         =       514
    Time variable: year                          Number of groups      =        20
    
    Moment conditions:     linear =      14      Obs per group:    min =        12
                        nonlinear =       0                        avg =      25.7
                            total =      14                        max =        28
    
                                        (Std. err. adjusted for 20 clusters in id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
          lpgrow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           smei3 |   .0991957    .057027     1.74   0.098    -.0201631    .2185545
     labutilpcgr |  -.3546281   .1410527    -2.51   0.021    -.6498549   -.0594013
          mfppwt |   .6901644   .0872634     7.91   0.000     .5075199    .8728089
        rknapcgr |   .1629855   .0893485     1.82   0.084     -.024023    .3499939
        hfcegrow |   .0271975   .0837204     0.32   0.749    -.1480314    .2024264
                 |
        lrgdpopc |
             L1. |  -.7435579   .2764233    -2.69   0.015    -1.322118   -.1649973
                 |
           _cons |   8.620034   3.041985     2.83   0.011     2.253086    14.98698
    ------------------------------------------------------------------------------
    Instruments corresponding to the linear moment conditions:
     1, model(diff):
       L2.lpgrow L2.labutilpcgr L2.mfppwt L2.rknapcgr L2.hfcegrow L2.lrgdpopc
     2, model(level):
       L1.D.lpgrow L1.D.labutilpcgr L1.D.mfppwt L1.D.rknapcgr L1.D.hfcegrow
       L1.D.lrgdpopc
     3, model(level):
       D.smei3
     4, model(level):
       _cons
    1. Am I specifying the instrumentation correctly? I'm particularly not confident about specifying the lags in level and difference equations. I would like to make sure I'm doing the right thing.

    2. What if I replace the first independent variable "smei3" with "sme" which is a dummy variable? Do I need to change anything inside "iv()"? A difference of a dummy variable is obviously wrong.

    3. Is there anything that you notice I'm doing wrong? I would appreciate any suggestion.

    Thank you for your generous help.

    Best wishes,

    Taka

    If the results are hard to see, here is a picture that shows the results.

    Doc14.docx
    Attached Files
    Last edited by Taka Sakamoto; 23 Jul 2023, 06:06.

    Comment


    • 1. I cannot see any obvious problem with your specification. However, it is of some concern that you only have 20 groups. It is difficult to obtain reliable results with such a small cross-sectional sample size. If you nevertheless want to do a GMM estimation, you might want to stick to the one-step estimator, which does not require estimation of the optimal weighting matrix. The one-step estimator will be asymptotically inefficient, but with N=20 you are very far away from asymptopia anyway.

      2. If your dummy variable varies over time, then you can in principle leave everything as it is. However, there is some risk that lags and/or first differences of dummy variables can be weak instruments. If the dummy variable is time invariant, then you can obviously not first difference it. You might then have to adopt the assumption that this variable is uncorrelated with both the idiosyncratic and the group-specific error component, which might or might not be acceptable, depending on your research question. In this case, you could include the dummy variable without the difference option for the level model.

      3. The main concern is about the sample size; see point 1.
      https://twitter.com/Kripfganz

      Comment


      • Thank you so much for your response and explanation. May I ask two questions?

        1. Re: iv():
        What should determine the use of iv() and whether to use difference or level in it? Also, iv() doesn't always have to be used and specified?

        2. Re: how different are two step and one step and what should determine which to use?

        Thank you your generous and kind help.

        Taka

        Comment


        • 3. How many cross-sections is desirable? Also, when researchers find that their cross-sections are too small, what other estimation do they apply?

          Thank you again.

          Taka

          Comment


          • iv(x) is equivalent to gmm(x, lag(0 0) collapse). You then need to justify (based on assumptions you make) whether differenced or undifferenced variables are valid instruments (for the level model or the differenced model); this follows the arguments set out in the seminal papers by Arellano and Bond (1991) and Blundell and Bond (1998), among others.

            Two-step estimators use the same instruments as one-step estimators but rely on an estimation of the optimal weighting matrix. This yields asymptotically efficient estimates. If the instruments are valid, both one-step and two-step estimators are invalid. The inefficiency of the one-step estimator might be less of a problem in small samples, where it can be difficult to estimate the optimal weighting matrix, especially the larger the number the instruments is. If you have a large sample size, then you should go for the efficiency gains with the two-step estimators. With small samples, it is not clear whether this would actually improve the estimates.

            I cannot provide a general answer on the "desirable" number of cross sections. More is better. With 50 or even 100 cross sections, the finite-sample performance of the estimators could still be unsatisfying, but this depends on a lot of other data characteristics. With small N, you just need to keep your model as simple as possible, and impose some strong assumptions where this might be okay (e.g. assume that your regressors are exogenous). A simple IV estimator using a minimal number of collapsed instruments might do the job. But the truth is: If your sample is small, you just cannot really expect precise and robust estimates, no matter which estimator you use.
            https://twitter.com/Kripfganz

            Comment


            • Thank you so much for your explanation. It's very helpful. I have one more elementary question and I would be grateful if you could teach me:

              In the level equation of the gmm ("gmm(...model(level))), you do absolutely specify diff option under all circumstances, right? Or does it depend on your assumptions? I ask this, because the gmm(...model(level)) option produces instruments in levels unless you specify "diff", and it gives me the impression that it can also be instruments in levels as long as "diff" is optional.

              Thank you so much for your generous help.

              Taka

              Comment


              • Dear Prof. Sebastian Kripfganz
                We estimate our Sys-GMM model with the following code:
                Code:
                xtabond2 L(0/1).GDP Labor Capital Financial_Development Temperature, gmmstyle(L.GDP L.Labor L.Capital L.Financial_Development , lag(1 3)) ivstyle(Temperature) robust twostep
                This sys-GMM gives the estimated coefficient of Financial_Development is positive and insignificant. Also, the estimated coefficient of the lagged dependent variable is 0.9 and is statistically significant.


                However, when we estimate the Diff-GMM with the following code:
                Code:
                xtabond2 L(0/1).GDP Labor Capital Financial_Development Temperature, gmmstyle(L.GDP L.Labor L.Capital L.Financial_Development , lag(1 3)) ivstyle(Temperature) robust twostep noleveleq
                This Diff-GMM gives the estimated coefficient of Financial_Development is positive and significant. Also, the estimated coefficient of the lagged dependent variable is 0.2 and statistically insignificant.


                The Diff-GMM gives the result we expected for our main variable (i.e., Financial_Development), but sys-GMM. However, in the relevant literature to our RQ, we find that most papers use system-GMM. Also, please note that fixed effect regression results are consistent with Diff-GMM.
                Code:
                xtreg GDP Labor Capital Financial_Development Temperature, fe r

                We have two questions:
                - Is it normal to have different results for Sys-GMM and Diff-GMM? Is there a reason behind this?
                - Do we need to consider other specifications for the system GMM to get similar results to Diff-GMM? For example, are there any other specifications from xtdpdgmm that could improve our sys-GMM results?
                Last edited by Sarah Magd; 25 Jul 2023, 06:23.

                Comment


                • Taka Sakamoto
                  Instruments for the level model must be uncorrelated with the unobserved group-specific effects, no matter whether you specify them with gmmiv() or iv(). Without option diff, this requires that the levels of those instruments themselves are uncorrelated with those unobserved effects, which is akin to a "random-effects" assumption. With option diff, the first-differenced instruments need to be uncorrelated with the unobserved effects, which is a weaker requirement but still needs to be justified (see the seminal paper by Blundell and Bond (1998)).

                  Sarah Magd
                  This topic is about the xtdpdgmm command. It would be better to start a different topic if you have a question about a different command, such as xtabond2. Some general comments:
                  • System GMM requires stronger assumptions about the initial observations than difference GMM. In a macroeconomic context, the additional assumption is quite likely to be violated due to the heterogeneous development of the countries. Unfortunately, in the empirical practice there is often not much effort made in justifying the extra assumption for system GMM. Just because the relevant literature used system GMM, this does not mean that it really is justified.
                  • If there is a lot of persistence in the data, which again is quite likely with macroeconomic data, then difference GMM might suffer from a weak-instruments problem and the coefficient of the lagged dependent variable can be severely downward biased. This would be consistent with the difference in estimates between the difference and system GMM estimator, but the first point above could also explain that difference if the additional assumption for system GMM is violated. Also, even if both estimators are consistent, in small samples they can have a large sampling variation, which could lead to the different estimates you observed.
                  • If the true data-generating process is dynamic, then estimating a static fixed-effects model yields biased estimates. So, it could be coincidentally that the bias from the static fixed-effects estimator is similar to the bias of the difference GMM estimator.
                  • In order to reduce the weak-instruments problem of the difference GMM estimator, without imposing the stronger assumption for system GMM, a good solution can be to use the difference GMM estimator with added nonlinear moment conditions; see xtdpdgmm option nl(noserial).
                  https://twitter.com/Kripfganz

                  Comment


                  • Thanks Prof. Sebastian Kripfganz
                    I have tried the following code:
                    Code:
                    xtdpdgmm L(0/1).GDP Labor Capital Financial_Development Temperature, model(diff) collapse gmm(GDP Labor Capital , lag(2 4)) gmm(Financial_Development, lag(1 2)) gmm(Temperature, lag(. .))  two vce(r) overid nl(noserial)
                    So I consider Temperature as a predetermined variable, and Temperature as an exogenous variable. However, when I run this code, I get this error:
                    xtdpdgmm_init_nl(): 3498 touse variable for model 'diff' required
                    <istmt>: - function returned error
                    r(3498);


                    Could you please help me figure out the problem?

                    Comment


                    • That's an error message you should not see. Would it be possible for you to send me your data set by e-mail, so that I can replicate the problem?
                      https://twitter.com/Kripfganz

                      Comment


                      • Thank you. What am I making happen if I do not use iv() option? I have read your 2019 slides, and see that you sometimes do not use iv() option.

                        Thank you for your help.

                        Comment


                        • Sorry, I have one more question. The variable that goes in iv() cannot and should not be correlated with the dependent variable?

                          Comment


                          • The variable that you specify in iv() (or gmmiv()) should not be correlated with the error term. In other words, if it is excluded from your regression specification, it should not have a direct effect on the dependent variable after controlling for any indirect effects through the included regressors. This is the standard requirement for a valid instrument.

                            As mentioned earlier, iv() is just a special case of gmmiv(). If the relevant instruments are already specified with gmmiv(), then there is often no need to use iv(). In some cases, e.g. for dummy variables, the iv() option is easier to use.
                            https://twitter.com/Kripfganz

                            Comment


                            • Thank you. Could you tell me what's happening in the following estimation? I enter the command:

                              Code:
                              xtdpdgmm gdpgrow sme inflation gfcfgrow hfcegrow tradeopen l.lrgdpopc ,gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc , lag(2 2) collapse model(diff)) gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc, lag(1 1) diff collapse model(level)) iv(sme,model(level))  one vce(cl id) small overid
                              I get the following results:

                              Code:
                              . xtdpdgmm gdpgrow sme inflation gfcfgrow hfcegrow tradeopen l.lrgdpopc ,gmm(gdpgrow inflation  gfcfgrow hfcegr
                              > ow lrgdpopc , lag(2 2) collapse model(diff)) gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc, lag(1 1) diff
                              >  collapse model(level)) iv(sme,model(level))  one vce(cl id) small overid
                              
                              Generalized method of moments estimation
                              
                              Fitting full model:
                              Step 1         f(b) =   3.170343
                              
                              Fitting reduced model 1:
                              Step 1         f(b) =  8.867e-16
                              
                              Fitting reduced model 2:
                              Step 1         f(b) =  1.413e-14
                              
                              Fitting reduced model 3:
                              Step 1         f(b) =  3.1380076
                              
                              Group variable: id                           Number of obs         =       919
                              Time variable: year                          Number of groups      =        21
                              
                              Moment conditions:     linear =      12      Obs per group:    min =         6
                                                  nonlinear =       0                        avg =   43.7619
                                                      total =      12                        max =        46
                              
                                                                  (Std. err. adjusted for 21 clusters in id)
                              ------------------------------------------------------------------------------
                                           |               Robust
                                   gdpgrow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                       sme |   1.371398    .415195     3.30   0.004     .5053168     2.23748
                                 inflation |  -.2012583   .0656044    -3.07   0.006    -.3381068   -.0644099
                                  gfcfgrow |   .1756431   .0365951     4.80   0.000     .0993071    .2519791
                                  hfcegrow |   .6537494   .2542447     2.57   0.018     .1234043    1.184095
                                 tradeopen |  -.0440692   .0166601    -2.65   0.016    -.0788215   -.0093169
                                           |
                                  lrgdpopc |
                                       L1. |  -.3924132   1.467164    -0.27   0.792    -3.452864    2.668037
                                           |
                                     _cons |   7.086845   15.30321     0.46   0.648     -24.8351    39.00879
                              ------------------------------------------------------------------------------
                              Instruments corresponding to the linear moment conditions:
                               1, model(diff):
                                 L2.gdpgrow L2.inflation L2.gfcfgrow L2.hfcegrow L2.lrgdpopc
                               2, model(level):
                                 L1.D.gdpgrow L1.D.inflation L1.D.gfcfgrow L1.D.hfcegrow L1.D.lrgdpopc
                               3, model(level):
                                 sme
                               4, model(level):
                                 _cons
                              When I remove "iv(sme,model(level))" from the command I get the results:

                              Code:
                              . xtdpdgmm gdpgrow sme inflation gfcfgrow hfcegrow tradeopen l.lrgdpopc ,gmm(gdpgrow inflation  gfcfgrow hfcegr
                              > ow lrgdpopc , lag(2 2) collapse model(diff)) gmm(gdpgrow inflation  gfcfgrow hfcegrow lrgdpopc, lag(1 1) diff
                              >  collapse model(level))  one vce(cl id) small overid
                              
                              Generalized method of moments estimation
                              
                              Fitting full model:
                              Step 1         f(b) =  3.1380076
                              
                              Group variable: id                           Number of obs         =       919
                              Time variable: year                          Number of groups      =        21
                              
                              Moment conditions:     linear =      11      Obs per group:    min =         6
                                                  nonlinear =       0                        avg =   43.7619
                                                      total =      11                        max =        46
                              
                                                                  (Std. err. adjusted for 21 clusters in id)
                              ------------------------------------------------------------------------------
                                           |               Robust
                                   gdpgrow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                       sme |    -.64922   6.394653    -0.10   0.920    -13.98823    12.68979
                                 inflation |  -.1814501   .1103874    -1.64   0.116    -.4117143    .0488141
                                  gfcfgrow |    .181537   .0435933     4.16   0.000     .0906029    .2724712
                                  hfcegrow |   .6699812   .2849286     2.35   0.029     .0756305    1.264332
                                 tradeopen |  -.0526704   .0459819    -1.15   0.266    -.1485869    .0432461
                                           |
                                  lrgdpopc |
                                       L1. |   .1360665     2.8836     0.05   0.963    -5.879018    6.151151
                                           |
                                     _cons |    3.10732   25.29937     0.12   0.903    -49.66625    55.88089
                              ------------------------------------------------------------------------------
                              Instruments corresponding to the linear moment conditions:
                               1, model(diff):
                                 L2.gdpgrow L2.inflation L2.gfcfgrow L2.hfcegrow L2.lrgdpopc
                               2, model(level):
                                 L1.D.gdpgrow L1.D.inflation L1.D.gfcfgrow L1.D.hfcegrow L1.D.lrgdpopc
                               3, model(level):
                                 _cons
                              "sme" is an invariant dummy variable, but the same results happen when I use a continuous version of "sme".

                              I am sorry I have taken time from you. And I appreciate your generous, kind help.

                              Many thanks.

                              TS

                              Comment


                              • By removing your instrument for sme, the coefficient of that regressor might be poorly identified. Not surprisingly, the standard errors of that coefficient estimate become huge. It (unsuccessfully) tries to borrow some identification strength from the other instruments, which then also slightly inflates the other standard errors.

                                The coefficient of your lagged dependent variable also appears to be poorly identified. It would probably require additional lags as instruments, which in turn would however increase the number of instruments, which can cause further trouble.
                                https://twitter.com/Kripfganz

                                Comment

                                Working...
                                X