Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inconsistent results with different way of specifying lags in -xtdpd-

    Dear Statalist users,

    I'm encountering an issue in specifying what lags to use for the GMM-type instruments in -xtdpd-. Specifically, when I have an unbalanced panel,
    Code:
    L(2/6).L.mvalue
    ,
    Code:
    mvalue, l(3 7)
    and
    Code:
    L.mvalue, l(2 6)
    all yield different results. I believe this is due to how Stata is dropping missing values, but since the code for -xtdpd- is not accessible I have no way of knowing what exactly is happening. Does anyone know if these are supposed to be different lag specifications? To me, lag(lag(y, k)) is lag(y, k+1) but maybe it's just a notational issue. I searched on the -xtdpd- documentation but did not find a clear answer to my question.

    The following is the code and output of a minimum working example, to show that results are consistent when the panel is balanced.

    Thank you in advance!

    Code:
    . webuse grunfeld, clear
    
    . gen rnnb=uniform()
    
    . drop if rnnb>0.8
    (42 observations deleted)
    
    . drop rnnb
    
    . xtdpd l(0/2).mvalue l(0/1).(invest) k, dgmmiv(L(2/6).L.mvalue) div(l(0/1).(invest) kstock)
    
    Dynamic panel-data estimation                   Number of obs     =         84
    Group variable: company                         Number of groups  =         10
    Time variable: year
                                                    Obs per group:
                                                                  min =          3
                                                                  avg =        8.4
                                                                  max =         12
    
    Number of instruments =     50                  Wald chi2(5)      =     199.23
                                                    Prob > chi2       =     0.0000
    One-step results
    ------------------------------------------------------------------------------
          mvalue | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |
             L1. |   .5393142    .115033     4.69   0.000     .3138536    .7647747
             L2. |   .2656282    .119435     2.22   0.026     .0315398    .4997165
                 |
          invest |
             --. |   2.321183   .4867436     4.77   0.000     1.367183    3.275183
             L1. |   -3.58443   .5904505    -6.07   0.000    -4.741692   -2.427169
                 |
          kstock |    1.87165   .2296433     8.15   0.000     1.421557    2.321742
           _cons |  -165.8201   93.87971    -1.77   0.077     -349.821    18.18073
    ------------------------------------------------------------------------------
    Instruments for differenced equation
            GMM-type: L(2/.).L3.mvalue L(2/.).L4.mvalue L(2/.).L5.mvalue
                      L(2/.).L6.mvalue L(2/.).L7.mvalue
            Standard: D.invest LD.invest D.kstock
    Instruments for level equation
            Standard: _cons
    
    . xtdpd l(0/2).mvalue l(0/1).(invest) k, dgmmiv(L.mvalue, l(2 6)) div(l(0/1).(invest) kstock)
    
    Dynamic panel-data estimation                   Number of obs     =         84
    Group variable: company                         Number of groups  =         10
    Time variable: year
                                                    Obs per group:
                                                                  min =          3
                                                                  avg =        8.4
                                                                  max =         12
    
    Number of instruments =     57                  Wald chi2(5)      =     187.28
                                                    Prob > chi2       =     0.0000
    One-step results
    ------------------------------------------------------------------------------
          mvalue | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |
             L1. |   .5947108    .112416     5.29   0.000     .3743794    .8150421
             L2. |   .1653495   .1140856     1.45   0.147    -.0582542    .3889532
                 |
          invest |
             --. |   2.097107   .4800003     4.37   0.000     1.156324    3.037891
             L1. |  -3.333221   .5872146    -5.68   0.000    -4.484141   -2.182302
                 |
          kstock |   1.677237   .2219305     7.56   0.000     1.242261    2.112212
           _cons |  -84.92306   90.64134    -0.94   0.349    -262.5768     92.7307
    ------------------------------------------------------------------------------
    Instruments for differenced equation
            GMM-type: L(2/6).L.mvalue
            Standard: D.invest LD.invest D.kstock
    Instruments for level equation
            Standard: _cons
    
    . xtdpd l(0/2).mvalue l(0/1).(invest) k, dgmmiv(mvalue, l(3 7)) div(l(0/1).(invest) kstock)
    
    Dynamic panel-data estimation                   Number of obs     =         84
    Group variable: company                         Number of groups  =         10
    Time variable: year
                                                    Obs per group:
                                                                  min =          3
                                                                  avg =        8.4
                                                                  max =         12
    
    Number of instruments =     57                  Wald chi2(5)      =     187.90
                                                    Prob > chi2       =     0.0000
    One-step results
    ------------------------------------------------------------------------------
          mvalue | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |
             L1. |   .5911101   .1124216     5.26   0.000     .3707678    .8114523
             L2. |   .1616966   .1134604     1.43   0.154    -.0606817    .3840749
                 |
          invest |
             --. |   2.098092   .4793739     4.38   0.000     1.158537    3.037648
             L1. |  -3.321696   .5855994    -5.67   0.000     -4.46945   -2.173943
                 |
          kstock |   1.687715   .2221867     7.60   0.000     1.252237    2.123193
           _cons |  -83.42153   90.32579    -0.92   0.356    -260.4568    93.61377
    ------------------------------------------------------------------------------
    Instruments for differenced equation
            GMM-type: L(3/7).mvalue
            Standard: D.invest LD.invest D.kstock
    Instruments for level equation
            Standard: _cons
    
    
    
    . webuse grunfeld, clear
    
    . xtdpd l(0/2).mvalue l(0/1).(invest) k, dgmmiv(mvalue, l(3 7)) div(l(0/1).(invest) kstock)
    
    Dynamic panel-data estimation                   Number of obs     =        180
    Group variable: company                         Number of groups  =         10
    Time variable: year
                                                    Obs per group:
                                                                  min =         18
                                                                  avg =         18
                                                                  max =         18
    
    Number of instruments =     79                  Wald chi2(5)      =     236.16
                                                    Prob > chi2       =     0.0000
    One-step results
    ------------------------------------------------------------------------------
          mvalue | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |
             L1. |   .2401437   .0700625     3.43   0.001     .1028237    .3774638
             L2. |  -.1160122   .0586921    -1.98   0.048    -.2310467   -.0009778
                 |
          invest |
             --. |   3.993119   .3330965    11.99   0.000     3.340262    4.645976
             L1. |  -2.694518    .526062    -5.12   0.000     -3.72558   -1.663455
                 |
          kstock |  -.2895378   .1637345    -1.77   0.077    -.6104515    .0313759
           _cons |   826.0328   82.84173     9.97   0.000      663.666    988.3996
    ------------------------------------------------------------------------------
    Instruments for differenced equation
            GMM-type: L(3/7).mvalue
            Standard: D.invest LD.invest D.kstock
    Instruments for level equation
            Standard: _cons
    
    . xtdpd l(0/2).mvalue l(0/1).(invest) k, dgmmiv(L.mvalue, l(2 6)) div(l(0/1).(invest) kstock)
    
    Dynamic panel-data estimation                   Number of obs     =        180
    Group variable: company                         Number of groups  =         10
    Time variable: year
                                                    Obs per group:
                                                                  min =         18
                                                                  avg =         18
                                                                  max =         18
    
    Number of instruments =     79                  Wald chi2(5)      =     236.16
                                                    Prob > chi2       =     0.0000
    One-step results
    ------------------------------------------------------------------------------
          mvalue | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |
             L1. |   .2401437   .0700625     3.43   0.001     .1028237    .3774638
             L2. |  -.1160122   .0586921    -1.98   0.048    -.2310467   -.0009778
                 |
          invest |
             --. |   3.993119   .3330965    11.99   0.000     3.340262    4.645976
             L1. |  -2.694518    .526062    -5.12   0.000     -3.72558   -1.663455
                 |
          kstock |  -.2895378   .1637345    -1.77   0.077    -.6104515    .0313759
           _cons |   826.0328   82.84173     9.97   0.000      663.666    988.3996
    ------------------------------------------------------------------------------
    Instruments for differenced equation
            GMM-type: L(2/6).L.mvalue
            Standard: D.invest LD.invest D.kstock
    Instruments for level equation
            Standard: _cons

  • #2
    It seems like this is a known issue with unbalanced panels, see point 4 in [1], and indeed -xtdpdgmm- gives consistent results, but I would still like to understand why this is happening.

    [1] https://www.statalist.org/forums/for...d-xtdpdsys-gmm

    Comment


    • #3
      It is hard to reverse-engineer what is happening under the hood of xtdpd. I do not have anything to add to the information provided in the linked thread.
      https://twitter.com/Kripfganz

      Comment


      • #4
        Coincidentally, the Stata 18 update released today claims to have a fix for this problem:
        22. xtabond, xtdpd and xtdpdsys, when gaps were present in a subset of
        panels, did not account for the gaps when constructing GMM-style lagged
        instruments in those panels for time periods up to and including the
        gap, leading to incorrect results. Note that because prior lagged
        values were substituted for the desired lagged values in these cases,
        results remained asymptotically valid under commonly used assumptions.
        This has been fixed.
        However, a quick check with the example in the linked thread shows that the inconsistency in results is still there.
        https://twitter.com/Kripfganz

        Comment


        • #5
          Hi Kevin,

          Good question. As you note, there is a discrepancy between some different, but mathematically equivalent, lag specifications in -xtdpd- and -xtabond- when panels are unbalanced. -xtdpd- takes the variable(s) you specify in -dgmmiv()- "as is", and then applies the lags you specify in -lagrange()- (read "lag range", not "Lagrange"). If there is a value missing in the variable you give it, as a result of lagging, Stata does not go back to the unlagged variable to retrieve those missing values and then reapply the lag structure in combination with the lags in -lagrange()-.

          Similarly, a variable L.Ly will not be the same as L2.y, if Ly is a generated variable equal to L.y and there are gaps in the data. The value in time period 10 (say) may not be the first lag of anything. And, because there is no time period 11, that value is lost in the first-lagging process and will never appear again. But it will appear if we compute second lags directly and it is the second lag of something.

          The downside to -xtdpd-'s handling of lags is that it creates discrepancies between, for example, dgmmiv(y, lagrange(2 4)) and dgmmiv(L.y, lagrange(1 3)) when panels are unbalanced, as you point out.

          On the other hand, if Ly is a generated variable equal to L.y, this approach preserves the equivalence between dgmmiv(Ly, lag(1 3)) and dgmmiv(L.y, lag(1 3)).

          Reasonable people can disagree on which approach makes more sense. David Roodman has discussed this question here in point 4.

          Sebastian mentioned there was a fix to -xtdpd- released yesterday that did not fix this inconsistency. That fix addressed some discrepancies between -xtdpd- and other packages that were caused by a bug in the construction of the instruments in some cases. The fix was not meant to change -xtdpd-'s approach to handling lags of already-lagged variables, so the discrepancies in this thread will still exist.

          Comment


          • #6
            Thanks for the clarification! But then shouldn't the number of observations reported by -xtdpd- differ between dgmmiv(y, lagrange(2 4)) and dgmmiv(L.y, lagrange(1 3)) when panels are unbalanced? In your example, the latter should have fewer observations reported as it loses the second lag of y in period 10.
            Last edited by Kevin Michael Frick; 23 May 2024, 15:36. Reason: typos

            Comment


            • #7
              Another good question. The estimation sample (and thus the number of observations) is determined by the available observations in the main equation, not by the available observations in the instruments. Unbalanced panels are handled by "filling in zeros in columns where missing data are required" in the instrument matrix, as the documentation puts it, for panels missing data from a given period. So, more missing lagged values in the instruments will not change the estimation sample, but rather mean the instrument matrix contains more zeroes, and sometimes fewer instruments (instruments that are zero everywhere will be dropped).

              By the way, there is an undocumented option diffsmp which will construct the estimation sample on the basis of the differenced equation, if that is of interest.

              Comment

              Working...
              X