Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reghdfe dkraay

    I am having trouble with the syntax for reghdfe. I would like to run a regression using reghdfe that includes time dummies and fixed effects dummies, with Driscoll-Kraay standard errors. I have installed avar and ftools, but I am having trouble understanding the sequence of commands for this advanced option as it is reported in the documentation. Any tips would be appreciated!

  • #2
    In reghdfe, you absorb the fixed effects. You can get away with absorbing one level and entering the other using dummies, but the program was designed in essence to do away with the dummies.

    Code:
    webuse grunfeld
    *Driscoll-Kraay 3 lags = (bandwidth =4)
    xtscc invest mvalue kstock i.time, fe lag(3)
    reghdfe invest mvalue kstock, a(company time) vce(cluster year, dkraay(4))
    Results:

    Code:
    . xtscc invest mvalue kstock i.time, fe lag(3)
    
    Regression with Driscoll-Kraay standard errors   Number of obs     =       200
    Method: Fixed-effects regression                 Number of groups  =        10
    Group variable (i): company                      F( 21,    19)     =      5.43
    maximum lag: 3                                   Prob > F          =    0.0002
                                                     within R-squared  =    0.7985
    
    ------------------------------------------------------------------------------
                 |             Drisc/Kraay
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1177158   .0232015     5.07   0.000     .0691546    .1662771
          kstock |   .3579163   .0590943     6.06   0.000     .2342304    .4816021
                 |
            time |
              1  |          0  (empty)
              2  |  -19.19741   8.488871    -2.26   0.036    -36.96482   -1.429996
              3  |  -40.69001   14.56118    -2.79   0.012     -71.1669   -10.21311
              4  |   -39.2264   4.902671    -8.00   0.000    -49.48781     -28.965
              5  |  -69.47029   9.047162    -7.68   0.000    -88.40622   -50.53436
              6  |  -44.23507   10.14195    -4.36   0.000    -65.46243   -23.00772
              7  |  -18.80446   9.721364    -1.93   0.068    -39.15151    1.542587
              8  |  -21.13979   8.346246    -2.53   0.020    -38.60869   -3.670898
              9  |  -42.97762   9.762677    -4.40   0.000    -63.41113    -22.5441
             10  |  -43.09876   10.06541    -4.28   0.000    -64.16591   -22.03161
             11  |  -55.68303   11.88382    -4.69   0.000    -80.55615   -30.80992
             12  |  -31.16928   13.34887    -2.33   0.031    -59.10879   -3.229775
             13  |  -39.39223   14.15797    -2.78   0.012    -69.02521   -9.759257
             14  |  -43.71651   16.42908    -2.66   0.015    -78.10298   -9.330047
             15  |   -73.4951   18.45404    -3.98   0.001    -112.1198   -34.87035
             16  |  -75.89611   19.65125    -3.86   0.001    -117.0267   -34.76556
             17  |   -62.4809   21.81853    -2.86   0.010    -108.1476    -16.8142
             18  |  -64.63233   25.13413    -2.57   0.019    -117.2387   -12.02599
             19  |  -67.71796   30.68896    -2.21   0.040    -131.9507   -3.485235
             20  |  -93.52622   34.46222    -2.71   0.014    -165.6565   -21.39595
                 |
           _cons |  -32.83631   15.91935    -2.06   0.053     -66.1559     .483282
    ------------------------------------------------------------------------------
    
    
    . reghdfe invest mvalue kstock, a(company time) vce(cluster year, dkraay(4))
    (converged in 3 iterations)
    
    HDFE Linear regression                            Number of obs   =        200
    Absorbing 2 HDFE groups                           F(   2,     19) =      43.51
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
     (dkraay=4)                                       R-squared       =     0.9517
     (panel=company time=year)                        Adj R-squared   =     0.9428
                                                      Within R-sq.    =     0.7201
    Number of clusters (year)    =         20         Root MSE        =    51.8782
    
                                      (Std. Err. adjusted for 20 clusters in year)
    ------------------------------------------------------------------------------
                 |               Robust
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1177158    .022576     5.21   0.000     .0704638    .1649679
          kstock |   .3579163   .0575012     6.22   0.000     .2375649    .4782676
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ---------------------------------------------------------------+
     Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     | 
    -------------+-------------------------------------------------|
         company |           10              10              0     | 
            time |            0              20             20 *   | 
    ---------------------------------------------------------------+
    * = fixed effect nested within cluster; treated as redundant for DoF computation

    Comment


    • #3
      Thanks so much for your response Andrew. I think my plan was to absorb the time dummies, and then enter the fixed effects in so that they are explicit in the command (I'd like to estimate and use them for other purposes afterward). Regardless, stata tells me that after running something like

      Code:
      reghdfe depvar i.person absorb(i.time) vce(cluster person, dkraay(4))
      that the vce option of dkraay isn't supported. Any idea as to why this might be happening? Thanks!

      Comment


      • #4
        I have not updated my version of reghdfe for a while, so that is why it works for me. In any case, you need to cluster on the time variable and not the panel variable.

        Code:
        reghdfe depvar i.person, absorb(time) vce(cluster time, dkraay(4))
        *equivalent to
        reghdfe depvar i.person, absorb(time) vce(robust, dkraay(4))
        My guess is that with the introduction of ivreghdfe (from SSC), Sergio Correia discontinued this option for reghdfe.

        Code:
        ssc install ivreghdfe
        ivreghdfe depvar i.person, absorb(time) dkraay(4)
        Last edited by Andrew Musau; 12 Apr 2019, 08:34.

        Comment


        • #5
          Andrew--great catch. From the doucmentation on ivreghdfe, it says, "As seen in the table below, ivreghdfe is recommended if you want to run IV/LIML/GMM2S regressions with fixed effects, or run OLS regressions with advanced standard errors (HAC, Kiefer, etc.)."

          So I have tried the following:
          (I was able to find documentation on how to trigger the old version of reghdfe)
          Code:
          reghdfe depvar i.person, absorb(time) old vce(robust, dkraay(4))
          Code:
          ivreghdfe depvar i.person, absorb(time) dkraay(4)
          and
          Code:
          xtscc depvar i.time i.person,  lag(4)
          and I get close, but not exact, standard errors between the three regressions (I am not sure what would cause the difference here).

          Comment


          • #6
            and I get close, but not exact, standard errors between the three regressions (I am not sure what would cause the difference here).
            Two points:

            (1) As stated in #2, a bandwidth of K equals K-1 lags, e.g., bandwidth=4 equals lag=3.

            (2) Because vce(robust, dkraay()) implies clustering on the year variable, and you absorb year fixed effects (i.e. cluster = fixed effects), reghdfe adjusts the degrees of freedom. xtscc does not do this, hence the discrepancy. You can get the same estimates by specifying the option dof(none), but the correct degrees of freedom are the default reghdfe degrees of freedom.

            Code:
            *SAME RESULTS but wrong DoF calculation
            xtscc depvar i.time i.person, lag(4)
            reghdfe depvar i.person, absorb(time) old vce(robust, dkraay(5)) dof(none)
            
            *CORRECT DoF calculation
            reghdfe depvar i.person, absorb(time) old vce(robust, dkraay(5))

            Comment


            • #7
              This is extremely helpful. Thank you again so much for your responses and insight.

              Comment


              • #8
                Hi Andrew Musau , I was hoping to follow up about the degrees of freedom correction.

                The only reasons I absorb time effects with the old version of -reghdfe- are:
                1) it is not essential that I export them and use them for analysis later, like it is for the fixed effects I am estimating, and
                2) I have to absorb something when I use the older version of this command (noabsorb is not an option in the older version, but I need to use the older version to use DK errors).

                Otherwise, I would not absorb the time effects. Why does -reghdfe- adjust the dof in this case when I do absorb them, and is that really the "correct" thing to do in the context of my model?

                Again, your help is much appreciated!

                Vicki

                Comment


                • #9
                  Why does -reghdfe- adjust the dof in this case when I do absorb them, and is that really the "correct" thing to do in the context of my model?
                  Excluding first differencing, you can think about 2 equivalent ways of estimating the fixed effects model:

                  1. Within group estimator
                  2. Least squares dummy variables estimator (LSDV)

                  For your xtscc command in #5, by explicitly including panel and time dummies, you are implementing LSDV.

                  xtscc depvar i.time i.person
                  On the other hand, reghdfe implements the within estimator which takes deviations from the individual means to eliminate the fixed effects. In the process, you lose 1 degree of freedom for each fixed effect eliminated, hence the adjustment.

                  Otherwise, I would not absorb the time effects.
                  The is no issue reporting the LSDV (unadjusted) standard errors, but to be technically correct, you should refer to the estimator as LSDV and not the within estimator.

                  Comment


                  • #10
                    When I use -reghdfe- and -xtscc- , the
                    Code:
                    df_r
                    or 'residual degrees of freedom' are the same between the two models (the number of time periods minus 3). However, when I use a command like -newey2- to produce newey-west standard errors in a panel setting, it gives me the number of the degrees of freedom that I would normally expect: (NT - N - K + 1).
                    Why is it that the DK standard errors produce much fewer degrees of freedom? (Since when using DK, I am effectively clustering on the time variable, so this becomes the effective sample size in my regression?).
                    Thanks again.

                    Comment


                    • #11
                      Why is it that the DK standard errors produce much fewer degrees of freedom? (Since when using DK, I am effectively clustering on the time variable, so this becomes the effective sample size in my regression?).
                      As you rightly point out, Driscoll-Kraay standard errors are an example of clustered standard errors where the clustering is on the time dimension. The denominator degrees of freedom equals the number of clusters minus one (in the absence of strata). So, if T=20, your df_r=19.


                      However, when I use a command like -newey2- to produce newey-west standard errors in a panel setting, it gives me the number of the degrees of freedom that I would normally expect: (NT - N - K + 1).
                      Newey and West's variance estimate is an extension of the White (1980) procedure, and thus not clustered standard errors. See Cameron and Miller's paper for this.

                      Reference
                      White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica 48(4): 817-838.

                      Comment


                      • #12
                        Thanks Andrew--I think I am good on this topic with the except of how many degrees of freedom should be in the numerator.
                        I specifically include person dummies so that I can use them for significance testing after estimation. In a world where I just use xtscc and I don't absorb anything, and include time dummies and fixed effects dummies in my regression, how many degrees of freedom should I actually have in the numerator?
                        Your help on this thread has been greatly informative and appreciated!

                        Comment


                        • #13
                          Vicki - the numerator degrees of freedom is the number of regressors not including constant. This will vary depending on what estimator you use. For example, you will always have dof LSDV> dof Within estimator as the former explicitly includes the dummies. At the end of the day, your coefficients and largely standard errors are the same, no matter which estimation technique you choose. So your dof will mainly reflect the estimation method.

                          I just use xtscc and I don't absorb anything, and include time dummies and fixed effects dummies in my regression, how many degrees of freedom should I actually have in the numerator
                          Here dof= No. of regressors + (N-1)+ (T-1). So in the Grunfeld dataset below, N=10, T=20 and I include 2 regressors. So, dof= 2+ (10-1)+ (20-1)= 30, as reported.

                          Code:
                          . clear
                          
                          . webuse grunfeld
                          
                          . xtscc invest mvalue kstock i.company i.time
                          
                          Regression with Driscoll-Kraay standard errors   Number of obs     =       200
                          Method: Pooled OLS                               Number of groups  =        10
                          Group variable (i): company                      F( 30,    19)     =    431.25
                          maximum lag: 2                                   Prob > F          =    0.0000
                                                                           R-squared         =    0.9517
                                                                           Root MSE          =   51.7245
                          
                          ------------------------------------------------------------------------------
                                       |             Drisc/Kraay
                                invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                mvalue |   .1177158   .0227522     5.17   0.000     .0700949    .1653368
                                kstock |   .3579163   .0621352     5.76   0.000     .2278659    .4879667
                                       |
                               company |
                                    1  |          0  (empty)
                                    2  |   207.0542   67.22298     3.08   0.006     66.35489    347.7535
                                    3  |  -135.2308   39.82943    -3.40   0.003    -218.5948   -51.86686
                                    4  |    95.3538   76.39059     1.25   0.227    -64.53355    255.2412
                                    5  |  -5.438636   68.63581    -0.08   0.938     -149.095    138.2178
                                    6  |   102.8886   78.72205     1.31   0.207    -61.87854    267.6557
                                    7  |   51.46657   79.51457     0.65   0.525    -114.9593    217.8925
                                    8  |   67.49048    70.1142     0.96   0.348    -79.26023    214.2412
                                    9  |   30.21752   76.98813     0.39   0.699    -130.9205    191.3555
                                   10  |   126.8371   92.22243     1.38   0.185    -66.18669    319.8608
                                       |
                                  time |
                                    1  |          0  (empty)
                                    2  |  -19.19741   8.342587    -2.30   0.033    -36.65864   -1.736172
                                    3  |  -40.69001   14.35567    -2.83   0.011    -70.73677   -10.64324
                                    4  |   -39.2264   5.163533    -7.60   0.000     -50.0338   -28.41901
                                    5  |  -69.47029   9.213538    -7.54   0.000    -88.75445   -50.18613
                                    6  |  -44.23507   10.28293    -4.30   0.000    -65.75749   -22.71266
                                    7  |  -18.80446   9.986999    -1.88   0.075    -39.70749    2.098567
                                    8  |  -21.13979   8.850416    -2.39   0.027    -39.66393   -2.615659
                                    9  |  -42.97762   10.26774    -4.19   0.001    -64.46824   -21.48699
                                   10  |  -43.09876   10.54431    -4.09   0.001    -65.16827   -21.02926
                                   11  |  -55.68303   12.31747    -4.52   0.000     -81.4638   -29.90226
                                   12  |  -31.16928   13.80389    -2.26   0.036    -60.06116     -2.2774
                                   13  |  -39.39223    15.0295    -2.62   0.017    -70.84933   -7.935135
                                   14  |  -43.71651   17.42692    -2.51   0.021    -80.19147   -7.241555
                                   15  |   -73.4951    19.5738    -3.75   0.001    -114.4635   -32.52666
                                   16  |  -75.89611   20.85696    -3.64   0.002    -119.5502   -32.24199
                                   17  |   -62.4809   23.10804    -2.70   0.014    -110.8466   -14.11521
                                   18  |  -64.63233   26.63911    -2.43   0.025    -120.3886   -8.876025
                                   19  |  -67.71796   32.44659    -2.09   0.051    -135.6294    .1935267
                                   20  |  -93.52622    36.5366    -2.56   0.019    -169.9982   -17.05424
                                       |
                                 _cons |  -86.90019    78.1847    -1.11   0.280    -250.5426    76.74227
                          ------------------------------------------------------------------------------
                          So it is not confusing if you state that "the model is estimated using the xtscc command in Stata including firm and time dummies" as it will be apparent that these dummies are included in the calculation of the degrees of freedom.

                          .

                          Comment


                          • #14
                            I think I understand this now!

                            I am planning to use reghdfe. In my results, xtscc and reghdfe use the same df_r (just the # of time periods -1). xtscc has (N-1)+ (T-1) df_m and reghdfe reports (N-1) df_m (due, I think I have realized now, to the adjustment made from the absorption of the time effects), but this is okay as it does not impact my standard errors (I think). However, it is not possible to to estimate anything other than a pooled OLS or an -fe- (within estimator) regression with xtscc. Thus, I plan to use reghdfe because despite the adjustment to the numerator degrees of freedom due to the absorbed time effects, I do not want to pool in this context, because I think that assumes that there is no heterogeneity in the fixed effects (when that is indeed what I am trying to test). So LSDV estimation using reghdfe, vce(robust dkraay(5)) produces the DK standard errors I need while not pooling.

                            Thanks, again, for all of your support and insight into this question!

                            Comment


                            • #15
                              I do not completely understand

                              However, it is not possible to to estimate anything other than a pooled OLS or an -fe- (within estimator) regression with xtscc. Thus, I plan to use reghdfe because despite the adjustment to the numerator degrees of freedom due to the absorbed time effects, I do not want to pool in this context
                              While reghdfe is the most efficient way to do the estimation, you can estimate either pooled OLS or the fixed effects model using xtscc. The only difference is the explicit inclusion of time dummies in xtscc because it can only implement the one way within estimator. Absorbing the fixed effects has the same impact as including all the dummies less one, the only difference being that the latter is computationally demanding. So by including firm and time dummies or absorbing them, you are acknowledging that there is heterogeneity and you can no longer call your model "pooled" (except if you say that you are pooling time in the one way firm fixed effects model). Here is the full range of estimators:

                              Code:
                              webuse grunfeld, clear
                              
                              *Pooled OLS DK standard errors
                              xtscc invest mvalue kstock, lag(3)
                              
                              *One-way (firm) fixed effects DK standard errors
                              xtscc invest mvalue kstock i.company, lag(3)
                              xtscc invest mvalue kstock, fe lag(3)
                              reghdfe invest mvalue kstock, a(company) vce(cluster year, dkraay(4)) old
                              
                              *One-way (time) fixed effects DK standard errors
                              xtscc invest mvalue kstock i.year, lag(3)
                              reghdfe invest mvalue kstock, a(year) vce(cluster year, dkraay(4)) old
                              
                              *Two-way fixed effects DK standard errors
                              xtscc invest mvalue kstock i.company i.year, lag(3)
                              xtscc invest mvalue kstock i.year, fe lag(3)
                              reghdfe invest mvalue kstock, a(company year) vce(cluster year, dkraay(4)) old
                              reghdfe invest mvalue kstock i.year, a(company) vce(cluster year, dkraay(4)) old
                              reghdfe invest mvalue kstock i.company, a(year) vce(cluster year, dkraay(4)) old
                              Last edited by Andrew Musau; 11 Jul 2019, 10:02.

                              Comment

                              Working...
                              X