
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with reghdfe dropping periods

    I am running fixed effects with double clustered standard errors with reghdfe in StataNow 18.5. My unbalanced panel data has T=14, N=409.
    When I check how many obs in each year is used for the regression, 2020-2022 are not included and the reason isn't explained in the regression results. I have almost no data for 2020, but 2021 and 2022 should be just like other periods and I have checked for the observations as coded below.
    . bysort year: count
    . reghdfe ln_homeless_nonvet_per10000_1 nonvet_black_rate nonvet_income median_rent_coc L1.own_vacancy_rate_coc L1.rent_vacancy_rate_coc nonvet_pov_rate L1.nonvet_ue_rate ssi_coc own_burden_rate_coc rent_burden_rate_coc L2.own_hpc L2.rent_hpc, absorb(coc_num year) vce(cluster coc_num year)
    . gen included = e(sample)
    . tab year if included
    . bysort year: count
    -> year = 2010
    -> year = 2011
    -> year = 2012
    -> year = 2013
    -> year = 2014
    -> year = 2015
    -> year = 2016
    -> year = 2017
    -> year = 2018
    -> year = 2019
    -> year = 2022
    -> year = 2023
    . reghdfe ln_homeless_nonvet_per10000_1 nonvet_black_rate nonvet_income median_rent_coc L1.own_vacancy_rate_coc
    > nt_vacancy_rate_coc nonvet_pov_rate L1.nonvet_ue_rate ssi_coc own_burden_rate_coc rent_burden_rate_coc L2.own_hpc L
    > 2.rent_hpc, absorb(coc_num) vce(cluster coc_num year)
    (dropped 2 singleton observations)
    (MWFE estimator converged in 1 iterations)
    HDFE Linear regression                            Number of obs   =      3,229
    Absorbing 1 HDFE group                            F(  12,      8) =       7.64
    Statistics robust to heteroskedasticity           Prob > F        =     0.0038
                                                      R-squared       =     0.9463
                                                      Adj R-squared   =     0.9393
    Number of clusters (coc_num) =        361         Within R-sq.    =     0.1273
    Number of clusters (year)    =          9         Root MSE        =     0.2471
                                        (Std. err. adjusted for 9 clusters in coc_num year)
                          |               Robust
    ln_homeless_nonvet_~1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        nonvet_black_rate |   .5034405   .2295248     2.19   0.060    -.0258447    1.032726
            nonvet_income |   .0005253   .0002601     2.02   0.078    -.0000745    .0011252
          median_rent_coc |   1.99e-06   9.68e-07     2.05   0.074    -2.47e-07    4.22e-06
     own_vacancy_rate_coc |
                      L1. |   1.239503    2.30195     0.54   0.605    -4.068803     6.54781
    rent_vacancy_rate_coc |
                      L1. |   .3716792   .3719027     1.00   0.347      -.48593    1.229288
          nonvet_pov_rate |   .6896438   .5059999     1.36   0.210     -.477194    1.856482
           nonvet_ue_rate |
                      L1. |   3.195935   .8627162     3.70   0.006     1.206507    5.185362
                  ssi_coc |  -1.47e-06   3.58e-06    -0.41   0.692    -9.73e-06    6.79e-06
      own_burden_rate_coc |  -.1589565   .3308741    -0.48   0.644    -.9219535    .6040405
     rent_burden_rate_coc |   .3420483   .1330725     2.57   0.033     .0351825    .6489141
                  own_hpc |
                      L2. |   .3028142   .1597655     1.90   0.095    -.0656058    .6712341
                 rent_hpc |
                      L2. |  -.5586364   .2167202    -2.58   0.033    -1.058394   -.0588787
                    _cons |   2.932302   .1263993    23.20   0.000     2.640824    3.223779
    Absorbed degrees of freedom:
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
         coc_num |       361         361           0    *|
    * = FE nested within cluster; treated as redundant for DoF computation
    . gen included = e(sample)
    . tab year if included
           year |      Freq.     Percent        Cum.
           2012 |        356       11.03       11.03
           2013 |        358       11.09       22.11
           2014 |        359       11.12       33.23
           2015 |        361       11.18       44.41
           2016 |        360       11.15       55.56
           2017 |        361       11.18       66.74
           2018 |        361       11.18       77.92
           2019 |        358       11.09       89.01
           2023 |        355       10.99      100.00
          Total |      3,229      100.00
    Thanks in advance!
    Last edited by Ella Li; Yesterday, 22:08.

  • #2
    reghdfe is from (FAQ Advice #12).

    When I check how many obs in each year is used for the regression, 2020-2022 are not included and the reason isn't explained in the regression results.
    You have lagged regressors in your regression. With a balanced panel, you lose two cross-sections with a variable lagged twice. With holes in the panel, this increases. Consider the following where the data has holes (2020 and 2021 are missing):

    input float year value
    2010 1
    2011 2
    2012 3
    2013 4
    2014 5
    2015 6
    2016 7
    2017 8
    2018 9
    2019 10
    2022 11
    2023 12
    tsset year
    gen L2value= L2.value

    . l, sep(0)
         | year   value   L2value |
      1. | 2010       1         . |
      2. | 2011       2         . |
      3. | 2012       3         1 |
      4. | 2013       4         2 |
      5. | 2014       5         3 |
      6. | 2015       6         4 |
      7. | 2016       7         5 |
      8. | 2017       8         6 |
      9. | 2018       9         7 |
     10. | 2019      10         8 |
     11. | 2022      11         . |
     12. | 2023      12         . |

    Number of clusters (coc_num) = 361
    Number of clusters (year) = 9
    (Std. err. adjusted for 9 clusters in coc_num year)
    Nine clusters are too few, so do not cluster using year. Additionally, there is usually no good reason to double-cluster using both the panel identifier and year in the first place. These days, it seems that people do so simply because the software allows it. Instead, cluster using the panel identifier only.

