Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • vce(cluster) does not work in fixed effects when there are only two observations per group

    I am working with a balanced individual-year panel dataset. All the individuals are observed for two and only two years. I am trying to do

    Code:
     xtreg y x, fe vce(cluster id)
    But I get only the coefficients, the standard errors are not produced.

    Things work fine when I use random effects or OLS to estimate the model, so I guess that to cluster the standard error at the group level at least three observations are needed for each group when using within estimator. But why? Or maybe my guess is wrong. Thanks ahead for your explanation.

  • #2
    so I guess that to cluster the standard error at the group level at least three observations are needed for each group when using within estimator.
    No. There must be something peculiar with your data. Here is a counterexample.

    Code:
    webuse grunfeld, clear
    keep if time<3
    xtreg invest mvalue, fe cluster(company)
    Res.:

    Code:
    . xtreg invest mvalue, fe cluster(company)
    
    Fixed-effects (within) regression               Number of obs     =         20
    Group variable: company                         Number of groups  =         10
    
    R-sq:                                           Obs per group:
         within  = 0.4230                                         min =          2
         between = 0.7746                                         avg =        2.0
         overall = 0.7503                                         max =          2
    
                                                    F(1,9)            =       8.54
    corr(u_i, Xb)  = 0.5577                         Prob > F          =     0.0170
    
                                   (Std. Err. adjusted for 10 clusters in company)
    ------------------------------------------------------------------------------
                 |               Robust
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .0567196   .0194114     2.92   0.017     .0128079    .1006313
           _cons |   36.49626   17.34454     2.10   0.065    -2.739813    75.73234
    -------------+----------------------------------------------------------------
         sigma_u |  72.169136
         sigma_e |  30.053113
             rho |  .85221652   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Last edited by Andrew Musau; 18 Mar 2020, 03:50.

    Comment


    • #3
      This is a subsample from the sample I am working with. I don't see anything weird with it but clustered standard error is still not produced.

      Code:
      clear
      input double pid float yearcode double lifesatisfaction float(ln_y lag_ln_y lag2_ln_y lead_ln_y)
      110006101 3 3 10.661173  10.84843  9.910132  10.90412
      110006101 4 4  10.90412 10.661173  10.84843  11.13896
      110009102 3 4 10.355473    10.242  9.972481  10.23279
      110009102 4 4  10.23279 10.355473    10.242  11.25785
      110009106 3 5 10.355473    10.242  9.972481  10.23279
      110009106 4 1  10.23279 10.355473    10.242  11.25785
      110009107 3 3 10.355473    10.242  9.972481  10.23279
      110009107 4 3  10.23279 10.355473    10.242  11.25785
      110026101 3 5 11.700234 11.027735  10.91509 11.289782
      110026101 4 5 11.289782 11.700234 11.027735   12.1021
      110026102 3 4 11.700234 11.027735  10.91509 11.289782
      110026102 4 4 11.289782 11.700234 11.027735   12.1021
      110026104 3 4 11.700234 11.027735  10.91509 11.289782
      110026104 4 3 11.289782 11.700234 11.027735   12.1021
      110030101 3 5 11.184422 11.184422 10.596635 11.472103
      110030101 4 5 11.472103 11.184422 11.184422 11.532728
      110031101 3 4  8.188689  9.680344  9.680344 10.806355
      110031101 4 4 10.806355  8.188689  9.680344 10.933107
      110035101 3 4 11.066638 10.576432 10.453534  8.804875
      110035101 4 3  8.804875 11.066638 10.576432 11.289782
      end
      what I run is

      Code:
      xtset pid yearcode
      
      xtreg lifesatisfaction ln_y lag_ln_y lag2_ln_y lead_ln_y , fe vce(cluster pid)
      One piece of information that is important: In this original sample, each individuals are followed for 5 years. My covariates include current income, and its one-period lead, and one- and two-period lags. Therefore, only observations of year 3 and year 4 have all these variables nonmissing. And the clustering fails when I add lag2_ln_y into the regression, which is the two-period lagged ln_y, no matter whether I run the regression directly with the original sample or first keep observations only in year 3 and year 4.
      Last edited by Xinchen Dai; 18 Mar 2020, 04:33.

      Comment


      • #4
        Stata is unable to calculate the cluster robust standard errors for some reason. With your sample data in #2, I will note that there are no such problems, but I cannot guess what happens with your full sample.

        EDIT: I will also recommend using factor variables and not creating the lags and leads yourself.

        Code:
        xtreg lifesatisfaction ln_y l(1/2).(ln_y) f.ln_y , fe vce(cluster pid)
        Last edited by Andrew Musau; 18 Mar 2020, 05:00.

        Comment


        • #5
          Xinchen:
          I cannot replicate your problem:
          Code:
          . xtset pid yearcode
                 panel variable:  pid (strongly balanced)
                  time variable:  yearcode, 3 to 4
                          delta:  1 unit
          
          . xtreg lifesatisfaction ln_y lag_ln_y lag2_ln_y lead_ln_y , fe vce(cluster pid)
          
          Fixed-effects (within) regression               Number of obs     =         20
          Group variable: pid                             Number of groups  =         10
          
          R-sq:                                           Obs per group:
               within  = 0.2728                                         min =          2
               between = 0.2256                                         avg =        2.0
               overall = 0.0071                                         max =          2
          
                                                          F(4,9)            =       1.84
          corr(u_i, Xb)  = -0.7092                        Prob > F          =     0.2058
          
                                             (Std. Err. adjusted for 10 clusters in pid)
          ------------------------------------------------------------------------------
                       |               Robust
          lifesatisf~n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  ln_y |  -.3212684   .4603154    -0.70   0.503    -1.362574    .7200373
              lag_ln_y |  -.3609143     .65565    -0.55   0.595    -1.844098    1.122269
             lag2_ln_y |   .6864056   .5923998     1.16   0.276    -.6536959    2.026507
             lead_ln_y |  -.8215082   .4954296    -1.66   0.132    -1.942248    .2992314
                 _cons |   13.04777   16.60944     0.79   0.452     -24.5254    50.62094
          -------------+----------------------------------------------------------------
               sigma_u |  1.1342179
               sigma_e |  1.0730268
                   rho |  .52770159   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          .
          PS: I agree in full with Andrew's helpful reply.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Hello Carlo,

            I was using Stata 14.0 SE, then I noticed that you are using Stata 16.0 SE. I don't have Stata 16, so I switch to Stata 15.1. Everything works fine now. This is weird, and makes me a bit embarrassed. Thanks both you and Andrew for taking the time to help me.

            Comment


            • #7
              Xinchen:
              I cannot replicate your problem with Stata 14.2, too:
              Code:
              . input double pid float yearcode double lifesatisfaction float(ln_y lag_ln_y lag2_ln_y lead_ln_y)
              
                          pid   yearcode  lifesati~n       ln_y   lag_ln_y  lag2_ln_y  lead_ln_y
                1.
              . 110006101 3 3 10.661173  10.84843  9.910132  10.90412
                2.
              . 110006101 4 4  10.90412 10.661173  10.84843  11.13896
                3.
              . 110009102 3 4 10.355473    10.242  9.972481  10.23279
                4.
              . 110009102 4 4  10.23279 10.355473    10.242  11.25785
                5.
              . 110009106 3 5 10.355473    10.242  9.972481  10.23279
                6.
              . 110009106 4 1  10.23279 10.355473    10.242  11.25785
                7.
              . 110009107 3 3 10.355473    10.242  9.972481  10.23279
                8.
              . 110009107 4 3  10.23279 10.355473    10.242  11.25785
                9.
              . 110026101 3 5 11.700234 11.027735  10.91509 11.289782
               10.
              . 110026101 4 5 11.289782 11.700234 11.027735   12.1021
               11.
              . 110026102 3 4 11.700234 11.027735  10.91509 11.289782
               12.
              . 110026102 4 4 11.289782 11.700234 11.027735   12.1021
               13.
              . 110026104 3 4 11.700234 11.027735  10.91509 11.289782
               14.
              . 110026104 4 3 11.289782 11.700234 11.027735   12.1021
               15.
              . 110030101 3 5 11.184422 11.184422 10.596635 11.472103
               16.
              . 110030101 4 5 11.472103 11.184422 11.184422 11.532728
               17.
              . 110031101 3 4  8.188689  9.680344  9.680344 10.806355
               18.
              . 110031101 4 4 10.806355  8.188689  9.680344 10.933107
               19.
              . 110035101 3 4 11.066638 10.576432 10.453534  8.804875
               20.
              . 110035101 4 3  8.804875 11.066638 10.576432 11.289782
               21.
              . end
              
              .
              . xtset pid yearcode
                     panel variable:  pid (strongly balanced)
                      time variable:  yearcode, 3 to 4
                              delta:  1 unit
              
              .
              . xtreg lifesatisfaction ln_y lag_ln_y lag2_ln_y lead_ln_y , fe vce(cluster pid)
              
              Fixed-effects (within) regression               Number of obs     =         20
              Group variable: pid                             Number of groups  =         10
              
              R-sq:                                           Obs per group:
                   within  = 0.2728                                         min =          2
                   between = 0.2256                                         avg =        2.0
                   overall = 0.0071                                         max =          2
              
                                                              F(4,9)            =       1.84
              corr(u_i, Xb)  = -0.7092                        Prob > F          =     0.2058
              
                                                 (Std. Err. adjusted for 10 clusters in pid)
              ------------------------------------------------------------------------------
                           |               Robust
              lifesatisf~n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                      ln_y |  -.3212684   .4603154    -0.70   0.503    -1.362574    .7200373
                  lag_ln_y |  -.3609143     .65565    -0.55   0.595    -1.844098    1.122269
                 lag2_ln_y |   .6864056   .5923998     1.16   0.276    -.6536959    2.026507
                 lead_ln_y |  -.8215082   .4954296    -1.66   0.132    -1.942248    .2992314
                     _cons |   13.04777   16.60944     0.79   0.452     -24.5254    50.62094
              -------------+----------------------------------------------------------------
                   sigma_u |  1.1342179
                   sigma_e |  1.0730268
                       rho |  .52770159   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              
              .
              As an aside, please note that the FAQ kindly request posters to specify beforehand which Stata release they're working with. Thanks.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment

              Working...
              X