Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between vce(robust) and vce(cluster ind) in cross-sectional model

    Dear Statalisters,

    I have a question regarding the use of vce(robust) or vce(cluster ind) for my cross-sectional model. My dataset is small, with 116 observations.
    The variable I concerned about is the dummy variable and it's only significant in the last case. There are differences in SE when I use vce(robust) or vce(cluster ind).
    I want to ask if it's right if I use vce(cluster ind) in this model and I hope I could understand why there are big changes in my model.
    I attached the code here with 3 model: OLS, vce(robust), vce(cluster ind).
    Thank you so much.

    Best regards,
    Ha Nguyen
    Code:
    
    . reg Y dummy X1 size cash age2 analyst mbratio
    
          Source |       SS           df       MS      Number of obs   =       116
    -------------+----------------------------------   F(7, 108)       =      2.43
           Model |  393.729615         7  56.2470879   Prob > F        =    0.0237
        Residual |  2499.90356       108  23.1472552   R-squared       =    0.1361
    -------------+----------------------------------   Adj R-squared   =    0.0801
           Total |  2893.63317       115  25.1620276   Root MSE        =    4.8112
    
    ------------------------------------------------------------------------------
               Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           dummy |  -1.449351   .9828503    -1.47   0.143     -3.39753    .4988292
              X1 |   .0673247    .028259     2.38   0.019     .0113104     .123339
            size |  -.8122102    .400049    -2.03   0.045    -1.605177   -.0192437
            cash |  -15.73149   8.188105    -1.92   0.057    -31.96174    .4987515
            age2 |   .9146657   .4920407     1.86   0.066    -.0606444    1.889976
         analyst |   .0638981   .6261884     0.10   0.919    -1.177316    1.305112
         mbratio |  -.2066817   .2757556    -0.75   0.455    -.7532772    .3399137
           _cons |   14.04019   8.693411     1.62   0.109    -3.191661    31.27204
    
    
    . reg Y dummy X1 size cash age2 analyst mbratio ,robust
    
    Linear regression                               Number of obs     =        116
                                                    F(7, 108)         =       1.84
                                                    Prob > F          =     0.0870
                                                    R-squared         =     0.1361
                                                    Root MSE          =     4.8112
    
    ------------------------------------------------------------------------------
                 |               Robust
               Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           dummy |  -1.449351   1.007996    -1.44   0.153    -3.447374    .5486733
              X1 |   .0673247   .0248537     2.71   0.008     .0180604     .116589
            size |  -.8122102   .3700793    -2.19   0.030    -1.545772   -.0786489
            cash |  -15.73149   9.472885    -1.66   0.100     -34.5084     3.04541
            age2 |   .9146657    .615406     1.49   0.140    -.3051757    2.134507
         analyst |   .0638981   .3762059     0.17   0.865    -.6818073    .8096035
         mbratio |  -.2066817   .1959103    -1.05   0.294    -.5950099    .1816464
           _cons |   14.04019   7.764501     1.81   0.073    -1.350401    29.43078
    ------------------------------------------------------------------------------
    
    reg Y dummy X1 size cash age2 analyst mbratio , vce(cluster ind)
    
    Linear regression                               Number of obs     =        116
                                                    F(7, 13)          =      27.17
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.1361
                                                    Root MSE          =     4.8112
    
                                       (Std. Err. adjusted for 14 clusters in ind)
    ------------------------------------------------------------------------------
                 |               Robust
               Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           dummy |  -1.449351   .6598815    -2.20   0.047    -2.874938   -.0237631
              X1 |   .0673247    .021427     3.14   0.008     .0210345    .1136149
            size |  -.8122102   .2410903    -3.37   0.005    -1.333054   -.2913664
            cash |  -15.73149   3.602049    -4.37   0.001    -23.51325    -7.94974
            age2 |   .9146657   .3727299     2.45   0.029     .1094317      1.7199
         analyst |   .0638981    .221543     0.29   0.778    -.4147165    .5425127
         mbratio |  -.2066817   .1792627    -1.15   0.270    -.5939553    .1805919
           _cons |   14.04019   4.305863     3.26   0.006     4.737936    23.34244
    ------------------------------------------------------------------------------

  • #2
    Often clustered SE's are larger than robust SE's. This makes me wondering, what is the "ind" variable you use? If ind is a variable for each individual (?), then clustering on this variable does not really make sense. In any case, the decision whether to cluster should be based on how your data is grouped. This might give you good insight for that decision
    http://cameron.econ.ucdavis.edu/rese...5_February.pdf
    https://arxiv.org/abs/1710.02926
    Last edited by Felix Stips; 20 May 2021, 05:34.

    Comment


    • #3
      Dear Felix,
      Thank you so much for your reply.
      I cluster them by industry using a 2-digit NAICS code. I have 13 industries in total.
      Originally posted by Felix Stips View Post
      Often clustered SE's are larger than robust SE's. This makes me wondering, what is the "ind" variable you use? If ind is a variable for each individual (?), then clustering on this variable does not really make sense. In any case, the decision whether to cluster should be based on how your data is grouped. This might give you good insight for that decision
      http://cameron.econ.ucdavis.edu/rese...5_February.pdf
      https://arxiv.org/abs/1710.02926
      Last edited by Nguyen Ha; 20 May 2021, 06:02.

      Comment


      • #4
        You do not have enough industries to cluster at the industry level. For your question, -robust- option in regress implements Huber-White standard errors, and is be equivalent to clustering at the observation level.

        Code:
        reg Y dummy X1 size cash age2 analyst mbratio ,robust
        gen obs_no=_n
        reg Y dummy X1 size cash age2 analyst mbratio , cluster(obs_no)

        Comment


        • #5
          Dear Andrew,
          Thank you so much for your reply. Could I ask you why the model doesn't have enough industries? How many are sufficient?
          In this case, the use of vce cluster industry is wrong?
          Thank you.

          Originally posted by Andrew Musau View Post
          You do not have enough industries to cluster at the industry level. For your question, -robust- option in regress implements Huber-White standard errors, and is be equivalent to clustering at the observation level.

          Comment


          • #6
            When you cluster your standard errors, you assume that observations within a cluster are correlated, but are uncorrelated with observations in other clusters. Effectively, each cluster is an observation. The cluster robust justification relies on \(N \rightarrow \infty\), so one rule-of thumb often used is that you need at least 30 clusters. In your case, 14 industries are just too few.

            Comment

            Working...
            X