Difference between vce(robust) and vce(cluster ind) in cross-sectional model

Nguyen Ha

Join Date: Apr 2021
Posts: 3

Difference between vce(robust) and vce(cluster ind) in cross-sectional model

20 May 2021, 02:54

Dear Statalisters,

I have a question regarding the use of vce(robust) or vce(cluster ind) for my cross-sectional model. My dataset is small, with 116 observations.
The variable I concerned about is the dummy variable and it's only significant in the last case. There are differences in SE when I use vce(robust) or vce(cluster ind).
I want to ask if it's right if I use vce(cluster ind) in this model and I hope I could understand why there are big changes in my model.
I attached the code here with 3 model: OLS, vce(robust), vce(cluster ind).
Thank you so much.

Best regards,
Ha Nguyen

Code:


. reg Y dummy X1 size cash age2 analyst mbratio

      Source |       SS           df       MS      Number of obs   =       116
-------------+----------------------------------   F(7, 108)       =      2.43
       Model |  393.729615         7  56.2470879   Prob > F        =    0.0237
    Residual |  2499.90356       108  23.1472552   R-squared       =    0.1361
-------------+----------------------------------   Adj R-squared   =    0.0801
       Total |  2893.63317       115  25.1620276   Root MSE        =    4.8112

------------------------------------------------------------------------------
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       dummy |  -1.449351   .9828503    -1.47   0.143     -3.39753    .4988292
          X1 |   .0673247    .028259     2.38   0.019     .0113104     .123339
        size |  -.8122102    .400049    -2.03   0.045    -1.605177   -.0192437
        cash |  -15.73149   8.188105    -1.92   0.057    -31.96174    .4987515
        age2 |   .9146657   .4920407     1.86   0.066    -.0606444    1.889976
     analyst |   .0638981   .6261884     0.10   0.919    -1.177316    1.305112
     mbratio |  -.2066817   .2757556    -0.75   0.455    -.7532772    .3399137
       _cons |   14.04019   8.693411     1.62   0.109    -3.191661    31.27204


. reg Y dummy X1 size cash age2 analyst mbratio ,robust

Linear regression                               Number of obs     =        116
                                                F(7, 108)         =       1.84
                                                Prob > F          =     0.0870
                                                R-squared         =     0.1361
                                                Root MSE          =     4.8112

------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       dummy |  -1.449351   1.007996    -1.44   0.153    -3.447374    .5486733
          X1 |   .0673247   .0248537     2.71   0.008     .0180604     .116589
        size |  -.8122102   .3700793    -2.19   0.030    -1.545772   -.0786489
        cash |  -15.73149   9.472885    -1.66   0.100     -34.5084     3.04541
        age2 |   .9146657    .615406     1.49   0.140    -.3051757    2.134507
     analyst |   .0638981   .3762059     0.17   0.865    -.6818073    .8096035
     mbratio |  -.2066817   .1959103    -1.05   0.294    -.5950099    .1816464
       _cons |   14.04019   7.764501     1.81   0.073    -1.350401    29.43078
------------------------------------------------------------------------------

reg Y dummy X1 size cash age2 analyst mbratio , vce(cluster ind)

Linear regression                               Number of obs     =        116
                                                F(7, 13)          =      27.17
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1361
                                                Root MSE          =     4.8112

                                   (Std. Err. adjusted for 14 clusters in ind)
------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       dummy |  -1.449351   .6598815    -2.20   0.047    -2.874938   -.0237631
          X1 |   .0673247    .021427     3.14   0.008     .0210345    .1136149
        size |  -.8122102   .2410903    -3.37   0.005    -1.333054   -.2913664
        cash |  -15.73149   3.602049    -4.37   0.001    -23.51325    -7.94974
        age2 |   .9146657   .3727299     2.45   0.029     .1094317      1.7199
     analyst |   .0638981    .221543     0.29   0.778    -.4147165    .5425127
     mbratio |  -.2066817   .1792627    -1.15   0.270    -.5939553    .1805919
       _cons |   14.04019   4.305863     3.26   0.006     4.737936    23.34244
------------------------------------------------------------------------------

Tags: None

Felix Stips

Join Date: Nov 2014

Posts: 110
#2

20 May 2021, 05:29

Often clustered SE's are larger than robust SE's. This makes me wondering, what is the "ind" variable you use? If ind is a variable for each individual (?), then clustering on this variable does not really make sense. In any case, the decision whether to cluster should be based on how your data is grouped. This might give you good insight for that decision
http://cameron.econ.ucdavis.edu/rese...5_February.pdf
https://arxiv.org/abs/1710.02926

Last edited by Felix Stips; 20 May 2021, 05:34.
1 like
Comment
Nguyen Ha

Join Date: Apr 2021

Posts: 3
#3

20 May 2021, 05:48

Dear Felix,
Thank you so much for your reply.
I cluster them by industry using a 2-digit NAICS code. I have 13 industries in total.

Originally posted by Felix Stips View Post

Often clustered SE's are larger than robust SE's. This makes me wondering, what is the "ind" variable you use? If ind is a variable for each individual (?), then clustering on this variable does not really make sense. In any case, the decision whether to cluster should be based on how your data is grouped. This might give you good insight for that decision
http://cameron.econ.ucdavis.edu/rese...5_February.pdf
https://arxiv.org/abs/1710.02926

Last edited by Nguyen Ha; 20 May 2021, 06:02.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10058
#4

20 May 2021, 06:12

You do not have enough industries to cluster at the industry level. For your question, -robust- option in regress implements Huber-White standard errors, and is be equivalent to clustering at the observation level.

Code:

reg Y dummy X1 size cash age2 analyst mbratio ,robust gen obs_no=_n reg Y dummy X1 size cash age2 analyst mbratio , cluster(obs_no)
Comment
Nguyen Ha

Join Date: Apr 2021

Posts: 3
#5

20 May 2021, 08:09

Dear Andrew,
Thank you so much for your reply. Could I ask you why the model doesn't have enough industries? How many are sufficient?
In this case, the use of vce cluster industry is wrong?
Thank you.

Originally posted by Andrew Musau View Post

You do not have enough industries to cluster at the industry level. For your question, -robust- option in regress implements Huber-White standard errors, and is be equivalent to clustering at the observation level.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10058
#6

20 May 2021, 08:29

When you cluster your standard errors, you assume that observations within a cluster are correlated, but are uncorrelated with observations in other clusters. Effectively, each cluster is an observation. The cluster robust justification relies on \(N \rightarrow \infty\), so one rule-of thumb often used is that you need at least 30 clusters. In your case, 14 industries are just too few.
1 like
Comment

Announcement

Difference between vce(robust) and vce(cluster ind) in cross-sectional model

Comment

Comment

Comment

Comment

Comment