Difference between robust and non-robust?

Dan Su

Join Date: Mar 2017

Posts: 29
#1

Difference between robust and non-robust?

22 Aug 2017, 10:57

Hi statisticians!

Can anyone explain to me when we should use the robust option when running what kind of models? In which case, the robust and nonrobust standard errors will not change much? Thanks so much!

I know in SAS we have the empirical option, dose anyone know which option or package we have in R to get the robust results? Thanks a ton!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

22 Aug 2017, 11:58

The robust variance estimator is robust to heteroscedasticity. It should be used when heteroscedasticity is, or is likely to be, present.

In some commands, (-xtreg, fe- and -xtpoisson, fe- come to mind, there may be others I'm not thinking of off the top of my head), specifying -vce(robust)- leads to the cluster robust variance estimator. This one, in addition to being robust to heteroscedasticity is also robust to correlation of errors within the specified clusters (the panel variable when invoked automatically by the command itself) and serial correlation. It should be used when these are present or suspected, and when the number of clusters is large enough for it to be valid. As a practical matter, most real world panel data has these problems, and it is easier to pre-emptively deal with them by specifing -robust- than it is to try to test for their presence. So the -vce(cluster robust)- is generally a good idea in any panel data analysis with a sufficient number of clusters. There is no universal agreement about the minimum number of cluster needed. I have seen rules of thumb suggesting a minimum of 10, or a minimum of 25, or of 50 in order for the cluster robust variance estimator to actually be an improvement over the ordinary variance estimator.

I do not use R and cannot answer the second question. But there are others on the Forum who use both Stata and R and might respond to that.

Last edited by Clyde Schechter; 22 Aug 2017, 12:03.
1 like
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 215
#3

23 Aug 2017, 07:52

Hello Dan,

This blog post might also be helpful

http://blog.stata.com/2016/08/30/two...andard-errors/
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

23 Aug 2017, 09:05

Thanks, Enrique. That was very informative. I wasn't aware of that.
Comment
Dan Su

Join Date: Mar 2017

Posts: 29
#5

06 Sep 2017, 08:36

Thank you everyone!!! It's really helpful!!!
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#6

06 Sep 2017, 08:55

I personally always use cluster robust options when analyzing panel data. This is standard in my field (econ and public policy). While I understand that it may be technically permissible in some situations to not use these commands, reviewers would always question it, and I've never seen a paper that tried to make an excuse for not using it.
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#7

06 Jun 2021, 08:58

Originally posted by Clyde Schechter View Post

The robust variance estimator is robust to heteroscedasticity. It should be used when heteroscedasticity is, or is likely to be, present.

In some commands, (-xtreg, fe- and -xtpoisson, fe- come to mind, there may be others I'm not thinking of off the top of my head), specifying -vce(robust)- leads to the cluster robust variance estimator. This one, in addition to being robust to heteroscedasticity is also robust to correlation of errors within the specified clusters (the panel variable when invoked automatically by the command itself) and serial correlation. It should be used when these are present or suspected, and when the number of clusters is large enough for it to be valid. As a practical matter, most real world panel data has these problems, and it is easier to pre-emptively deal with them by specifing -robust- than it is to try to test for their presence. So the -vce(cluster robust)- is generally a good idea in any panel data analysis with a sufficient number of clusters. There is no universal agreement about the minimum number of cluster needed. I have seen rules of thumb suggesting a minimum of 10, or a minimum of 25, or of 50 in order for the cluster robust variance estimator to actually be an improvement over the ordinary variance estimator.

I do not use R and cannot answer the second question. But there are others on the Forum who use both Stata and R and might respond to that.

Hi Schechter,

Is there any problem if we use -xtreg, fe vce(cluster id)- when there is no heteroscedasticity or autocorrelation in our data?

--------------------
(Stata 15.1 MP)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#8

06 Jun 2021, 15:55

As long as you have enough clusters, it should not be a problem.
1 like
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#9

22 Jul 2021, 19:21

Originally posted by Clyde Schechter View Post

As long as you have enough clusters, it should not be a problem.

I use panel data that has more than 200 firms in 10 years, so I will have 200 clusters. It is OK, isn't it?

In addition, could you please give me some theoretical background so that I can answer when a reviewer ask "why should it not be a problem if I have enough clusters".

Thanks in advance!

--------------------
(Stata 15.1 MP)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#10

23 Jul 2021, 07:51

The robust standard errors are consistent whether you have heteroskedasticity or not. The cluster-robust standard errors are consistent whether you have cluster correlation as you have specified, or only heteroskedasticity, or no cluster correlation and no heteroskedasticity at all.

For an accessible theoretical background you can look up this paper: Cameron, A. Colin, and Douglas L. Miller. "A practitioner’s guide to cluster-robust inference." Journal of human resources 50, no. 2 (2015): 317-372.

Also note that in Stata -xtreg, fe vce(cluster id)- is equivalent to -xtreg, fe robust-, in other words, Stata would not allow you to compute heteroskedasticity only consistent standard errors and variances in the xtreg suit, but automatically reverts to cluster robust even if you have said only robust.

Originally posted by Linh Nguyen View Post

I use panel data that has more than 200 firms in 10 years, so I will have 200 clusters. It is OK, isn't it?

In addition, could you please give me some theoretical background so that I can answer when a reviewer ask "why should it not be a problem if I have enough clusters".

Thanks in advance!
2 likes
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#11

03 Aug 2021, 19:08

Originally posted by Joro Kolev View Post

The robust standard errors are consistent whether you have heteroskedasticity or not. The cluster-robust standard errors are consistent whether you have cluster correlation as you have specified, or only heteroskedasticity, or no cluster correlation and no heteroskedasticity at all.

For an accessible theoretical background you can look up this paper: Cameron, A. Colin, and Douglas L. Miller. "A practitioner’s guide to cluster-robust inference." Journal of human resources 50, no. 2 (2015): 317-372.

Also note that in Stata -xtreg, fe vce(cluster id)- is equivalent to -xtreg, fe robust-, in other words, Stata would not allow you to compute heteroskedasticity only consistent standard errors and variances in the xtreg suit, but automatically reverts to cluster robust even if you have said only robust.

Thanks so much Joro

--------------------
(Stata 15.1 MP)
Comment

Announcement

Difference between robust and non-robust?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment