Does STATA use robust standard errors for logistic regression?

John Gustavsson

Join Date: Sep 2018

Posts: 14
#1

Does STATA use robust standard errors for logistic regression?

10 Feb 2020, 07:40

Hi,

The title says it all really. Since logistic regression by its nature is heteroskedastic, does stata use robust standard errors automatically or does one need to add that specifically (like with OLS regression when one would add "robust" as an option at the end)?

It never quite occurred to me that STATA might not use robust standard errors since it's quite clearly necessary for logistic regression.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35429

10 Feb 2020, 07:51

Statistics is full of things "quite clearly necessary" to some of its practitioners but not all. The distribution of the response is not identical to the sampling distributions of the parameters, and more can be said.

You can answer your own question in various ways, e.g. by experiment or by looking at documentation. Here I show by experiment that robust standard errors are not the default. The help also explains that robust SEs are optional.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. logit foreign weight

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood = -30.669507  
Iteration 2:   log likelihood = -29.068209  
Iteration 3:   log likelihood = -29.054005  
Iteration 4:   log likelihood = -29.054002  
Iteration 5:   log likelihood = -29.054002  

Logistic regression                             Number of obs     =         74
                                                LR chi2(1)        =      31.96
                                                Prob &gt; chi2       =     0.0000
Log likelihood = -29.054002                     Pseudo R2         =     0.3548

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0025874   .0006094    -4.25   0.000    -.0037817    -.001393
       _cons |   6.282599   1.603967     3.92   0.000     3.138882    9.426316
------------------------------------------------------------------------------

. logit foreign weight, vce(robust)

Iteration 0:   log pseudolikelihood =  -45.03321  
Iteration 1:   log pseudolikelihood = -30.669507  
Iteration 2:   log pseudolikelihood = -29.068209  
Iteration 3:   log pseudolikelihood = -29.054005  
Iteration 4:   log pseudolikelihood = -29.054002  
Iteration 5:   log pseudolikelihood = -29.054002  

Logistic regression                             Number of obs     =         74
                                                Wald chi2(1)      =      19.29
                                                Prob &gt; chi2       =     0.0000
Log pseudolikelihood = -29.054002               Pseudo R2         =     0.3548

------------------------------------------------------------------------------
             |               Robust
     foreign |      Coef.   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0025874   .0005892    -4.39   0.000    -.0037421   -.0014327
       _cons |   6.282599   1.603905     3.92   0.000     3.139004    9.426195
------------------------------------------------------------------------------

https://www.statalist.org/forums/help#spelling also applies.

Comment

John Gustavsson

Join Date: Sep 2018

Posts: 14
#3

10 Feb 2020, 10:00

So basically just running a normal logit regression is useless? What are the standard errors you get then? Are they just the same as OLS, no adjustment for the heteroscedasticity that inevitably occurs? It just seems so strange to me.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#4

10 Feb 2020, 10:23

Stata fits logit models using the standard Maximum Likelihood estimator, which takes account of the binary nature of the observed outcome variable. It is presumably the latter that leads you to your remark about inevitable heteroskedasticity. I think you're on the wrong track and recommend having a look at the manual entry, following it through to the References and also the Methods and Formulae. (See also any standard stats/econometrics textbook.) This will likely also explain how Stata (and other good software) estimate the "right" standard errors for the non-robust case, and how one may also have robust standard errors if you wish.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35429
#5

10 Feb 2020, 10:27

So basically just running a normal logit regression is useless?

Stephen Jenkins has already addressed this, but a short answer is No, not least because there is no unanimity on this even among experts.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#6

10 Feb 2020, 10:32

If by "logistic regression" you mean the linear probability model, then you are right is saying that the residuals are "by nature heteroscedastic". If you are referring to the logit model the residuals are not "by nature" heteroscedastic. The logit model completely specifies the distribution.
Added on edit: If you are referring to the linear probability model, you should correct for heteroscedasticity with the robust option. Otherwise, you should use the heteroscedastic probit model (-hetprog- in Stata)

Last edited by Eric de Souza; 10 Feb 2020, 10:39.
1 like
Comment

Announcement

Does STATA use robust standard errors for logistic regression?

Comment

Comment

Comment

Comment

Comment