Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does STATA use robust standard errors for logistic regression?

    Hi,

    The title says it all really. Since logistic regression by its nature is heteroskedastic, does stata use robust standard errors automatically or does one need to add that specifically (like with OLS regression when one would add "robust" as an option at the end)?

    It never quite occurred to me that STATA might not use robust standard errors since it's quite clearly necessary for logistic regression.

  • #2
    Statistics is full of things "quite clearly necessary" to some of its practitioners but not all. The distribution of the response is not identical to the sampling distributions of the parameters, and more can be said.

    You can answer your own question in various ways, e.g. by experiment or by looking at documentation. Here I show by experiment that robust standard errors are not the default. The help also explains that robust SEs are optional.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . logit foreign weight
    
    Iteration 0:   log likelihood =  -45.03321  
    Iteration 1:   log likelihood = -30.669507  
    Iteration 2:   log likelihood = -29.068209  
    Iteration 3:   log likelihood = -29.054005  
    Iteration 4:   log likelihood = -29.054002  
    Iteration 5:   log likelihood = -29.054002  
    
    Logistic regression                             Number of obs     =         74
                                                    LR chi2(1)        =      31.96
                                                    Prob > chi2       =     0.0000
    Log likelihood = -29.054002                     Pseudo R2         =     0.3548
    
    ------------------------------------------------------------------------------
         foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          weight |  -.0025874   .0006094    -4.25   0.000    -.0037817    -.001393
           _cons |   6.282599   1.603967     3.92   0.000     3.138882    9.426316
    ------------------------------------------------------------------------------
    
    . logit foreign weight, vce(robust)
    
    Iteration 0:   log pseudolikelihood =  -45.03321  
    Iteration 1:   log pseudolikelihood = -30.669507  
    Iteration 2:   log pseudolikelihood = -29.068209  
    Iteration 3:   log pseudolikelihood = -29.054005  
    Iteration 4:   log pseudolikelihood = -29.054002  
    Iteration 5:   log pseudolikelihood = -29.054002  
    
    Logistic regression                             Number of obs     =         74
                                                    Wald chi2(1)      =      19.29
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -29.054002               Pseudo R2         =     0.3548
    
    ------------------------------------------------------------------------------
                 |               Robust
         foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          weight |  -.0025874   .0005892    -4.39   0.000    -.0037421   -.0014327
           _cons |   6.282599   1.603905     3.92   0.000     3.139004    9.426195
    ------------------------------------------------------------------------------
    https://www.statalist.org/forums/help#spelling also applies.

    Comment


    • #3
      So basically just running a normal logit regression is useless? What are the standard errors you get then? Are they just the same as OLS, no adjustment for the heteroscedasticity that inevitably occurs? It just seems so strange to me.

      Comment


      • #4
        Stata fits logit models using the standard Maximum Likelihood estimator, which takes account of the binary nature of the observed outcome variable. It is presumably the latter that leads you to your remark about inevitable heteroskedasticity. I think you're on the wrong track and recommend having a look at the manual entry, following it through to the References and also the Methods and Formulae. (See also any standard stats/econometrics textbook.) This will likely also explain how Stata (and other good software) estimate the "right" standard errors for the non-robust case, and how one may also have robust standard errors if you wish.

        Comment


        • #5
          So basically just running a normal logit regression is useless?
          Stephen Jenkins has already addressed this, but a short answer is No, not least because there is no unanimity on this even among experts.

          Comment


          • #6
            If by "logistic regression" you mean the linear probability model, then you are right is saying that the residuals are "by nature heteroscedastic". If you are referring to the logit model the residuals are not "by nature" heteroscedastic. The logit model completely specifies the distribution.
            Added on edit: If you are referring to the linear probability model, you should correct for heteroscedasticity with the robust option. Otherwise, you should use the heteroscedastic probit model (-hetprog- in Stata)
            Last edited by Eric de Souza; 10 Feb 2020, 10:39.

            Comment

            Working...
            X