Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit with vce(robust)

    Hi
    I have read contrary opinions about using vce(robust) option in probit models. I am not sure yet whether it makes sense to use r option for probit/logit model. I appreciate it if you could answer this question.

  • #2
    Dave Giles has a good blog post on this:
    https://davegiles.blogspot.com/2013/...nonlinear.html
    Associate Professor of Finance and Economics
    University of Illinois
    www.julianreif.com

    Comment


    • #3
      My thinking has evolved on this, and I think it makes some sense to use robust standard errors for basically every estimation problem. The reason is that we know all models are misspecified. If we realistically assume that probit and logit are approximations to the truth, then we want to perform inference that allows misspecification. That is what vce(robust) does in a probit or logit. We know the distribution is Bernoulli; we just don't know whether we have the correct functional form. We can act as if we have the correct model for computing average marginal (partial) effects, but we probably should obtain standard errors that allow the model to be wrong.

      Having said that, there is no sense in which vce(robust) is somehow accounting for heteroskedasticity in the latent error, say e, in y* = xb + e. If e is heteroskedastic, then the correct model is not the usual probit or logit, but a more general version. One can estimate the more general version. Or, just use usual logit/probit as approximations and obtain robust standard errors.

      Comment


      • #4
        An alternative perspective casts the probit estimation problem as GMM rather than ML in the spirit of this underappreciated (IMHO) paper by Avery, Hansen, and Hotz https://www.jstor.org/stable/2526113.

        Presumably there would be little debate about using appropriate robust standard errors in GMM estimation. How much efficiency would be sacrificed by using GMM instead of ML is not obvious to me.

        Code:
        cap preserve
        cap drop _all
        
        sysuse auto
        
        loc rhs="price mpg weight length"
        
        qui probit foreign `rhs'
        probit
        
        qui probit foreign `rhs', vce(robust)
        probit
        
        qui gmm (foreign-normal({xb:`rhs' _cons})), vce(robust) instr(`rhs') igmm
        gmm
        
        cap restore
        Results
        Code:
        . qui probit foreign `rhs'
        
        . probit
        
        Probit regression                                       Number of obs =     74
                                                                LR chi2(4)    =  56.15
                                                                Prob > chi2   = 0.0000
        Log likelihood = -16.95753                              Pseudo R2     = 0.6234
        
        ------------------------------------------------------------------------------
             foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               price |   .0005327   .0001674     3.18   0.001     .0002047    .0008608
                 mpg |  -.0702474   .0566022    -1.24   0.215    -.1811857    .0406909
              weight |   -.004612   .0017089    -2.70   0.007    -.0079614   -.0012627
              length |   .0298633   .0481359     0.62   0.535    -.0644813    .1242079
               _cons |   4.827757   5.976915     0.81   0.419    -6.886781    16.54229
        ------------------------------------------------------------------------------
        
        .
        . qui probit foreign `rhs', vce(robust)
        
        . probit
        
        Probit regression                                       Number of obs =     74
                                                                Wald chi2(4)  =  25.43
                                                                Prob > chi2   = 0.0000
        Log pseudolikelihood = -16.95753                        Pseudo R2     = 0.6234
        
        ------------------------------------------------------------------------------
                     |               Robust
             foreign | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               price |   .0005327   .0001216     4.38   0.000     .0002944    .0007711
                 mpg |  -.0702474   .0540592    -1.30   0.194    -.1762014    .0357067
              weight |   -.004612   .0012393    -3.72   0.000     -.007041    -.002183
              length |   .0298633   .0450489     0.66   0.507    -.0584309    .1181575
               _cons |   4.827757   6.432887     0.75   0.453    -7.780469    17.43598
        ------------------------------------------------------------------------------
        
        .
        . qui gmm (foreign-normal({xb:`rhs' _cons})), vce(robust) instr(`rhs') igmm
        
        . gmm
        
        GMM estimation
        
        Number of parameters =   5
        Number of moments    =   5
        Initial weight matrix: Unadjusted                 Number of obs   =         74
        GMM weight matrix:     Robust
        
        ------------------------------------------------------------------------------
                     |               Robust
                     | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               price |   .0005188   .0001355     3.83   0.000     .0002533    .0007844
                 mpg |  -.0655702   .0526607    -1.25   0.213    -.1687834    .0376429
              weight |  -.0043788   .0012219    -3.58   0.000    -.0067736    -.001984
              length |   .0234059   .0468664     0.50   0.617    -.0684505    .1152623
               _cons |   5.356118   6.922758     0.77   0.439    -8.212238    18.92447
        ------------------------------------------------------------------------------
        Instruments for equation 1: price mpg weight length _cons

        Comment


        • #5
          John: I think in the Avery et al. paper, they're interested in cases where time is a dimension and they use overidentifying restrictions that can lead to more efficiency in the presence of unmodeled serial correlation. A similar issue is when using logit or probit with panel data. Then one will use vce(cluster id) -- not primarily because one thinks the probit model is misspecified but because of the serial correlation. For cross-sectional problems with no overidentification, I'm not sure why one would use GMM. If the model is wrong, then every set of moment conditions identifies new parameters. It seems MLE is the way to go here. So then we're back to deciding how much discomfort we have in admitting the model is misspecified -- otherwise, we wouldn't use vce(robust).

          Comment


          • #6
            Thanks for the clarification, Jeff. It's a good thing that models are rarely misspecified :-)

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              Having said that, there is no sense in which vce(robust) is somehow accounting for heteroskedasticity in the latent error, say e, in y* = xb + e. If e is heteroskedastic, then the correct model is not the usual probit or logit, but a more general version. One can estimate the more general version. Or, just use usual logit/probit as approximations and obtain robust standard errors.
              As Jeff says, in ML robust is not heteroskedastic robust, but rather robust to regularity conditions not being satisfied, and thus the information matrix equality not being satisfied, which would make the oim estimator of the variance inappropriate. Jeff Wooldridge when you say more general version, do you mean the heteroskedastic probit? Even when modeling heteroskedasticity, we may want to use robust standard errors, since we are still unsure of whether we have either the model for the probability function right, or the function for the heteroskedasticity right either.
              Alfonso Sanchez-Penalver

              Comment


              • #8
                This comment by Alfonso Sánchez-Peñalver prompts the following questions.

                As Jeff Wooldridge has advocated elsewhere, when seeking to estimate the conditional mean of an outcome y measured on [0, infinity) a leading strategy is to use Poisson regression with robust std. errors. The one key requirement for consistency is that the functional form of E[y|x] is correctly specified as exp(x*b).

                My questions:

                (1) Does the same logic extend to estimation of the conditional mean of an outcome y measured in {0, 1} by using probit regression with robust standard errors?

                (2) If so is the one key requirement that the functional form of E[y|x] is correctly specified as PHI(x*b), where PHI is the N(0,1) CDF?

                (3) If so, what are the implications (if any) of heteroskedasticity of u = g(x)*v with v ~ N(0,1) in the latent-outcome model y* = xb + u, where y = 1(y* > 0)?


                For me it's (3) that makes things tricky. It's one thing to assume or assert (a) that the conditional mean of a binary outcome y is PHI(x*b) without making any assumptions that y is defined via a latent-variable threshold-crossing model, and a different thing to assume (b) that y arises from the threshold-crossing model in (3) in which case the conditional mean of y is no longer PHI(x*b) but rather PHI(x*b / g(x)).

                Assumption (a) seems more in the spirit of "estimate the conditional mean of a non-negative outcome using Poisson regression with robust standard errors". But my instinct is also that by invoking the first-moment-only assumption (a) one might sacrifice the ability to interpret E[y|x] as Pr(y=1|x), which is presumably legitimate under assumption (b).

                I may have strayed far off the trail here but these issues have for years flummoxed me. Thanks in advance for any clarifications and insights.

                Comment


                • #9
                  Hi John Mullahy. With regards to your question about (3), notice that with hetero in the way that you have described, the probability becomes Phi(x*b / g(z)), assuming z and x don't have to be the same. The multiplicative heteroskedasticity scales the bs. You were never really estimating the betas of the latent variable, but rather the betas divided by the overall (constant) scale (standard deviation), let's call them deltas. If the scale is not constant across observations, then your delta estimator is inconsistent. But it would also be so if the specification of g(z) is wrong, hence why I said in my previous message that you should still use robust standard errors. So, I think, this answers your (2) as well.

                  The reason that the Poisson estimator is consistent as long as the specification of the mean is correct, is that the Poisson distribution is fully determined by the mean. With Probit, or Logit for that matter, you need two parameters: mean and standard deviation. The normalization of the variance in both of them is valid assuming homoskedasticity, but if the latent variable is heteroskedastic, the normalization is not valid because it should be observation/case specific.

                  Now, having said all that, notice once more that g(z) is scaling all the bs. Depending the type of data you are using, or analysis you are doing, you may have heteroskedasticity across cases, and/or occasions (panel, grouped data...). Another way to capture unobserved heterogeneity in the deltas is through random parameters. Normally that heterogeneity is modeled across cases, so if the heteroskedasticity is also across cases, there should be a problem of identification, since random parameters are a very general approach that encompasses any heterogeneity at that level, including scale. If, however, the heteroskedasticity is across occasions, that should be identifiable as long as the random parameters are modeled at the case level. I also think that the identification problem depends on how many of the deltas you model as random and how many you model as fixed. because the heteroskedasticity scales all parameters: random and fixed.

                  I hope this clarifies a bit your thoughts, and hope I don't confuse you more.
                  Alfonso Sanchez-Penalver

                  Comment


                  • #10
                    Thanks very much, Alfonso.

                    Comment

                    Working...
                    X