Probit with vce(robust)

Mansour Mohseni

Join Date: Apr 2024

Posts: 17
#1

Probit with vce(robust)

05 Oct 2024, 09:15

Hi
I have read contrary opinions about using vce(robust) option in probit models. I am not sure yet whether it makes sense to use r option for probit/logit model. I appreciate it if you could answer this question.
Tags: MLE, probit, robust_se
Julian Reif

Join Date: Dec 2018

Posts: 49
#2

05 Oct 2024, 11:22

Dave Giles has a good blog post on this:
https://davegiles.blogspot.com/2013/...nonlinear.html

Associate Professor of Finance and Economics
University of Illinois
www.julianreif.com
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2207
#3

05 Oct 2024, 23:29

My thinking has evolved on this, and I think it makes some sense to use robust standard errors for basically every estimation problem. The reason is that we know all models are misspecified. If we realistically assume that probit and logit are approximations to the truth, then we want to perform inference that allows misspecification. That is what vce(robust) does in a probit or logit. We know the distribution is Bernoulli; we just don't know whether we have the correct functional form. We can act as if we have the correct model for computing average marginal (partial) effects, but we probably should obtain standard errors that allow the model to be wrong.

Having said that, there is no sense in which vce(robust) is somehow accounting for heteroskedasticity in the latent error, say e, in y* = xb + e. If e is heteroskedastic, then the correct model is not the usual probit or logit, but a more general version. One can estimate the more general version. Or, just use usual logit/probit as approximations and obtain robust standard errors.
3 likes
Comment

John Mullahy

Join Date: Dec 2016
Posts: 755

06 Oct 2024, 09:22

An alternative perspective casts the probit estimation problem as GMM rather than ML in the spirit of this underappreciated (IMHO) paper by Avery, Hansen, and Hotz https://www.jstor.org/stable/2526113.

Presumably there would be little debate about using appropriate robust standard errors in GMM estimation. How much efficiency would be sacrificed by using GMM instead of ML is not obvious to me.

Code:

cap preserve
cap drop _all

sysuse auto

loc rhs="price mpg weight length"

qui probit foreign `rhs'
probit

qui probit foreign `rhs', vce(robust)
probit

qui gmm (foreign-normal({xb:`rhs' _cons})), vce(robust) instr(`rhs') igmm
gmm

cap restore

Results

Code:

. qui probit foreign `rhs'

. probit

Probit regression                                       Number of obs =     74
                                                        LR chi2(4)    =  56.15
                                                        Prob > chi2   = 0.0000
Log likelihood = -16.95753                              Pseudo R2     = 0.6234

------------------------------------------------------------------------------
     foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   .0005327   .0001674     3.18   0.001     .0002047    .0008608
         mpg |  -.0702474   .0566022    -1.24   0.215    -.1811857    .0406909
      weight |   -.004612   .0017089    -2.70   0.007    -.0079614   -.0012627
      length |   .0298633   .0481359     0.62   0.535    -.0644813    .1242079
       _cons |   4.827757   5.976915     0.81   0.419    -6.886781    16.54229
------------------------------------------------------------------------------

.
. qui probit foreign `rhs', vce(robust)

. probit

Probit regression                                       Number of obs =     74
                                                        Wald chi2(4)  =  25.43
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -16.95753                        Pseudo R2     = 0.6234

------------------------------------------------------------------------------
             |               Robust
     foreign | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   .0005327   .0001216     4.38   0.000     .0002944    .0007711
         mpg |  -.0702474   .0540592    -1.30   0.194    -.1762014    .0357067
      weight |   -.004612   .0012393    -3.72   0.000     -.007041    -.002183
      length |   .0298633   .0450489     0.66   0.507    -.0584309    .1181575
       _cons |   4.827757   6.432887     0.75   0.453    -7.780469    17.43598
------------------------------------------------------------------------------

.
. qui gmm (foreign-normal({xb:`rhs' _cons})), vce(robust) instr(`rhs') igmm

. gmm

GMM estimation

Number of parameters =   5
Number of moments    =   5
Initial weight matrix: Unadjusted                 Number of obs   =         74
GMM weight matrix:     Robust

------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   .0005188   .0001355     3.83   0.000     .0002533    .0007844
         mpg |  -.0655702   .0526607    -1.25   0.213    -.1687834    .0376429
      weight |  -.0043788   .0012219    -3.58   0.000    -.0067736    -.001984
      length |   .0234059   .0468664     0.50   0.617    -.0684505    .1152623
       _cons |   5.356118   6.922758     0.77   0.439    -8.212238    18.92447
------------------------------------------------------------------------------
Instruments for equation 1: price mpg weight length _cons

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2207
#5

06 Oct 2024, 13:48

John: I think in the Avery et al. paper, they're interested in cases where time is a dimension and they use overidentifying restrictions that can lead to more efficiency in the presence of unmodeled serial correlation. A similar issue is when using logit or probit with panel data. Then one will use vce(cluster id) -- not primarily because one thinks the probit model is misspecified but because of the serial correlation. For cross-sectional problems with no overidentification, I'm not sure why one would use GMM. If the model is wrong, then every set of moment conditions identifies new parameters. It seems MLE is the way to go here. So then we're back to deciding how much discomfort we have in admitting the model is misspecified -- otherwise, we wouldn't use vce(robust).
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#6

06 Oct 2024, 14:14

Thanks for the clarification, Jeff. It's a good thing that models are rarely misspecified :-)
1 like
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#7

07 Oct 2024, 20:32

Originally posted by Jeff Wooldridge View Post

Having said that, there is no sense in which vce(robust) is somehow accounting for heteroskedasticity in the latent error, say e, in y* = xb + e. If e is heteroskedastic, then the correct model is not the usual probit or logit, but a more general version. One can estimate the more general version. Or, just use usual logit/probit as approximations and obtain robust standard errors.

As Jeff says, in ML robust is not heteroskedastic robust, but rather robust to regularity conditions not being satisfied, and thus the information matrix equality not being satisfied, which would make the oim estimator of the variance inappropriate. Jeff Wooldridge when you say more general version, do you mean the heteroskedastic probit? Even when modeling heteroskedasticity, we may want to use robust standard errors, since we are still unsure of whether we have either the model for the probability function right, or the function for the heteroskedasticity right either.

Alfonso Sanchez-Penalver
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#8

08 Oct 2024, 06:53

This comment by Alfonso Sánchez-Peñalver prompts the following questions.

As Jeff Wooldridge has advocated elsewhere, when seeking to estimate the conditional mean of an outcome y measured on [0, infinity) a leading strategy is to use Poisson regression with robust std. errors. The one key requirement for consistency is that the functional form of E[y|x] is correctly specified as exp(x*b).

My questions:

(1) Does the same logic extend to estimation of the conditional mean of an outcome y measured in {0, 1} by using probit regression with robust standard errors?

(2) If so is the one key requirement that the functional form of E[y|x] is correctly specified as PHI(x*b), where PHI is the N(0,1) CDF?

(3) If so, what are the implications (if any) of heteroskedasticity of u = g(x)*v with v ~ N(0,1) in the latent-outcome model y* = xb + u, where y = 1(y* > 0)?

For me it's (3) that makes things tricky. It's one thing to assume or assert (a) that the conditional mean of a binary outcome y is PHI(x*b) without making any assumptions that y is defined via a latent-variable threshold-crossing model, and a different thing to assume (b) that y arises from the threshold-crossing model in (3) in which case the conditional mean of y is no longer PHI(x*b) but rather PHI(x*b / g(x)).

Assumption (a) seems more in the spirit of "estimate the conditional mean of a non-negative outcome using Poisson regression with robust standard errors". But my instinct is also that by invoking the first-moment-only assumption (a) one might sacrifice the ability to interpret E[y|x] as Pr(y=1|x), which is presumably legitimate under assumption (b).

I may have strayed far off the trail here but these issues have for years flummoxed me. Thanks in advance for any clarifications and insights.
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#9

08 Oct 2024, 10:26

Hi John Mullahy. With regards to your question about (3), notice that with hetero in the way that you have described, the probability becomes Phi(x*b / g(z)), assuming z and x don't have to be the same. The multiplicative heteroskedasticity scales the bs. You were never really estimating the betas of the latent variable, but rather the betas divided by the overall (constant) scale (standard deviation), let's call them deltas. If the scale is not constant across observations, then your delta estimator is inconsistent. But it would also be so if the specification of g(z) is wrong, hence why I said in my previous message that you should still use robust standard errors. So, I think, this answers your (2) as well.

The reason that the Poisson estimator is consistent as long as the specification of the mean is correct, is that the Poisson distribution is fully determined by the mean. With Probit, or Logit for that matter, you need two parameters: mean and standard deviation. The normalization of the variance in both of them is valid assuming homoskedasticity, but if the latent variable is heteroskedastic, the normalization is not valid because it should be observation/case specific.

Now, having said all that, notice once more that g(z) is scaling all the bs. Depending the type of data you are using, or analysis you are doing, you may have heteroskedasticity across cases, and/or occasions (panel, grouped data...). Another way to capture unobserved heterogeneity in the deltas is through random parameters. Normally that heterogeneity is modeled across cases, so if the heteroskedasticity is also across cases, there should be a problem of identification, since random parameters are a very general approach that encompasses any heterogeneity at that level, including scale. If, however, the heteroskedasticity is across occasions, that should be identifiable as long as the random parameters are modeled at the case level. I also think that the identification problem depends on how many of the deltas you model as random and how many you model as fixed. because the heteroskedasticity scales all parameters: random and fixed.

I hope this clarifies a bit your thoughts, and hope I don't confuse you more.

Alfonso Sanchez-Penalver
2 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#10

08 Oct 2024, 16:05

Thanks very much, Alfonso.
Comment

Announcement

Probit with vce(robust)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment