Probit MLE

Zuhura Anne

Join Date: Aug 2016

Posts: 33
#1

Probit MLE

28 Aug 2016, 12:24

Hello, My question is mostly based on the econometric though I understand that this is a stata forum.

This is my probit model:

Code:

Prob[y_ij=1]= Φ(β_0+ β_1 〖Educ〗_j+ X_ij^' β_2+X_j^' β_3+μ_ij )

i represents individuals and j represents households.
I am not so sure how I should write out its MLE.

This is what I have :

Code:

L= ∑_(i=1)^N▒y_ij 〖log〗_e [Φ(β_0+ β_1 〖Educ〗_j+ X_ij^' β_(2 )+X_j^' β_█(3@ ) )]+∑_(i=1)^N▒( 1-y_ij)〖log〗_e [1-Φ(β_0+ β_1 〖Educ〗_j+ X_ij^' β_2+ X_j^' β_(3 ))]

Should I imclude another summation from j=1 to M?
Or I can leave it as it is?
Tags: None
Christos Makridis

Join Date: Nov 2014

Posts: 157
#2

28 Aug 2016, 15:00

Is i an individual and j some group or product? Yeah just like you would see with the closed form expression in a logit model of exp(x_ij) / sum{exp(x_ij)}, you want to sum over j-- if that's the only question, just confirm your likelihood with some online notes or a text; the likelihood is not showing up properly on the forum here I think.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#3

28 Aug 2016, 15:11

If this is for your reseach, you should just use the canned Stata probit command. It does not care that you have a data structure of individuals within households. But you should cluster at the household level, as the data set is pretty likely a cluster sample at the household level.

Code:

probit y x1 ... xk, cluster(household_id)

The explanatory variables can include those at the individual and household level. The standard errors and inference are robust to any kind of within-household correlation. If you want a reference, I discuss probit estimation using cluster samples in Chapter 20 of my MIT Press book, Econometric Analysis of Cross Section and Panel Data, MIT Press, 2010, 2e.

If for some reason you are having to code this up -- say, as an exercise -- then I cannot help.
Comment
Zuhura Anne

Join Date: Aug 2016

Posts: 33
#4

28 Aug 2016, 18:34

Thank you so much. The chapter on using cluster samples has been really useful.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

02 Sep 2016, 05:49

You're welcome!
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#6

02 Sep 2016, 17:36

Jeff Wooldridge How do you square clustering in non-linear models with this? I thought that the MLE of the parameter vector is biased and inconsistent if the errors are allowed to be heteroskedastic across clusters and correlated within them since the index function coefficients are normalized by the standard deviation of the error?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#7

03 Sep 2016, 12:19

Originally posted by Dimitriy V. Masterov View Post

Jeff Wooldridge How do you square clustering in non-linear models with this? I thought that the MLE of the parameter vector is biased and inconsistent if the errors are allowed to be heteroskedastic across clusters and correlated within them since the index function coefficients are normalized by the standard deviation of the error?

This is a common misunderstanding, especially with probit models. One is not allowing the variance in the latent variable model to be heteroskedastic. As you implied, this would entirely change the functional form of the response probability, so of course the usual probit MLE would be inconsistent for the coefficient vector. But clustering is first and foremost used to address within cluster correlation. If the probit model is correctly specified, one does not need to cluster for "heteroskedasticity" -- which, in the probit case, means a violation of the information matrix equality for each individual unit. But allowing that is harmless, too. There is no clear benefit from clustering to allow cluster correlation while imposing the information matrix equality for each i. At least no asymptotic benefit, and I know of no studies that have looked at the issue.

Even if one doubts the correctness of the probit model is correct, it can provide a good approximation to the average partial effects. In this case, clustering would have the benefit of providing the proper standard errors that allow the probit model to be misspecified. In fact, that is the purpose of the "robust" option with probit: To allow the probit model to be wrong but at the same time producing proper standard errors if the model is misspecified.

I hope this helps.
Jeff
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#8

06 Sep 2016, 13:08

Jeff Wooldridge Your first point makes a lot of sense to me: under homoskedasticity, it is not problematic if you allow the diagonal of the variance-covariance matrix to vary since you should get back that they are the same, at least asymptotically. Thanks for clarifying this.

The second paragraph makes less sense to me. Why would you care about consistently estimating the standard errors of inconsistent parameters?
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 215
#9

06 Sep 2016, 13:35

Hello Dimitriy,

What is meant with this is that the probit likelihood is not the true likelihood but still provides a good approximation to objects of interest. In these cases you obtain consistent estimates of the mean function, and functions of the mean function like the partial effects, but using the standard errors of the probit likelihood would give incorrect inference. This is what happens when you use poisson for a continuous outcome or fracreg, which uses a probit likelihood although the outcome may be continuous in [0,1]. I wrote in a bit more detail about this topic in

http://blog.stata.com/2016/08/30/two...andard-errors/
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#10

06 Sep 2016, 16:02

Enrique Pinzon (StataCorp) I understand the Poisson case, but the probit maximum likelihood estimator is not consistent in the presence of any kind of heteroscedasticity (or unmeasured heterogeneity or omitted variables, even if they are orthogonal to the included ones), and the sandwich estimator provides an appropriate asymptotic covariance matrix for an estimator that is biased in an unknown direction.
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#11

06 Sep 2016, 16:14

Besides the appropriateness of the estimation of the standard errors, are we happy to assume that the choices made for members of the same household are independent of each other? Shouldn't he be also accounting for this somehow, at least with a random intercept if not slopes? He can use xtprobit, vce(robust) if he wants just a random intercept, since once he sets the panel var to j that's the same as clustering on j, and meprobit, vce(cluster j), if he wants to do random coefficients. Of course if he wants to model fixed effects, he can use the correlated random effects (CRE) model using probit. Jeff Wooldridge is an expert in this, so I would seek his advice.

Alfonso Sanchez-Penalver
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#12

07 Sep 2016, 06:02

Originally posted by Dimitriy V. Masterov View Post

Jeff Wooldridge Your first point makes a lot of sense to me: under homoskedasticity, it is not problematic if you allow the diagonal of the variance-covariance matrix to vary since you should get back that they are the same, at least asymptotically. Thanks for clarifying this.

The second paragraph makes less sense to me. Why would you care about consistently estimating the standard errors of inconsistent parameters?

Dimitriy: Although we don't like to admit it, I would argue that computing standard errors of "inconsistent" parameters, or average marginal effects, is something we do all the time. In fact, almost every time we do an empirical analysis. How can one really think any model is correctly specified? As Hal White showed, we are consistently estimating the parameters that provide the best approximation to the "truth." In the MLE case, we minimize the distance between our model and the true density where distance is the Kullback-Leibler distance.

Thus, one can make a case to always compute a sandwich estimator of the asymptotic variance that is valid whether or not the model is correctly specified. This is part of what Enrique is getting at. I don't push this as strongly as I should because people are still loathe to admit their model might be wrong. But if we view the AMEs from, say, a probit model as at best an approximation to the true AMEs, we should at least do proper inference on those approximations.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#13

07 Sep 2016, 06:05

Alfonso: One could account for the within-family correlation in estimation -- probably using a random effects probit specification -- but there is no need to for consistent estimation and proper inference. The tradeoff is that the pooled probit allows any within-cluster correlation while RE probit assumes a very specific structure. RE probit will me more efficient if the RE structure happens to be true; otherwise it is generally inconsistent.
1 like
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#14

07 Sep 2016, 08:05

Originally posted by Jeff Wooldridge View Post

Alfonso: One could account for the within-family correlation in estimation -- probably using a random effects probit specification -- but there is no need to for consistent estimation and proper inference. The tradeoff is that the pooled probit allows any within-cluster correlation while RE probit assumes a very specific structure. RE probit will me more efficient if the RE structure happens to be true; otherwise it is generally inconsistent.

Hi Jeff. I always thought that when observations weren't independent the probit estimator would be inconsistent. I guess I was wrong. Why then the need for the assumption of independent observations? Because when learning the probit model the first thing I learned were that the errors were identically, independently normal distributed with mean zero and standard deviation 1. As you can see now I'm really confused about this.

The other question about whether to use a RE or a CRE estimator is whether there are unobserved household effects. To be concrete consider the choice of giving to charity, and that clearly the individual level of altruism is unobserved. One should expect the level of altruism to be pretty similar, or at least pretty related across household members, and quite different across households. When we have different values of the unobserved variable for each cluster, is pooled probit still consistent? Furthermore, altruism changes with age, as does the amount people give which clearly increases the probability of giving, so altruism is correlated with age, which is used as an explanatory variable. In this scenario clearly pooled probit is not consistent is it? One could further argue that the level of altruism changes the effect that higher income has on giving, and since households have different levels of altruism the effect of income will be different across households, which would imply that clearly the error term under pooled probit would be correlated with income, and thus pooled probit would also be inconsistent, wouldn't it?

So to clarify, then. If the errors are correlated within clusters because, for example, a persistence in the choice within households, and the households' unobserved effects are all the same across households and uncorrelated with the explanatory variables, then pooled probit is consistent (as long as we have the right specification of the model otherwise). Is my understanding now right?

Thanks!!!

Alfonso Sanchez-Penalver
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#15

08 Sep 2016, 16:51

Alfonso: Probit with dependence across observations will be consistent essentially when OLS is. Regrettably, many instructors and authors use independence as a required assumption, and this unnecessarily limits the scope of applications. When one uses pooled probit -- or, for that matter, and pooled MLE -- only only needs that the conditional distribution, D(y(i,t)|x(i,t)), is correctly specified in the panel data case and D(y(c,h)|x(c,h)) in the cluster case, where c is the cluster and h is a unit within a cluster. Of course, one must adjust inference.

If a true RE probit model holds, then pooled probit is still consistent for the (scaled) parameters at average marginal effects, even if the household heterogeneity changes across all households. I agree that the CRE model is often preferred, but that has nothing to do with whether the household effect varies. It has to do with whether the covariates are correlated with the household effect. Anyway, the CRE model using the Mundlak-Chamberlain device can be estimated by pooled probit or RE probit, with pooled probit being more robust (because is is consistent for any within-household correlation structure).

Pooled probit is never less robust than RE probit. In fact, it is more robust. If we use the CRE approach, pooled probit is still more robust.

I hope this helps. I have a fairly detailed discussion in my book and in various places online.

Best,
Jeff
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment