Heckman + IV methods using cmp

Michael Thomson

Join Date: May 2018

Posts: 7
#1

Heckman + IV methods using cmp

19 Aug 2018, 22:37

I want to use the user-generated cmp command to combine Heckman selection modelling with instrumental variables.

Could anyone verify I've coded this correctly? In particular, should I be using my selection variable (indicating all non-missing variables) as the "indicator" for BOTH the first- and second stages of the 2SLS, or should I be just using $cmp_cont?

The outcome is continuous but missing data.
The endogenous variable is binary (but should be modelled as continuous to avoid the 'forbidden regression' if I understand correctly)
The instrument is an ordered categorical variable

Code:

cmp (outcome = endog_var `covariates') (selectvar = excludable_inst `covariates') (endog_var = i.instrument 'covariates') , ind(selectvar $cmp_probit $cmp_cont)

I'm also confused about why cmp uses more observations than ivregress when I remove the selection equation. I wanted to double-check what effect my selection modelling was having on the estimates, but it doesn't make sense to me that cmp uses nearly all my observations while ivregress correctly acknowledges that roughly 25% of my dependent variable observations are missing. How could I use cmp to mirror what ivregress does?

Last edited by Michael Thomson; 19 Aug 2018, 22:59.
Tags: None
David Roodman

Join Date: Jul 2014

Posts: 465
#2

20 Aug 2018, 21:01

It's OK to model the endogenous model as probit, if you think that's a good model for it. It will not constitute a forbidden regression to do so when all the equations are estimated simultaneously. But it is also OK to model it as linear: that is more robust to violates from the probit distributional assumption, but less efficient. I discuss this in my replication with Jonathan Morduch of Pitt & Khandker 1998.

cmp allows different equations to have different samples. What then is the size of "the" sample? cmp counts any observation for which at least one equation is complete as part of the overall sample. Return values e(N1), e(N2), ... give sample sizes for each equation.

If you want, you can restrict the sample of the instrumenting equation to selectvar*$cmp_cont or $cmp_probit. But you don't have to. Again, it depends on what you think the right model is. If you think its reasonable to assume that the same equation holds, with the same coefficients, for the full sample, then you don't need to restrict the sample for this equation.

--David
Comment

Announcement

Heckman + IV methods using cmp

Comment