I want to use the user-generated cmp command to combine Heckman selection modelling with instrumental variables.
Could anyone verify I've coded this correctly? In particular, should I be using my selection variable (indicating all non-missing variables) as the "indicator" for BOTH the first- and second stages of the 2SLS, or should I be just using $cmp_cont?
The outcome is continuous but missing data.
The endogenous variable is binary (but should be modelled as continuous to avoid the 'forbidden regression' if I understand correctly)
The instrument is an ordered categorical variable
I'm also confused about why cmp uses more observations than ivregress when I remove the selection equation. I wanted to double-check what effect my selection modelling was having on the estimates, but it doesn't make sense to me that cmp uses nearly all my observations while ivregress correctly acknowledges that roughly 25% of my dependent variable observations are missing. How could I use cmp to mirror what ivregress does?
Could anyone verify I've coded this correctly? In particular, should I be using my selection variable (indicating all non-missing variables) as the "indicator" for BOTH the first- and second stages of the 2SLS, or should I be just using $cmp_cont?
The outcome is continuous but missing data.
The endogenous variable is binary (but should be modelled as continuous to avoid the 'forbidden regression' if I understand correctly)
The instrument is an ordered categorical variable
Code:
cmp (outcome = endog_var `covariates') (selectvar = excludable_inst `covariates') (endog_var = i.instrument 'covariates') , ind(selectvar $cmp_probit $cmp_cont)
Comment