2SLS with sample selection on STATA

Nina Yatas

Join Date: Aug 2017

Posts: 6
#1

2SLS with sample selection on STATA

05 Sep 2017, 09:39

Dear all,

I am running a 2SLS model on STATA whilst correcting for sample selection and thus, I am bootstrapping the whole process, but when I run the ivregress 2sls y.... command while bootstrapping, the model does not report the first stage regression results. is there a way to get those results?

Thank you.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

06 Sep 2017, 14:44

You didn't get a quick answer. You're most likely to get useful help if you follow the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

The only obvious thing is to have first as an option. However, it appears Stata does not give you the first stage with bootstrap. If all you want is the betas, you can use a different option for the standard errors and the first option will give you the same betas as bootstrap. I guess you could use the boostrap routine with regression to get the first stage results.
Comment
Nina Yatas

Join Date: Aug 2017

Posts: 6
#3

24 Sep 2017, 07:41

Thank you Phil.

Yes the "first" option does not provide me with the first-step results in ivregress while bootstrapping.

I have tried running the model manually with the following,
regress H x1 x2 x3, bootstrap
predicted Hhat, xb
regress W Hhat x1 x2, bootstrap

But the results are quite different from when running the model using "ivregress"

So I am assuming there is no other way to get the first stage results on STATA?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2420
#4

24 Sep 2017, 11:34

Hi Nina,
So if understand correctly you want to handle both sample selection and endogeneity. In the lastest code that you provide, i didnt see any indication on how were you introducing the Selection term in the equation.
I believe there are at least 2 ways to do this. If you have access to Stata 15, the new extended regression command -eregress- now allows you to address selection and endogeneity simultaneously.
The user written command CMP also allows you to estimate the whole model as long as you specify the selection and endogeneity models.

More hands on approach would be to write your own program so you can estimate the whole reduced form set of equations using -ml-. THis would be like doing two-step but letting the standard errors being corrected automatically.

You can also use bootstrap to obtain the same, but I would say you will need to program the whole process, and keep the same seed. so if you really want to see what happens on the auxiliary regresions, you can do model them individually.
Hope this helps.
F
Comment
Nina Yatas

Join Date: Aug 2017

Posts: 6
#5

25 Sep 2017, 04:15

Hi Fernando,

I am sorry I have not been clear in my posts. I am aware of the CMP model and the eregress command (although I do not have access to STATA 15 yet).

I am running a 2SLS model, as well as using CMP and comparing results between the two models. And I have to stick to that because these are corrections to submitted papers.
The correction required is mainly to account for selection in my models, which were already accounting for endogeneity using 2SLS and CMP.

So as I have understood from other posts that in the 2SLS model I can do the following:

probit selection x1 x2 z1, bootstrap
calculate IMILLS
ivregress wages x1 x2 IMILLS (health= z2), first vce(bootstrap)

But thats when STATA does not report the reduced-form health equation results.

I would be grateful if you could clarify what you mean by program the whole process,as in write my own program of sequential estimation of each equation?

Thanks,
N.
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2420

25 Sep 2017, 06:59

Hi Nina,
I see your problem better now. So my suggestion is to program the whole system using ml, (not sure how familiar you are with that process in stata) or "Bootstrap" each section sequentially in Stata using the same seed.
So for the first suggestion this is what i would do:

Code:

program iv_sel
args lnf xb1 g11 g12 lns1 zb2 g21 lns2 wb1
qui {
tempvar lnf1 lnf2 lnf3
** Selection model
gen double `lnf1'=($ML_y3==1)*ln(normal(`wb1'))+ ($ML_y3==0)*ln(1-normal(`wb1'))
** First Stage regression for Endogenous variable
tempvar mills
gen double `mills'=normalden(`wb1')/normal(`wb1')
gen double `lnf2'=ln(normalden($ML_y2,`zb2'+`g21'*`mills',exp(`lns2')) if $ML_y3==1
*** Main outcome model
gen double `lnf3'=ln(normalden($ML_y1,`xb1'+`g11'*($ML_y2-(`zb2'+`g21'*`mills'))+`g12'*`mills'),exp(`lns1'))  if $ML_y3==1
** Adding all LNF
replace `lnf'=`lnf1'
replace `lnf'=`lnf'+`lnf2'+`lnf3' if $ML_y3==1
}
end

*This Model could be estimated using a command like:
constrain 1 [g11]_cons==0
constrain 2 [g12]_cons==0
constrain 3 [g21]_cons==0
ml model lf iv_sel (xb1:wages=x1 x2 imills health) (g11:) (g12:) (lns1:) (zb2:health=x1 x2  z2) (g21:) (lns2:) (sel:selection=x1 x2 z1), maximize constrain(1 2 3) missing
matrix b=e(b)
ml model lf iv_sel (xb1:wages=x1 x2 imills health) (g11:) (g12:) (lns1:) (zb2:health=x1 x2  z2) (g21:) (lns2:) (sel:selection=x1 x2 z1), maximize init(b,skip) vce(robust) missing
ml display

The main advantage of doing this is that it is easy to track down how you are dealing with both enodgeneity and selection. The problem may be that it can some times be hard find good appropriate initial values to start the estimation. Which is not a problem on itself. Since "good" initial values can be obtained from running each step individually. The alternative, as I put in the code, is to estimate each equation independently, to obtain those initial values that are later used in the main program.

The second option, is to bootstrap the whole process using the same seed.
The selection and reduced form Fist equation model do not need to be bootstrapped:

Code:

bootstrap , seed(1010): probit selection x1 x2 z1
bootstrap , seed(1010): heckman health x1 x2 z2 , selection(selection=x1 x2 z1)

Full system:
program bs2ssel, eclass

probit selection x1 x2 z1
capture drop sstar
predict sstar, xb
capture drop mill
gen mill=normalden(sstar)/normal(sstar)
reg health x1 x2 z2 mil
capture drop h_hat
predict h_hat
reg wages x1 x2 h_hat mills
end

bootstrap , seed(1010):bs2ssel

Hope this helps

Announcement

2SLS with sample selection on STATA

Comment

Comment

Comment

Comment

Comment