Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2SLS with sample selection on STATA

    Dear all,

    I am running a 2SLS model on STATA whilst correcting for sample selection and thus, I am bootstrapping the whole process, but when I run the ivregress 2sls y.... command while bootstrapping, the model does not report the first stage regression results. is there a way to get those results?

    Thank you.

  • #2
    You didn't get a quick answer. You're most likely to get useful help if you follow the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    The only obvious thing is to have first as an option. However, it appears Stata does not give you the first stage with bootstrap. If all you want is the betas, you can use a different option for the standard errors and the first option will give you the same betas as bootstrap. I guess you could use the boostrap routine with regression to get the first stage results.

    Comment


    • #3
      Thank you Phil.

      Yes the "first" option does not provide me with the first-step results in ivregress while bootstrapping.

      I have tried running the model manually with the following,
      regress H x1 x2 x3, bootstrap
      predicted Hhat, xb
      regress W Hhat x1 x2, bootstrap

      But the results are quite different from when running the model using "ivregress"

      So I am assuming there is no other way to get the first stage results on STATA?

      Comment


      • #4
        Hi Nina,
        So if understand correctly you want to handle both sample selection and endogeneity. In the lastest code that you provide, i didnt see any indication on how were you introducing the Selection term in the equation.
        I believe there are at least 2 ways to do this. If you have access to Stata 15, the new extended regression command -eregress- now allows you to address selection and endogeneity simultaneously.
        The user written command CMP also allows you to estimate the whole model as long as you specify the selection and endogeneity models.

        More hands on approach would be to write your own program so you can estimate the whole reduced form set of equations using -ml-. THis would be like doing two-step but letting the standard errors being corrected automatically.

        You can also use bootstrap to obtain the same, but I would say you will need to program the whole process, and keep the same seed. so if you really want to see what happens on the auxiliary regresions, you can do model them individually.
        Hope this helps.
        F

        Comment


        • #5
          Hi Fernando,

          I am sorry I have not been clear in my posts. I am aware of the CMP model and the eregress command (although I do not have access to STATA 15 yet).

          I am running a 2SLS model, as well as using CMP and comparing results between the two models. And I have to stick to that because these are corrections to submitted papers.
          The correction required is mainly to account for selection in my models, which were already accounting for endogeneity using 2SLS and CMP.

          So as I have understood from other posts that in the 2SLS model I can do the following:

          probit selection x1 x2 z1, bootstrap
          calculate IMILLS
          ivregress wages x1 x2 IMILLS (health= z2), first vce(bootstrap)

          But thats when STATA does not report the reduced-form health equation results.

          I would be grateful if you could clarify what you mean by program the whole process,as in write my own program of sequential estimation of each equation?


          Thanks,
          N.

          Comment


          • #6
            Hi Nina,
            I see your problem better now. So my suggestion is to program the whole system using ml, (not sure how familiar you are with that process in stata) or "Bootstrap" each section sequentially in Stata using the same seed.
            So for the first suggestion this is what i would do:

            Code:
            program iv_sel
            args lnf xb1 g11 g12 lns1 zb2 g21 lns2 wb1
            qui {
            tempvar lnf1 lnf2 lnf3
            ** Selection model
            gen double `lnf1'=($ML_y3==1)*ln(normal(`wb1'))+ ($ML_y3==0)*ln(1-normal(`wb1'))
            ** First Stage regression for Endogenous variable
            tempvar mills
            gen double `mills'=normalden(`wb1')/normal(`wb1')
            gen double `lnf2'=ln(normalden($ML_y2,`zb2'+`g21'*`mills',exp(`lns2')) if $ML_y3==1
            *** Main outcome model
            gen double `lnf3'=ln(normalden($ML_y1,`xb1'+`g11'*($ML_y2-(`zb2'+`g21'*`mills'))+`g12'*`mills'),exp(`lns1'))  if $ML_y3==1
            ** Adding all LNF
            replace `lnf'=`lnf1'
            replace `lnf'=`lnf'+`lnf2'+`lnf3' if $ML_y3==1
            }
            end
            
            *This Model could be estimated using a command like:
            constrain 1 [g11]_cons==0
            constrain 2 [g12]_cons==0
            constrain 3 [g21]_cons==0
            ml model lf iv_sel (xb1:wages=x1 x2 imills health) (g11:) (g12:) (lns1:) (zb2:health=x1 x2  z2) (g21:) (lns2:) (sel:selection=x1 x2 z1), maximize constrain(1 2 3) missing
            matrix b=e(b)
            ml model lf iv_sel (xb1:wages=x1 x2 imills health) (g11:) (g12:) (lns1:) (zb2:health=x1 x2  z2) (g21:) (lns2:) (sel:selection=x1 x2 z1), maximize init(b,skip) vce(robust) missing
            ml display
            The main advantage of doing this is that it is easy to track down how you are dealing with both enodgeneity and selection. The problem may be that it can some times be hard find good appropriate initial values to start the estimation. Which is not a problem on itself. Since "good" initial values can be obtained from running each step individually. The alternative, as I put in the code, is to estimate each equation independently, to obtain those initial values that are later used in the main program.

            The second option, is to bootstrap the whole process using the same seed.
            The selection and reduced form Fist equation model do not need to be bootstrapped:
            Code:
            bootstrap , seed(1010): probit selection x1 x2 z1
            bootstrap , seed(1010): heckman health x1 x2 z2 , selection(selection=x1 x2 z1)
            
            Full system:
            program bs2ssel, eclass
            
            probit selection x1 x2 z1
            capture drop sstar
            predict sstar, xb
            capture drop mill
            gen mill=normalden(sstar)/normal(sstar)
            reg health x1 x2 z2 mil
            capture drop h_hat
            predict h_hat
            reg wages x1 x2 h_hat mills
            end
            
            bootstrap , seed(1010):bs2ssel
            Hope this helps





            Comment

            Working...
            X