Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ivprobit twostep show residuals

    Dear all,

    I am using ivprobit twostep to estimate the causal effect from x on y using z as an instrument for the endogenous variable x.
    If I understand it correctly, the twostep approach takes the predicted residuals of the first stage regression as a control variable in the second stage regression.
    If I perform the twostep procedure by hand, the second stage results comprise a coefficient for the residuals and the endogenous variable x. However, if I use the ivprobit twostep command, I get the same coefficient for x, but no coefficient for the residuals.
    I would like to report the coefficient and (correct) standard error for the residuals in my paper. Thus, I was wondering if there is any way to display coefficient and standard error for the residuals in the second stage when using the ivprobit twostep stata command?
    (If I compute the coefficient for the residuals in the second stage by hand, standard errors are probably incorrect.)

    I hope, I explained my problem properly. Thank you for your help in advance.
    Last edited by steffi sundi; 09 Oct 2024, 03:25.

  • #2
    Why would you want to do that? The residual coefficient is a nuisance parameter in the two-step estimation, and that is why Stata does not report it. You could bootstrap the two-stage procedure, but reporting the residual coefficient would just look odd.

    Code:
    webuse laborsup, clear
    ivprobit fem_work fem_educ kids (other_inc = male_educ), twostep vce(bootstrap, seed(10092024))
    gen sample= e(sample)
    
    cap prog drop mybootstrap_prog
    prog mybootstrap_prog, eclass
    qui sureg (other_inc =  fem_educ kids male_educ) if sample
    *PREDICT RESIDUALS
    predict res, r
    *SECOND STAGE PROBIT
    probit fem_work other_inc fem_educ kids res if sample
    cap drop res
    end
    
    bootstrap _b, reps(50) nowarn nodots seed(10092024): mybootstrap_prog
    Res.:

    Code:
    . ivprobit fem_work fem_educ kids (other_inc = male_educ), twostep vce(bootstrap, seed(10092024))
    (running ivprobit on estimation sample)
    
    Bootstrap replications (50): .........10.........20.........30.........40.........50 done
    
    Two-step probit with endogenous regressors        Number of obs   =        500
                                                      Wald chi2(3)    =     100.84
                                                      Prob > chi2     =     0.0000
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
       other_inc |   -.058473   .0098039    -5.96   0.000    -.0776883   -.0392576
        fem_educ |    .227437   .0274044     8.30   0.000     .1737254    .2811485
            kids |  -.1961748   .0464118    -4.23   0.000    -.2871402   -.1052093
           _cons |   .3956061   .4222941     0.94   0.349    -.4320751    1.223287
    ------------------------------------------------------------------------------
    Endogenous: other_inc
    Exogenous:  fem_educ kids male_educ
    
    .
    . gen sample= e(sample)
    
    .
    .
    .
    . cap prog drop mybootstrap_prog
    
    .
    . prog mybootstrap_prog, eclass
      1.
    . qui sureg (other_inc =  fem_educ kids male_educ) if sample
      2.
    . *PREDICT RESIDUALS
    .
    . predict res, r
      3.
    . *SECOND STAGE PROBIT
    .
    . probit fem_work other_inc fem_educ kids res if sample
      4.
    . cap drop res
      5.
    . end
    
    .
    .
    .
    . bootstrap _b, reps(50) nowarn nodots seed(10092024): mybootstrap_prog
    
    Probit regression                                       Number of obs =    500
                                                            Replications  =     50
                                                            Wald chi2(4)  = 149.46
                                                            Prob > chi2   = 0.0000
    Log likelihood = -252.04529                             Pseudo R2     = 0.2687
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
        fem_work | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
       other_inc |   -.058473   .0098039    -5.96   0.000    -.0776883   -.0392576
        fem_educ |    .227437   .0274044     8.30   0.000     .1737254    .2811485
            kids |  -.1961748   .0464118    -4.23   0.000    -.2871402   -.1052093
             res |   .0240492   .0088876     2.71   0.007     .0066298    .0414685
           _cons |   .3956061   .4222941     0.94   0.349    -.4320751    1.223287
    ------------------------------------------------------------------------------
    
    .

    Comment


    • #3
      Thank you very much for your response! That helped a lot!
      Indeed I was asking myself if the coefficient and significance for the residuals in the second stage kind of "confirm" the endogeneity of my explanatory variable x. However, you're right that it is unusual to report coefficients for residuals! Thank you once again!

      Comment


      • #4
        In addition to testing the null hypothesis of exogeneity of x, there is another good reason for seeing the coefficient on the control function (residual): its sign tells you the direction of the endogeneity and acts as a logical check. For example, in a wage equation where education is thought to be endogenous, it seems the unobserved factors that influence schooling will be positively correlated with the unobserved factors that affect wages. The sign of the coefficient on the CF provides that information. If the sign were negative in this example, one should be suspicious and probably question the exogeneity of the instrumental variable. Typically one has a hypothesis about which directly the endogeneity will go if it's present.

        One other comment: the magnitudes of the coefficients don't tell you the magnitude of the effect. You should also compute the average partial effect using the margins command.

        Code:
        margins, predict(pr) dydx(*)

        Comment

        Working...
        X