Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manual 2SLS as simply as humanly possible

    Hi all,

    I have what I think is a very simple question but I have not seen a generalizable answer to this on StataList.

    if you want to run a 2sls, you can do so very easily with Procedure 1:

    Code:
    ivreg y1 x1 (x2 = z1 z2)
    But if I want to do so in two steps as below in Procedure 2, the standard errors will obviously be wrong.

    Code:
    reg x2 x1 z1 z2
    predict x2hat, xb
    reg y1 x1 x2hat
    Is there a quick (or less quick) way to force the standard errors in Procedure 2 to the correct values taken from Procedure 1? Similarly, what if the regression I wanted to complete manually was

    Code:
    ivreg y1 x1 (x2 = z1 z2), r

    Would that have a simple fix as well? Hope I'm not violating any StataList Guidelines here!
    Thanks so much in advance!

    Best,
    Chuck

  • #2
    You can bootstrap the procedure, and the resulting bootstrap standard errors should be similar to the estimated robust standard errors.

    Comment


    • #3
      Hi Andrew, this is informative. I've heard of bootstrapping, but what is the procedure in stata to do this for this simple example? Or at least good resources on this? Thank you for your time.

      Best,
      Chuck

      Comment


      • #4
        Hi Andrew, this is informative. I've heard of bootstrapping, but what is the procedure in stata to do this for this simple example? Or at least good resources on this? I suppose my question is really this --

        Is it sufficient for me to just change the last stage of my regression to

        Code:
        ivreg y1 x1 x2hat, vce(bootstrap, reps(1000))

        Thank you for your time.

        Best,
        Chuck

        Comment


        • #5
          Chuck, that won't do it, for two reasons. The first is easy to fix: you want reg in place of ivreg. The second is fundamental: the command will not account for the first-step estimation, which means it will roughly give you the same standard errors as if you do

          Code:
          reg x2 x1 z1 z2
          predict x2hat
          reg y x1 x2hat, robust
          Unfortunately, adding "robust" does not account for the sampling error in the first-stage regression.

          What I've always done is write a short program that does both estimation steps, and then call the program in a bootstrap routine. You have to do both steps with each resampling of the data.

          Comment


          • #6
            I should say that I'm not a fan of doing this "by hand," and I don't see why a researcher would do this unless you have some sort of split-sample situation.

            Comment


            • #7
              You certainly can use the boostrap prefix with ivregress. But #1 suggests that you want to run the IV regression in 2 stages. Here is how, replicating the results of ivregress. Also, as stated in #2, the bootstrap standard errors would be similar to those obtained using the -robust- option.

              Code:
              *BOOTSTRAPPING TWO-STAGE PROCEDURE
              cap prog drop pboot
              program pboot
              args depvar endog exog1 iv1
              regress `endog' `iv1' `exog1'
              tempvar res
              predict `res', res
              regress `depvar' `endog' `exog1' `res'
              end
              
              
              webuse hsng2, clear
              bootstrap _b[hsngval] _b[pcturban], reps(50) seed(02202024): pboot rent hsngval pcturban faminc
              
              *IVREGRESS WITH BOOTSTRAP SEs
              bootstrap, reps(50) seed(02202024): ivregress 2sls rent pcturban (hsngval = faminc)
              
              *ROBUST SEs
              ivregress 2sls rent pcturban (hsngval = faminc), r
              Res.:

              Code:
              . *BOOTSTRAPPING TWO-STAGE PROCEDURE
              
              .
              . cap prog drop pboot
              
              .
              . program pboot
                1.
              . args depvar endog exog1 iv1
                2.
              . regress `endog' `iv1' `exog1'
                3.
              . tempvar res
                4.
              . predict `res', res
                5.
              . regress `depvar' `endog' `exog1' `res'
                6.
              . end
              
              .
              .
              .
              .
              .
              . webuse hsng2, clear
              (1980 Census housing data)
              
              .
              . bootstrap _b[hsngval] _b[pcturban], reps(50) seed(02202024): pboot rent hsngval pcturban faminc
              (running pboot on estimation sample)
              
              Bootstrap replications (50): .........10.........20.........30.........40.........50 done
              
              Bootstrap results                                           Number of obs = 50
                                                                          Replications  = 50
              
                    Command: pboot rent hsngval pcturban faminc
                      _bs_1: _b[hsngval]
                      _bs_2: _b[pcturban]
              
              ------------------------------------------------------------------------------
                           |   Observed   Bootstrap                         Normal-based
                           | coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                     _bs_1 |   .0031938   .0007289     4.38   0.000     .0017653    .0046224
                     _bs_2 |  -.5064118   .5925213    -0.85   0.393    -1.667732    .6549086
              ------------------------------------------------------------------------------
              
              .
              .
              .
              . *IVREGRESS WITH BOOTSTRAP SEs
              
              .
              . bootstrap, reps(50) seed(02202024): ivregress 2sls rent pcturban (hsngval = faminc)
              (running ivregress on estimation sample)
              
              Bootstrap replications (50): .........10.........20.........30.........40.........50 done
              
              Instrumental variables 2SLS regression            Number of obs   =         50
                                                                Wald chi2(2)    =      47.89
                                                                Prob > chi2     =     0.0000
                                                                R-squared       =     0.2887
                                                                Root MSE        =     29.517
              
              ------------------------------------------------------------------------------
                           |   Observed   Bootstrap                         Normal-based
                      rent | coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                   hsngval |   .0031938   .0007289     4.38   0.000     .0017653    .0046224
                  pcturban |  -.5064118   .5925213    -0.85   0.393    -1.667732    .6549086
                     _cons |   113.8143    18.8149     6.05   0.000      76.9378    150.6909
              ------------------------------------------------------------------------------
              Endogenous: hsngval
              Exogenous:  pcturban faminc
              
              .
              .
              .
              . *ROBUST SEs
              
              .
              . ivregress 2sls rent pcturban (hsngval = faminc), r
              
              Instrumental variables 2SLS regression            Number of obs   =         50
                                                                Wald chi2(2)    =      32.55
                                                                Prob > chi2     =     0.0000
                                                                R-squared       =     0.2887
                                                                Root MSE        =     29.517
              
              ------------------------------------------------------------------------------
                           |               Robust
                      rent | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                   hsngval |   .0031938    .000738     4.33   0.000     .0017474    .0046402
                  pcturban |  -.5064118   .5428297    -0.93   0.351    -1.570339    .5575149
                     _cons |   113.8143   21.62169     5.26   0.000     71.43659    156.1921
              ------------------------------------------------------------------------------
              Endogenous: hsngval
              Exogenous:  pcturban faminc
              
              .
              Depending on how many instruments you have, change the bootstrap program accordingly.
              Last edited by Andrew Musau; 20 Feb 2024, 14:08.

              Comment


              • #8
                Originally posted by Jeff Wooldridge View Post
                I should say that I'm not a fan of doing this "by hand," and I don't see why a researcher would do this unless you have some sort of split-sample situation.
                Hi Jeff,

                Thanks for sharing your insights I have two follow-up questions, and I really appreciate if you have any suggestions or comments.

                1. Calculation of (adjusted) R2 in 2SLS.
                It is recommended to use a software package with 2SLS command rather than explicitly carry out the two-step procedure (Wooldridge 2002). In doing so, 2SLS software will compute 2SLS residuals, instead of residuals from the second-stage OLS regression. The main difference is that whether we use predicted X or actual X when we calculate the residuals. As such, it is possible that RSS > TSS, and then R2 is negative. (In case I didn't discuss the problem clearly, please see https://www.stata.com/support/faqs/s...least-squares/). My question is that under what conditions we'd better report R2 (or adjusted R2) for 2SLS results and under what conditions a negative R2 (or adjusted R2) is acceptable? Can we just not report R2 if it is negative?

                2. Doing 2SLS by hand.
                If we do 2SLS by hand, the R2 (or adjusted R2) are positive. Is it reasonable to report R2 (or adjusted R2) from the two-stage OLS analyses? If yes, I guess it is better to use bootstrap to calculate SE and R2 (or adjusted R2).


                Thank you for your guidance and look forward to your reply.


                Best,
                Michael

                Comment

                Working...
                X