Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrument Exogeneity

    This is more of a theoretical question as opposed to a Stata-specific question.

    For a just-identified 2SLS with one endogenous variable (x) and one instrumental variable (z), testing for instrument exogeneity is problematic: Cov(z,u)=0 if u is the second stage residual. Can someone simply include a random variable as an additional instrumental variable to achieve overidentification and thus perform the Sargan test explicitly on the new model. From some quick tests, including a random variable as an additional instrument does not appear to influence the weak instruments test as at least one of the instruments is still highly correlated to the endogenous variable, and the inclusion of the additional random variable as an instrument does not influence the second stage beta coefficient for the endogenous variable since the random variable has no explanatory value. But, including this random instrumental variable does allow the execution of the Sargan overidentification test to verify instrument exogeneity.

  • #2
    You have basically already given the answer yourself.

    Adding a random variable as an additional instrument does not help with testing for exogeneity of the initial instrument. The random instrument is not relevant and therefore does not actually overidentify the model. All the identification still comes from the initial instrument, with or without the random instrument.
    Last edited by Sebastian Kripfganz; 15 Jun 2024, 07:52.
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Sebastian,
      Consider the following example. First, I only consider one instrumental variable for the potentially endogenous variable, which allows me to test for relevance but not exogeneity.

      Code:
      . use https://www.stata-press.com/data/r18/nlswork
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . gen age2=age*age
      (24 missing values generated)
      
      . ivreg2 ln_wage age age2 birth_yr grade (tenure =  wks_work )
      
      IV (2SLS) estimation
      --------------------
      
      Estimates efficient for homoskedasticity only
      Statistics consistent for homoskedasticity only
      
                                                            Number of obs =    27406
                                                            F(  5, 27400) =  1590.16
                                                            Prob > F      =   0.0000
      Total (centered) SS     =  6233.407363                Centered R2   =   0.0533
      Total (uncentered) SS   =  83644.71582                Uncentered R2 =   0.9295
      Residual SS             =   5900.97963                Root MSE      =     .464
      
      ------------------------------------------------------------------------------
           ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            tenure |    .105388   .0033394    31.56   0.000     .0988429    .1119331
               age |   .0324732   .0038269     8.49   0.000     .0249726    .0399738
              age2 |  -.0007456   .0000624   -11.96   0.000    -.0008678   -.0006233
          birth_yr |  -.0097598   .0010498    -9.30   0.000    -.0118174   -.0077022
             grade |   .0695243   .0012912    53.84   0.000     .0669936     .072055
             _cons |   .6648034   .0794622     8.37   0.000     .5090602    .8205465
      ------------------------------------------------------------------------------
      Underidentification test (Anderson canon. corr. LM statistic):        1724.158
                                                         Chi-sq(1) P-val =    0.0000
      ------------------------------------------------------------------------------
      Weak identification test (Cragg-Donald Wald F statistic):             1839.507
      Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                               15% maximal IV size              8.96
                                               20% maximal IV size              6.66
                                               25% maximal IV size              5.53
      Source: Stock-Yogo (2005).  Reproduced by permission.
      ------------------------------------------------------------------------------
      Sargan statistic (overidentification test of all instruments):           0.000
                                                       (equation exactly identified)
      ------------------------------------------------------------------------------
      Instrumented:         tenure
      Included instruments: age age2 birth_yr grade
      Excluded instruments: wks_work
      ------------------------------------------------------------------------------
      Here the instrumental variable wks-work is relevant, but we are not certain if it is correlated with the second stage residual, so cannot conclude that it is valid. So, let's include a second instrumental variable that we suspect is associated with tenure but not with wage so that the system is overidentified and we can test the validity of the instruments.

      Code:
      . ivreg2 ln_wage age age2 birth_yr grade (tenure =  wks_work msp), endog(tenure)
      
      IV (2SLS) estimation
      --------------------
      
      Estimates efficient for homoskedasticity only
      Statistics consistent for homoskedasticity only
      
                                                            Number of obs =    27393
                                                            F(  5, 27387) =  1589.52
                                                            Prob > F      =   0.0000
      Total (centered) SS     =  6231.680517                Centered R2   =   0.0536
      Total (uncentered) SS   =  83612.93909                Uncentered R2 =   0.9295
      Residual SS             =  5897.852939                Root MSE      =     .464
      
      ------------------------------------------------------------------------------
           ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            tenure |   .1053544   .0033395    31.55   0.000      .098809    .1118998
               age |   .0325098   .0038279     8.49   0.000     .0250074    .0400123
              age2 |  -.0007461   .0000624   -11.96   0.000    -.0008683   -.0006238
          birth_yr |  -.0097681     .00105    -9.30   0.000    -.0118261   -.0077101
             grade |   .0695431   .0012916    53.84   0.000     .0670117    .0720745
             _cons |   .6644631   .0794799     8.36   0.000     .5086855    .8202408
      ------------------------------------------------------------------------------
      Underidentification test (Anderson canon. corr. LM statistic):        1723.606
                                                         Chi-sq(2) P-val =    0.0000
      ------------------------------------------------------------------------------
      Weak identification test (Cragg-Donald Wald F statistic):              919.435
      Stock-Yogo weak ID test critical values: 10% maximal IV size             19.93
                                               15% maximal IV size             11.59
                                               20% maximal IV size              8.75
                                               25% maximal IV size              7.25
      Source: Stock-Yogo (2005).  Reproduced by permission.
      ------------------------------------------------------------------------------
      Sargan statistic (overidentification test of all instruments):           0.542
                                                         Chi-sq(1) P-val =    0.4616
      -endog- option:
      Endogeneity test of endogenous regressors:                             645.697
                                                         Chi-sq(1) P-val =    0.0000
      Regressors tested:    tenure
      ------------------------------------------------------------------------------
      Instrumented:         tenure
      Included instruments: age age2 birth_yr grade
      Excluded instruments: wks_work msp
      ------------------------------------------------------------------------------
      Now we see from the results a few things:
      1) the results are consistent with the just-identified first model.
      2) the Cragg-Donald Wald F statistic indicates that at least one of the instruments is valid.
      3) the Sargan statistic indicates that none of the instruments are correlated with the residual.
      4) the endogeneity test confirms that tenure is endogenous.

      Thus, I would be inclined to conclude that the second model is appropriate.

      Digging a bit deeper, let's examine the first-stage model.

      Code:
      . reg tenure wks_work msp age age2 birth_yr grad
      
            Source |       SS           df       MS      Number of obs   =    27,393
      -------------+----------------------------------   F(6, 27386)     =   1593.59
             Model |   100381.69         6  16730.2817   Prob > F        =    0.0000
          Residual |  287512.189    27,386  10.4985098   R-squared       =    0.2588
      -------------+----------------------------------   Adj R-squared   =    0.2586
             Total |  387893.879    27,392  14.1608455   Root MSE        =    3.2401
      
      ------------------------------------------------------------------------------
            tenure | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
          wks_work |   .0351499   .0008261    42.55   0.000     .0335307    .0367691
               msp |  -.0221594   .0409745    -0.54   0.589    -.1024715    .0581528
               age |   .1103038   .0266872     4.13   0.000     .0579955    .1626122
              age2 |   .0009966   .0004394     2.27   0.023     .0001354    .0018578
          birth_yr |   .0386993   .0070719     5.47   0.000     .0248379    .0525606
             grade |   .0747116   .0086691     8.62   0.000     .0577197    .0917035
             _cons |  -5.629259   .5120505   -10.99   0.000    -6.632904   -4.625614
      ------------------------------------------------------------------------------
      
      . test wks_work msp
      
       ( 1)  wks_work = 0
       ( 2)  msp = 0
      
             F(  2, 27386) =  919.44
                  Prob > F =    0.0000
      As can be seen, the F-statistic for instrument relevance is approximately equal to what was in the ivreg2 output and highly significant.

      But, msp actually does not add value as an instrument. However, it does not negatively impact the relevance of wks_work and it does allow the model to be overidentified so we could test for exogeneity.

      Code:
      . test msp
      
       ( 1)  msp = 0
      
             F(  1, 27386) =    0.29
                  Prob > F =    0.5886
      Now, let's replace the second instrumental variable (msp) with a random variable and run a third model.

      Code:
      . gen rand=runiform()
      
      . ivreg2 ln_wage age age2 birth_yr grade (tenure =  wks_work rand), endog(tenure)
      
      IV (2SLS) estimation
      --------------------
      
      Estimates efficient for homoskedasticity only
      Statistics consistent for homoskedasticity only
      
                                                            Number of obs =    27406
                                                            F(  5, 27400) =  1592.16
                                                            Prob > F      =   0.0000
      Total (centered) SS     =  6233.407363                Centered R2   =   0.0545
      Total (uncentered) SS   =  83644.71582                Uncentered R2 =   0.9295
      Residual SS             =  5893.475086                Root MSE      =    .4637
      
      ------------------------------------------------------------------------------
           ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            tenure |   .1052149   .0033319    31.58   0.000     .0986844    .1117454
               age |   .0325107   .0038243     8.50   0.000     .0250153    .0400062
              age2 |  -.0007455   .0000623   -11.96   0.000    -.0008676   -.0006233
          birth_yr |  -.0097409    .001049    -9.29   0.000    -.0117968    -.007685
             grade |    .069544   .0012902    53.90   0.000     .0670152    .0720728
             _cons |   .6629985   .0793874     8.35   0.000      .507402     .818595
      ------------------------------------------------------------------------------
      Underidentification test (Anderson canon. corr. LM statistic):        1729.666
                                                         Chi-sq(2) P-val =    0.0000
      ------------------------------------------------------------------------------
      Weak identification test (Cragg-Donald Wald F statistic):              922.856
      Stock-Yogo weak ID test critical values: 10% maximal IV size             19.93
                                               15% maximal IV size             11.59
                                               20% maximal IV size              8.75
                                               25% maximal IV size              7.25
      Source: Stock-Yogo (2005).  Reproduced by permission.
      ------------------------------------------------------------------------------
      Sargan statistic (overidentification test of all instruments):           0.845
                                                         Chi-sq(1) P-val =    0.3580
      -endog- option:
      Endogeneity test of endogenous regressors:                             645.638
                                                         Chi-sq(1) P-val =    0.0000
      Regressors tested:    tenure
      ------------------------------------------------------------------------------
      Instrumented:         tenure
      Included instruments: age age2 birth_yr grade
      Excluded instruments: wks_work rand
      ------------------------------------------------------------------------------
      
      .
      The results are consistent with the second model. I agree that msp is not a good instrumental variable, but using it still allows an appropriate Cragg-Donald Wald F stat as well as a valid Sargan statistic for exogeneity. Theoretically the random variable adds no information to the model, yet it allows the first stage to be overidentified such that the validity (specifically the exogeneity, as relevance can be established with a just-identified system) of the first instrument can be tested.

      Comment


      • #4
        I tend to disagree and would flip the argument around. The second instrument, msp, is apparently as good (or bad) as an independent random variable. This explains the similarity of the second and third results. msp is irrelevant in the first stage. Further indicators of that conclusion are that the underidentification test statistic is virtually unchanged when going from the first to the second specification, and the weak-identification test statistic is roughly halfed (which is solely due to the fact that the number of instruments appears in its denominator.) There is no extra information coming from msp. The overidentification test has no power; it randomly rejects the null hypothesis in 5% of the cases (assuming this is the significance level) irrespective of whether the null hypothesis is true or not. The p-value is just a random draw from the uniform distribution.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          @JJKovach for the Cragg-Donald Wald F statistic what is their threshold for considering that the instrument is valid?

          Comment


          • #6
            Check out Stock & Yogo (2002) for a good discussion on weak instruments

            https://www.nber.org/system/files/wo...0284/t0284.pdf

            Comment

            Working...
            X