Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing Instrument Relevance following ivregress and first stage statistics

    Testing instrument relevance after ivregress using first option. I have an abbreviated version of my model below for abbreviated output. The required F statistic would test the hypothesis that each of the two instruments are weak. The output F statistic from the first stage is identical to the F statistic from OLS estimation on all of the exogenous (included and excluded) variables. Some documentation suggests that first stage output would be the necessary F statistic to test relevance, but this does not seem to be the case.

    Assuming I'm correct, then F-statistics must be developed after separate estimations of each excluded instrument on all of the exogenous variables.

    Any comment?

    Here is my abbreviated output demonstrating the identical F statistics:

    . */ abbreviated model for abbreviated output

    . */ suppose two endogenous variables and two instruments

    . . ivregress 2sls d.gtotpay d.(style_msa) (d.comp d.hmo_sim = d.frac_comp_state_gen d.frac_hmo_sim_state_gen) if cap_person!=1, vce(robust) first

    First-stage regressions
    -----------------------

    Number of obs = 1,692,863
    F(3, 1692859) = 2074.65
    Prob > F = 0.0000
    R-squared = 0.0077
    Adj R-squared = 0.0077
    Root MSE = 0.1574

    ----------------------------------------------------------------------------------------
    | Robust
    D.comp | Coefficient std. err. t P>|t| [95% conf. interval]
    -----------------------+----------------------------------------------------------------
    style_msa |
    D1. | .0001779 .0000108 16.51 0.000 .0001568 .000199
    |
    frac_comp_state_gen |
    D1. | .7236218 .0101824 71.07 0.000 .7036647 .7435789
    |
    frac_hmo_sim_state_gen |
    D1. | .0762913 .0029813 25.59 0.000 .070448 .0821346
    |
    _cons | -.0096025 .0003209 -29.93 0.000 -.0102313 -.0089736
    ----------------------------------------------------------------------------------------

    Number of obs = 1,692,863
    F(3, 1692859) = 1204.13
    Prob > F = 0.0000
    R-squared = 0.0182
    Adj R-squared = 0.0182
    Root MSE = 0.1307

    ----------------------------------------------------------------------------------------
    | Robust
    D.hmo_sim | Coefficient std. err. t P>|t| [95% conf. interval]
    -----------------------+----------------------------------------------------------------
    style_msa |
    D1. | -.0001198 .0000232 -5.16 0.000 -.0001653 -.0000743
    |
    frac_comp_state_gen |
    D1. | .2146812 .0058788 36.52 0.000 .2031589 .2262035
    |
    frac_hmo_sim_state_gen |
    D1. | .4678168 .0082113 56.97 0.000 .4517229 .4839107
    |
    _cons | .0034223 .0006378 5.37 0.000 .0021722 .0046723
    ----------------------------------------------------------------------------------------


    Instrumental-variables 2SLS regression Number of obs = 1,692,863
    Wald chi2(3) = 54.14
    Prob > chi2 = 0.0000
    R-squared = .
    Root MSE = 25296

    ------------------------------------------------------------------------------
    | Robust
    D.gtotpay | Coefficient std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    comp |
    D1. | 1512.331 1372.195 1.10 0.270 -1177.121 4201.784
    |
    hmo_sim |
    D1. | -2098.223 818.5431 -2.56 0.010 -3702.538 -493.9079
    |
    style_msa |
    D1. | 11.55352 1.765345 6.54 0.000 8.093507 15.01353
    |
    _cons | -177.6334 53.6 -3.31 0.001 -282.6875 -72.57936
    ------------------------------------------------------------------------------
    Endogenous: D.comp D.hmo_sim
    Exogenous: D.style_msa D.frac_comp_state_gen D.frac_hmo_sim_state_gen

    . */ OLS estimation on the first endogenous variable

    . reg d.comp d.(style_msa frac_comp_state_gen frac_hmo_sim_state_gen) if cap_person!=1, vce(robust) first

    Linear regression Number of obs = 1,692,863
    F(3, 1692859) = 2074.65
    Prob > F = 0.0000
    R-squared = 0.0077
    Root MSE = .15738

    ----------------------------------------------------------------------------------------
    | Robust
    D.comp | Coefficient std. err. t P>|t| [95% conf. interval]
    -----------------------+----------------------------------------------------------------
    style_msa |
    D1. | .0001779 .0000108 16.51 0.000 .0001568 .000199
    |
    frac_comp_state_gen |
    D1. | .7236218 .0101824 71.07 0.000 .7036647 .7435789
    |
    frac_hmo_sim_state_gen |
    D1. | .0762913 .0029813 25.59 0.000 .070448 .0821346
    |
    _cons | -.0096025 .0003209 -29.93 0.000 -.0102313 -.0089736






  • #2
    I'm having trouble understanding your question or concern. Could you elaborate on the sentence "Some documentation suggests that first stage output would be the necessary F statistic to test relevance, but this does not seem to be the case."? What documentation, specifically? Why doesn't it seem to be the case?

    Perhaps relatedly, why are you thinking it is necessary to do "separate estimations [regressions?] of each excluded instrument on all of the exogenous variables"?

    p.s. Please edit your post to place the code and (especially) results inside CODE delimiters, as described in FAQ 12.3.

    Comment


    • #3
      Below I applied the delimiters as indicated in FAQ 12.3 but it doesn't seem to improve the readability compared to what I provided from copying first to notepad. I'm missing something in how to apply that function.

      I misstated my conclusion in the original post, as you quoted, and I apologize for that. Rather, I think it is necessary to estimate separate regressions, each with a different endogenous variable as the dependent variable.

      Stata base reference manual p. 1344 includes: "The column marked “F(4, 44)” is an F statistic for the joint significance of 2, 3, 4, and 5,
      the coefficients on the additional instruments. Its p-value is listed in the column marked “Prob > F”. If the F statistic is not significant, then the additional instruments have no significant explanatory
      power....."

      suggesting that the needed statistic is provided from the "first" option. Yet, in my example below the first stage F statistic and the OLS F statistic, are exactly the same in magnitude and in degrees
      of freedom despite the OLS code including a variable in addition to the instruments indicated in the ivregress code. That is, F(3, 1692859) = 2074.65.




      [. */ abbreviated model for abbreviated output

      . */ suppose two endogenous variables and two instruments

      . . ivregress 2sls d.gtotpay d.(style_msa) (d.comp d.hmo_sim = d.frac_comp_state_gen d.frac_hmo_sim_state_gen) if cap_person!=1, vce(robust) first

      First-stage regressions
      -----------------------

      Number of obs = 1,692,863
      F(3, 1692859) = 2074.65
      Prob > F = 0.0000
      R-squared = 0.0077
      Adj R-squared = 0.0077
      Root MSE = 0.1574

      ----------------------------------------------------------------------------------------
      | Robust
      D.comp | Coefficient std. err. t P>|t| [95% conf. interval]
      -----------------------+----------------------------------------------------------------
      style_msa |
      D1. | .0001779 .0000108 16.51 0.000 .0001568 .000199
      |
      frac_comp_state_gen |
      D1. | .7236218 .0101824 71.07 0.000 .7036647 .7435789
      |
      frac_hmo_sim_state_gen |
      D1. | .0762913 .0029813 25.59 0.000 .070448 .0821346
      |
      _cons | -.0096025 .0003209 -29.93 0.000 -.0102313 -.0089736
      ----------------------------------------------------------------------------------------

      Number of obs = 1,692,863
      F(3, 1692859) = 1204.13
      Prob > F = 0.0000
      R-squared = 0.0182
      Adj R-squared = 0.0182
      Root MSE = 0.1307

      ----------------------------------------------------------------------------------------
      | Robust
      D.hmo_sim | Coefficient std. err. t P>|t| [95% conf. interval]
      -----------------------+----------------------------------------------------------------
      style_msa |
      D1. | -.0001198 .0000232 -5.16 0.000 -.0001653 -.0000743
      |
      frac_comp_state_gen |
      D1. | .2146812 .0058788 36.52 0.000 .2031589 .2262035
      |
      frac_hmo_sim_state_gen |
      D1. | .4678168 .0082113 56.97 0.000 .4517229 .4839107
      |
      _cons | .0034223 .0006378 5.37 0.000 .0021722 .0046723
      ----------------------------------------------------------------------------------------


      Instrumental-variables 2SLS regression Number of obs = 1,692,863
      Wald chi2(3) = 54.14
      Prob > chi2 = 0.0000
      R-squared = .
      Root MSE = 25296

      ------------------------------------------------------------------------------
      | Robust
      D.gtotpay | Coefficient std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      comp |
      D1. | 1512.331 1372.195 1.10 0.270 -1177.121 4201.784
      |
      hmo_sim |
      D1. | -2098.223 818.5431 -2.56 0.010 -3702.538 -493.9079
      |
      style_msa |
      D1. | 11.55352 1.765345 6.54 0.000 8.093507 15.01353
      |
      _cons | -177.6334 53.6 -3.31 0.001 -282.6875 -72.57936
      ------------------------------------------------------------------------------
      Endogenous: D.comp D.hmo_sim
      Exogenous: D.style_msa D.frac_comp_state_gen D.frac_hmo_sim_state_gen

      . */ OLS estimation on the first endogenous variable

      . reg d.comp d.(style_msa frac_comp_state_gen frac_hmo_sim_state_gen) if cap_person!=1, vce(robust) first

      Linear regression Number of obs = 1,692,863
      F(3, 1692859) = 2074.65
      Prob > F = 0.0000
      R-squared = 0.0077
      Root MSE = .15738

      ----------------------------------------------------------------------------------------
      | Robust
      D.comp | Coefficient std. err. t P>|t| [95% conf. interval]
      -----------------------+----------------------------------------------------------------
      style_msa |
      D1. | .0001779 .0000108 16.51 0.000 .0001568 .000199
      |
      frac_comp_state_gen |
      D1. | .7236218 .0101824 71.07 0.000 .7036647 .7435789
      |
      frac_hmo_sim_state_gen |
      D1. | .0762913 .0029813 25.59 0.000 .070448 .0821346
      |
      _cons | -.0096025 .0003209 -29.93 0.000 -.0102313 -.0089736
      ----------------------------------------------------------------------------------------CODE][/CODE]

      Comment


      • #4
        I think I understand your question now. The Stata manual passage you cite is not referring to the F-stat for the whole first stage regression. It's referring to an F-stat displayed by the command
        Code:
        estat firststage
        . You run that command after ivregress. (There seem to also be various community-contributed packages for weak instrument tests, with discussion elsewhere on StataList, but I'm no expert on those.)

        As for the delimiters, you appear to have pasted your results in the wrong place. It should go in between the open delimiter [CODE] and the close delimiter [/CODE]. Instead you've put it inside the open delimiter, between [ and CODE].

        Comment

        Working...
        X