Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy Instrumental Variable?

    Dear Statalist Community,

    I am trying to run a 2SLS using ivreg2. My dependent variable (Y) is voteshare by municipality, and my endogenous variable (X) is number of workers in a municipality. However, my instrumental variable (Z) is a dummy variable corresponding to risk of worker deaths. Municipalities with higher risk of deaths would =1; and lower risk of deaths=0. My data is not panel data, it's simply cross-sectional, I just have data for one period and multiple municipalities (N=6,497).

    I noticed that when I included a province dummy, the following regression wouldn't run:

    ivreg2 voteshare `controls' prov_dummy*(workers=deaths_dummy), cluster(agrarianzone) first

    I would get the error:
    "Warning - collinearities detected
    Vars dropped: prov_dummy87 deaths_dummy
    equation not identified; must have at least as many instruments
    not in the regression as there are instrumented variables"


    I believe this error may be occurring because I am possibly running a "forbidden regression", considering that my instrument is a dummy. I tried to read Angrist and Pischke's section on "Forbidden Regressions" in MHE, but I don't know if there is a way out of this problem without having to operationalize a new continuous instrument.

    Is there any way I could still use my dummy instrument? Is it possible to operationalize a dummy IV?

    Thank you so much for your precious help in advance.

    Best,
    Cat


  • #2
    Looks like there is perfect collinearity with the dummy and province. It's not a dummy problem as IV.

    Comment


    • #3
      Dear George,

      Thank you very much for your reply.
      Can I then assume that a "forbidden regression" only occurs when the endogenous variable is the dummy? And that the IV being a dummy wouldn't be a problem?
      If that is the case, how can I solve the collinearity problem?

      Once again, thank you for your help.
      Cat

      Comment


      • #4
        Cat: You're using the phrase "forbidden regression" incorrectly. It has nothing to do with the nature of the endogenous variable or instrumental variable. What you're attempting is fine as far as functional forms. It's just that, for whatever reason, your instrument is perfectly predicted by the province dummies. I find this peculiar because your IV varies by municipality, correct? How many provinces do you have?

        I can't help more unless you follow the FAQ and post a sample of the data, your Stata commands, and the output it gives you. I suspect the first stage gives a perfect fit, given the diagnostics you've shown.

        Comment


        • #5
          Dear Jeff,

          Thank you so much for your help and for clarifying the meaning of the "forbidden regression". My IV varies by municipality, but there are some provinces in which the risk of disease is high (=1) throughout all municipalities, so maybe that is why I am getting the collinearity problem? I have 18 regions, 93 provinces, 775 agrarian zones, and 7286 municipalities.

          Please kindly find a sample of the data, summary, and the 2SLS regressions:

          list voteshare workers deaths_dummy ///
          > agrarianzone province_no municipalities in 1/20


          +--------------------------------------------------------------------+
          | votesh~e workers deaths~y agrari~e province_no munici~s |
          |--------------------------------------------------------------------|
          1. | 0 .0124646 0 1 alessàndria 10 |
          2. | .0034317 .0124646 0 1 alessàndria 5 |
          3. | 0 .0124646 0 1 alessàndria 8 |
          4. | 0 .0124646 0 1 alessàndria 7 |
          5. | 0 .0124646 0 1 alessàndria 11 |
          |--------------------------------------------------------------------|
          6. | .0019455 .0124646 0 1 alessàndria 4 |
          7. | .0014706 .0124646 0 1 alessàndria 3 |
          8. | .0028599 .0124646 0 1 alessàndria 1 |
          9. | .007984 .0124646 0 1 alessàndria 9 |
          10. | .0089552 .058588 0 2 alessàndria 16 |
          |--------------------------------------------------------------------|
          11. | .0053004 .058588 0 2 alessàndria 12 |
          12. | .011879 .058588 0 2 alessàndria 20 |
          13. | .026 .058588 0 2 alessàndria 14 |
          14. | .007728 .058588 0 2 alessàndria 22 |
          15. | .0163522 .058588 0 2 alessàndria 15 |
          |--------------------------------------------------------------------|
          16. | .0043924 .058588 0 2 alessàndria 25 |
          17. | .0030372 .058588 0 2 alessàndria 17 |
          18. | 0 .058588 0 2 alessàndria 13 |
          19. | .0101954 .058588 0 2 alessàndria 29 |
          20. | .0080808 .058588 0 2 alessàndria 27 |
          +--------------------------------------------------------------------+


          . sum voteshare workers deaths_dummy ///
          > agrarianzone province_no municipalities


          Variable | Obs Mean Std. Dev. Min Max
          -------------+---------------------------------------------------------
          voteshare | 6,497 .016998 .0346604 0 .7008849
          workers | 6,497 .0435834 .0360513 0 .2803251
          deaths_dummy | 6,497 .3898722 .4877586 0 1
          agrarianzone | 6,497 341.0008 233.87 1 775
          province_no | 6,497 45.2221 27.15982 1 93
          -------------+---------------------------------------------------------
          municipali~s | 6,497 3721.937 2112.855 1 7286


          The 2SLS without the fixed effects:

          . ivreg2 voteshare (workers=deaths_dummy), cluster(agrarianzone) first


          First-stage regressions
          -----------------------


          First-stage regression of workers:

          Statistics robust to heteroskedasticity and clustering on agrarianzone
          Number of obs = 6497
          Number of clusters (agrarianzone) = 749
          ------------------------------------------------------------------------------
          | Robust
          workers | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          deaths_dummy | .036656 .0030138 12.16 0.000 .0307479 .0425641
          _cons | .0292923 .0013988 20.94 0.000 .0265501 .0320345
          ------------------------------------------------------------------------------
          F test of excluded instruments:
          F( 1, 748) = 147.93
          Prob > F = 0.0000
          Sanderson-Windmeijer multivariate F test of excluded instruments:
          F( 1, 748) = 147.93
          Prob > F = 0.0000



          Summary results for first-stage regressions
          -------------------------------------------

          (Underid) (Weak id)
          Variable | F( 1, 748) P-val | SW Chi-sq( 1) P-val | SW F( 1, 748)
          workers | 147.93 0.0000 | 148.15 0.0000 | 147.93

          NB: first-stage test statistics cluster-robust

          Stock-Yogo weak ID F test critical values for single endogenous regressor:
          10% maximal IV size 16.38
          15% maximal IV size 8.96
          20% maximal IV size 6.66
          25% maximal IV size 5.53
          Source: Stock-Yogo (2005). Reproduced by permission.
          NB: Critical values are for i.i.d. errors only.

          Underidentification test
          Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
          Ha: matrix has rank=K1 (identified)
          Kleibergen-Paap rk LM statistic Chi-sq(1)=108.04 P-val=0.0000

          Weak identification test
          Ho: equation is weakly identified
          Cragg-Donald Wald F statistic 2118.56
          Kleibergen-Paap Wald rk F statistic 147.93

          Stock-Yogo weak ID test critical values for K1=1 and L1=1:
          10% maximal IV size 16.38
          15% maximal IV size 8.96
          20% maximal IV size 6.66
          25% maximal IV size 5.53
          Source: Stock-Yogo (2005). Reproduced by permission.
          NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

          Weak-instrument-robust inference
          Tests of joint significance of endogenous regressors B1 in main equation
          Ho: B1=0 and orthogonality conditions are valid
          Anderson-Rubin Wald test F(1,748)= 214.23 P-val=0.0000
          Anderson-Rubin Wald test Chi-sq(1)= 214.55 P-val=0.0000
          Stock-Wright LM S statistic Chi-sq(1)= 220.85 P-val=0.0000

          NB: Underidentification, weak identification and weak-identification-robust
          test statistics cluster-robust

          Number of clusters N_clust = 749
          Number of observations N = 6497
          Number of regressors K = 2
          Number of endogenous regressors K1 = 1
          Number of instruments L = 2
          Number of excluded instruments L1 = 1

          IV (2SLS) estimation
          --------------------

          Estimates efficient for homoskedasticity only
          Statistics robust to heteroskedasticity and clustering on agrarianzone

          Number of clusters (agrarianzone) = 749 Number of obs = 6497
          F( 1, 748) = 87.57
          Prob > F = 0.0000
          Total (centered) SS = 7.803919546 Centered R2 = -0.2508
          Total (uncentered) SS = 9.681105754 Uncentered R2 = -0.0083
          Residual SS = 9.76134978 Root MSE = .03876

          ------------------------------------------------------------------------------
          | Robust
          voteshare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          workers | .6765329 .0722437 9.36 0.000 .5349379 .8181279
          _cons | -.0124877 .002614 -4.78 0.000 -.017611 -.0073643
          ------------------------------------------------------------------------------
          Underidentification test (Kleibergen-Paap rk LM statistic): 108.044
          Chi-sq(1) P-val = 0.0000
          ------------------------------------------------------------------------------
          Weak identification test (Cragg-Donald Wald F statistic): 2118.556
          (Kleibergen-Paap rk Wald F statistic): 147.927
          Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
          15% maximal IV size 8.96
          20% maximal IV size 6.66
          25% maximal IV size 5.53
          Source: Stock-Yogo (2005). Reproduced by permission.
          NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
          ------------------------------------------------------------------------------
          Hansen J statistic (overidentification test of all instruments): 0.000
          (equation exactly identified)
          ------------------------------------------------------------------------------
          Instrumented: workers
          Excluded instruments: deaths_dummy
          ------------------------------------------------------------------------------

          .
          end of do-file


          And the 2SLS regression with the fixed effects, that I can't even run:

          . tabulate province_no, gen(prov_dummy)

          . ivreg2 voteshare prov_dummy* (workers=deaths_dummy), cluster(agrarianzone) first

          Warning - collinearities detected
          Vars dropped: prov_dummy87 deaths_dummy
          equation not identified; must have at least as many instruments
          not in the regression as there are instrumented variables
          r(481);

          end of do-file

          r(481);

          .

          Once again, thank you very much for your precious help.

          Best,
          Cat



          Comment


          • #6
            Still can't tell what's happening. I don't often is ivreg2. Also wonder if it's trying to put in all province dummies, leading to a dummy variable trap.

            Try this first to see if you get a perfect fit between the IV and province dummies:

            Code:
            reg deaths_dummy i.province_no
            Try these, where you trick xtivreg into including province FEs:

            Code:
            xtset province_no
            xtivreg voteshare (workers = deaths_dummy), vce(cluster agrarianzone) first
            Or ivreghdfe and absorb using the province_no:

            Code:
            ivreghdfe voteshare (workers = deaths_dummy), absorb(agrarianzone) vce(cluster agrarianzone)
            Hopefully the first stage with xtivreg shows something.

            Comment


            • #7
              Dear Jeff,

              Thank you so much for your time and help once again -- I really appreciate it.
              Is it possible that the problem is that although the IV varies by municipality, the fact that some provinces have the same value could be causing the problem?

              Here are the results of the code you prescribe:





              Code:
              . reg deaths_dummy i.province_no
              
                    Source |       SS           df       MS      Number of obs   =     6,497
              -------------+----------------------------------   F(86, 6410)     =         .
                     Model |  1545.45359        86  17.9703906   Prob > F        =         .
                  Residual |           0     6,410           0   R-squared       =    1.0000
              -------------+----------------------------------   Adj R-squared   =    1.0000
                     Total |  1545.45359     6,496  .237908497   Root MSE        =         0
              
              ---------------------------------------------------------------------------------------
                       deaths_dummy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              ----------------------+----------------------------------------------------------------
                        province_no |
                       alessàndria  |         -1          .        .       .            .           .
                            ancona  |         -1          .        .       .            .           .
                             aosta  |         -1          .        .       .            .           .
                            arezzo  |         -1          .        .       .            .           .
                              asti  |         -1          .        .       .            .           .
                          avellino  |  -6.84e-14          .        .       .            .           .
                              bari  |  -6.72e-14          .        .       .            .           .
                           belluno  |         -1          .        .       .            .           .
                         benevento  |  -6.83e-14          .        .       .            .           .
                           bologna  |         -1          .        .       .            .           .
                           brèscia  |         -1          .        .       .            .           .
                          brìndisi  |  -6.80e-14          .        .       .            .           .
                           bèrgamo  |         -1          .        .       .            .           .
                     caltanissetta  |  -6.78e-14          .        .       .            .           .
                        campobasso  |  -6.86e-14          .        .       .            .           .
                         catanzaro  |  -6.88e-14          .        .       .            .           .
                           catània  |  -6.73e-14          .        .       .            .           .
                            chieti  |  -6.84e-14          .        .       .            .           .
                              como  |         -1          .        .       .            .           .
                           cosenza  |  -6.88e-14          .        .       .            .           .
                           cremona  |         -1          .        .       .            .           .
                          càgliari  |  -6.87e-14          .        .       .            .           .
                             cùneo  |         -1          .        .       .            .           .
                              enna  |  -6.78e-14          .        .       .            .           .
                           ferrara  |         -1          .        .       .            .           .
                           firenze  |         -1          .        .       .            .           .
                             forlì  |         -1          .        .       .            .           .
                    friuli (ùdine)  |         -1          .        .       .            .           .
                         frosinone  |  -6.81e-14          .        .       .            .           .
                            fòggia  |  -6.73e-14          .        .       .            .           .
                          grosseto  |         -1          .        .       .            .           .
                            gènova  |         -1          .        .       .            .           .
                           impèria  |         -1          .        .       .            .           .
                   iònio (táranto)  |  -6.79e-14          .        .       .            .           .
                         la spèzia  |         -1          .        .       .            .           .
                             lecce  |  -6.82e-14          .        .       .            .           .
                          littòria  |  -6.79e-14          .        .       .            .           .
                           livorno  |         -1          .        .       .            .           .
                             lucca  |         -1          .        .       .            .           .
                          macerata  |         -1          .        .       .            .           .
                   massa e carrara  |         -1          .        .       .            .           .
                            matera  |  -6.79e-14          .        .       .            .           .
                           messina  |  -6.80e-14          .        .       .            .           .
                            milano  |         -1          .        .       .            .           .
                           màntova  |         -1          .        .       .            .           .
                            mòdena  |         -1          .        .       .            .           .
                            napoli  |  -6.88e-14          .        .       .            .           .
                            novara  |         -1          .        .       .            .           .
                             nuoro  |  -6.82e-14          .        .       .            .           .
                            padova  |         -1          .        .       .            .           .
                           palermo  |  -6.78e-14          .        .       .            .           .
                             parma  |         -1          .        .       .            .           .
                             pavia  |         -1          .        .       .            .           .
                           perùgia  |         -1          .        .       .            .           .
                           pescara  |  -6.74e-14          .        .       .            .           .
                          piacenza  |         -1          .        .       .            .           .
                              pisa  |         -1          .        .       .            .           .
                           pistòia  |         -1          .        .       .            .           .
                           potenza  |  -6.83e-14          .        .       .            .           .
                   pèsaro e urbino  |         -1          .        .       .            .           .
                            ragusa  |  -6.78e-14          .        .       .            .           .
                           ravenna  |         -1          .        .       .            .           .
                             rieti  |  -6.76e-14          .        .       .            .           .
                              roma  |  -6.84e-14          .        .       .            .           .
                            rovigo  |         -1          .        .       .            .           .
                règgio di calàbria  |  -6.81e-14          .        .       .            .           .
                règgio nell emilia  |         -1          .        .       .            .           .
                           salerno  |  -6.87e-14          .        .       .            .           .
                            savona  |         -1          .        .       .            .           .
                             siena  |         -1          .        .       .            .           .
                          siracusa  |  -6.79e-14          .        .       .            .           .
                           sàssari  |  -6.77e-14          .        .       .            .           .
                           sòndrio  |         -1          .        .       .            .           .
                             terni  |         -1          .        .       .            .           .
                            torino  |         -1          .        .       .            .           .
                           treviso  |         -1          .        .       .            .           .
                           tràpani  |  -6.79e-14          .        .       .            .           .
                            tèramo  |  -6.73e-14          .        .       .            .           .
                            varese  |         -1          .        .       .            .           .
                           venèzia  |         -1          .        .       .            .           .
                          vercelli  |         -1          .        .       .            .           .
                            verona  |         -1          .        .       .            .           .
                          vincenza  |         -1          .        .       .            .           .
                           viterbo  |  -6.78e-14          .        .       .            .           .
                     àscoli piceno  |         -1          .        .       .            .           .
              áquila degli abruzzi  |  -6.84e-14          .        .       .            .           .
                                    |
                              _cons |          1          .        .       .            .           .
              ---------------------------------------------------------------------------------------
              
              . 
              end of do-file
              Code:
              . xtset province_no
                     panel variable:  province_no (unbalanced)
              
              . xtivreg voteshare (workers = deaths_dummy), vce(cluster agrarianzone) first
              panels are not nested within clusters
              r(498);
              
              end of do-file
              
              r(498);
              Code:
              . ivreghdfe voteshare (workers = deaths_dummy), absorb(agrarianzone) vce(cluster agrarianzone)
              (dropped 45 singleton observations)
              (MWFE estimator converged in 1 iterations)
              warning: -ranktest- error in calculating underidentification test statistics;
                       may be caused by collinearities
              warning: -ranktest- error in calculating weak identification test statistics;
                       may be caused by collinearities
              
              IV (2SLS) estimation
              --------------------
              
              Estimates efficient for homoskedasticity only
              Statistics robust to heteroskedasticity and clustering on agrarianzone
              
              Number of clusters (agrarianzone) =    704            Number of obs =     6452
                                                                    F(  1,   703) =        .
                                                                    Prob > F      =        .
              Total (centered) SS     =  5.002510873                Centered R2   =  -0.0010
              Total (uncentered) SS   =  5.002510873                Uncentered R2 =  -0.0010
              Residual SS             =  5.007604876                Root MSE      =   .02786
              
              ------------------------------------------------------------------------------
                           |               Robust
                 voteshare |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   workers |  -1.043916          .        .       .            .           .
              ------------------------------------------------------------------------------
              Underidentification test (Kleibergen-Paap rk LM statistic):                  .
                                                                 Chi-sq(.) P-val =         .
              ------------------------------------------------------------------------------
              Weak identification test (Cragg-Donald Wald F statistic):                    .
                                       (Kleibergen-Paap rk Wald F statistic):              .
              Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                                       15% maximal IV size              8.96
                                                       20% maximal IV size              6.66
                                                       25% maximal IV size              5.53
              Source: Stock-Yogo (2005).  Reproduced by permission.
              NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
              ------------------------------------------------------------------------------
              Warning: estimated covariance matrix of moment conditions not of full rank.
                       overidentification statistic not reported, and standard errors and
                       model tests should be interpreted with caution.
              Possible causes:
                       number of clusters insufficient to calculate robust covariance matrix
                       singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
              partial option may address problem.
              ------------------------------------------------------------------------------
              Collinearities detected among instruments: 1 instrument(s) dropped
              Instrumented:         workers
              Excluded instruments: deaths_dummy
              Partialled-out:       _cons
                                    nb: total SS, model F and R2s are after partialling-out;
                                        any small-sample adjustments include partialled-out
                                        variables in regressor count K
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              ------------------------------------------------------+
                Absorbed FE | Categories  - Redundant  = Num. Coefs |
              --------------+---------------------------------------|
               agrarianzone |       704         704           0    *|
              ------------------------------------------------------+
              * = FE nested within cluster; treated as redundant for DoF computation
              
              . 
              end of do-file
              
              .
              Thank you so much!
              Best regards,
              Cat

              Comment


              • #8
                The first regression show you your instrument doesn’t vary within province. That’s the problem. You can’t include province dummies when every municipality in within a province either has a zero or a one. There’s no usable variation if you include province dummies. It makes me think the IV was constructed using province-level data — not at the level of the municipality.

                Comment


                • #9
                  Dear Jeff,

                  Thank you so much for helping me identify the problem, this was really, really helpful.
                  I will look again into the data source to double-check the exact level at which the IV was constructed (this data was shared with me).

                  Can't thank you enough for your time!

                  Best,
                  Cat

                  Comment

                  Working...
                  X