Instrumental variables with binary endogenous regressor

Devon Smith

Join Date: Sep 2021

Posts: 26
#16

06 Jul 2022, 19:14

..

Last edited by Devon Smith; 06 Jul 2022, 19:19.
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#17

06 Jul 2022, 20:53

Jeff:

Just had a follow-up question:

If the homoskedasticity assumption in the structural equation does not hold, does using fitted values as instrument lead to more efficient estimates than using the actual instrument, Z, itself? I have a situation where my instrument also happens to be binary.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#18

09 Jul 2022, 19:28

Devon: There are no guarantees with heteroskedasticity. It could still be more efficient. Use robust standard errors in both cases. With binary Z you’re counting on variation in X to strengthen the IV.
1 like
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#19

13 Jul 2022, 20:31

Got it! Thanks, Jeff! Appreciate your help.

Best, Devon.
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#20

30 Oct 2022, 17:12

You may want to check the following from Angrist & Krueger (2001):

We conclude our review of pitfalls with a discussion of functional form issues for both the first and second stages in two-stage least squares estimation. Researchers are sometimes tempted to use probit or logit to generate first-stage predicted values in applications with a dummy endogenous regressor. But this is not necessary and may even do some harm. In two-stage least squares, consistency of the second-stage estimates does not turn on getting the first-stage functional form right (Kelejian, 1971). So using a linear regression for the first-stage estimates generates consistent second-stage estimates even with a dummy endogenous variable. Moreover, using a nonlinear first stage to generate fitted values that are plugged directly into the second-stage equation does not generate consistent estimates unless the nonlinear model happens to be exactly right, a result which makes the dangers of misspecification high.¹⁰

Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85. https://doi.org/10.1257/jep.15.4.69
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#21

30 Oct 2022, 17:14

You may want to check the following from Angrist & Krueger (2001):

We conclude our review of pitfalls with a discussion of functional form issues for both the first and second stages in two-stage least squares estimation. Researchers are sometimes tempted to use probit or logit to generate first-stage predicted values in applications with a dummy endogenous regressor. But this is not necessary and may even do some harm. In two-stage least squares, consistency of the second-stage estimates does not turn on getting the first-stage functional form right (Kelejian, 1971). So using a linear regression for the first-stage estimates generates consistent second-stage estimates even with a dummy endogenous variable. Moreover, using a nonlinear first stage to generate fitted values that are plugged directly into the second-stage equation does not generate consistent estimates unless the nonlinear model happens to be exactly right, a result which makes the dangers of misspecification high.¹⁰

Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85. https://doi.org/10.1257/jep.15.4.69
Comment
Thaer Alhalabi

Join Date: Jan 2019

Posts: 4
#22

27 Dec 2023, 10:10

Hello Jef, I have a question that is related to this thread, is it okay if my ivreg2 shows that my instruments are strong but then when I do the first step -probit- only one instrument is significant (out of three). Can I still use the estimated probability as an instrument or does it mean that the instruments are questionable?
Thanks
Comment
Michael Zuze

Join Date: Apr 2024

Posts: 12
#23

09 Jun 2024, 06:03

Originally posted by Jeff Wooldridge View Post

Devon: There are no guarantees with heteroskedasticity. It could still be more efficient. Use robust standard errors in both cases. With binary Z you’re counting on variation in X to strengthen the IV.

may you please assist in a situation where both dependent and endogenous variables (Y&D) are binary in a simultaneous equation model and there is reverse causality

Y = B₀ + B₁D + B₂X + U
D = B₀ + B₁Y+ B₂X + U
I was reading a method suggested by Maddala (1983) that the two stages can be done using probit ML?.
Comment
Michael Zuze

Join Date: Apr 2024

Posts: 12
#24

09 Jun 2024, 07:34

Joao Santos Silva would you please assist

I have a similar case and trying to avoid the Forbidden regression. I have two simultaneous equations one for poverty and the other for informal employment specified as follows:
Poor = B₀ + B₁ Informal employment + B₂X +B₂Z1+ U
Informal Employment = B₀ + B₁ Poor + B₂X +B₂Z2+ U, where both dependent and endogenous variables are binary for the two equations and vector X has the same exogenous variables, Z1 and Z2 are instruments.

I was following Maddala(1983), who suggested estimating probit ML in the first and second stages; however, after reading Angrist, I discovered this is impossible and leads to forbidden regression. Instead, I should use LPM. Kindly assist me in working this out for my two simultaneous equations.

Thanks
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#25

10 Jun 2024, 00:11

Dear Michael Zuze,

Maybe I am missing something, but I would say that the standard in this case is to simply use 2SLS and ignore the fact that the dependent variables are binary. Of course, you need to interpret the results with the necessary caution.

Best wishes,

Joao
Comment
Michael Zuze

Join Date: Apr 2024

Posts: 12
#26

10 Jun 2024, 03:05

thanks Joao Santos Silva so i have tried it as below. Initially i wanted to use biprobit but the system was not converging probably because i specified as there is reverse causality between poor and informal employment thus each is either a dependent or independent variable in one equation.

global y1 hhinformal // Informal sector employment, binary
global y2 poor // Poverty status, binary
global x1 hdmale i.hhage i.hheduc hhmarried hdsize tot_informal urban // Shared predictors
global z1 child_under6 m_hseduc // Instruments for equation 1 (hhinformal)
global z2 large_firm // Instrument for equation 2 (poor)

// First stage regression: predicting hhinformal
ivregress 2sls $y1 ($y2 = $z2) $x1 $z1, first

// First stage regression: predicting poor
ivregress 2sls $y2 ($y1 = $z1) $x1 $z2, first
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment