2sls regression with binary endogenous variable

Yarovoy Dima

Join Date: Jul 2015

Posts: 2
#1

2sls regression with binary endogenous variable

14 Jul 2015, 12:18

Hi people,

I tried to find the answer on the internet, I really did.

I'm trying to conduct 2sls regression in stata using ivregress command. The problem is that my endogenous variable, the one that is used as dependent variable in the first stage is binary. As far as I get, the estimation of my endogenous variable, which will be used for in the second stage, will be continuous. So how to convert the result of of first regression into a binary variable? I'm aware that I could do it manually in two steps. But is it possible to make it using ivregress or similiar command?

Thanks a lot for any help in advance.
Tags: None
Jorge Eduardo Perez Perez

Join Date: Mar 2014

Posts: 429
#2

14 Jul 2015, 12:37

This is the "forbidden regression". Google this and you'll find many resources on the proper approach in this case.

Jorge Eduardo Pérez Pérez
www.jorgeperezperez.com
Comment
Yarovoy Dima

Join Date: Jul 2015

Posts: 2
#3

14 Jul 2015, 13:28

Thanks you for your answer. But if it's possible, could you give a small example in stata terms?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#4

14 Jul 2015, 15:01

Dear Yarovoy,

Just use plain 2SLS, it will be fine. As you say, the fitted values of the binary variable that are used in the second stage won't be binary, but that is how it should be. Check a good econometrics textbook (say, Wooldridge, Cameron & Trivedi, or Davidson & MacKinnon) for the details

All the best,

Joao
Comment
Naveen Abedin

Join Date: May 2016

Posts: 2
#5

16 May 2016, 09:04

Hi everyone,

My problem is quite similar to Yarovoy's. Let me illustrate in a simplified manner:

My primary dependent variable is: Y
Independent variables are: X1, X2, X3. X1 however is an endogenous binary variable.
I have the instruments Z1 and Z2.

Following IV-2sls procedure, I have done:

probit X1 X2 X3 Z1 Z2

then I have obtained fitted values: X1-hat

X1 hat is no longer binary, but continuous.

Then I have completed my second stage as:

reg Y X1-hat X2 X3

Can you please tell me if I have done this right? Also, since X1-hat is not binary, how do I interpret the coefficient of X1-hat? (Y-hat is continuous)
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#6

16 May 2016, 13:17

Dear Naveen,

I am afraid what you are doing is wrong; is is what is called a forbidden regression. What you have to do is as follows:

a) probit X1 X2 X3 Z1 Z2
b) predict X1_hat, p
c) ivregress 2sls Y X2 X3 (X1 = Z1 Z2 X1_hat)

All the best,

Joao
Comment
Naveen Abedin

Join Date: May 2016

Posts: 2
#7

19 May 2016, 00:59

Thank you so much Joao! That was immense help!
Comment
Yaara Nussi

Join Date: Jun 2016

Posts: 8
#8

20 Jun 2016, 01:42

Hi!
I think I am dealing with a similar situation, and reading this post made doubt the validity of my procedure, could anyone advise?

I have a continuous dependent variable y, and two binary endogenous variables x1 x2, as well as additional exogenous variables $x. I also have 4 IV: z1-z4.
I am trying to estimate each equation separately as part of a simultaneous equation system.

1. When using y as the dependent variable, I used 2SLS.
Reading Joao's reply made me realize this might have not been accurate.
Should I have used the procedure suggested by Joao twice (one for each binary endogenous variable) to predict x1_hat and x2_hat then use 2SLS?
Does this look OK?

a) probit x1 $x z1 z2 z3 z4
b) predict x1_hat, p
c) probit x2 $x z1 z2 z3 z4
d) predict x2_hat, p
e) ivregress 2sls y $x (x1= $x z1 z2 z3 z4 x1_hat) (x2= $x z1 z2 z3 z4 x2_hat)

Would this account for the two endogenous binary variables?

2. When using x1 (binary) as the dependent variable, I wanted to use IV Probit, but refrained from doing so since the implemented procedure is not applicable for binary endogenous variables (x2). The second endogenous variable in this case, y, is continuous, and thus I believe should be OK. For this case I have a different set of IVs z5-z8.
Is there a similar, manual procedure I could apply?

I tried using -cmp- for this purpose, but my model has many exogenous variables ($x) and it seems to prevent the procedure from converging.

Any suggestion is much appreciated.
Many thanks,
Yaara
Comment
Yaara Nussi

Join Date: Jun 2016

Posts: 8
#9

20 Jun 2016, 05:09

I realized this adaption of Joao's suggestions is inaccurate since the syntax for IVs should "(all instrumented variables = instrument variables)".
This would make step e) ivregress 2sls y $x (x1 x2= z1 z2 z3 z4 x1_hat x2_hat).
Would steps a) - e) under this correction be the correct implementation of the manual run? I am using Stata SE 11.1.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#10

20 Jun 2016, 12:18

Dear Yaara,

When your dependent variable is continuous, the procedure that you describe is correct. Notice, however, that this is just a 2SLS estimator where you are using non-linear functions of the exogenous variables as instruments. The plain 2SLS you are using is also valid, but it is probably less efficient and the instruments may be weaker.

When your dependent variable is binary things get much more complicated; Andrew Chesher has some papers on this and one of the main results is that point-identification may not be possible. You should read that literature carefully before proceeding.

Best regards,

Joao
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#11

20 Jun 2016, 15:37

You might also look at cmp and SEM/GSEM which can estimate similar models.
Comment
Yaara Nussi

Join Date: Jun 2016

Posts: 8
#12

27 Jun 2016, 05:51

I tried both 2SLS and the suggested adaptation by Joao for the continuous dependent variable, and using a newer version of Stata have manage to run the IV Probit process for the binary endogenous regressor using CMP. Thank you both for your input, it was very helpful!
Comment
Hyejin Cho

Join Date: Jun 2015

Posts: 19
#13

05 Jul 2016, 01:41

Hi! I have also a similar problem but I am getting a little confused on when and when not to include the inverse mills ratio as well as how to include an interaction term that includes the binary endogeneous variable.

My variables are:
Y1=truncated continuous dependent variable
FA=ENDOgeneous & Binary independent variable
X2~X5=EXOgeneous & continuous independent variables
FAxX2=interaction term btwn endogeneous X1 & exogeneous X2
Z1~Z2=instrument variables

Based on the above suggestions, to avoid a forbidden regression I have done the following:
a) probit FA X2 X3 X4 X5 Z1 Z2
b) predict FA_hat, p
c) gen imr=normalden(FA_hat)/normal(FA_hat)
d) gen FA_hatxX2=FA_hat*X2
e) xtivregress Y1 X2 X3 (FA FAxX2 = Z1 Z2 Z1xX2 Z2xX2 FA_hat FA_hatxX2)

However, in my case, as I have issues of endogeneity & sample selection (since dependent variable Y1 is truncated), in step e, when running xtivregress, should i also be including the IMR so that my equation becomes:
xtivregress Y1 X2 X3 IMR (FA FAxX2 = Z1 Z2 Z1xX2 Z2xX2 FA_hat FA_hatxX2) ?

Second, as FAxX2 is an interaction variable involving the endogeneous FA, is the above method the correct method of controlling for the endogeneity issue of the interaction?

Any help will be much appreciated! Thanks in advance!

Regards,

Hyejin Cho
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#14

05 Jul 2016, 11:38

Dear Hyejin Cho,

Can you please let us know what is Y1 and why it is truncated?

Thanks,

Joao
Comment
Hyejin Cho

Join Date: Jun 2015

Posts: 19
#15

05 Jul 2016, 12:02

Hi Joao,
Y1 is the premium paid for an acquisition which is calculated as the % difference between the offer price and the target's equity value 1 month prior to the acquisition. This is truncated because the premiums which are obtained from SDC, results in troubling outliers. Thus following Officer (Journal of Financial Economics, 2003) premium is truncated with a lower bound of 0% which is economically meaningful bound) and 200% ( which is an arbitrary bound set by Officer). In my data, this represents truncation at approximately the 10% and 99% levels. Hope this explanation helps! Please let me know if there is anything else I can help explain! Thank you!
Comment

Announcement

2sls regression with binary endogenous variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment