2sls regression with binary endogenous variable

Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#16

05 Jul 2016, 14:01

Is this really truncation or is it censoring? Don't you observe the other variables when Y1 is an "outlier"?

Anyway, I probably won't be able to help much because when that trimming is done the problem gets really messy and it is difficult to get reasonably robust results.

Instead of trimming the data I would use a method that is less sensitive to outlires, e.g., median regression.

Joao
Comment

Emad Shehata

Join Date: Oct 2014
Posts: 203

#17

05 Jul 2016, 17:19

This example maybe help to know how to estimate 2SLS (Probit) Binary (Y2) Endogenous Variables

Code:

* Example generated by -dataex-. To install: ssc install dataex
 clear all
input float y1 byte y2 float(x1 x2) int(x3 x4)
 89.1 1  96.7   101  12  28
 99.2 1  98.1 100.1  15  35
   99 0   100   100  17  37
  100 0 104.9  90.6  22  42
111.6 1 104.9  86.5  36  47
122.2 1 109.5  89.7  45  51
117.6 1 110.8  90.6  66  56
121.1 0 112.3  82.8  89  60
  136 1 109.3  70.1  99  65
154.2 0 105.3  65.4 118  69
153.6 0 101.7  61.3 134  74
158.5 1  95.4  62.5 151  78
140.6 0  96.4  63.6 167  83
136.2 0  97.6  52.6 184  87
  168 1 102.4  59.7 200  92
154.3 1 101.6  59.5 217  96
  149 1 103.8  61.3 233 101
 end

 local model probit
 local nX y2 x1 x2 _cons
 `model' y2 x1 x2 x3 x4
 predict yh2 , xb
 reg y1 yh2 x1 x2
 matrix B=e(b)'
 mkmat y1 , matrix(Y)
 gen x0=1
 mkmat y2 x1 x2 x0 , matrix(Z)
 mkmat x1 x2 x3 x4 x0 , matrix(X)
 matrix W=X*invsym(X'*X)*X'
 matrix M=Z'*W*Z
* matrix M=Z'*Z
 matrix E=Y-Z*B
 matrix Sig2=E'*E
 matrix Sig2=Sig2/e(df_r)
 matrix Cov=Sig2*invsym(M'*M)
 
 matrix B=B'
 local N = _N
 local DF =e(df_r)
 matrix colnames Cov = `nX'
 matrix rownames Cov = `nX'
 matrix colnames B   = `nX'
 ereturn post B Cov , dep(y1) obs(`N') dof(`DF')

* 2SLS (Probit) Binary (Y2) Endogenous Variables
 ereturn display

* wrong model
 ivregress 2sls y1 x1 x2 (yh2 = x3 x4) , small

* wrong model if (Y2 = Binary Probit Endogenous Variables)
 ivregress 2sls y1 x1 x2 (y2 = x3 x4) , small

HTML Code:

* 2SLS (Probit) Binary (Y2) Endogenous Variables
. ereturn display
------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y2 |   3.479495   15.93427     0.22   0.831     -30.9444    37.90339
          x1 |   .4633677   .0474345     9.77   0.000     .3608917    .5658438
          x2 |  -1.393883   .0589887   -23.63   0.000    -1.521321   -1.266446
       _cons |   187.7333          .        .       .            .           .
------------------------------------------------------------------------------

* wrong model
. ivregress 2sls y1 x1 x2 (yh2 = x3 x4) , small


Instrumental variables (2SLS) regression

      Source |       SS       df       MS              Number of obs =      17
-------------+------------------------------           F(  3,    13) =   27.26
       Model |  8327.93832     3  2775.97944           Prob > F      =  0.0000
    Residual |  1324.02019    13  101.847707           R-squared     =  0.8628
-------------+------------------------------           Adj R-squared =  0.8312
       Total |  9651.95851    16  603.247407           Root MSE      =  10.092

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         yh2 |   3.479495   4.192867     0.83   0.422    -5.578642    12.53763
          x1 |   .4633677   .4872688     0.95   0.359    -.5893125    1.516048
          x2 |  -1.393883   .1556052    -8.96   0.000    -1.730048   -1.057719
       _cons |   187.7333   49.70744     3.78   0.002     80.34689    295.1197
------------------------------------------------------------------------------
Instrumented:  yh2
Instruments:   x1 x2 x3 x4

. * wrong model if (Y2 = Binary Probit Endogenous Variables)
. ivregress 2sls y1 x1 x2 (y2 = x3 x4) , small

Instrumental variables (2SLS) regression

      Source |       SS       df       MS              Number of obs =      17
-------------+------------------------------           F(  3,    13) =   28.96
       Model |  8406.05065     3  2802.01688           Prob > F      =  0.0000
    Residual |  1245.90786    13  95.8390663           R-squared     =  0.8709
-------------+------------------------------           Adj R-squared =  0.8411
       Total |  9651.95851    16  603.247407           Root MSE      =  9.7897

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y2 |   10.62683   12.43814     0.85   0.408    -16.24413    37.49779
          x1 |    .471081   .4716861     1.00   0.336    -.5479348    1.490097
          x2 |  -1.405856   .1545482    -9.10   0.000    -1.739737   -1.071975
       _cons |    182.531   47.69067     3.83   0.002     79.50159    285.5604
------------------------------------------------------------------------------
Instrumented:  y2
Instruments:   x1 x2 x3 x4

Last edited by Emad Shehata; 05 Jul 2016, 18:14.

Emad A. Shehata
Professor (PhD Economics)
Agricultural Research Center - Agricultural Economics Research Institute - Egypt
Email: [email protected]
IDEAS: http://ideas.repec.org/f/psh494.html
EconPapers: http://econpapers.repec.org/RAS/psh494.htm
Google Scholar: http://scholar.google.com/citations?...r=cOXvc94AAAAJ

Comment

Hyejin Cho

Join Date: Jun 2015

Posts: 19
#18

05 Jul 2016, 21:06

Hi Joao,
Yes it is a case of truncation rather than censoring. However, the continuous premium (non-truncated Y1) can also be used as well. In the case that the continous variable is use, is it possible to resolve the two issues explained earlier?

Emad, thanks so much for the code! Sorry, Im quite a beginner with Stata...is there a example database that I can see to better understand the commands and the results? (i.e.: gsem_union3.dta on stata-press.com)
Thank you!
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#19

06 Jul 2016, 13:46

Dear Hyejin Cho,

I am not sure if I really understand the situation, but my general advice would be not to trim the data and use suitably robust methods, e.g., based on median regression.

Best wishes,

Joao
Comment

Emad Shehata

Join Date: Oct 2014
Posts: 203

#20

06 Jul 2016, 16:26

Code:

* Example generated by -dataex-. To install: ssc install dataex
 clear all
input float y1 byte y2 float(x1 x2) int(x3 x4)
 89.1 1  96.7   101  12  28
 99.2 1  98.1 100.1  15  35
   99 0   100   100  17  37
  100 0 104.9  90.6  22  42
111.6 1 104.9  86.5  36  47
122.2 1 109.5  89.7  45  51
117.6 1 110.8  90.6  66  56
121.1 0 112.3  82.8  89  60
  136 1 109.3  70.1  99  65
154.2 0 105.3  65.4 118  69
153.6 0 101.7  61.3 134  74
158.5 1  95.4  62.5 151  78
140.6 0  96.4  63.6 167  83
136.2 0  97.6  52.6 184  87
  168 1 102.4  59.7 200  92
154.3 1 101.6  59.5 217  96
  149 1 103.8  61.3 233 101
 end

 cdsimeq (y1 x1 x2) (y2 x3 x4)

HTML Code:

.  cdsimeq (y1 x1 x2) (y2 x3 x4)

                        NOW THE FIRST STAGE REGRESSIONS

      Source |       SS       df       MS              Number of obs =      17
-------------+------------------------------           F(  4,    12) =   19.45
       Model |  8362.32848     4  2090.58212           Prob > F      =  0.0000
    Residual |  1289.63003    12  107.469169           R-squared     =  0.8664
-------------+------------------------------           Adj R-squared =  0.8218
       Total |  9651.95851    16  603.247407           Root MSE      =  10.367

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |    .206956   .6997125     0.30   0.772    -1.317587    1.731499
          x2 |  -.9104515   .4918427    -1.85   0.089    -1.982085    .1611815
          x3 |  -.1575674   .3570724    -0.44   0.667    -.9355613    .6204265
          x4 |   .8675963   1.325321     0.65   0.525     -2.02003    3.755223
       _cons |   138.7174   68.04336     2.04   0.064    -9.536307    286.9712
------------------------------------------------------------------------------

Iteration 0:   log likelihood = -11.517405
Iteration 1:   log likelihood =  -9.980019
Iteration 2:   log likelihood = -9.9494714
Iteration 3:   log likelihood = -9.9493851

Probit regression                                 Number of obs   =         17
                                                  LR chi2(4)      =       3.14
                                                  Prob > chi2     =     0.5353
Log likelihood = -9.9493851                       Pseudo R2       =     0.1361

------------------------------------------------------------------------------
          y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .0062123   .0823161     0.08   0.940    -.1551244     .167549
          x2 |   .1040173   .0689807     1.51   0.132    -.0311824    .2392169
          x3 |   .0121413   .0452971     0.27   0.789    -.0766393    .1009219
          x4 |   .0353014   .1673367     0.21   0.833    -.2926725    .3632753
       _cons |  -11.88578   9.627492    -1.23   0.217    -30.75531     6.98376
------------------------------------------------------------------------------

              NOW THE SECOND STAGE REGRESSIONS WITH INSTRUMENTS


      Source |       SS       df       MS              Number of obs =      17
-------------+------------------------------           F(  3,    13) =   27.26
       Model |  8327.93832     3  2775.97944           Prob > F      =  0.0000
    Residual |  1324.02019    13  101.847707           R-squared     =  0.8628
-------------+------------------------------           Adj R-squared =  0.8312
       Total |  9651.95851    16  603.247407           Root MSE      =  10.092

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        I_y2 |   3.479495   4.192867     0.83   0.422    -5.578642    12.53763
          x1 |   .4633677   .4872688     0.95   0.359    -.5893125    1.516048
          x2 |  -1.393883   .1556052    -8.96   0.000    -1.730048   -1.057719
       _cons |   187.7333   49.70744     3.78   0.002     80.34689    295.1197
------------------------------------------------------------------------------

Iteration 0:   log likelihood = -11.517405
Iteration 1:   log likelihood =  -10.04216
Iteration 2:   log likelihood = -10.020231
Iteration 3:   log likelihood =   -10.0202

Probit regression                                 Number of obs   =         17
                                                  LR chi2(3)      =       2.99
                                                  Prob > chi2     =     0.3925
Log likelihood =   -10.0202                       Pseudo R2       =     0.1300

------------------------------------------------------------------------------
          y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        I_y1 |  -.1174181   .0741252    -1.58   0.113    -.2627008    .0278646
          x3 |   -.017184   .0387496    -0.44   0.657    -.0931318    .0587639
          x4 |    .173977   .1773915     0.98   0.327     -.173704     .521658
       _cons |   6.082584   4.687326     1.30   0.194    -3.104405    15.26957
------------------------------------------------------------------------------

         NOW THE SECOND STAGE REGRESSIONS WITH CORRECTED STANDARD ERRORS


------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        I_y2 |   3.479495   4.640983     0.75   0.467    -6.546738    13.50573
          x1 |   .4633677   .5177683     0.89   0.387    -.6552027    1.581938
          x2 |  -1.393883   .1685833    -8.27   0.000    -1.758085   -1.029681
       _cons |   187.7333   52.88488     3.55   0.004     73.48245    301.9841
------------------------------------------------------------------------------
------------------------------------------------------------------------------
          y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        I_y1 |  -.1174181   .1029972    -1.14   0.254    -.3192888    .0844526
          x3 |   -.017184   .0553249    -0.31   0.756    -.1256189    .0912509
          x4 |    .173977   .2499454     0.70   0.486    -.3159069    .6638609
       _cons |   6.082584   6.699306     0.91   0.364    -7.047814    19.21298
------------------------------------------------------------------------------

Comment

ashish

Join Date: Apr 2014

Posts: 10
#21

10 Nov 2016, 00:54

Hi Joao,
I have similar problem with different variable structure.I would like to know about the stata code for implementing my model

1. My dependent variable is binary(Adolescent completed class 10 or more)
2. My endogenous regressor is married before age 18 ( which is also binary)
3. I would like to know what sort of modification i have to do in IVREGRESS command for binary dependent and endogenous variable

any help will be appriciated.
Comment
ashish

Join Date: Apr 2014

Posts: 10
#22

10 Nov 2016, 05:46

Hi Joao,
I have similar problem with different variable structure.I would like to know about the stata code for implementing my model

1. My dependent variable is binary(Adolescent completed class 10 or more)
2. My endogenous regressor is married before age 18 ( which is also binary)
3. I would like to know what sort of modification i have to do in IVREGRESS command for binary dependent and endogenous variable

any help will be appriciated.
Comment
Malika Jurazoda

Join Date: Nov 2016

Posts: 1
#23

14 Nov 2016, 09:00

Dear stata users,

Please advise me on this issue.

I have a panel data (2 years), binary dependent variable and many other binary independent variables. In addition I have instruments.
There is a separate panel probit estimation, and a separate IV probit.

But how to do it together?
Comment
anusree paul

Join Date: Jan 2017

Posts: 3
#24

12 Mar 2018, 02:14

Dear STATA users,
I am working with a similar kind of problem mentioned by Hyejin Cho in an earlier post under this heading. I am applying Panel CF approach in international trade.
There is a recent paper of Murtazashvili & Wooldrige (2016) published in JE on ‘control function approach to estimating switching regression models with endo generous explanatory variables and endogeneous switching’. The methodology of CF in a panel setting is mentioned there. Following the methodology, I am trying to implement the following:
Variables that I have taken:
Y= continuous dependent variable (say, bilateral trade flow)
X1= binary endogeneous explananatory variables
X2= continuous exogeneous explanatory vari.
X3= binary exogeneous explanatory vari.
Z1-Z2= instruments
Based on the suggestions given in this forum (under this heading) and the methodology described in Murtazashvili & Wooldrige (2016), I am doing the following:
probit X1 X2 X3 Z1 Z2 mean_X2 mean_X3 mean_Z1 mean_Z2

predict X1_hat

gen generalised_residual_X1= X1*[normalden(X1_hat)/normal(X1_hat)] -(1-X1) *[-normalden(X1_hat)/(1-normal(X1_hat))]

xtivreg Y X1 X2 X3 mean_X2 mean_X3 generalised_residual_X1 (X1= X2 X3 Z1 Z2 mean_X2 mean_X3 mean_Z1 mean_Z2generalised_residual_X1)

Would steps a) - d) be correct steps? I am using Stata 14.
Any help and suggestion are much appreciated.
Thank you in advance.
Regards,
Anusree
Comment
NJ JAIN

Join Date: Mar 2019

Posts: 6
#25

14 May 2020, 07:33

Originally posted by ashish View Post

Hi Joao,
I have similar problem with different variable structure.I would like to know about the stata code for implementing my model

1. My dependent variable is binary(Adolescent completed class 10 or more)
2. My endogenous regressor is married before age 18 ( which is also binary)
3. I would like to know what sort of modification i have to do in IVREGRESS command for binary dependent and endogenous variable

any help will be appriciated.

Hi, Can any one please elaborate on this. I am also having this issue. Thanks.
Comment
Vik Bahr

Join Date: Jun 2020

Posts: 1
#26

04 Jun 2020, 06:15

Originally posted by Joao Santos Silva View Post

Dear Naveen,

I am afraid what you are doing is wrong; is is what is called a forbidden regression. What you have to do is as follows:

a) probit X1 X2 X3 Z1 Z2
b) predict X1_hat, p
c) ivregress 2sls Y X2 X3 (X1 = Z1 Z2 X1_hat)

All the best,

Joao

Dear Joao,

Thanks for your comment it is really helpful. However, I am confused regarding using X1_hat and instrument in the (c) part. Should we not only use the predicted value from the probit as the instrument? Or Is it necessary to add the Z1 Z2 instrument as well?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#27

04 Jun 2020, 09:21

Dear Vik Bahr,

You do not have to use Z1 and Z2 but you can use them and in general it does not hurt to use them and you get a test for overidentifying restrictions (although it is likely to have low power in this context).

Best wishes,

Joao
1 like
Comment
Ziwen Bu

Join Date: Jan 2019

Posts: 1
#28

02 Jul 2020, 18:09

Originally posted by Joao Santos Silva View Post

Dear Naveen,

I am afraid what you are doing is wrong; is is what is called a forbidden regression. What you have to do is as follows:

a) probit X1 X2 X3 Z1 Z2
b) predict X1_hat, p
c) ivregress 2sls Y X2 X3 (X1 = Z1 Z2 X1_hat)

All the best,

Joao

Hi Joao,

If I want to do underidentification test (Kleibergen-Paap rk LM statistic) and week instrument test (Kleibergen-Paap rk Wald F statistic), if there a way to do those tests based on the code you have provided? Many thank.

Regards,
Ziwen
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#29

03 Jul 2020, 02:36

Dear Ziwen Bu

I believe you get the results of those test if in the last like you use ivreg2 instead of ivregress.

Best wishes,

Joao
Comment
Deboshmita Brahma

Join Date: Jun 2022

Posts: 6
#30

11 Jun 2022, 00:13

Joao Santos Silva
Sir, I am working on a variable (Y) which follows Tweedie compound Poisson distribution. I have an endogenous variable in the model, which is categorical, X. I have an instrumental variable Z.
The control variables are P, Q and R.
What would be the correct approach to account for instrumentality, given Y is compound Poisson?
I would be extremely grateful if you kindly offer some suggestions Sir.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment