Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Is this really truncation or is it censoring? Don't you observe the other variables when Y1 is an "outlier"?

    Anyway, I probably won't be able to help much because when that trimming is done the problem gets really messy and it is difficult to get reasonably robust results.

    Instead of trimming the data I would use a method that is less sensitive to outlires, e.g., median regression.

    Joao

    Comment


    • #17
      This example maybe help to know how to estimate 2SLS (Probit) Binary (Y2) Endogenous Variables

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
       clear all
      input float y1 byte y2 float(x1 x2) int(x3 x4)
       89.1 1  96.7   101  12  28
       99.2 1  98.1 100.1  15  35
         99 0   100   100  17  37
        100 0 104.9  90.6  22  42
      111.6 1 104.9  86.5  36  47
      122.2 1 109.5  89.7  45  51
      117.6 1 110.8  90.6  66  56
      121.1 0 112.3  82.8  89  60
        136 1 109.3  70.1  99  65
      154.2 0 105.3  65.4 118  69
      153.6 0 101.7  61.3 134  74
      158.5 1  95.4  62.5 151  78
      140.6 0  96.4  63.6 167  83
      136.2 0  97.6  52.6 184  87
        168 1 102.4  59.7 200  92
      154.3 1 101.6  59.5 217  96
        149 1 103.8  61.3 233 101
       end
      
       local model probit
       local nX y2 x1 x2 _cons
       `model' y2 x1 x2 x3 x4
       predict yh2 , xb
       reg y1 yh2 x1 x2
       matrix B=e(b)'
       mkmat y1 , matrix(Y)
       gen x0=1
       mkmat y2 x1 x2 x0 , matrix(Z)
       mkmat x1 x2 x3 x4 x0 , matrix(X)
       matrix W=X*invsym(X'*X)*X'
       matrix M=Z'*W*Z
      * matrix M=Z'*Z
       matrix E=Y-Z*B
       matrix Sig2=E'*E
       matrix Sig2=Sig2/e(df_r)
       matrix Cov=Sig2*invsym(M'*M)
       
       matrix B=B'
       local N = _N
       local DF =e(df_r)
       matrix colnames Cov = `nX'
       matrix rownames Cov = `nX'
       matrix colnames B   = `nX'
       ereturn post B Cov , dep(y1) obs(`N') dof(`DF')
      
      * 2SLS (Probit) Binary (Y2) Endogenous Variables
       ereturn display
      
      * wrong model
       ivregress 2sls y1 x1 x2 (yh2 = x3 x4) , small
      
      * wrong model if (Y2 = Binary Probit Endogenous Variables)
       ivregress 2sls y1 x1 x2 (y2 = x3 x4) , small

      HTML Code:
      * 2SLS (Probit) Binary (Y2) Endogenous Variables
      . ereturn display
      ------------------------------------------------------------------------------
                y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                y2 |   3.479495   15.93427     0.22   0.831     -30.9444    37.90339
                x1 |   .4633677   .0474345     9.77   0.000     .3608917    .5658438
                x2 |  -1.393883   .0589887   -23.63   0.000    -1.521321   -1.266446
             _cons |   187.7333          .        .       .            .           .
      ------------------------------------------------------------------------------
      
      * wrong model
      . ivregress 2sls y1 x1 x2 (yh2 = x3 x4) , small
      
      
      Instrumental variables (2SLS) regression
      
            Source |       SS       df       MS              Number of obs =      17
      -------------+------------------------------           F(  3,    13) =   27.26
             Model |  8327.93832     3  2775.97944           Prob > F      =  0.0000
          Residual |  1324.02019    13  101.847707           R-squared     =  0.8628
      -------------+------------------------------           Adj R-squared =  0.8312
             Total |  9651.95851    16  603.247407           Root MSE      =  10.092
      
      ------------------------------------------------------------------------------
                y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               yh2 |   3.479495   4.192867     0.83   0.422    -5.578642    12.53763
                x1 |   .4633677   .4872688     0.95   0.359    -.5893125    1.516048
                x2 |  -1.393883   .1556052    -8.96   0.000    -1.730048   -1.057719
             _cons |   187.7333   49.70744     3.78   0.002     80.34689    295.1197
      ------------------------------------------------------------------------------
      Instrumented:  yh2
      Instruments:   x1 x2 x3 x4
      
      . * wrong model if (Y2 = Binary Probit Endogenous Variables)
      . ivregress 2sls y1 x1 x2 (y2 = x3 x4) , small
      
      Instrumental variables (2SLS) regression
      
            Source |       SS       df       MS              Number of obs =      17
      -------------+------------------------------           F(  3,    13) =   28.96
             Model |  8406.05065     3  2802.01688           Prob > F      =  0.0000
          Residual |  1245.90786    13  95.8390663           R-squared     =  0.8709
      -------------+------------------------------           Adj R-squared =  0.8411
             Total |  9651.95851    16  603.247407           Root MSE      =  9.7897
      
      ------------------------------------------------------------------------------
                y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                y2 |   10.62683   12.43814     0.85   0.408    -16.24413    37.49779
                x1 |    .471081   .4716861     1.00   0.336    -.5479348    1.490097
                x2 |  -1.405856   .1545482    -9.10   0.000    -1.739737   -1.071975
             _cons |    182.531   47.69067     3.83   0.002     79.50159    285.5604
      ------------------------------------------------------------------------------
      Instrumented:  y2
      Instruments:   x1 x2 x3 x4
      Last edited by Emad Shehata; 05 Jul 2016, 18:14.
      Emad A. Shehata
      Professor (PhD Economics)
      Agricultural Research Center - Agricultural Economics Research Institute - Egypt
      Email: [email protected]
      IDEAS: http://ideas.repec.org/f/psh494.html
      EconPapers: http://econpapers.repec.org/RAS/psh494.htm
      Google Scholar: http://scholar.google.com/citations?...r=cOXvc94AAAAJ

      Comment


      • #18
        Hi Joao,
        Yes it is a case of truncation rather than censoring. However, the continuous premium (non-truncated Y1) can also be used as well. In the case that the continous variable is use, is it possible to resolve the two issues explained earlier?

        Emad, thanks so much for the code! Sorry, Im quite a beginner with Stata...is there a example database that I can see to better understand the commands and the results? (i.e.: gsem_union3.dta on stata-press.com)
        Thank you!

        Comment


        • #19
          Dear Hyejin Cho,

          I am not sure if I really understand the situation, but my general advice would be not to trim the data and use suitably robust methods, e.g., based on median regression.

          Best wishes,

          Joao

          Comment


          • #20
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
             clear all
            input float y1 byte y2 float(x1 x2) int(x3 x4)
             89.1 1  96.7   101  12  28
             99.2 1  98.1 100.1  15  35
               99 0   100   100  17  37
              100 0 104.9  90.6  22  42
            111.6 1 104.9  86.5  36  47
            122.2 1 109.5  89.7  45  51
            117.6 1 110.8  90.6  66  56
            121.1 0 112.3  82.8  89  60
              136 1 109.3  70.1  99  65
            154.2 0 105.3  65.4 118  69
            153.6 0 101.7  61.3 134  74
            158.5 1  95.4  62.5 151  78
            140.6 0  96.4  63.6 167  83
            136.2 0  97.6  52.6 184  87
              168 1 102.4  59.7 200  92
            154.3 1 101.6  59.5 217  96
              149 1 103.8  61.3 233 101
             end
            
             cdsimeq (y1 x1 x2) (y2 x3 x4)
            HTML Code:
            .  cdsimeq (y1 x1 x2) (y2 x3 x4)
            
                                    NOW THE FIRST STAGE REGRESSIONS
            
                  Source |       SS       df       MS              Number of obs =      17
            -------------+------------------------------           F(  4,    12) =   19.45
                   Model |  8362.32848     4  2090.58212           Prob > F      =  0.0000
                Residual |  1289.63003    12  107.469169           R-squared     =  0.8664
            -------------+------------------------------           Adj R-squared =  0.8218
                   Total |  9651.95851    16  603.247407           Root MSE      =  10.367
            
            ------------------------------------------------------------------------------
                      y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      x1 |    .206956   .6997125     0.30   0.772    -1.317587    1.731499
                      x2 |  -.9104515   .4918427    -1.85   0.089    -1.982085    .1611815
                      x3 |  -.1575674   .3570724    -0.44   0.667    -.9355613    .6204265
                      x4 |   .8675963   1.325321     0.65   0.525     -2.02003    3.755223
                   _cons |   138.7174   68.04336     2.04   0.064    -9.536307    286.9712
            ------------------------------------------------------------------------------
            
            Iteration 0:   log likelihood = -11.517405
            Iteration 1:   log likelihood =  -9.980019
            Iteration 2:   log likelihood = -9.9494714
            Iteration 3:   log likelihood = -9.9493851
            
            Probit regression                                 Number of obs   =         17
                                                              LR chi2(4)      =       3.14
                                                              Prob > chi2     =     0.5353
            Log likelihood = -9.9493851                       Pseudo R2       =     0.1361
            
            ------------------------------------------------------------------------------
                      y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      x1 |   .0062123   .0823161     0.08   0.940    -.1551244     .167549
                      x2 |   .1040173   .0689807     1.51   0.132    -.0311824    .2392169
                      x3 |   .0121413   .0452971     0.27   0.789    -.0766393    .1009219
                      x4 |   .0353014   .1673367     0.21   0.833    -.2926725    .3632753
                   _cons |  -11.88578   9.627492    -1.23   0.217    -30.75531     6.98376
            ------------------------------------------------------------------------------
            
                          NOW THE SECOND STAGE REGRESSIONS WITH INSTRUMENTS
            
            
                  Source |       SS       df       MS              Number of obs =      17
            -------------+------------------------------           F(  3,    13) =   27.26
                   Model |  8327.93832     3  2775.97944           Prob > F      =  0.0000
                Residual |  1324.02019    13  101.847707           R-squared     =  0.8628
            -------------+------------------------------           Adj R-squared =  0.8312
                   Total |  9651.95851    16  603.247407           Root MSE      =  10.092
            
            ------------------------------------------------------------------------------
                      y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    I_y2 |   3.479495   4.192867     0.83   0.422    -5.578642    12.53763
                      x1 |   .4633677   .4872688     0.95   0.359    -.5893125    1.516048
                      x2 |  -1.393883   .1556052    -8.96   0.000    -1.730048   -1.057719
                   _cons |   187.7333   49.70744     3.78   0.002     80.34689    295.1197
            ------------------------------------------------------------------------------
            
            Iteration 0:   log likelihood = -11.517405
            Iteration 1:   log likelihood =  -10.04216
            Iteration 2:   log likelihood = -10.020231
            Iteration 3:   log likelihood =   -10.0202
            
            Probit regression                                 Number of obs   =         17
                                                              LR chi2(3)      =       2.99
                                                              Prob > chi2     =     0.3925
            Log likelihood =   -10.0202                       Pseudo R2       =     0.1300
            
            ------------------------------------------------------------------------------
                      y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    I_y1 |  -.1174181   .0741252    -1.58   0.113    -.2627008    .0278646
                      x3 |   -.017184   .0387496    -0.44   0.657    -.0931318    .0587639
                      x4 |    .173977   .1773915     0.98   0.327     -.173704     .521658
                   _cons |   6.082584   4.687326     1.30   0.194    -3.104405    15.26957
            ------------------------------------------------------------------------------
            
                     NOW THE SECOND STAGE REGRESSIONS WITH CORRECTED STANDARD ERRORS
            
            
            ------------------------------------------------------------------------------
                      y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    I_y2 |   3.479495   4.640983     0.75   0.467    -6.546738    13.50573
                      x1 |   .4633677   .5177683     0.89   0.387    -.6552027    1.581938
                      x2 |  -1.393883   .1685833    -8.27   0.000    -1.758085   -1.029681
                   _cons |   187.7333   52.88488     3.55   0.004     73.48245    301.9841
            ------------------------------------------------------------------------------
            ------------------------------------------------------------------------------
                      y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    I_y1 |  -.1174181   .1029972    -1.14   0.254    -.3192888    .0844526
                      x3 |   -.017184   .0553249    -0.31   0.756    -.1256189    .0912509
                      x4 |    .173977   .2499454     0.70   0.486    -.3159069    .6638609
                   _cons |   6.082584   6.699306     0.91   0.364    -7.047814    19.21298
            ------------------------------------------------------------------------------
            Emad A. Shehata
            Professor (PhD Economics)
            Agricultural Research Center - Agricultural Economics Research Institute - Egypt
            Email: [email protected]
            IDEAS: http://ideas.repec.org/f/psh494.html
            EconPapers: http://econpapers.repec.org/RAS/psh494.htm
            Google Scholar: http://scholar.google.com/citations?...r=cOXvc94AAAAJ

            Comment


            • #21
              Hi Joao,
              I have similar problem with different variable structure.I would like to know about the stata code for implementing my model

              1. My dependent variable is binary(Adolescent completed class 10 or more)
              2. My endogenous regressor is married before age 18 ( which is also binary)
              3. I would like to know what sort of modification i have to do in IVREGRESS command for binary dependent and endogenous variable

              any help will be appriciated.


              Comment


              • #22
                Hi Joao,
                I have similar problem with different variable structure.I would like to know about the stata code for implementing my model

                1. My dependent variable is binary(Adolescent completed class 10 or more)
                2. My endogenous regressor is married before age 18 ( which is also binary)
                3. I would like to know what sort of modification i have to do in IVREGRESS command for binary dependent and endogenous variable

                any help will be appriciated.

                Comment


                • #23
                  Dear stata users,

                  Please advise me on this issue.

                  I have a panel data (2 years), binary dependent variable and many other binary independent variables. In addition I have instruments.
                  There is a separate panel probit estimation, and a separate IV probit.

                  But how to do it together?

                  Comment


                  • #24
                    Dear STATA users,
                    I am working with a similar kind of problem mentioned by Hyejin Cho in an earlier post under this heading. I am applying Panel CF approach in international trade.
                    There is a recent paper of Murtazashvili & Wooldrige (2016) published in JE on ‘control function approach to estimating switching regression models with endo generous explanatory variables and endogeneous switching’. The methodology of CF in a panel setting is mentioned there. Following the methodology, I am trying to implement the following:
                    Variables that I have taken:
                    Y= continuous dependent variable (say, bilateral trade flow)
                    X1= binary endogeneous explananatory variables
                    X2= continuous exogeneous explanatory vari.
                    X3= binary exogeneous explanatory vari.
                    Z1-Z2= instruments
                    Based on the suggestions given in this forum (under this heading) and the methodology described in Murtazashvili & Wooldrige (2016), I am doing the following:
                    1. probit X1 X2 X3 Z1 Z2 mean_X2 mean_X3 mean_Z1 mean_Z2
                    2. predict X1_hat
                    3. gen generalised_residual_X1= X1*[normalden(X1_hat)/normal(X1_hat)] -(1-X1) *[-normalden(X1_hat)/(1-normal(X1_hat))]
                    4. xtivreg Y X1 X2 X3 mean_X2 mean_X3 generalised_residual_X1 (X1= X2 X3 Z1 Z2 mean_X2 mean_X3 mean_Z1 mean_Z2generalised_residual_X1)
                    Would steps a) - d) be correct steps? I am using Stata 14.
                    Any help and suggestion are much appreciated.
                    Thank you in advance.
                    Regards,
                    Anusree

                    Comment


                    • #25
                      Originally posted by ashish View Post
                      Hi Joao,
                      I have similar problem with different variable structure.I would like to know about the stata code for implementing my model

                      1. My dependent variable is binary(Adolescent completed class 10 or more)
                      2. My endogenous regressor is married before age 18 ( which is also binary)
                      3. I would like to know what sort of modification i have to do in IVREGRESS command for binary dependent and endogenous variable

                      any help will be appriciated.

                      Hi, Can any one please elaborate on this. I am also having this issue. Thanks.

                      Comment


                      • #26
                        Originally posted by Joao Santos Silva View Post
                        Dear Naveen,

                        I am afraid what you are doing is wrong; is is what is called a forbidden regression. What you have to do is as follows:

                        a) probit X1 X2 X3 Z1 Z2
                        b) predict X1_hat, p
                        c) ivregress 2sls Y X2 X3 (X1 = Z1 Z2 X1_hat)

                        All the best,

                        Joao
                        Dear Joao,

                        Thanks for your comment it is really helpful. However, I am confused regarding using X1_hat and instrument in the (c) part. Should we not only use the predicted value from the probit as the instrument? Or Is it necessary to add the Z1 Z2 instrument as well?

                        Comment


                        • #27
                          Dear Vik Bahr,

                          You do not have to use Z1 and Z2 but you can use them and in general it does not hurt to use them and you get a test for overidentifying restrictions (although it is likely to have low power in this context).

                          Best wishes,

                          Joao

                          Comment


                          • #28
                            Originally posted by Joao Santos Silva View Post
                            Dear Naveen,

                            I am afraid what you are doing is wrong; is is what is called a forbidden regression. What you have to do is as follows:

                            a) probit X1 X2 X3 Z1 Z2
                            b) predict X1_hat, p
                            c) ivregress 2sls Y X2 X3 (X1 = Z1 Z2 X1_hat)

                            All the best,

                            Joao
                            Hi Joao,

                            If I want to do underidentification test (Kleibergen-Paap rk LM statistic) and week instrument test (Kleibergen-Paap rk Wald F statistic), if there a way to do those tests based on the code you have provided? Many thank.

                            Regards,
                            Ziwen

                            Comment


                            • #29
                              Dear Ziwen Bu

                              I believe you get the results of those test if in the last like you use ivreg2 instead of ivregress.

                              Best wishes,

                              Joao

                              Comment


                              • #30
                                Joao Santos Silva
                                Sir, I am working on a variable (Y) which follows Tweedie compound Poisson distribution. I have an endogenous variable in the model, which is categorical, X. I have an instrumental variable Z.
                                The control variables are P, Q and R.
                                What would be the correct approach to account for instrumentality, given Y is compound Poisson?
                                ​​​​​​​I would be extremely grateful if you kindly offer some suggestions Sir.

                                Comment

                                Working...
                                X