Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-step IV method with binary dependent variable

    Dear Statalist,

    I estimate a logit model where the dependent variable is a dummy and the predictor also is a binary variable that is likely endogenous (simultaneity problem).

    1) I attempt to perform IV estimation, that is first run logit model of my endogenous varaible (binary) on one excluded instrument (that takes values 1, 2, 3, 4 and 5) and control variables (age, age square, living location-rural or urban-, gender...). And second, estimate a logit model of the dummy dependent variable on the fitted probabilities that replace the endogenous regressor.

    Question: Is that correct?

    2) I also use the two-step estimator, that is first estimate a logit or probit model of the binary endogenous regressor on the excluded instrument and control variables. And then, run ivregress 2sls of the dummy dependent variable on the fitted probabilities as excuded instrument adding control variables.

    Question: I wonder whether that is correct in my case? Or, can I just use ivreg2 command of stata to perform the IV estimation?

    Many thanks in advance

    Steph

  • #2
    This is not as simple as it looks - your instrument has measurement error among other things. Why not look at ivprobit instead of programming this yourself? You should also look at cmp (user written) and GSEM which can do this kind of model.

    You'll generally get a better response if you follow the FAQ on asking questions - provide Stata data in code delimiters, Stata output, and sample data using dataex.

    Comment


    • #3
      The first proposed method does not estimate anything interesting -- at least not that anyone has shown. It is an example of a "forbidden regression," where one tries to incorrectly extend 2SLS to a nonlinear model. As I tell my students: A method that plugs in fitted values into nonlinear second stages should be assumed inconsistent unless you prove otherwise.

      The second method doesn't make much logical sense. If you are going to acknowledge the discreteness of the endogenous explanatory variable it seems odd to then use a linear model for the main variable, y1. If y1 were continuous, so that a linear model could reasonably represent a conditional mean, then the method would be fine. In fact, it's a method I cover in Section 21.4 in my 2010 MIT Press book. (Incidentally, there isn't "measurement error" in the instrument. It's estimation error, or sampling error, which goes away as N gets large. That's much different then measurement error, which is a population, not a sampling, issue.)

      But with a binary y1 and binary y2, you should use two methods.

      1. A standard linear model estimated by 2SLS. This is what Angrist and Pischke propose in "Mostly Harmless Econometrics."

      2. Use the so-called "biprobit" model, where y1 and y2 are modeled as probits. This is a joint maximum likelihood procedure. You should compute the average marginal effect from the biprobit and compare it with the 2SLS estimate.

      JW

      Comment


      • #4
        Dear Statalist, thank you for your posts and recommendations. I appreciate your replies very much.

        However, I estimate the two models 2SLS and "biprobit"

        Code:
        y1: binary dependent varaible
        y2: binary endogenous variable
        z: instrument variable
        x: set of control variables
        
        *Standard linear model estimated by 2SLS
        ivreg2 y1 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 (y2 = z), r
        
        *biprobit
        biprobit (y1 y2 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11) (y2 z x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11)
        
        *To compute the average marginal effect from the biprobit I use the following command
        mfx compute, force
        However, when comparing the avearge marginal effect from biprobit with the 2SLS estimate, I find that the coefficients from the two models are differents. For instance, variable x3, x4, x9 appear negative in 2SLS estimate but have positive average marginal effect from the biprobit.

        In this situation what is better for me to do?



        Click image for larger version

Name:	Results -1.png
Views:	1
Size:	153.5 KB
ID:	1379805


        Click image for larger version

Name:	Results -2.png
Views:	1
Size:	127.9 KB
ID:	1379806

        Click image for larger version

Name:	Results -3.png
Views:	1
Size:	86.3 KB
ID:	1379807
        Click image for larger version

Name:	Results -4.png
Views:	1
Size:	141.8 KB
ID:	1379808

        Comment


        • #5
          The correct commands that I use

          Code:
          y1: binary dependent varaible
          y2: binary endogenous variable
          z: instrument variable
          x: set of control variables
          
          
          *Standard linear model estimated by 2SLS
          ivreg2 y1 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 (y2 = z), first
          
          *biprobit
          biprobit (y1 y2 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11) (y2 z x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11)
          
          *I compute the average marginal effect from the biprobit using the following
          mfx compute, force
          Last edited by Steph Ki; 23 Mar 2017, 12:06.

          Comment


          • #6
            I don't think that -mfx- command is doing what you want. It is impossible for the average marginal effect to be positive when the coefficients are negative. I'm wondering if the command you used is somehow taking into account the equation for y2. I'm suspicious because a marginal effects is being reported for z, and z does not appear in the main equation of interest. What I see from the coefficient estimates between 2SLS and biprobit is consistent. I would use the -margins- command, as it's much more recent.

            Comment


            • #7
              Actually, I have my doubts about the margins command, too. In the past, I've done the calculation by hand. The downside is having to use something like bootstrapping to get a standard error.

              Comment


              • #8
                Dear JW thank you too much for your answer and suggestions. I have read something for computing average margin effect
                Code:
                *biprobit
                biprobit (y1 y2 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11) (y2 z x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11)
                mfx
                
                replace y2=0
                predict double adjpredy2_0
                replace y2=1
                predict double adjpredy2_1
                gen double mey2= adjpredy2_1 - adjpredy2_0
                
                sum adjpredy2_1 adjpredy2_0 mey2 if e(sample)
                
                *And I get the following:
                  
                Variable Obs Mean Std. Dev. Min Max
                adjpredy2_1 350 .2572922 .3242765 6.89e-14 .9762893
                adjpredy2_0 350 .2831236 .3444418 6.89e-14 .985427
                mey2 350 -.0258314 .02986 -.0828319 1.11e-16
                If the code is correct I have to use bootstrapping to get standard errors?

                However, I have one more question regarding the standard linear model estimated by 2SLS (in my previous post), I get higher value of Kleibergen-Paap rk Wald F statistic (with the option robust) and I wonder whether that is correct? and whether I can rely on it for the relevance of the instrument?

                Thank you very much

                Comment


                • #9
                  Hi,

                  Can I seek guidance on the following two questions please.

                  Q1: For a problem involving binary dependent, binary endogenous and two binary instruments, can I use the below-mentioned approach? Please advise.

                  Step1: Estimate the endogenous variable using the two binary instruments and other exogneous covariates.


                  eststo Probit_Gov1: probit CS4_govt CS23 CS22 i.TA10A Nchild_adult Income_person i.RO3 i.RO5 COPC i.HHEDUC I.ED6 CS10-CS12 i.CS8 CS5 i.ID11 ED7 i.ID13 i.STATE [fweight = FWT], vce(cluster IDHH)
                  predict Probit_Gov1

                  Step 2: Estimate the endogenous variable from the estimate in step 1 and other exogenous covariates.

                  eststo Probit_Gov2: probit CS4_govt Probit_Gov1 i.TA10A Nchild_adult Income_person i.RO3 i.RO5 COPC i.HHEDUC I.ED6 CS10-CS12 i.CS8 CS5 i.ID11 ED7 i.ID13 i.STATE [fweight = FWT], vce(cluster IDHH)
                  predict Probit_Gov2

                  Step 3: Use the estimate from step 2 as an instrument in IVProbit.

                  ivprobit TA10B_flag i.TA10A Nchild_adult Income_person i.RO3 i.RO5 COPC i.HHEDUC I.ED6 CS10-CS12 i.CS8 CS5 i.ID11 ED7 i.ID13 i.STATE ////
                  (CS4_govt = Probit_Gov1)[fweight = FWT], twostep

                  In the above code, TA10B_flag and CS4_govt are both binary while CS23, CS22 are the original binary instruments.


                  Q2: Does IV Probit take a very long time to execute?



                  Comment


                  • #10
                    Two questions regarding post #3 by Jeff Wooldridge:

                    (1) Is there any citable literature where you/he propose/s the two methods: (1) standard 2SLS and (2) bivariate probit model?
                    (2) To not mess it up: When comparing the estimates of the standard 2SLS and bivarite probit model, both estimates are interpreted as percentage points, right?

                    Comment


                    • #11
                      Originally posted by Kerstin Schmidt View Post
                      Two questions regarding post #3 by Jeff Wooldridge:

                      (1) Is there any citable literature where you/he propose/s the two methods: (1) standard 2SLS and (2) bivariate probit model?
                      (2) To not mess it up: When comparing the estimates of the standard 2SLS and bivarite probit model, both estimates are interpreted as percentage points, right?
                      Hello, Kerstin:

                      I just recently discovered you had the same problem as me. I was wondering if you found the studies you were looking for here? Please if you could help me, thanks.

                      Comment


                      • #12
                        I discuss this in Section 15.7.3 in my 2010 MIT Press book. Example 15.4 gives an example, and I provide a few citations to other papers where 2SLS and biprobit are compared. I don't regularly check my gmail account and so I missed Kerstin's query.

                        Comment


                        • #13
                          Thank you Jeff Wooldridge!!!

                          Comment


                          • #14
                            Hey ! I am really sorry to disturb you, but by any chance do you know how to do an IV biprobit regression when your dependent variable and endogenous variable are the same ?

                            Thank you in advance,
                            Dounia
                            Last edited by Dounia Ouederni; 29 Apr 2024, 15:40.

                            Comment

                            Working...
                            X