Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata unnecessarily drops collinearity

    Dear Statalist,

    I have an issue running ivregress 2sls. As a simplified version, assume I have two x's and two z's. No collinearities exist within X or Z, but x1 + x2 + z1 = z2. That is, there is one collinearity in X and Z together.

    As best as I can understand from the formulas and from using other regression programs/methods, it is still legal to run such a regression. For instance, if I run 2SLS by hand (regressing the x's on the z's and using the predicted values for a second stage) I don't run into any problems. Correct me if I'm wrong here, however.

    However, Stata cannot run this regression. It detects the collinearity and drops z2 thereby making the regression impossible.

    I'd like to know either
    a) If Stata should not run this regression, why not?
    b) If this regression is fine, how can I make Stata run it using standard commands like ivregress?

    Thanks,
    Cory

  • #2
    Cory,

    I think ivregress is being a bit too conservative in refusing to run this regression. You can see this by using the very old-fashioned syntax for IV estimation using the regress command:

    regress y x1 x2 (z1 z2), nocons

    This is equivalent to

    ivregress 2sls y (x1 x2 = z1 z2), nocons

    If you try it on your example, you'll see that regress will run it quite happily, whereas ivregress will complain. FYI, ivreg2 (available from ssc archives) will also run it without complaining.

    But ... your assumptions about exogeneity seem implausible, and this IV estimation looks misspecified to me.

    You have 4 variables. The two regressors x1 and x2 are endogenous, so say E(x1*e) = a and E(x2*e) = b, a and b nonzero. Say z1 is exogenous, so E(z1*e)=0. Also, so we can dispense with a constant, say everything is zero mean, i.e., E(x1)=E(x2)=E(z1)=E(z2)=E(e)=0.

    Since x1 + x2 + z1 = z2,

    E(z2*e) = E[ (x1 + x2 + z1)*e ] = E(x1*e) + E(x2*e) + E(z1*e) = a + b + 0 = a+b

    and there's no reason that you've given for us to think that a+b=0.

    So only if you are really lucky, or really clever in your choice of instruments, will z2 be exogenous. Since you're using an instrument that most likely is invalid by construction, your coefficient estimates will be inconsistent. Maybe I am missing something here, but I don't think so....

    --Mark

    Comment


    • #3
      Dear Mark,

      Thanks for helping me work through this. Your suggestion on ivreg2 was spot on and fixes the technical problem. To Stata employees, I'd recommend fixing this issue in ivregress since it appears to be something of a bug.

      Your point about the relationship between the biases is well-taken and something I've been considered. I agree it's an issue. However, I in fact do not have a full vote over the regressions I am running so it will likely be left as is for now..

      Very much appreciate all your help!

      Cory

      Comment


      • #4
        If Cory really wants to proceed, the option perfect will keep ivregress from throwing an error.

        For the discussed example, Cory could type

        Code:
        . ivregress y (x1 x2 = z1 z2), perfect
        Mark clearly stated that the parameters of this model are (almost certainly) not identified if z2 = x1 + x2 + z1 and x1 and x2 are endogenous.

        ivregress throws an error to protect Cory from doing something that is almost certainly not a good idea. The
        perfect option will allow Cory to proceed, if Cory has one of the rare cases is which this z2 is a valid instrument.

        --David
        [email protected]

        Comment

        Working...
        X