Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity in Probit Regression with Square and Cube terms of explanatory variable

    Hi,
    I am running a probit regression ( dependent variable being a 0 or 1) and need to use the square and cubed terms of an explanatory variable as I expect a curvilinear relationship and two inflection points. Multicollinearity is high in this scenario. I have read some weblinks that tell me that multicollinearity is not an issue because the problem demands it (http://www.statalist.org/forums/foru...=1483003842560, and here, http://statisticalhorizons.com/multicollinearity ), but do these apply also for probit regressions? How do I solve the problem?

    Thank you,

  • #2
    Multicollinearity, of greater or lesser extent, is expected whenever polynomial regression is carried out. It is not a problem and it needs no solution. Just remember that when you examine the effect of the explanatory variable, you need to be looking at all of those coefficients (the linear, quadratic, and cubic) jointly. None of them separately has any meaning in a polynomial model.

    That said, you can reduce the colinearity among the terms of the polynomial somewhat by centering the explanatory variable around its mean (or some other convenient value in the center of the data). This may reduce the standard errors of the individual coefficients a bit (which may make you feel more comfortable with the results although, as already noted, it is the joint effects of all three terms that matter, and those statistics are not affected by centering.)

    Comment


    • #3
      Phil Bromiley sent me a private forum message to point out some errors and unclarity in my response in #2. I thank him for noticing these problems, and here I attempt to correct them.

      First, there is no real benefit in this situation by centering variables here. While it is true that the standard errors of the individual coefficients may decrease from doing that, this is of no importance because the individual coefficients themselves are of no importance. Whether the variables are centered or not, the model will give the exact same predicted values, and any joint tests of the linear, quadratic, and cubic terms will come out identically. (In other contexts where multicolinearity may be an issue, having narrower standard errors can be helpful, but it isn't in this polynomial context. Here it is purely cosmetic.)

      Second, I misspoke in referring to centering around a central point in the data. I had in mind a different situation. When fitting a quadratic model, it is sometimes useful to center the x variable at or near the axis of symmetry of the parabola. If that axis of symmetry is located at or near a useful reference value, then the model can be more easily understood when framed this way. Moreover, centering around the axis of symmetry does eliminate the colinearity between the linear and quadratic terms (although, again, this has no substantive effects: it does not change model predictions, nor joint tests of the terms.) Conceivably, a similar gain in understandability might be accomplished by centering around a critical point in the cubic relationship (though this is less likely than in the quadratic case). But, again, doing so does not substantively change the model.

      So, what I said about centering in #2 is a mixture of incorrect, confusing, and irrelevant. I apologize for whatever confusion I have caused Suja or others reading this thread.

      To summarize briefly what I should have said:

      1. Multicolinearity among the linear, quadratic, and cubic terms is expected and is not a problem at all. It isn't broke; don't try to fix it.

      2. In interpreting your model, you should not attempt to separately interpret the linear, quadratic, and cubic terms. Any significance tests done should be joint significance tests of all three terms. Model predictions should be calculated including all three terms even if one or more of them is not statistically significant in its own right.

      Comment


      • #4
        Thank you completely for your detailed reply, and apologies for my delayed response... I have one more(related) question. In an OLS if I had the linear , quadratic and cubed forms of an explanatory variable,say x, I can find the two inflection points by differentiating the equation and setting it equal to zero, (assuming the other parameters are unaffected by x).Then I will get the two values of x at which the behavior of my dependent variable y changes. But how do I do this for a probit regression?

        Thank you,

        Comment

        Working...
        X