Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • High Multicollinearity due to Dummy Variables ( VIF >15)

    Hi,

    In my regression model, I have introduced 5 dummy variables to control effect of 6 different sets used for experiment (Set A /B/C/D/E/F).
    But I am getting higher VIF (>15) for these 5 independent control and a few other control variables . However I am getting lower vif (<3.0) for
    remaining independent variables including variable of interest.
    If I drop there control variables to avoid multicollinearity problem, ovtest becomes significant. What should I do?

    Is it appropriate to ignore VIF for dummy variables and report the findings ?


    Regards-
    - Abhishek


  • #2
    Here is a good summary article on VIF by Paul Alison which may be of interest of you. If variables are highly collinear, Stata drops them anyway. The article describes when you have a safeguard in the presence of multicollinearity. I think, situation '1' is for you. http://www.statisticalhorizons.com/multicollinearity
    Roman

    Comment


    • #3
      Originally posted by Roman Mostazir View Post
      Here is a good summary article on VIF by Paul Alison which may be of interest of you. If variables are highly collinear, Stata drops them anyway. The article describes when you have a safeguard in the presence of multicollinearity. I think, situation '1' is for you. http://www.statisticalhorizons.com/multicollinearity


      Thank you for sharing this link.

      Author has mentioned that he has observed high VIF (5.26) for dummy variables. But in my case it is more than 15. So will it be appropriate to ignore since 15 >> 5.26?

      Comment


      • #4
        Actually I wonder if situation 3 applies -- if you have a 6 category variable and create 5 dummies from it, then sure, you will have high multicollinearity.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          I'm going to go out on a limb here, but in my opinion, far too much attention is paid to multicollinearity. I would go beyond Allison's recommendations and say that multicollinearity is just not a problem except when it's obviously a problem.

          There are a couple of potential problems multicollinearity can cause. One of them, which requires extremely high levels of multicollinearity, is that the variance matrix becomes nearly singular and the calculations become unstable. Stata already checks for that before proceeding, and will eliminate one or more of the offending variables before doing the regression. So the user doesn't really have to think about that. The other is with degrees of collinearity that make it difficult to distinguish the effects of the different variables from each other. In extreme cases of this, you will find obviously unreasonable standard errors for those variables--and this is when it is obviously a problem.

          But if the key variables of interest in your research question exhibit sensible standard errors, then, in my opinion, you have no problem, no matter what the VIF's look like. In fact, in that situation I wouldn't even bother calculating VIF.

          If one of your key variables has a suspiciously high standard error, then you need to investigate the causes of that: but VIF doesn't really contribute anything to that. It may be that your key variable is nearly collinear with some of the variables you are using to adjust for confounding. In that case, you may need to rethink your model and eliminate some redundant variables. On the other hand, this could also be a clue that one of those "confounders" actually mediates the effect of interest--that would be a very interesting finding if true and warrants investigating in an appropriate mediation model. But really, a model that is well thought-out before you start analysis (and, better still, before you gather data) will rarely surprise you in this way.

          Finally, what if you are in the situation where a key variable and a variable you are merely adjusting for, but that is independently known to be an important correlate of the outcome, are sufficiently collinear that they are jointly significant predictors of the outcome, but neither one is separately. Well, then the only fair conclusion is that your data does not provide sufficient information to disentangle their effects distinctly. It may be tempting to drop the covariate so as to capture a "significant" result for your key variable: but that would be misleading, presenting a biased result. Sometimes a data set doesn't answer a particular question clearly. In this circumstances you need either more data, or perhaps different data: a sampling design that breaks the association between the key variable and the confounder (I'm thinking of matched pairs or stratified sampling or something like that.)

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I'm going to go out on a limb here, but in my opinion, far too much attention is paid to multicollinearity. I would go beyond Allison's recommendations and say that multicollinearity is just not a problem except when it's obviously a problem.

            Respected sir, I was experiencing tough time due to high VIF problem, but your comment brought some relief! I would like to thank you for valuable comment on forum.

            Comment

            Working...
            X