Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for collinearity and multiple collinearity in a conditional logistic regression model with imputed data

    Dear Stata forum,

    I have imputed a data set consisting of continuous and binary variables and I am creating a conditional logistic regression model with independent variables associated with the recurrence of TB infection (recurrence being my dependent variable). I believe that there are some variables that are highly correlated e.g. the interruption of drug treatment and reaction to medication. When I search online for methods to detect collinearity and multiple collinearity papers suggest to use methods such as VIF, the condition index and / or using the unexpected direction of associations between the outcome and explanatory variables is an important sign of collinearity and multicollinearity (http://www.nature.com/bdj/journal/v199/n7/full/4812743a.html). Using the last recommendation I believe I have detected collinearity but I cannot use VIF / the condition index with multiple imputed data. I was wondering if there is a better approach to assess my conditional logistic regression model for the presence of collinear and multiple collinear variables when working with multiply imputed data?

    Many thanks for your help


  • #2
    What prevents you from using VIF? I suppose it would be biased downward for variables that have imputed values, but it should still give you a feel for things. It's just a diagnostic, descriptive tool anyway, so it's not like you need to worry about exact p-values or anything like that.

    ps. the fact that it's biased downward should basically inform your interpretation. If your VIF is high even with MI, then you know you have problems. If your VIF is borderline, then you may still have problems. But if the problems are really bad, VIF should pick them up even with MI.
    Last edited by ben earnhart; 24 Nov 2015, 16:26.

    Comment


    • #3
      Many thanks for your answer. The problem with VIF in Stata is that I get an error message with imputed data 'vif not appropriate after regress, nocons; use option uncentered to get uncentered VIFs'. Using vif, uncentered give an 'invalid syntax' error. I get no joy with estat vif either.

      Comment


      • #4
        Oh. Maybe somebody will chime in as to why VIF is verboten (it's only approximate with MI, but I don't see why it should be made impossible). Do you know the long-hand way to get VIF? See https://en.wikipedia.org/wiki/Variance_inflation_factor

        Oh, and I think your error has to do with using the nocons option in your regresssion. Remove that option from the regression, and see if it works.
        Last edited by ben earnhart; 24 Nov 2015, 17:14.

        Comment


        • #5
          Originally posted by ben earnhart View Post
          Oh. Maybe somebody will chime in as to why VIF is verboten (it's only approximate with MI, but I don't see why it should be made impossible).
          Just a guess, but I do not believe StataCorp makes it particularly easy for you to do things that do not have sound theoretical foundation, especially concerning multiple imputation. I like this strategy, by the way.

          Anyway, I have authored a program, mivif (SSC), a few years ago, that implements what I think might be a good approximation to a VIF in multiply imputed datasets, namely treating R-squared as a point estimate that needs to be combined using Rubin's rules. From the help

          mivif runs the regression of x1 on all other x in each imputed dataset and calculates the mean of the z-transformed R-squares as R2_MI. The VIF reported for each variable is calculated as 1/(1 - R2_MI).
          Best
          Daniel

          Comment


          • #6
            Dear Daniel and Ben, many thanks for your help. The mivif program was very helpful!

            Comment

            Working...
            X