Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity: VIFs okay, condition number not

    Dear Statalist users,
    I am going to estimate a sem model with several covariates, and wanted to check for possible collinearity problems. I used the collin (by Phil Ender) and got the following output (all log* variables are exposure variables of interest, the rest is covariates):
    Code:
     collin logAs logCo logK_ug logCd logCs logCu logHg logMn logMo logPb logSe logZn logMg_ug logNa_ug Maternal_age Maternal_edu Parity MADHD_SS TOTAL_Fish_Intake if PP_ADHD_SS!=.
    
      Collinearity Diagnostics
    
                            SQRT                   R-
      Variable      VIF     VIF    Tolerance    Squared
    ----------------------------------------------------
         logAs      1.47    1.21    0.6819      0.3181
         logCo      1.22    1.10    0.8215      0.1785
       logK_ug      2.07    1.44    0.4840      0.5160
         logCd      1.25    1.12    0.8012      0.1988
         logCs      1.25    1.12    0.7969      0.2031
         logCu      1.37    1.17    0.7289      0.2711
         logHg      1.76    1.33    0.5696      0.4304
         logMn      1.26    1.12    0.7909      0.2091
         logMo      1.16    1.08    0.8641      0.1359
         logPb      1.26    1.12    0.7918      0.2082
         logSe      1.32    1.15    0.7556      0.2444
         logZn      1.59    1.26    0.6303      0.3697
      logMg_ug      1.68    1.30    0.5949      0.4051
      logNa_ug      1.47    1.21    0.6809      0.3191
    Maternal_age      1.28    1.13    0.7817      0.2183
    Maternal_edu      1.18    1.08    0.8497      0.1503
        Parity      1.23    1.11    0.8146      0.1854
      MADHD_SS      1.05    1.02    0.9552      0.0448
    TOTAL_Fish_Intake      1.15    1.07    0.8732      0.1268
    ----------------------------------------------------
      Mean VIF      1.37
    
                               Cond
            Eigenval          Index
    ---------------------------------
        1    15.8980          1.0000
        2     1.3086          3.4855
        3     0.8965          4.2111
        4     0.6585          4.9134
        5     0.3864          6.4144
        6     0.2504          7.9683
        7     0.1637          9.8558
        8     0.1233         11.3536
        9     0.0937         13.0223
        10     0.0875         13.4817
        11     0.0528         17.3523
        12     0.0306         22.7821
        13     0.0281         23.7791
        14     0.0199         28.2944
        15     0.0011        118.0879
        16     0.0005        183.6459
        17     0.0002        261.0906
        18     0.0001        491.0595
        19     0.0000        834.7926
        20     0.0000        946.2107
    ---------------------------------
     Condition Number       946.2107
     Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
     Det(correlation matrix)    0.0425
    As you can see, the VIFs seem not to indicate problems, but the condition number is very high, and seems to indicate collinearity problems. Dropping the intercept, however, lowers the condition number to an acceptable level:

    Code:
     collin logAs logCo logK_ug logCd logCs logCu logHg logMn logMo logPb logSe logZn logMg_ug logNa_ug Maternal_age Maternal_edu Parity MADHD_SS TOTAL_Fish_Intake if PP_ADHD_SS!=., corr
    
                               Cond
            Eigenval          Index
    ---------------------------------
        1     2.9785          1.0000
        2     2.0193          1.2145
        3     1.5783          1.3737
        4     1.4891          1.4143
        5     1.2766          1.5275
        6     1.1341          1.6206
        7     1.0118          1.7158
        8     0.9726          1.7500
        9     0.8092          1.9186
        10     0.7914          1.9400
        11     0.7275          2.0233
        12     0.7138          2.0428
        13     0.6610          2.1228
        14     0.6043          2.2201
        15     0.5528          2.3212
        16     0.5022          2.4353
        17     0.4795          2.4924
        18     0.3994          2.7307
        19     0.2984          3.1591
    ---------------------------------
     Condition Number         3.1591
     Eigenvalues & Cond Index computed from deviation sscp (no intercept)
     Det(correlation matrix)    0.0425
    When I do the sem analysis, I get the following results, including model fit:
    Code:
    Endogenous variables
    
    Observed:  THYREOIDEA BARN_DNA_1606 PP_ADHD_SS
    
    Exogenous variables
    
    Observed:  logCd logCo logAs TOTAL_Fish_Intake Maternal_age Maternal_edu Parity MADHD_SS
    
    Fitting target model:
    
    Iteration 0:   log likelihood = -12697.629 
    Iteration 1:   log likelihood = -12697.629 
    
    Structural equation model                       Number of obs     =        782
    Estimation method  = ml
    Log likelihood     = -12697.629
    
    ----------------------------------------------------------------------------------------------------
                                       |                 OIM
                          Standardized |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
    Structural                         |
      THYREOIDEA                       |
                                 logCd |  -.0407465   .0357893    -1.14   0.255    -.1108923    .0293992
                                 logCo |   .0390415     .03579     1.09   0.275    -.0311055    .1091886
                                 logAs |   .0590727   .0355923     1.66   0.097    -.0106869    .1288322
                                 _cons |   .1299296    .131326     0.99   0.322    -.1274646    .3873237
      ---------------------------------+----------------------------------------------------------------
      BARN_DNA_1606                    |
                                 logCd |   .0576503   .0357198     1.61   0.107    -.0123592    .1276599
                                 logCo |  -.0774175   .0356707    -2.17   0.030    -.1473308   -.0075042
                                 logAs |   -.011075   .0356121    -0.31   0.756    -.0808735    .0587235
                                 _cons |   1.471542   .1375627    10.70   0.000     1.201924     1.74116
      ---------------------------------+----------------------------------------------------------------
      PP_ADHD_SS                       |
                            THYREOIDEA |   -.015188   .0346469    -0.44   0.661    -.0830946    .0527187
                         BARN_DNA_1606 |  -.0171873   .0346506    -0.50   0.620    -.0851012    .0507266
                                 logCd |   .0038546   .0355215     0.11   0.914    -.0657664    .0734755
                                 logCo |   .0847461   .0350729     2.42   0.016     .0160044    .1534877
                                 logAs |   .0443859   .0359332     1.24   0.217     -.026042    .1148137
                     TOTAL_Fish_Intake |   .0196385   .0355246     0.55   0.580    -.0499885    .0892655
                          Maternal_age |  -.0692916   .0372963    -1.86   0.063     -.142391    .0038078
                          Maternal_edu |  -.1573542   .0361577    -4.35   0.000     -.228222   -.0864865
                                Parity |   .0574914   .0372399     1.54   0.123    -.0154975    .1304804
                              MADHD_SS |   .1504165   .0346772     4.34   0.000     .0824505    .2183825
                                 _cons |   1.307863    .288134     4.54   0.000     .7431304    1.872595
    -----------------------------------+----------------------------------------------------------------
                            mean(logCd)|  -2.218876   .0665337   -33.35   0.000     -2.34928   -2.088472
                            mean(logCo)|  -2.910331   .0818192   -35.57   0.000    -3.070693   -2.749968
                            mean(logAs)|   .4790394   .0377558    12.69   0.000     .4050395    .5530394
                mean(TOTAL_Fish_Intake)|   1.541774   .0529021    29.14   0.000     1.438088     1.64546
                     mean(Maternal_age)|    4.88355   .1285595    37.99   0.000     4.631578    5.135522
                     mean(Maternal_edu)|   2.773015   .0787109    35.23   0.000     2.618745    2.927286
                           mean(Parity)|    .723081   .0401631    18.00   0.000     .6443628    .8017992
                         mean(MADHD_SS)|    3.80549   .1026558    37.07   0.000     3.604289    4.006692
    -----------------------------------+----------------------------------------------------------------
                      var(e.THYREOIDEA)|   .9936511   .0056625                      .9826145    1.004812
                   var(e.BARN_DNA_1606)|   .9914212   .0065675                      .9786324    1.004377
                      var(e.PP_ADHD_SS)|   .9260464   .0180988                      .8912443    .9622076
                             var(logCd)|          1          .                             .           .
                             var(logCo)|          1          .                             .           .
                             var(logAs)|          1          .                             .           .
                 var(TOTAL_Fish_Intake)|          1          .                             .           .
                      var(Maternal_age)|          1          .                             .           .
                      var(Maternal_edu)|          1          .                             .           .
                            var(Parity)|          1          .                             .           .
                          var(MADHD_SS)|          1          .                             .           .
    -----------------------------------+----------------------------------------------------------------
      cov(e.THYREOIDEA,e.BARN_DNA_1606)|  -.0144893   .0357524    -0.41   0.685    -.0845627    .0555842
                       cov(logCd,logCo)|   .0967001   .0354255     2.73   0.006     .0272674    .1661329
                       cov(logCd,logAs)|    .017203   .0357493     0.48   0.630    -.0528644    .0872704
           cov(logCd,TOTAL_Fish_Intake)|    .086831   .0354903     2.45   0.014     .0172712    .1563907
                cov(logCd,Maternal_age)|   .0803381   .0355291     2.26   0.024     .0107023    .1499739
                cov(logCd,Maternal_edu)|   -.132176   .0351352    -3.76   0.000    -.2010397   -.0633123
                      cov(logCd,Parity)|    .145757   .0350002     4.16   0.000     .0771579    .2143561
                    cov(logCd,MADHD_SS)|  -.0195149   .0357463    -0.55   0.585    -.0895763    .0505466
                       cov(logCo,logAs)|    .014158   .0357528     0.40   0.692    -.0559162    .0842321
           cov(logCo,TOTAL_Fish_Intake)|   .0609902   .0356269     1.71   0.087    -.0088372    .1308177
                cov(logCo,Maternal_age)|   .0545553   .0356535     1.53   0.126    -.0153242    .1244349
                cov(logCo,Maternal_edu)|   .0246906   .0357381     0.69   0.490    -.0453548    .0947361
                      cov(logCo,Parity)|   .1501533   .0349537     4.30   0.000     .0816453    .2186612
                    cov(logCo,MADHD_SS)|   .0527544   .0356604     1.48   0.139    -.0171387    .1226475
           cov(logAs,TOTAL_Fish_Intake)|   .1952887   .0343961     5.68   0.000     .1278736    .2627039
                cov(logAs,Maternal_age)|   .1267407   .0351855     3.60   0.000     .0577783     .195703
                cov(logAs,Maternal_edu)|    .194003    .034414     5.64   0.000     .1265527    .2614532
                      cov(logAs,Parity)|  -.0288601   .0357301    -0.81   0.419    -.0988899    .0411697
                    cov(logAs,MADHD_SS)|  -.0406803   .0357007    -1.14   0.255    -.1106525    .0292918
    cov(TOTAL_Fish_Intake,Maternal_age)|   .1012197   .0353936     2.86   0.004     .0318496    .1705898
    cov(TOTAL_Fish_Intake,Maternal_edu)|   .1006808   .0353974     2.84   0.004     .0313031    .1700585
          cov(TOTAL_Fish_Intake,Parity)|   .0791338    .035536     2.23   0.026     .0094845     .148783
        cov(TOTAL_Fish_Intake,MADHD_SS)|  -.0542005   .0356549    -1.52   0.128    -.1240827    .0156818
         cov(Maternal_age,Maternal_edu)|   .1586736   .0348596     4.55   0.000       .09035    .2269971
               cov(Maternal_age,Parity)|   .3200286   .0320975     9.97   0.000     .2571188    .3829385
             cov(Maternal_age,MADHD_SS)|   -.092296   .0354553    -2.60   0.009    -.1617871   -.0228049
               cov(Maternal_edu,Parity)|  -.0491163   .0356737    -1.38   0.169    -.1190354    .0208028
             cov(Maternal_edu,MADHD_SS)|  -.1547283   .0349038    -4.43   0.000    -.2231385   -.0863181
                   cov(Parity,MADHD_SS)|   .0030596   .0357596     0.09   0.932    -.0670279    .0731471
    ----------------------------------------------------------------------------------------------------
    LR test of model vs. saturated: chi2(10)  =     10.16, Prob > chi2 = 0.4266
    
    estat gof, stats(all)
    
    
    ----------------------------------------------------------------------------
    Fit statistic        |      Value   Description
    ---------------------+------------------------------------------------------
    Likelihood ratio     |
             chi2_ms(10) |     10.160   model vs. saturated
                p > chi2 |      0.427
             chi2_bs(27) |     81.775   baseline vs. saturated
                p > chi2 |      0.000
    ---------------------+------------------------------------------------------
    Population error     |
                   RMSEA |      0.005   Root mean squared error of approximation
     90% CI, lower bound |      0.000
             upper bound |      0.039
                  pclose |      0.993   Probability RMSEA <= 0.05
    ---------------------+------------------------------------------------------
    Information criteria |
                     AIC |  25529.258   Akaike's information criterion
                     BIC |  25841.603   Bayesian information criterion
    ---------------------+------------------------------------------------------
    Baseline comparison  |
                     CFI |      0.997   Comparative fit index
                     TLI |      0.992   Tucker-Lewis index
    ---------------------+------------------------------------------------------
    Size of residuals    |
                    SRMR |      0.014   Standardized root mean squared residual
                      CD |      0.087   Coefficient of determination
    ----------------------------------------------------------------------------
    
    
    
    estat residuals
    
    
    Residuals of observed variables
    
      Mean residuals
    
                     | THYREOI~A  BARN~1606  PP_ADHD~S      logCd      logCo      logAs  T~ish_I~e  Materna~e
        -------------+----------------------------------------------------------------------------------------
                 raw |     0.000      0.000      0.000      0.000      0.000      0.000      0.000      0.000
        ------------------------------------------------------------------------------------------------------
    
                     | Materna~u     Parity   MADHD_SS
        -------------+---------------------------------
                 raw |     0.000      0.000      0.000
        -----------------------------------------------
    
      Covariance residuals
    
                     | THYREOI~A  BARN~1606  PP_ADHD~S      logCd      logCo      logAs  T~ish_I~e  Materna~e
        -------------+----------------------------------------------------------------------------------------
          THYREOIDEA |    -0.000                                                                             
        BARN_DN~1606 |     0.000      0.000                                                                  
          PP_ADHD_SS |     0.015     -0.010     -0.017                                                       
               logCd |     0.000      0.000      0.000      0.000                                            
               logCo |     0.000      0.000      0.000      0.000      0.000                                 
               logAs |    -0.000     -0.000      0.000      0.000      0.000      0.000                      
        TOTAL_Fish~e |     0.065     -0.111     -0.020      0.000      0.000      0.000      0.000           
        Maternal_age |     0.000      0.002     -0.000      0.000      0.000      0.000      0.000      0.000
        Maternal_edu |    -0.007      0.014      0.002      0.000      0.000      0.000      0.000      0.000
              Parity |     0.002      0.015     -0.005      0.000      0.000      0.000      0.000      0.000
            MADHD_SS |     0.018      0.008     -0.015      0.000      0.000      0.000      0.000      0.000
        ------------------------------------------------------------------------------------------------------
    
                     | Materna~u     Parity   MADHD_SS
        -------------+---------------------------------
        Maternal_edu |     0.000                      
              Parity |     0.000      0.000           
            MADHD_SS |     0.000      0.000      0.000
        -----------------------------------------------
    Do I have reason to be concerned about multicollinearity?

    Best regards,
    Kjell

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions. You also need to put up much shorter postings - you're asking us to puzzle through a lot of material just to help you. Knowing exactly what you ran would also be helpful - a bunch of variances equal 1 would worry me unless for some reason you've constrained them that way. However, you can't really constrain the variance of an observed variable sensibly. I'd also worry about the "residuals of observed variables" all equal 0. I'm not sure exactly what this means, but it seems troubling. I also wonder if you don't have a lot of parameters dependent on very few observed variables.

    Dropping a constant to change reported colinearity is not really a good idea - you're fundamentally changing the model.

    I am not sure you need SEM - do you have latent variables?

    Comment


    • #3
      You should read the piece in Goldberger's econometric text on colinearity. Colinearity (as long as the model estimates) really just means you're going to have more difficulty accurately estimating the parameters accurately. So, many don't worry too much about it.

      Comment

      Working...
      X