Use and interpretation of colin (and colldiag)

Miguel A. Duran

Join Date: Apr 2014
Posts: 47

Use and interpretation of colin (and colldiag)

09 Mar 2015, 10:50

I am doing an OLS regression with quite a large number of variables. This is one of the specifications I am working with:

Code:

regress y xm xx mm $st $fes $fas $fis dquarter* dprimarysiccode*, cluster(firm)

Linear regression                                      Number of obs =    1549
                                                       F( 55,   106) =   12.72
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.4078
                                                       Root MSE      =  .23525

                                 (Std. Err. adjusted for 107 clusters in firm)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          xm |   .0350857   .0530344     0.66   0.510    -.0700602    .1402316
          xx |  -.1241785   .0472125    -2.63   0.010    -.2177819   -.0305751
          mm |  -.1093333   .0456007    -2.40   0.018    -.1997411   -.0189255
          zs |  -.0113633   .0092922    -1.22   0.224     -.029786    .0070593
         dur |  -.0655198    .039015    -1.68   0.096    -.1428707    .0118311
         dco |   .2530042   .0943562     2.68   0.009     .0659339    .4400745
         dan |   .2723561   .1063543     2.56   0.012     .0614983    .4832139
         dut |   .0419444   .0713666     0.59   0.558    -.0995468    .1834355
         dup |  -.0193242   .0347322    -0.56   0.579    -.0881841    .0495357
         dca |  -.0648063   .0478411    -1.35   0.178     -.159656    .0300433
         dse |  -.0399489   .0429487    -0.93   0.354    -.1250987     .045201
         dsy |  -.0358723   .0587616    -0.61   0.543     -.152373    .0806284
          yl |  -.0963299   .0432366    -2.23   0.028    -.1820506   -.0106091
         pur |  -.0504213   .0283916    -1.78   0.079    -.1067104    .0058677
         mat |   .0011013   .0010494     1.05   0.296    -.0009792    .0031819
         amt |   .0001559   .0000992     1.57   0.119    -.0000408    .0003525
         ycv |   .0734919   .0517773     1.42   0.159    -.0291617    .1761454
         pro |   .4624523   .4935774     0.94   0.351     -.516113    1.441018
          mb |  -.0337071   .0159716    -2.11   0.037    -.0653723    -.002042
          ta |   .0165495   .0940275     0.18   0.861    -.1698691    .2029682
         lev |    .531612   .0906072     5.87   0.000     .3519745    .7112496
          li |  -.0336324   .0163566    -2.06   0.042    -.0660609   -.0012039
          si |  -.0641883   .0194685    -3.30   0.001    -.1027865     -.02559
         unr |   .2487435   .0704326     3.53   0.001      .109104    .3883831
         spe |   .1181584   .0627558     1.88   0.062     -.006261    .2425779
   dquarter1 |   -.138636   .0757786    -1.83   0.070    -.2888745    .0116025
   dquarter2 |  -.1464862   .0738429    -1.98   0.050    -.2928868   -.0000855
   dquarter3 |  -.1559278   .0753829    -2.07   0.041    -.3053817   -.0064739
   dquarter4 |  -.1208113   .0738601    -1.64   0.105    -.2672462    .0256235
   dquarter5 |  -.1211856   .0741939    -1.63   0.105    -.2682822    .0259109
   dquarter6 |  -.1000479   .0762635    -1.31   0.192    -.2512476    .0511519
   dquarter7 |  -.0960139   .0767273    -1.25   0.214    -.2481333    .0561055
   dquarter8 |  -.0625375   .0779553    -0.80   0.424    -.2170914    .0920164
   dquarter9 |  -.0596899   .0758053    -0.79   0.433    -.2099812    .0906015
  dquarter10 |  -.0391873   .0774287    -0.51   0.614    -.1926973    .1143227
  dquarter11 |  -.0083287   .0788968    -0.11   0.916    -.1647492    .1480918
  dquarter12 |   .0267371     .08208     0.33   0.745    -.1359945    .1894687
  dquarter13 |   .0008081   .0787844     0.01   0.992    -.1553896    .1570058
  dquarter14 |  -.0341349   .0728937    -0.47   0.641    -.1786537    .1103839
  dquarter15 |  -.0422517   .0719317    -0.59   0.558    -.1848633    .1003598
  dquarter16 |  -.0721815   .0675334    -1.07   0.288     -.206073      .06171
  dquarter17 |  -.0450987   .0717747    -0.63   0.531    -.1873989    .0972015
  dquarter18 |  -.0837943   .0745701    -1.12   0.264    -.2316368    .0640482
  dquarter19 |  -.0872572   .0734328    -1.19   0.237    -.2328449    .0583306
  dquarter20 |   -.117614   .0707423    -1.66   0.099    -.2578675    .0226395
  dquarter21 |  -.0987901   .0715946    -1.38   0.171    -.2407335    .0431532
  dquarter22 |  -.0597826   .0693939    -0.86   0.391    -.1973626    .0777975
  dquarter23 |  -.0227661   .0628482    -0.36   0.718    -.1473688    .1018366
  dquarter24 |  (dropped)
  dquarter25 |   -.027545   .0914456    -0.30   0.764    -.2088448    .1537547
dprimarysi~1 |  -.0624944    .105257    -0.59   0.554    -.2711766    .1461879
dprimarysi~2 |  -.0201591   .0481707    -0.42   0.676    -.1156622    .0753441
dprimarysi~3 |   .0293575   .0502924     0.58   0.561    -.0703521    .1290672
dprimarysi~4 |   -.117912   .0627035    -1.88   0.063    -.2422278    .0064037
dprimarysi~5 |  -.0167383   .0669127    -0.25   0.803    -.1493992    .1159226
dprimarysi~6 |  (dropped)
dprimarysi~7 |   .0414913   .0712774     0.58   0.562    -.0998231    .1828057
       _cons |   .3202686   .1846133     1.73   0.086    -.0457453    .6862825
------------------------------------------------------------------------------

To test for multicollinearity using -collin-, I exclude (besides the dependent variable) dquarter* and dprimarysiccode*, ie, the fixed effects. Otherwise, I get the following message:

Code:

corr(): matrix has zero or negative values on diagonal

.

Question 1. Is this correct?

Code:

 collin xm xx mm $st $fes $fas $fis
(obs=1549)

  Collinearity Diagnostics

                        SQRT                   R-
  Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
        xm      3.29    1.81    0.3037      0.6963
        xx      3.03    1.74    0.3305      0.6695
        mm      3.19    1.79    0.3134      0.6866
        zs      3.06    1.75    0.3263      0.6737
       dur      1.37    1.17    0.7305      0.2695
       dco      5.59    2.36    0.1788      0.8212
       dan      7.46    2.73    0.1341      0.8659
       dut      2.22    1.49    0.4498      0.5502
       dup      1.13    1.06    0.8887      0.1113
       dca      1.21    1.10    0.8296      0.1704
       dse      2.05    1.43    0.4875      0.5125
       dsy      1.38    1.18    0.7233      0.2767
        yl      1.36    1.17    0.7340      0.2660
       pur      1.16    1.08    0.8612      0.1388
       mat      1.46    1.21    0.6852      0.3148
       amt      2.70    1.64    0.3708      0.6292
       ycv      1.10    1.05    0.9095      0.0905
       pro      2.09    1.44    0.4795      0.5205
        mb      1.76    1.33    0.5666      0.4334
        ta      1.41    1.19    0.7107      0.2893
       lev      1.82    1.35    0.5492      0.4508
        li      1.78    1.33    0.5612      0.4388
        si      3.37    1.83    0.2971      0.7029
       unr      6.12    2.47    0.1633      0.8367
       spe      4.05    2.01    0.2468      0.7532
----------------------------------------------------
  Mean VIF      2.61

                           Cond
        Eigenval          Index
---------------------------------
    1    12.6253          1.0000
    2     2.3336          2.3260
    3     1.7757          2.6664
    4     1.3940          3.0094
    5     1.1074          3.3765
    6     0.8993          3.7469
    7     0.8329          3.8935
    8     0.7367          4.1397
    9     0.6745          4.3263
    10     0.6550          4.3902
    11     0.5188          4.9333
    12     0.4313          5.4105
    13     0.4056          5.5790
    14     0.3061          6.4224
    15     0.2463          7.1600
    16     0.1891          8.1705
    17     0.1810          8.3517
    18     0.1605          8.8704
    19     0.1307          9.8284
    20     0.1118         10.6248
    21     0.0935         11.6230
    22     0.0750         12.9734
    23     0.0478         16.2546
    24     0.0392         17.9361
    25     0.0230         23.4378
    26     0.0058         46.5849
---------------------------------
 Condition Number        46.5849
 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
 Det(correlation matrix)    0.0000

All the variables have a reasonable value of VIF. However, the conditioning index seems too high. As far as I know, this could suggest that there is a collinearity problem associated to the constant term.

Question 2. Is this correct?

To exclude the constant term from the analysis, I run -collin- with the option -corr-:

Code:

collin xm xx mm $st $fes $fas $fis, corr
(obs=1549)

  Collinearity Diagnostics

                        SQRT                   R-
  Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
        xm      3.29    1.81    0.3037      0.6963
        xx      3.03    1.74    0.3305      0.6695
        mm      3.19    1.79    0.3134      0.6866
        zs      3.06    1.75    0.3263      0.6737
       dur      1.37    1.17    0.7305      0.2695
       dco      5.59    2.36    0.1788      0.8212
       dan      7.46    2.73    0.1341      0.8659
       dut      2.22    1.49    0.4498      0.5502
       dup      1.13    1.06    0.8887      0.1113
       dca      1.21    1.10    0.8296      0.1704
       dse      2.05    1.43    0.4875      0.5125
       dsy      1.38    1.18    0.7233      0.2767
        yl      1.36    1.17    0.7340      0.2660
       pur      1.16    1.08    0.8612      0.1388
       mat      1.46    1.21    0.6852      0.3148
       amt      2.70    1.64    0.3708      0.6292
       ycv      1.10    1.05    0.9095      0.0905
       pro      2.09    1.44    0.4795      0.5205
        mb      1.76    1.33    0.5666      0.4334
        ta      1.41    1.19    0.7107      0.2893
       lev      1.82    1.35    0.5492      0.4508
        li      1.78    1.33    0.5612      0.4388
        si      3.37    1.83    0.2971      0.7029
       unr      6.12    2.47    0.1633      0.8367
       spe      4.05    2.01    0.2468      0.7532
----------------------------------------------------
  Mean VIF      2.61

                           Cond
        Eigenval          Index
---------------------------------
    1     4.2709          1.0000
    2     3.3002          1.1376
    3     2.1975          1.3941
    4     1.5268          1.6725
    5     1.4322          1.7269
    6     1.3057          1.8086
    7     1.1789          1.9034
    8     1.1252          1.9482
    9     1.0541          2.0129
    10     0.8856          2.1961
    11     0.8616          2.2264
    12     0.8253          2.2748
    13     0.7241          2.4286
    14     0.6657          2.5329
    15     0.6308          2.6020
    16     0.5993          2.6694
    17     0.5105          2.8925
    18     0.4487          3.0853
    19     0.4239          3.1743
    20     0.3123          3.6980
    21     0.2173          4.4334
    22     0.1886          4.7592
    23     0.1394          5.5350
    24     0.0941          6.7352
    25     0.0815          7.2412
---------------------------------
 Condition Number         7.2412
 Eigenvalues & Cond Index computed from deviation sscp (no intercept)
 Det(correlation matrix)    0.0000

Thus, if the constant term is dropped, the collinearity problem seems not to be present.

I have also used -colldiag- to analyze the conditioning matrix. However, since -fit- has be to run before -colldiag-, but -fit- cannot be used without constant term, I cannot check the conditioning matrix without the constant term.

Question 3. Does anyone know a way to do this?

[NOTE: I do not include the results from running -colldiag- because I would exceed the maximum numer of characters]

The results from running -colldiag- show that just two conditioning indexes are above 30 (their values are 79.0018 and 35.8414). The variances associated to them that are above 0.5 correspond to some dquarter* variables (ie, time fixed effects variables and their values are just slightly above the threshold) and to the constant term.

Indeed, the variance of the constant is almost 1 (0.9610).

Question 4. Would this provide additional evidence that if I keep the constant term a potential multicollineartiy problem is present?

The variables xx, xm and mm are mutually exclusive dummies that, if equal to 1, account for 87.5% of the observations. Something similar occurs as regards the variables dco and dan.

Question 5. Taking into account this, and given the results from -collin- and -colldiag- would it be advisable to estimate the model without the constant term?

Thanks in advance for any help.

Tags: None

Roman Mostazir

Join Date: Apr 2014

Posts: 870
#2

09 Mar 2015, 19:02

Why are you bothered about multicollinearity when you have reasonable estimates of standard errors for all of your coefficients? I don't see any problem in your model that flags multicollinearity to be worrying. If there was high multicollinearity, Stata would have aotumatically flagged and dropped that variable from your model. I would rather spend time to make the model more meaningful. It looks clumsy with too many variables.

Roman
1 like
Comment

Announcement

Use and interpretation of colin (and colldiag)

Comment