Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use and interpretation of colin (and colldiag)

    I am doing an OLS regression with quite a large number of variables. This is one of the specifications I am working with:
    Code:
    regress y xm xx mm $st $fes $fas $fis dquarter* dprimarysiccode*, cluster(firm)
    
    Linear regression                                      Number of obs =    1549
                                                           F( 55,   106) =   12.72
                                                           Prob > F      =  0.0000
                                                           R-squared     =  0.4078
                                                           Root MSE      =  .23525
    
                                     (Std. Err. adjusted for 107 clusters in firm)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              xm |   .0350857   .0530344     0.66   0.510    -.0700602    .1402316
              xx |  -.1241785   .0472125    -2.63   0.010    -.2177819   -.0305751
              mm |  -.1093333   .0456007    -2.40   0.018    -.1997411   -.0189255
              zs |  -.0113633   .0092922    -1.22   0.224     -.029786    .0070593
             dur |  -.0655198    .039015    -1.68   0.096    -.1428707    .0118311
             dco |   .2530042   .0943562     2.68   0.009     .0659339    .4400745
             dan |   .2723561   .1063543     2.56   0.012     .0614983    .4832139
             dut |   .0419444   .0713666     0.59   0.558    -.0995468    .1834355
             dup |  -.0193242   .0347322    -0.56   0.579    -.0881841    .0495357
             dca |  -.0648063   .0478411    -1.35   0.178     -.159656    .0300433
             dse |  -.0399489   .0429487    -0.93   0.354    -.1250987     .045201
             dsy |  -.0358723   .0587616    -0.61   0.543     -.152373    .0806284
              yl |  -.0963299   .0432366    -2.23   0.028    -.1820506   -.0106091
             pur |  -.0504213   .0283916    -1.78   0.079    -.1067104    .0058677
             mat |   .0011013   .0010494     1.05   0.296    -.0009792    .0031819
             amt |   .0001559   .0000992     1.57   0.119    -.0000408    .0003525
             ycv |   .0734919   .0517773     1.42   0.159    -.0291617    .1761454
             pro |   .4624523   .4935774     0.94   0.351     -.516113    1.441018
              mb |  -.0337071   .0159716    -2.11   0.037    -.0653723    -.002042
              ta |   .0165495   .0940275     0.18   0.861    -.1698691    .2029682
             lev |    .531612   .0906072     5.87   0.000     .3519745    .7112496
              li |  -.0336324   .0163566    -2.06   0.042    -.0660609   -.0012039
              si |  -.0641883   .0194685    -3.30   0.001    -.1027865     -.02559
             unr |   .2487435   .0704326     3.53   0.001      .109104    .3883831
             spe |   .1181584   .0627558     1.88   0.062     -.006261    .2425779
       dquarter1 |   -.138636   .0757786    -1.83   0.070    -.2888745    .0116025
       dquarter2 |  -.1464862   .0738429    -1.98   0.050    -.2928868   -.0000855
       dquarter3 |  -.1559278   .0753829    -2.07   0.041    -.3053817   -.0064739
       dquarter4 |  -.1208113   .0738601    -1.64   0.105    -.2672462    .0256235
       dquarter5 |  -.1211856   .0741939    -1.63   0.105    -.2682822    .0259109
       dquarter6 |  -.1000479   .0762635    -1.31   0.192    -.2512476    .0511519
       dquarter7 |  -.0960139   .0767273    -1.25   0.214    -.2481333    .0561055
       dquarter8 |  -.0625375   .0779553    -0.80   0.424    -.2170914    .0920164
       dquarter9 |  -.0596899   .0758053    -0.79   0.433    -.2099812    .0906015
      dquarter10 |  -.0391873   .0774287    -0.51   0.614    -.1926973    .1143227
      dquarter11 |  -.0083287   .0788968    -0.11   0.916    -.1647492    .1480918
      dquarter12 |   .0267371     .08208     0.33   0.745    -.1359945    .1894687
      dquarter13 |   .0008081   .0787844     0.01   0.992    -.1553896    .1570058
      dquarter14 |  -.0341349   .0728937    -0.47   0.641    -.1786537    .1103839
      dquarter15 |  -.0422517   .0719317    -0.59   0.558    -.1848633    .1003598
      dquarter16 |  -.0721815   .0675334    -1.07   0.288     -.206073      .06171
      dquarter17 |  -.0450987   .0717747    -0.63   0.531    -.1873989    .0972015
      dquarter18 |  -.0837943   .0745701    -1.12   0.264    -.2316368    .0640482
      dquarter19 |  -.0872572   .0734328    -1.19   0.237    -.2328449    .0583306
      dquarter20 |   -.117614   .0707423    -1.66   0.099    -.2578675    .0226395
      dquarter21 |  -.0987901   .0715946    -1.38   0.171    -.2407335    .0431532
      dquarter22 |  -.0597826   .0693939    -0.86   0.391    -.1973626    .0777975
      dquarter23 |  -.0227661   .0628482    -0.36   0.718    -.1473688    .1018366
      dquarter24 |  (dropped)
      dquarter25 |   -.027545   .0914456    -0.30   0.764    -.2088448    .1537547
    dprimarysi~1 |  -.0624944    .105257    -0.59   0.554    -.2711766    .1461879
    dprimarysi~2 |  -.0201591   .0481707    -0.42   0.676    -.1156622    .0753441
    dprimarysi~3 |   .0293575   .0502924     0.58   0.561    -.0703521    .1290672
    dprimarysi~4 |   -.117912   .0627035    -1.88   0.063    -.2422278    .0064037
    dprimarysi~5 |  -.0167383   .0669127    -0.25   0.803    -.1493992    .1159226
    dprimarysi~6 |  (dropped)
    dprimarysi~7 |   .0414913   .0712774     0.58   0.562    -.0998231    .1828057
           _cons |   .3202686   .1846133     1.73   0.086    -.0457453    .6862825
    ------------------------------------------------------------------------------
    To test for multicollinearity using -collin-, I exclude (besides the dependent variable) dquarter* and dprimarysiccode*, ie, the fixed effects. Otherwise, I get the following message:
    Code:
    corr(): matrix has zero or negative values on diagonal
    .

    Question 1. Is this correct?

    Code:
     collin xm xx mm $st $fes $fas $fis
    (obs=1549)
    
      Collinearity Diagnostics
    
                            SQRT                   R-
      Variable      VIF     VIF    Tolerance    Squared
    ----------------------------------------------------
            xm      3.29    1.81    0.3037      0.6963
            xx      3.03    1.74    0.3305      0.6695
            mm      3.19    1.79    0.3134      0.6866
            zs      3.06    1.75    0.3263      0.6737
           dur      1.37    1.17    0.7305      0.2695
           dco      5.59    2.36    0.1788      0.8212
           dan      7.46    2.73    0.1341      0.8659
           dut      2.22    1.49    0.4498      0.5502
           dup      1.13    1.06    0.8887      0.1113
           dca      1.21    1.10    0.8296      0.1704
           dse      2.05    1.43    0.4875      0.5125
           dsy      1.38    1.18    0.7233      0.2767
            yl      1.36    1.17    0.7340      0.2660
           pur      1.16    1.08    0.8612      0.1388
           mat      1.46    1.21    0.6852      0.3148
           amt      2.70    1.64    0.3708      0.6292
           ycv      1.10    1.05    0.9095      0.0905
           pro      2.09    1.44    0.4795      0.5205
            mb      1.76    1.33    0.5666      0.4334
            ta      1.41    1.19    0.7107      0.2893
           lev      1.82    1.35    0.5492      0.4508
            li      1.78    1.33    0.5612      0.4388
            si      3.37    1.83    0.2971      0.7029
           unr      6.12    2.47    0.1633      0.8367
           spe      4.05    2.01    0.2468      0.7532
    ----------------------------------------------------
      Mean VIF      2.61
    
                               Cond
            Eigenval          Index
    ---------------------------------
        1    12.6253          1.0000
        2     2.3336          2.3260
        3     1.7757          2.6664
        4     1.3940          3.0094
        5     1.1074          3.3765
        6     0.8993          3.7469
        7     0.8329          3.8935
        8     0.7367          4.1397
        9     0.6745          4.3263
        10     0.6550          4.3902
        11     0.5188          4.9333
        12     0.4313          5.4105
        13     0.4056          5.5790
        14     0.3061          6.4224
        15     0.2463          7.1600
        16     0.1891          8.1705
        17     0.1810          8.3517
        18     0.1605          8.8704
        19     0.1307          9.8284
        20     0.1118         10.6248
        21     0.0935         11.6230
        22     0.0750         12.9734
        23     0.0478         16.2546
        24     0.0392         17.9361
        25     0.0230         23.4378
        26     0.0058         46.5849
    ---------------------------------
     Condition Number        46.5849
     Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
     Det(correlation matrix)    0.0000
    All the variables have a reasonable value of VIF. However, the conditioning index seems too high. As far as I know, this could suggest that there is a collinearity problem associated to the constant term.

    Question 2. Is this correct?

    To exclude the constant term from the analysis, I run -collin- with the option -corr-:
    Code:
    collin xm xx mm $st $fes $fas $fis, corr
    (obs=1549)
    
      Collinearity Diagnostics
    
                            SQRT                   R-
      Variable      VIF     VIF    Tolerance    Squared
    ----------------------------------------------------
            xm      3.29    1.81    0.3037      0.6963
            xx      3.03    1.74    0.3305      0.6695
            mm      3.19    1.79    0.3134      0.6866
            zs      3.06    1.75    0.3263      0.6737
           dur      1.37    1.17    0.7305      0.2695
           dco      5.59    2.36    0.1788      0.8212
           dan      7.46    2.73    0.1341      0.8659
           dut      2.22    1.49    0.4498      0.5502
           dup      1.13    1.06    0.8887      0.1113
           dca      1.21    1.10    0.8296      0.1704
           dse      2.05    1.43    0.4875      0.5125
           dsy      1.38    1.18    0.7233      0.2767
            yl      1.36    1.17    0.7340      0.2660
           pur      1.16    1.08    0.8612      0.1388
           mat      1.46    1.21    0.6852      0.3148
           amt      2.70    1.64    0.3708      0.6292
           ycv      1.10    1.05    0.9095      0.0905
           pro      2.09    1.44    0.4795      0.5205
            mb      1.76    1.33    0.5666      0.4334
            ta      1.41    1.19    0.7107      0.2893
           lev      1.82    1.35    0.5492      0.4508
            li      1.78    1.33    0.5612      0.4388
            si      3.37    1.83    0.2971      0.7029
           unr      6.12    2.47    0.1633      0.8367
           spe      4.05    2.01    0.2468      0.7532
    ----------------------------------------------------
      Mean VIF      2.61
    
                               Cond
            Eigenval          Index
    ---------------------------------
        1     4.2709          1.0000
        2     3.3002          1.1376
        3     2.1975          1.3941
        4     1.5268          1.6725
        5     1.4322          1.7269
        6     1.3057          1.8086
        7     1.1789          1.9034
        8     1.1252          1.9482
        9     1.0541          2.0129
        10     0.8856          2.1961
        11     0.8616          2.2264
        12     0.8253          2.2748
        13     0.7241          2.4286
        14     0.6657          2.5329
        15     0.6308          2.6020
        16     0.5993          2.6694
        17     0.5105          2.8925
        18     0.4487          3.0853
        19     0.4239          3.1743
        20     0.3123          3.6980
        21     0.2173          4.4334
        22     0.1886          4.7592
        23     0.1394          5.5350
        24     0.0941          6.7352
        25     0.0815          7.2412
    ---------------------------------
     Condition Number         7.2412
     Eigenvalues & Cond Index computed from deviation sscp (no intercept)
     Det(correlation matrix)    0.0000
    Thus, if the constant term is dropped, the collinearity problem seems not to be present.

    I have also used -colldiag- to analyze the conditioning matrix. However, since -fit- has be to run before -colldiag-, but -fit- cannot be used without constant term, I cannot check the conditioning matrix without the constant term.

    Question 3. Does anyone know a way to do this?

    [NOTE: I do not include the results from running -colldiag- because I would exceed the maximum numer of characters]

    The results from running -colldiag- show that just two conditioning indexes are above 30 (their values are 79.0018 and 35.8414). The variances associated to them that are above 0.5 correspond to some dquarter* variables (ie, time fixed effects variables and their values are just slightly above the threshold) and to the constant term.

    Indeed, the variance of the constant is almost 1 (0.9610).

    Question 4. Would this provide additional evidence that if I keep the constant term a potential multicollineartiy problem is present?

    The variables xx, xm and mm are mutually exclusive dummies that, if equal to 1, account for 87.5% of the observations. Something similar occurs as regards the variables dco and dan.

    Question 5. Taking into account this, and given the results from -collin- and -colldiag- would it be advisable to estimate the model without the constant term?

    Thanks in advance for any help.

  • #2
    Why are you bothered about multicollinearity when you have reasonable estimates of standard errors for all of your coefficients? I don't see any problem in your model that flags multicollinearity to be worrying. If there was high multicollinearity, Stata would have aotumatically flagged and dropped that variable from your model. I would rather spend time to make the model more meaningful. It looks clumsy with too many variables.
    Roman

    Comment

    Working...
    X