I am doing an OLS regression with quite a large number of variables. This is one of the specifications I am working with:
To test for multicollinearity using -collin-, I exclude (besides the dependent variable) dquarter* and dprimarysiccode*, ie, the fixed effects. Otherwise, I get the following message:
.
Question 1. Is this correct?
All the variables have a reasonable value of VIF. However, the conditioning index seems too high. As far as I know, this could suggest that there is a collinearity problem associated to the constant term.
Question 2. Is this correct?
To exclude the constant term from the analysis, I run -collin- with the option -corr-:
Thus, if the constant term is dropped, the collinearity problem seems not to be present.
I have also used -colldiag- to analyze the conditioning matrix. However, since -fit- has be to run before -colldiag-, but -fit- cannot be used without constant term, I cannot check the conditioning matrix without the constant term.
Question 3. Does anyone know a way to do this?
[NOTE: I do not include the results from running -colldiag- because I would exceed the maximum numer of characters]
The results from running -colldiag- show that just two conditioning indexes are above 30 (their values are 79.0018 and 35.8414). The variances associated to them that are above 0.5 correspond to some dquarter* variables (ie, time fixed effects variables and their values are just slightly above the threshold) and to the constant term.
Indeed, the variance of the constant is almost 1 (0.9610).
Question 4. Would this provide additional evidence that if I keep the constant term a potential multicollineartiy problem is present?
The variables xx, xm and mm are mutually exclusive dummies that, if equal to 1, account for 87.5% of the observations. Something similar occurs as regards the variables dco and dan.
Question 5. Taking into account this, and given the results from -collin- and -colldiag- would it be advisable to estimate the model without the constant term?
Thanks in advance for any help.
Code:
regress y xm xx mm $st $fes $fas $fis dquarter* dprimarysiccode*, cluster(firm) Linear regression Number of obs = 1549 F( 55, 106) = 12.72 Prob > F = 0.0000 R-squared = 0.4078 Root MSE = .23525 (Std. Err. adjusted for 107 clusters in firm) ------------------------------------------------------------------------------ | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- xm | .0350857 .0530344 0.66 0.510 -.0700602 .1402316 xx | -.1241785 .0472125 -2.63 0.010 -.2177819 -.0305751 mm | -.1093333 .0456007 -2.40 0.018 -.1997411 -.0189255 zs | -.0113633 .0092922 -1.22 0.224 -.029786 .0070593 dur | -.0655198 .039015 -1.68 0.096 -.1428707 .0118311 dco | .2530042 .0943562 2.68 0.009 .0659339 .4400745 dan | .2723561 .1063543 2.56 0.012 .0614983 .4832139 dut | .0419444 .0713666 0.59 0.558 -.0995468 .1834355 dup | -.0193242 .0347322 -0.56 0.579 -.0881841 .0495357 dca | -.0648063 .0478411 -1.35 0.178 -.159656 .0300433 dse | -.0399489 .0429487 -0.93 0.354 -.1250987 .045201 dsy | -.0358723 .0587616 -0.61 0.543 -.152373 .0806284 yl | -.0963299 .0432366 -2.23 0.028 -.1820506 -.0106091 pur | -.0504213 .0283916 -1.78 0.079 -.1067104 .0058677 mat | .0011013 .0010494 1.05 0.296 -.0009792 .0031819 amt | .0001559 .0000992 1.57 0.119 -.0000408 .0003525 ycv | .0734919 .0517773 1.42 0.159 -.0291617 .1761454 pro | .4624523 .4935774 0.94 0.351 -.516113 1.441018 mb | -.0337071 .0159716 -2.11 0.037 -.0653723 -.002042 ta | .0165495 .0940275 0.18 0.861 -.1698691 .2029682 lev | .531612 .0906072 5.87 0.000 .3519745 .7112496 li | -.0336324 .0163566 -2.06 0.042 -.0660609 -.0012039 si | -.0641883 .0194685 -3.30 0.001 -.1027865 -.02559 unr | .2487435 .0704326 3.53 0.001 .109104 .3883831 spe | .1181584 .0627558 1.88 0.062 -.006261 .2425779 dquarter1 | -.138636 .0757786 -1.83 0.070 -.2888745 .0116025 dquarter2 | -.1464862 .0738429 -1.98 0.050 -.2928868 -.0000855 dquarter3 | -.1559278 .0753829 -2.07 0.041 -.3053817 -.0064739 dquarter4 | -.1208113 .0738601 -1.64 0.105 -.2672462 .0256235 dquarter5 | -.1211856 .0741939 -1.63 0.105 -.2682822 .0259109 dquarter6 | -.1000479 .0762635 -1.31 0.192 -.2512476 .0511519 dquarter7 | -.0960139 .0767273 -1.25 0.214 -.2481333 .0561055 dquarter8 | -.0625375 .0779553 -0.80 0.424 -.2170914 .0920164 dquarter9 | -.0596899 .0758053 -0.79 0.433 -.2099812 .0906015 dquarter10 | -.0391873 .0774287 -0.51 0.614 -.1926973 .1143227 dquarter11 | -.0083287 .0788968 -0.11 0.916 -.1647492 .1480918 dquarter12 | .0267371 .08208 0.33 0.745 -.1359945 .1894687 dquarter13 | .0008081 .0787844 0.01 0.992 -.1553896 .1570058 dquarter14 | -.0341349 .0728937 -0.47 0.641 -.1786537 .1103839 dquarter15 | -.0422517 .0719317 -0.59 0.558 -.1848633 .1003598 dquarter16 | -.0721815 .0675334 -1.07 0.288 -.206073 .06171 dquarter17 | -.0450987 .0717747 -0.63 0.531 -.1873989 .0972015 dquarter18 | -.0837943 .0745701 -1.12 0.264 -.2316368 .0640482 dquarter19 | -.0872572 .0734328 -1.19 0.237 -.2328449 .0583306 dquarter20 | -.117614 .0707423 -1.66 0.099 -.2578675 .0226395 dquarter21 | -.0987901 .0715946 -1.38 0.171 -.2407335 .0431532 dquarter22 | -.0597826 .0693939 -0.86 0.391 -.1973626 .0777975 dquarter23 | -.0227661 .0628482 -0.36 0.718 -.1473688 .1018366 dquarter24 | (dropped) dquarter25 | -.027545 .0914456 -0.30 0.764 -.2088448 .1537547 dprimarysi~1 | -.0624944 .105257 -0.59 0.554 -.2711766 .1461879 dprimarysi~2 | -.0201591 .0481707 -0.42 0.676 -.1156622 .0753441 dprimarysi~3 | .0293575 .0502924 0.58 0.561 -.0703521 .1290672 dprimarysi~4 | -.117912 .0627035 -1.88 0.063 -.2422278 .0064037 dprimarysi~5 | -.0167383 .0669127 -0.25 0.803 -.1493992 .1159226 dprimarysi~6 | (dropped) dprimarysi~7 | .0414913 .0712774 0.58 0.562 -.0998231 .1828057 _cons | .3202686 .1846133 1.73 0.086 -.0457453 .6862825 ------------------------------------------------------------------------------
Code:
corr(): matrix has zero or negative values on diagonal
Question 1. Is this correct?
Code:
collin xm xx mm $st $fes $fas $fis (obs=1549) Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- xm 3.29 1.81 0.3037 0.6963 xx 3.03 1.74 0.3305 0.6695 mm 3.19 1.79 0.3134 0.6866 zs 3.06 1.75 0.3263 0.6737 dur 1.37 1.17 0.7305 0.2695 dco 5.59 2.36 0.1788 0.8212 dan 7.46 2.73 0.1341 0.8659 dut 2.22 1.49 0.4498 0.5502 dup 1.13 1.06 0.8887 0.1113 dca 1.21 1.10 0.8296 0.1704 dse 2.05 1.43 0.4875 0.5125 dsy 1.38 1.18 0.7233 0.2767 yl 1.36 1.17 0.7340 0.2660 pur 1.16 1.08 0.8612 0.1388 mat 1.46 1.21 0.6852 0.3148 amt 2.70 1.64 0.3708 0.6292 ycv 1.10 1.05 0.9095 0.0905 pro 2.09 1.44 0.4795 0.5205 mb 1.76 1.33 0.5666 0.4334 ta 1.41 1.19 0.7107 0.2893 lev 1.82 1.35 0.5492 0.4508 li 1.78 1.33 0.5612 0.4388 si 3.37 1.83 0.2971 0.7029 unr 6.12 2.47 0.1633 0.8367 spe 4.05 2.01 0.2468 0.7532 ---------------------------------------------------- Mean VIF 2.61 Cond Eigenval Index --------------------------------- 1 12.6253 1.0000 2 2.3336 2.3260 3 1.7757 2.6664 4 1.3940 3.0094 5 1.1074 3.3765 6 0.8993 3.7469 7 0.8329 3.8935 8 0.7367 4.1397 9 0.6745 4.3263 10 0.6550 4.3902 11 0.5188 4.9333 12 0.4313 5.4105 13 0.4056 5.5790 14 0.3061 6.4224 15 0.2463 7.1600 16 0.1891 8.1705 17 0.1810 8.3517 18 0.1605 8.8704 19 0.1307 9.8284 20 0.1118 10.6248 21 0.0935 11.6230 22 0.0750 12.9734 23 0.0478 16.2546 24 0.0392 17.9361 25 0.0230 23.4378 26 0.0058 46.5849 --------------------------------- Condition Number 46.5849 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.0000
Question 2. Is this correct?
To exclude the constant term from the analysis, I run -collin- with the option -corr-:
Code:
collin xm xx mm $st $fes $fas $fis, corr (obs=1549) Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- xm 3.29 1.81 0.3037 0.6963 xx 3.03 1.74 0.3305 0.6695 mm 3.19 1.79 0.3134 0.6866 zs 3.06 1.75 0.3263 0.6737 dur 1.37 1.17 0.7305 0.2695 dco 5.59 2.36 0.1788 0.8212 dan 7.46 2.73 0.1341 0.8659 dut 2.22 1.49 0.4498 0.5502 dup 1.13 1.06 0.8887 0.1113 dca 1.21 1.10 0.8296 0.1704 dse 2.05 1.43 0.4875 0.5125 dsy 1.38 1.18 0.7233 0.2767 yl 1.36 1.17 0.7340 0.2660 pur 1.16 1.08 0.8612 0.1388 mat 1.46 1.21 0.6852 0.3148 amt 2.70 1.64 0.3708 0.6292 ycv 1.10 1.05 0.9095 0.0905 pro 2.09 1.44 0.4795 0.5205 mb 1.76 1.33 0.5666 0.4334 ta 1.41 1.19 0.7107 0.2893 lev 1.82 1.35 0.5492 0.4508 li 1.78 1.33 0.5612 0.4388 si 3.37 1.83 0.2971 0.7029 unr 6.12 2.47 0.1633 0.8367 spe 4.05 2.01 0.2468 0.7532 ---------------------------------------------------- Mean VIF 2.61 Cond Eigenval Index --------------------------------- 1 4.2709 1.0000 2 3.3002 1.1376 3 2.1975 1.3941 4 1.5268 1.6725 5 1.4322 1.7269 6 1.3057 1.8086 7 1.1789 1.9034 8 1.1252 1.9482 9 1.0541 2.0129 10 0.8856 2.1961 11 0.8616 2.2264 12 0.8253 2.2748 13 0.7241 2.4286 14 0.6657 2.5329 15 0.6308 2.6020 16 0.5993 2.6694 17 0.5105 2.8925 18 0.4487 3.0853 19 0.4239 3.1743 20 0.3123 3.6980 21 0.2173 4.4334 22 0.1886 4.7592 23 0.1394 5.5350 24 0.0941 6.7352 25 0.0815 7.2412 --------------------------------- Condition Number 7.2412 Eigenvalues & Cond Index computed from deviation sscp (no intercept) Det(correlation matrix) 0.0000
I have also used -colldiag- to analyze the conditioning matrix. However, since -fit- has be to run before -colldiag-, but -fit- cannot be used without constant term, I cannot check the conditioning matrix without the constant term.
Question 3. Does anyone know a way to do this?
[NOTE: I do not include the results from running -colldiag- because I would exceed the maximum numer of characters]
The results from running -colldiag- show that just two conditioning indexes are above 30 (their values are 79.0018 and 35.8414). The variances associated to them that are above 0.5 correspond to some dquarter* variables (ie, time fixed effects variables and their values are just slightly above the threshold) and to the constant term.
Indeed, the variance of the constant is almost 1 (0.9610).
Question 4. Would this provide additional evidence that if I keep the constant term a potential multicollineartiy problem is present?
The variables xx, xm and mm are mutually exclusive dummies that, if equal to 1, account for 87.5% of the observations. Something similar occurs as regards the variables dco and dan.
Question 5. Taking into account this, and given the results from -collin- and -colldiag- would it be advisable to estimate the model without the constant term?
Thanks in advance for any help.
Comment