Highly significant regression coefficients and low correlations

alex badalyan

Join Date: Feb 2018

Posts: 24
#1

Highly significant regression coefficients and low correlations

20 Feb 2018, 11:37

Hi, I have a model with five indepvars. Looking at the correlation matrix, some of the indepvars are highly correlated with the depvar. However, when running a set of regressions, the only depvar that shows consistent statistical significance in regression has -0.02 correlation coefficient, i.e. has the lowest correlation with the dependent variable out of the whole set. What is the intuition behind interpreting this? I would be grateful to hear your view.

Best,

Alex
Tags: panel data, regression

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

20 Feb 2018, 11:46

Alex:
without taking a look at your data, with a bit of guess-work I would say that what you experience is due to the fact that you compare the results from multiple correlation with the ones obtained from different (simple?) regressions.
As you can see from the folowing toy-example significance (which is usually oversold) can come and go:

Code:

. sysuse auto.dta
. pwcorr price mpg weight, sig

             |    price      mpg   weight
-------------+---------------------------
       price |   1.0000
             |
             |
         mpg |  -0.4686   1.0000
             |   0.0000
             |
      weight |   0.5386  -0.8072   1.0000
             |   0.0000   0.0000
             |

. reg price mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |   139449474         1   139449474   Prob > F        =    0.0000
    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |   635065396        73  8699525.97   Root MSE        =    2623.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. reg price weight

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     29.42
       Model |   184233937         1   184233937   Prob > F        =    0.0000
    Residual |   450831459        72  6261548.04   R-squared       =    0.2901
-------------+----------------------------------   Adj R-squared   =    0.2802
       Total |   635065396        73  8699525.97   Root MSE        =    2502.3

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   2.044063   .3768341     5.42   0.000     1.292857    2.795268
       _cons |  -6.707353    1174.43    -0.01   0.995     -2347.89    2334.475
------------------------------------------------------------------------------

. reg price weight mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     14.74
       Model |   186321280         2  93160639.9   Prob > F        =    0.0000
    Residual |   448744116        71  6320339.67   R-squared       =    0.2934
-------------+----------------------------------   Adj R-squared   =    0.2735
       Total |   635065396        73  8699525.97   Root MSE        =      2514

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   1.746559   .6413538     2.72   0.008      .467736    3.025382
         mpg |  -49.51222   86.15604    -0.57   0.567    -221.3025     122.278
       _cons |   1946.069    3597.05     0.54   0.590    -5226.245    9118.382
------------------------------------------------------------------------------

.

A an aside, simple regressions usually suffer from omitted variable bias, that makes their results untrustworthy (despite their statistical significance!).
If what above does not give you any useful hint, as per FAQ please what you typed and what Stata gave you back. Thanks.

Last edited by Carlo Lazzaro; 20 Feb 2018, 11:54.

Kind regards,
Carlo
(Stata 19.0)

Comment

alex badalyan

Join Date: Feb 2018

Posts: 24
#3

20 Feb 2018, 11:55

Originally posted by Carlo Lazzaro View Post

Alex:
without taking a look at your data, with a bit of guess-work I would say that what you experience is due to the fact that you compare the results from multiple correlation with the ones obtained from different (simple?) regressions.
If what about does not give you any useful hint, as per FAQ please what you typed and what Stata gave you back. Thanks.

Hi Carlo:

Commands:

Code:

mi estimate, post: xtreg depvar indepvar1 indepvar2 indepvar3 indepvar 4 indepvar5, fe vce(robust) correlate depvar indepvar1 indepvar2 indepvar3 indepvar 4 indepvar5

Basically the situation is that indepvar2 is the only significant coefficient in my model. At the same time, it has the lowest correlation with the depvar. I just wanted to think about what one can say in this instance, as it seems a bit counterintuitive - normally the significant explanatory variables have quite high correlations with dependent variables. Thoughts?

Edit: the same is true if we look at data without imputations.
Edit2: the significance of indepvar2 persists through a variety of "stress tests" and changes to the model

Last edited by alex badalyan; 20 Feb 2018, 12:02.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

20 Feb 2018, 12:04

Nothing unusual or surprising here. What it means is that after you adjust for the contributions of the other independent variables, indepvar2 ends up having the highest separate significance. These variables are evidently rather heavily correlated with each other, and so they compete with each other as explanations for the variance in your outcome. As it happens, indepvar2 turns out to be the winner of that competition. This sort of thing happens often.

I think it's counterintuitiveness reflects more on your intuitions than on the phenomenon itself. In fact, the way things shake out when a group of correlated variables enter a regression model is quite complicated. I think few statisticians or mathematicians would claim to even have any intuitions about what happens when you invert that covariance matrix and then multiply it by some other matrices. That is, I think one should abandon any attempt to even have intuitions about these matters. There are occasional simple cases where you can readily see what goes on, but those are truly the exceptions.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

20 Feb 2018, 12:09

Alex:
for the future, please post what Stata gave you back, too. Thanks.
My second guess is that you are comparing variable correlation vs coefficient correlations:

Code:

. use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
(1978 Automobile Data)

. correlate price mpg weight
(obs=74)

             |    price      mpg   weight
-------------+---------------------------
       price |   1.0000
         mpg |  -0.4686   1.0000
      weight |   0.5386  -0.8072   1.0000


. reg price weight mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     14.74
       Model |   186321280         2  93160639.9   Prob > F        =    0.0000
    Residual |   448744116        71  6320339.67   R-squared       =    0.2934
-------------+----------------------------------   Adj R-squared   =    0.2735
       Total |   635065396        73  8699525.97   Root MSE        =      2514

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   1.746559   .6413538     2.72   0.008      .467736    3.025382
         mpg |  -49.51222   86.15604    -0.57   0.567    -221.3025     122.278
       _cons |   1946.069    3597.05     0.54   0.590    -5226.245    9118.382
------------------------------------------------------------------------------

. estat vce, corr

Correlation matrix of coefficients of regress model

        e(V) |   weight       mpg     _cons
-------------+------------------------------
      weight |   1.0000                    
         mpg |   0.8072    1.0000          
       _cons |  -0.9501   -0.9447    1.0000

PS: crossed in the cyberspace with Clyde's helpful reply.

Kind regards,
Carlo
(Stata 19.0)

Comment

Dick Campbell

Join Date: Apr 2014

Posts: 279
#6

20 Feb 2018, 12:26

Quoting Clyde Schechter,

That is, I think one should abandon any attempt to even have intuitions about these matters. There are occasional simple cases where you can readily see what goes on, but those are truly the exceptions.

I tend to agree with this, but one article, published many years ago, explores this issue in some depth. See Robert A. Gordon, "Issues in Multiple Regression," American Journal of Sociology 73, no. 5 (Mar., 1968): 592-616. From the abstract:

Four major ways in which these regression coefficients can be seriously misleading are discussed. Although warnings concerning multicollinearity are to be found in statistics texts, they are insufficiently informative to prevent the mistakes described here. This is because the problem is essentially one of substantive interpretation rather than one of mathematical statistics per se.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment

Announcement