Checking for Multicollinearity in FE-Model

lisa bäcker

Join Date: Aug 2015

Posts: 62
#16

27 Oct 2015, 12:45

instead of taking age^2 I use the logartihm of all my variables.
I think it has nearly the same effect or?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#17

27 Oct 2015, 12:55

Lisa:
in a log-log model what changes is the interpretation of the relationship between coefficients and depvar.

Kind regards,
Carlo
(Stata 19.0)
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#18

27 Oct 2015, 12:58

Originally posted by lisa bäcker View Post

thank you Roman,
but I don't understand how to deal with your suggestion:
the variable age is of course the age of the customer,
but age1 is the centered variable (age-mean_age), so I need to define a new variable (age1) to tell stata to use the centered value, or?

By centering the "age" and creating a new variable you are doing nothing but copying "age" with a different scale and giving it a different name "age1'' which stata thinks a separate variable. Its like you have 'x' twice in the model but with two differnt entity. This is causing your multicollinearity and possibly as high as VIF>8. Why don't you just put 'age' in the model and 'age x income' in the model ? Scaled age (centered) is needed only if you want to have a meaningful intercept term and you don't have 'zero' in you observation. However, if you want that, you can use centered age, but in that case, the model will be: intercept + income + age1(centered) + c.age1#c.income

Roman
Comment
lisa bäcker

Join Date: Aug 2015

Posts: 62
#19

31 Oct 2015, 04:47

ok, thanks Roman.
But can I use interaction terms in a log-log-model?
thanks.
Kind regards,
lisa
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#20

31 Oct 2015, 05:07

Lisa:
yes, you can.

Kind regards,
Carlo
(Stata 19.0)
Comment
lisa bäcker

Join Date: Aug 2015

Posts: 62
#21

31 Oct 2015, 05:56

thanks for this information
Comment
lisa bäcker

Join Date: Aug 2015

Posts: 62
#22

31 Oct 2015, 10:56

I hope this is really the last question about this topic, I am sorry for all my questions.

If I have two variables which are quiet close, for example age and number of years of the customer-bank relationship:
How do I need to handle the combination of these variables? if I just include one variable it is positive, if I include both age is positive, the other one negative. If I include an interaction term (length*age) the two variables have a positiv sign again. Is an interaction term reasonable?

the vif is low and also the correlation between both is not too high. But why do I have a change in the sign if I include both variables? is this also a sign for multicollinearity, or is this reasonable?

I would be happy to get a response.
Thanks,
Lisa
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#23

31 Oct 2015, 15:05

Lisa:
interaction might be interesting, assumed that it is meaningful in your research field. More broadly: which regression model is suggested by previous researches on the same topic?

Kind regards,
Carlo
(Stata 19.0)
Comment
lisa bäcker

Join Date: Aug 2015

Posts: 62
#24

08 Nov 2015, 05:09

there is not much research done in this field. so it is really difficult for someone without experience in statistics to make such a research.
therefore I am really thankful for your help.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#25

08 Nov 2015, 05:19

Lisa:
focusing on the handful of published articles in your research field, which are the most suggested regression models?
Besides, it would be interesting to see what you typed and what Stata gave you back (as recommended by the FAQ).

Kind regards,
Carlo
(Stata 19.0)
Comment
Federico Tedeschi

Join Date: Mar 2015

Posts: 137
#26

02 Mar 2022, 07:41

Originally posted by Carlo Lazzaro View Post

Lisa:
Paul Allison (https://uk.sagepub.com/en-gb/eur/mul...ssion/book8989) at page 141 suggests 0.6 correlation as a "let's start-worrying-about-multicollinearity-threshold".

I took a look at that page. He actually talks about a R^2 above 0.6 in regressing one independent variable on all others, corresponding to a tolerance of 0.4 and a VIF of 2.5. With only 2 predictors, a R^2 of 0.6 would correspond to a correlation of (+/-) 0.775. I am also looking for a threshold for the correlation between estimators, that would seem to me more informative when one wants to analyze a specific pair of predictors that are known to be highly associated.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment