You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
thank you Roman,
but I don't understand how to deal with your suggestion:
the variable age is of course the age of the customer,
but age1 is the centered variable (age-mean_age), so I need to define a new variable (age1) to tell stata to use the centered value, or?
By centering the "age" and creating a new variable you are doing nothing but copying "age" with a different scale and giving it a different name "age1'' which stata thinks a separate variable. Its like you have 'x' twice in the model but with two differnt entity. This is causing your multicollinearity and possibly as high as VIF>8. Why don't you just put 'age' in the model and 'age x income' in the model ? Scaled age (centered) is needed only if you want to have a meaningful intercept term and you don't have 'zero' in you observation. However, if you want that, you can use centered age, but in that case, the model will be: intercept + income + age1(centered) + c.age1#c.income
I hope this is really the last question about this topic, I am sorry for all my questions.
If I have two variables which are quiet close, for example age and number of years of the customer-bank relationship:
How do I need to handle the combination of these variables? if I just include one variable it is positive, if I include both age is positive, the other one negative. If I include an interaction term (length*age) the two variables have a positiv sign again. Is an interaction term reasonable?
the vif is low and also the correlation between both is not too high. But why do I have a change in the sign if I include both variables? is this also a sign for multicollinearity, or is this reasonable?
Lisa:
interaction might be interesting, assumed that it is meaningful in your research field. More broadly: which regression model is suggested by previous researches on the same topic?
there is not much research done in this field. so it is really difficult for someone without experience in statistics to make such a research.
therefore I am really thankful for your help.
Lisa:
focusing on the handful of published articles in your research field, which are the most suggested regression models?
Besides, it would be interesting to see what you typed and what Stata gave you back (as recommended by the FAQ).
I took a look at that page. He actually talks about a R^2 above 0.6 in regressing one independent variable on all others, corresponding to a tolerance of 0.4 and a VIF of 2.5. With only 2 predictors, a R^2 of 0.6 would correspond to a correlation of (+/-) 0.775. I am also looking for a threshold for the correlation between estimators, that would seem to me more informative when one wants to analyze a specific pair of predictors that are known to be highly associated.
Comment