Dear forum members,
I am working with an OLS model to make predictions about consumption.
This model has three explanatory variables, its results seem to be sound, but the truth is that two of these explanatory variables (A and B) are correlated (a multicollinearity problem), and (as far as I know) this kind of situation should be avoided.
Removing one of these variables (let's say B) doesn't change very much the R^2 (it goes down a little). However, the well behaved model starts to present signs of heteroskedasticity, and its error terms grow. It seems that B prevents these errors from misbehaving.
My questions are:
1. How much should I care about a multicollinearity problem in a prediction model?
Some people quote Kutner et al (Applied Linear Statistical Models), to argue that these problems may not be very serious when we are dealing with prediction models. "The fact that some or all predictor variables are correlated among themselves does not, in general, inhibit our ability to obtain a good fit nor does it tend to affect inferences about mean responses or predictions of new observations." What are your opinions about this statement?
2. Is there a way I can use B to prevent this heteroskedasticity problem (as some kind of weight) and, at the same time, avoid its use as an explanatory variable?
Thank you very much for your comments.
I am working with an OLS model to make predictions about consumption.
This model has three explanatory variables, its results seem to be sound, but the truth is that two of these explanatory variables (A and B) are correlated (a multicollinearity problem), and (as far as I know) this kind of situation should be avoided.
Removing one of these variables (let's say B) doesn't change very much the R^2 (it goes down a little). However, the well behaved model starts to present signs of heteroskedasticity, and its error terms grow. It seems that B prevents these errors from misbehaving.
My questions are:
1. How much should I care about a multicollinearity problem in a prediction model?
Some people quote Kutner et al (Applied Linear Statistical Models), to argue that these problems may not be very serious when we are dealing with prediction models. "The fact that some or all predictor variables are correlated among themselves does not, in general, inhibit our ability to obtain a good fit nor does it tend to affect inferences about mean responses or predictions of new observations." What are your opinions about this statement?
2. Is there a way I can use B to prevent this heteroskedasticity problem (as some kind of weight) and, at the same time, avoid its use as an explanatory variable?
Thank you very much for your comments.
Comment