Hi there,
I would like to code up a simulation to demonstrate the role of residuals in deciding on the number of polynomials to include. The idea being that with the right number of polynomials included, there shouldn't be any obvious relationship between the residuals and the X variable. I have started with the following but it looks like the inclusion of a quadratic term seems enough which is odd since the relationship between Y and X is supposed to be cubic. Would appreciate any advice on how to proceed.
clear all
set obs 5000
set seed 1000
g X = runiform()
g Y = 3*(X^3) + rnormal()
twoway scatter Y X
reg Y X
predict e, residuals
twoway scatter e X
gen X2 = X^2
gen X3 = X^3
reg Y X X2
predict e_b, residuals
twoway scatter e_b X
reg Y X X2 X3
predict e_c, residuals
twoway scatter e_c X
Many thanks
Karen
I would like to code up a simulation to demonstrate the role of residuals in deciding on the number of polynomials to include. The idea being that with the right number of polynomials included, there shouldn't be any obvious relationship between the residuals and the X variable. I have started with the following but it looks like the inclusion of a quadratic term seems enough which is odd since the relationship between Y and X is supposed to be cubic. Would appreciate any advice on how to proceed.
clear all
set obs 5000
set seed 1000
g X = runiform()
g Y = 3*(X^3) + rnormal()
twoway scatter Y X
reg Y X
predict e, residuals
twoway scatter e X
gen X2 = X^2
gen X3 = X^3
reg Y X X2
predict e_b, residuals
twoway scatter e_b X
reg Y X X2 X3
predict e_c, residuals
twoway scatter e_c X
Many thanks
Karen
Comment