Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with simulation - plotting residuals to identify polynomials to be included in the model

    Hi there,
    I would like to code up a simulation to demonstrate the role of residuals in deciding on the number of polynomials to include. The idea being that with the right number of polynomials included, there shouldn't be any obvious relationship between the residuals and the X variable. I have started with the following but it looks like the inclusion of a quadratic term seems enough which is odd since the relationship between Y and X is supposed to be cubic. Would appreciate any advice on how to proceed.

    clear all
    set obs 5000
    set seed 1000

    g X = runiform()
    g Y = 3*(X^3) + rnormal()
    twoway scatter Y X

    reg Y X
    predict e, residuals
    twoway scatter e X

    gen X2 = X^2
    gen X3 = X^3

    reg Y X X2
    predict e_b, residuals
    twoway scatter e_b X

    reg Y X X2 X3
    predict e_c, residuals
    twoway scatter e_c X

    Many thanks
    Karen

  • #2
    HTML Code:
    clear all
    set obs 5000
    set seed 1000
    
    g X = rnormal()
    g Y = 1 + 2*X + .5*X^2 - 0.2*X^3 + rnormal()
    twoway scatter Y X
    
    reg Y X
    predict e, residuals
    twoway scatter e X
    
    gen X2 = X^2
    gen X3 = X^3
    
    reg Y X X2
    predict e_b, residuals
    twoway scatter e_b X
    
    reg Y X X2 X3
    predict e_c, residuals
    twoway scatter e_c X

    Comment


    • #3
      Your other option is to also look not at the residual plot (too many points) but at the predicted lines.

      Code:
      twoway lpoly e_b x || lpoly e_c x

      Comment


      • #4
        rvpplot X

        replaces 2 lines of code.

        Comment


        • #5
          Thank you so much George, this is really helpful.

          Comment


          • #6
            throw in -ovtest- as a kicker

            Comment

            Working...
            X