Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saving betas and using to predict future values

    Statalist:

    I am trying to do the following.

    1. Take values of health, education, labor variables for a sample of men at age 20 and regress them on a health_index at age 30, save the betas.
    2. Use these betas in a future equation to predict health at age 40.

    My code is below along with a sample of my data. I think that I am saving the betas correctly using "predict, xbs" but I am not entirely sure. Also, idk how to find these betas and use them later for predicting health at age 40.....

    Super thanks for any help on this.


    Code:
    *step one: creating a variable for health and controls values at age 20
    foreach v of varlist health_index lincomeb`k'_1 lincomeb`k'_2 fsize`k'_1 fsize`k'_2  fsize_incomeb`k' father_figure`k' male black bweight hgc02 hgcgrandmoth never_married married02 afqtmom lincomeb02 fsize02 southmom14 liveparents14 siblmom rural14 overweight accident illness health_work everproblem  welfare unemployment foodstamps income posted employed financialstrain married_r economic_index college trade {
    gen `v'2=`v' if age==20
    bys idc: egen `v'20=max(`v'2)
    bys idc: carryforward `v'20, replace
    }
    
    
    *step two: regress health_index at age 30+ with values at age 20, predict and save xbs
    local k = 4
    local distb1 "distp_cutoff`k'"
    local d = 0.85
    preserve
    keep if age>30 & all_male==1
    reg health_index lincomeb`k'_120 lincomeb`k'_220 fsize`k'_120 fsize`k'_220  fsize_incomeb`k'20 father_figure`k'20 black20 bweight20 hgc0220 hgcgrandmoth20 never_married20 married0220 afqtmom20 lincomeb0220 fsize0220 southmom1420 liveparents1420 siblmom20 rural1420 overweight20 accident20 illness20 health_work20 everproblem20  welfare20 unemployment20 foodstamps20 income20 posted20 employed20 financialstrain20 married_r20 economic_index20 college20 trade20
    predict predicthealth_30plus, xb
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(age health_index lincomeb4_120 married_r20)
     .         .         . .
    33  .1513923         . .
    29 .05537483         . .
     .         .         . .
     .         .         . .
    35 -.4958639         . .
     .         .         . .
     .         .         . .
     .         .         . .
    31 -.4118481         . .
    17  .3456535         . .
    13 -.4958639         . .
    21  .3456535         . .
    27  -.415097         . .
    29 -.4958639         . .
     .         .         . .
     .         .         . .
     .         .         . .
     .         .         . .
    13 -.4958639         . .
    33 -.4958639         . .
    31 -.4958639         . .
     .         .         . .
    17  .7807171         . .
    32         .         . .
     .         .         . .
     .         .         . .
    30         .         . .
    34         .         . .
     .         .         . .
    16  .6267617         . .
     .         .         . .
    36         .         . .
     .         .         . .
     .         .         . .
    27  .7150816         . .
     .         .         . .
    13 -.4958639         . .
     .         .         . .
    17 -.3903777         . .
    21  .7807171         . .
     .         .         . .
    25 -.4008482         . .
    23 .05537483         . .
    18 -.3760564  10.46894 0
    20 -.4008482  10.46894 0
    16  .3456535  10.46894 0
    12 -.4958639  10.46894 0
     .         .  10.46894 0
    14 -.4958639  10.46894 0
     .         .         . .
     .         .         . .
    13         .         . .
     .         .         . .
    17  .2604401         . .
     .         .         . .
     .         .         . .
    33 1.1192399         . .
     .         .         . .
    39 2.0166187         . .
     .         .         . .
    37  1.572919         . .
    35 2.0166187         . .
    21  .2604401         . .
     .         .  9.122956 0
     .         .  9.122956 0
    30 .08828463  9.122956 0
     .         .  9.122956 0
    36 2.0166187  9.122956 0
    34  .2184644  9.122956 0
    32 2.0166187  9.122956 0
     .         .  9.122956 0
    20  .2604401  9.122956 0
    16  .2604401  9.122956 0
     .         .  9.122956 0
    12         .  9.122956 0
     .         .  9.122956 0
     .         . 10.129276 0
    16  .9948865 10.129276 0
    12 2.0166187 10.129276 0
     .         . 10.129276 0
    24  .8136269 10.129276 0
     .         . 10.129276 0
    20 .45116645 10.129276 0
    30 1.2172137 10.129276 0
    28  .1012839 10.129276 0
    26  .8136269 10.129276 0
     .         . 10.129276 0
    21  -.388793         . .
     .         .         . .
     .         .         . .
    17  -.388793         . .
     .         .         . .
    13 -.4958639         . .
    23  -.415097         . .
    29  .2837221         . .
    27 -.4008482         . .
    25  -.415097         . .
    21  .4837536         . .
     .         .         . .
    end

  • #2
    Here's a simple example that shows how to use the unstandardized coefficients (bs) to create predicted values that are identical to the values created by the command predict. (Some sources use "betas" to refer to standardized regression coefficients.)

    For your purposes, you would want to use different variables (measured age 40) in line 5 (in bold), and you would almost certainly obtain different results than if you used the same variables (measured at age 30) that were used in the regression. That is, line 6 would yield a correlation less than 1.000 and line 7 would yield an error message indicating that the two variables are not identical. Other than for observing these differences, you would not use predict at all.

    Code:
    sysuse nlsw88, clear
    regress wage union grade age
    display _b[union] " " _b[grade] " " _b[age]
    predict yhat
    generate yhat2 = _b[_cons] + _b[union]*union + _b[grade]*grade + _b[age]*age
    correlate yhat yhat2
    assert yhat == yhat2
    David Radwin
    Senior Researcher, California Competes
    californiacompetes.org
    Pronouns: He/Him

    Comment


    • #3
      You are probably overcomplicating things. The command -predict- can make out of sample predictions. Here is an example

      Code:
      . sysuse auto, clear
      (1978 automobile data)
      
      . reg price mpg headroom if rep<4
      
            Source |       SS           df       MS      Number of obs   =        40
      -------------+----------------------------------   F(2, 37)        =     10.94
             Model |   170071376         2  85035688.2   Prob > F        =    0.0002
          Residual |   287537806        37  7771292.06   R-squared       =    0.3717
      -------------+----------------------------------   Adj R-squared   =    0.3377
             Total |   457609183        39  11733568.8   Root MSE        =    2787.7
      
      ------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               mpg |   -559.563   121.8424    -4.59   0.000    -806.4392   -312.6868
          headroom |  -528.3248   532.1811    -0.99   0.327    -1606.626    549.9765
             _cons |    18784.8   3423.387     5.49   0.000     11848.35    25721.24
      ------------------------------------------------------------------------------
      
      . predict pricehat if rep>=4
      (option xb assumed; fitted values)
      (40 missing values generated)
      
      . summ pricehat
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
          pricehat |         34    4159.781    3661.249  -5742.264   9365.938
      
      .
      So I estimated the regression over the sample where rep<4, and then predicted out of sample where rep>=4.

      Comment


      • #4
        OK, predict can use different values than were used in the regression, but can it use different variables than those in the regression? The original poster asked about making predictions using data collected 10-20 years after the variables used to fit the regression model.
        David Radwin
        Senior Researcher, California Competes
        californiacompetes.org
        Pronouns: He/Him

        Comment


        • #5
          Originally posted by David Radwin View Post
          OK, predict can use different values than were used in the regression, but can it use different variables than those in the regression? The original poster asked about making predictions using data collected 10-20 years after the variables used to fit the regression model.
          If you are using different variables, this is no loger the same regression model, so there is no sense in which these are "predictions from the regression model." Of course if you want to construct predictions from a different model you need to do it manually in the lines you showed above.

          And my comment was not to you, I do not see any harm in showing manually what predict does, like you did above.

          My comment was to OP, whose question and code that he showed are both overcomplicated and messy. My guess was that OP does not know that -predict- is already designed to make out of sample predictions.

          Comment


          • #6
            Thank you both for your help on this. You are both right. I didn't quite understand predict right, but also, what I am trying to do is with David's example. Thanks for the assistance on this.

            Comment

            Working...
            X