Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Concerning your syntax in #12:
    • Please use code tags around your syntax (see FAQ 12.3). Read the complete FAQ, also 12.5 (you did post a Word document as an attachment -- I did not open it for obvious reasons).
    • It seems that you did not understand my comments in #10 concerning the display command and the sum() function. Please read the help using
      Code:
      help display
      and
      Code:
      help sum()
      or better:
      Code:
      help sum
      .
    • To understand the use of frames, see
      Code:
      help frames
      (you probably should read the complete PDF manual entry).
    • You should indent commands enclosed by { } by some spaces to better see the structure of your program.

    I am guessing that you want to replicate what crossfold (from SSC) does. Here your code substantially revised (and using only 5 folds, not 10). I am only calculating the RMSE (biased) and RMSE (unbiased):
    Code:
    cap frame change default
    cap frame drop results
    
    sysuse auto, clear
    keep price mpg headroom
    set seed 1234
    gen rand = uniform()
    egen split = cut(rand), group(5) // split data set into 5 folds assigned value from 0 to 4
    fre split
    
    frame create results fold n_t n rmse rmse_u
    forvalues i = 0/4 {
       * Fit the model using training set (split != `i')
       qui reg price mpg headroom if split != `i'
       local df_m = e(df_m)
       local n_t = e(N)
       
       * Calculate RMSE of unused group using coefficients of training set (split == `i')
       qui predict res_2 if split == `i', residuals
       qui replace res_2 = res_2^2   // square residuals
       qui sum res_2, meanonly
       local rmse = sqrt(r(mean))
       local rmse_u = sqrt(r(sum)/(r(N) - `df_m' - 1))
     
       * Save fold, n_t (n of training set), n (n of unused group), rmse and rmse_u (unbiased) in frame results:
       frame post results (`i') (`n_t') (r(N)) (`rmse') (`rmse_u')
       
       drop res_2  // drop squared residuals
    }
    
    frame results: list, noob
    The list command (frame results) shows the RMSE for each fold as follows:
    Code:
    . frame results: list, noob
    
      +---------------------------------------+
      | fold   n_t    n       rmse     rmse_u |
      |---------------------------------------|
      |    0    60   14   3142.947   3545.723 |
      |    1    59   15   2467.412   2758.651 |
      |    2    59   15    3746.17   4188.346 |
      |    3    59   15   1884.271   2106.679 |
      |    4    59   15   1863.891   2083.893 |
      +---------------------------------------+
    See whether the RMSE (biased, i.e. the SS divided by N) by this method is identical to the RMSE calculated by crossfold (from SSC) using the last fold:
    Code:
    . * Use -crossfold- (from SSC):
    . set seed 1234
    
    . crossfold reg price mpg headroom
    
                 |      RMSE
    -------------+----------
            est1 |  3221.696
            est2 |  2333.683
            est3 |   3827.52
            est4 |  1795.796
            est5 |  1923.156
    
    .
    . * Calculate RMSE of last fold using method above (check if results are identical):
    . predict res_2 if !e(sample), residuals
    (60 missing values generated)
    
    . replace res_2 = res_2^2
    (14 real changes made)
    
    . qui sum res_2
    
    . di "RMSE of last fold: " sqrt(r(mean))
    RMSE of last fold: 1923.1562

    Comment


    • #17
      Dear Dirk,

      Thank you very much for your careful guidance in your syntax #16. From your code, I've tried to apply them in my own data set (I've posted example of my data set at the end of this email). Given the outcome of the model is a zero inflated distribution which is score of dental caries at age of five (fiveyoddmfs), I employed zero inflated model (zinb) with SES10 is independent variable.

      The model follows the command: zinb fiveyoddmfs SES10, inflate( SES10) level(97.5).

      Unfortunately, after fitting this zinb model, it seems that the option for predicting residual is not allowed.

      I wonder whether to calculate residual in this case, should i have to calculate residuals through predicting yhat of the fiveyoddmfs at first ?

      Please give your advice on this matter.

      Sorry for occupying your time,

      I'm realy appreciated your helps.




      [CODE]

      * Using Stata 18

      cap frame change default
      cap frame drop results


      set seed 1234
      gen rand = uniform()
      egen split = cut(rand), group(5) // split data set into 5 folds assigned value from 0 to 4
      tab split // command :fre" doesn't go through then i change to "tab"

      frame create results fold n_t n rmse rmse_u
      forvalues i = 0/4 {

      * Fit the model using training set (split != `i')
      zinb fiveyoddmfs SES10 if split != `i', inflate( SES10) level(97.5)

      local df_m = e(df_m)
      local n_t = e(N)

      * Calculate RMSE of unused group using coefficients of training set (split == `i')
      qui predict res_2_`i' if split == `i', residuals // option residuals is not allowed after fitting zinb
      qui replace res_2_`i' = (res_2_`i')^2 if split == `i' // square residuals
      qui sum res_2_`i' if split == `i', meanonly
      local rmse_`i' = sqrt(r(mean))
      local rmse_u_`i' = sqrt(r(sum)/(r(N) - `df_m' - 1))

      * Save fold, n_t (n of training set), n (n of unused group), rmse and rmse_u (unbiased) in frame results:
      frame post results (`i') (`n_t') (r(N)) (`rmse') (`rmse_u')

      drop res_2_`i' // drop squared residuals

      }


      * example data set

      clear
      input byte fiveyoddmfs float SES10
      . -11.331564
      . 8.1011715
      0 2.2212045
      0 7.33273
      0 -3.352976
      . 1.6334463
      . 1.828287
      . -6.747619
      0 6.203488
      . -2.489392
      . .
      . .5171063
      . 1.825728
      . .8692052
      6 -.48989475
      . 1.3699383
      . -2.1161172
      . 8.622786
      . .
      . .
      . 3.388796
      . -8.907704
      1 6.87164
      . .
      . -.56183946
      0 2.469089
      . -2.2184033
      . .
      2 4.276748
      0 .
      0 -4.126646
      . .
      . .
      . .
      . -4.3964844
      . .
      0 5.789186
      0 4.358637
      1 -3.786526
      0 .
      . .
      . -1.530157
      0 .
      . .
      . -9.897887
      0 5.268062
      . 8.049155
      0 -1.3490996
      . 3.330214
      . 4.95108
      . 4.0023966
      0 7.586362
      0 -.20663673
      . -.3952223
      0 3.967337
      0 -4.779833
      . -6.933777
      1 -3.7553964
      0 5.819356
      . -1.063974
      . .
      0 -5.627989
      . 8.175615
      . .
      . .
      . -4.2680497
      . .
      0 -.3013507
      . -5.338384
      0 .15812564
      . 4.143319
      0 -.05721521
      . -2.547206
      . .9384828
      . 5.160995
      . 5.309409
      . 1.2886283
      0 3.14623
      0 1.367786
      . 2.430058
      2 2.1759531
      0 2.853131
      0 -2.786535
      0 6.648635
      . .
      0 2.413101
      0 .6106424
      . .
      0 3.9899344
      11 -.6410409
      0 -2.212806
      . 5.004969
      0 .
      0 6.978215
      . 9.327445
      . .
      3 1.0532051
      . .
      . 1.762062
      0 -4.880984
      end
      Last edited by An Dao; 18 Feb 2024, 22:30.

      Comment


      • #18
        I would also like to ask about the second/late part of the command that you adviced in #16 which apply "crossfold" as below.

        Do you mean that after running "crossfold reg price mpg headroom", the screen automatically pops up the results of RSME for 5 fold in which that of the last fold is 1923.156. Then we should check to see whether this figure is the same with the RSME value of the last fold when apply the loop command at the first part in #16? or to compare the RSME coming directly from the crossfold command with those coming from the following syntax?

        I am a bit confused about why the RSME of the last fold when apply "crossfold" is 1923.156, but that from the loop command is 1863.891 after the syntax "frame results: list, noob" in #16

        I'm sorry for my these silly asks, but honestly, I'm the begginer in ML but keen on learing this.

        Appreciated.

        * The second part of command in #16
        crossfold reg price mpg headroom

        . * Calculate RMSE of last fold using method above (check if results are identical):
        . predict res_2 if !e(sample), residuals
        (60 missing values generated)

        . replace res_2 = res_2^2
        (14 real changes made)

        . qui sum res_2

        . di "RMSE of last fold: " sqrt(r(mean))
        RMSE of last fold: 1923.1562
        [/CODE][/QUOTE]
        Last edited by An Dao; 18 Feb 2024, 23:12.

        Comment

        Working...
        X