Concerning your syntax in #12:
I am guessing that you want to replicate what crossfold (from SSC) does. Here your code substantially revised (and using only 5 folds, not 10). I am only calculating the RMSE (biased) and RMSE (unbiased):
The list command (frame results) shows the RMSE for each fold as follows:
See whether the RMSE (biased, i.e. the SS divided by N) by this method is identical to the RMSE calculated by crossfold (from SSC) using the last fold:
- Please use code tags around your syntax (see FAQ 12.3). Read the complete FAQ, also 12.5 (you did post a Word document as an attachment -- I did not open it for obvious reasons).
- It seems that you did not understand my comments in #10 concerning the display command and the sum() function. Please read the help using
Code:
help display
Code:help sum()
Code:help sum
- To understand the use of frames, see
Code:
help frames
- You should indent commands enclosed by { } by some spaces to better see the structure of your program.
I am guessing that you want to replicate what crossfold (from SSC) does. Here your code substantially revised (and using only 5 folds, not 10). I am only calculating the RMSE (biased) and RMSE (unbiased):
Code:
cap frame change default cap frame drop results sysuse auto, clear keep price mpg headroom set seed 1234 gen rand = uniform() egen split = cut(rand), group(5) // split data set into 5 folds assigned value from 0 to 4 fre split frame create results fold n_t n rmse rmse_u forvalues i = 0/4 { * Fit the model using training set (split != `i') qui reg price mpg headroom if split != `i' local df_m = e(df_m) local n_t = e(N) * Calculate RMSE of unused group using coefficients of training set (split == `i') qui predict res_2 if split == `i', residuals qui replace res_2 = res_2^2 // square residuals qui sum res_2, meanonly local rmse = sqrt(r(mean)) local rmse_u = sqrt(r(sum)/(r(N) - `df_m' - 1)) * Save fold, n_t (n of training set), n (n of unused group), rmse and rmse_u (unbiased) in frame results: frame post results (`i') (`n_t') (r(N)) (`rmse') (`rmse_u') drop res_2 // drop squared residuals } frame results: list, noob
Code:
. frame results: list, noob +---------------------------------------+ | fold n_t n rmse rmse_u | |---------------------------------------| | 0 60 14 3142.947 3545.723 | | 1 59 15 2467.412 2758.651 | | 2 59 15 3746.17 4188.346 | | 3 59 15 1884.271 2106.679 | | 4 59 15 1863.891 2083.893 | +---------------------------------------+
Code:
. * Use -crossfold- (from SSC): . set seed 1234 . crossfold reg price mpg headroom | RMSE -------------+---------- est1 | 3221.696 est2 | 2333.683 est3 | 3827.52 est4 | 1795.796 est5 | 1923.156 . . * Calculate RMSE of last fold using method above (check if results are identical): . predict res_2 if !e(sample), residuals (60 missing values generated) . replace res_2 = res_2^2 (14 real changes made) . qui sum res_2 . di "RMSE of last fold: " sqrt(r(mean)) RMSE of last fold: 1923.1562
Comment