Hello Statalist community,
I am working with a panel dataset for U.S. banks and estimating the following model for different profitability measures:
〖profit〗_(i,t)^j=β_t 〖profit〗_(i,t-1)^j+γ_t 〖log(assets〗_(i,t-1))+ϵ_(i,t)^j , from t-5 to t.
for each bank from t-5 to t, for each profitability variable j. The profitability measures include:
foreach var in NetInterestIncome NetOperatingIncome PretaxProfit NetProfit ROAA_w {
* Run rolling regression for each bank
rangestat (reg) `var' L_`var' L_Assets, interval(year -5 0) by(FitchID)
* Compute residuals
gen residuals_`var' = `var' - b_cons - b_L_`var'*L_`var' - b_L_Assets*L_Assets
* Compute squared residuals
gen sq_res_`var' = residuals_`var'^2
* Compute RMSE for each bank
bysort FitchID (year): egen mean_res_`var' = mean(sq_res_`var')
gen RMSE_`var' = sqrt(mean_res_`var')
* Drop intermediate variables to avoid conflicts in the next iteration
drop mean_res_`var' sq_res_`var' residuals_`var'
drop reg_nobs reg_r2 reg_adj_r2 b_L_`var' b_L_Assets b_cons se_L_`var' se_L_Assets se_cons
}
Question:
Is this the correct way to compute RMSE for a rolling regression window of t-5 to t using rangestat?Are there any improvements or alternative approaches to ensure correct RMSE estimation?
(Attaching a snapshot of my dataset for reference.)
Thanks so much!
I am working with a panel dataset for U.S. banks and estimating the following model for different profitability measures:
〖profit〗_(i,t)^j=β_t 〖profit〗_(i,t-1)^j+γ_t 〖log(assets〗_(i,t-1))+ϵ_(i,t)^j , from t-5 to t.
for each bank from t-5 to t, for each profitability variable j. The profitability measures include:
- Net interest income
- Net operating income (net interest income + net other operating income)
- Profit before tax
- Net profit
- ROAA
foreach var in NetInterestIncome NetOperatingIncome PretaxProfit NetProfit ROAA_w {
* Run rolling regression for each bank
rangestat (reg) `var' L_`var' L_Assets, interval(year -5 0) by(FitchID)
* Compute residuals
gen residuals_`var' = `var' - b_cons - b_L_`var'*L_`var' - b_L_Assets*L_Assets
* Compute squared residuals
gen sq_res_`var' = residuals_`var'^2
* Compute RMSE for each bank
bysort FitchID (year): egen mean_res_`var' = mean(sq_res_`var')
gen RMSE_`var' = sqrt(mean_res_`var')
* Drop intermediate variables to avoid conflicts in the next iteration
drop mean_res_`var' sq_res_`var' residuals_`var'
drop reg_nobs reg_r2 reg_adj_r2 b_L_`var' b_L_Assets b_cons se_L_`var' se_L_Assets se_cons
}
Question:
Is this the correct way to compute RMSE for a rolling regression window of t-5 to t using rangestat?Are there any improvements or alternative approaches to ensure correct RMSE estimation?
(Attaching a snapshot of my dataset for reference.)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long FitchID int year double NetInterestIncome float NetOperatingIncome double PretaxProfit float NetProfit double ROAA_w 856 1994 6.812e+09 6.123e+09 2.115e+09 1.863e+09 .85 856 1995 7.592e+09 7.242e+09 3.804e+09 2.356e+09 1.06 856 1996 8.376e+09 8.148e+09 4.383e+09 2.75e+09 1.17 856 1997 8.538e+09 7.70e+09 4.181e+09 2.634e+09 1.01 856 1998 9.329e+09 7.517e+09 2.684e+09 1.70e+09 .56 856 1999 1.0232e+10 9709000704 4.890e+09 3.079e+09 .98 856 2000 1.0898e+10 1.2263e+10 7.755e+09 4.923e+09 1.32 856 2001 1.3822e+10 1.3471e+10 8.083e+09 5.27e+09 1.17 856 2002 2.0715e+10 2.1064e+10 9.682e+09 6.356e+09 1.31 856 2003 2.0471e+10 2.1598e+10 1.1143e+10 7.919e+09 1.43 856 2004 2.2416e+10 2.4038e+10 1.3103e+10 9.413e+09 1.39 856 2005 2.1173e+10 2.4507e+10 1.2439e+10 8.83e+09 1.25 856 2006 2.3896e+10 2.486e+10 1.2487e+10 9.338e+09 .94 856 2007 3.0803e+10 2.8686e+10 4.070e+08 2.304e+09 .18 856 2008 3.6235e+10 2.6373e+10 -9.529e+09 -2.101e+09 -.17 856 2009 3.2182e+10 2.6253e+10 -7.951e+09 -2.794e+09 -.23 856 2010 3.0282e+10 2.6624e+10 9.859e+09 7.904e+09 .65 856 2011 4.3306e+10 3.1554e+10 1.3423e+10 1.0509e+10 .81 856 2012 4.2283e+10 3.4775e+10 1.5251e+10 1.1845e+10 .88 856 2013 4.0076e+10 3.2493e+10 2.0493e+10 1.43e+10 1.06 end
Comment