Hello everyone. I've written a Stata implementation of the Friedman, Hastie and Tibshirani (2010, JStatSoft) coordinate descent algorithm for elastic net regression and its famous special cases: lasso and ridge regression. The resultant command, elasticregress, is now available on ssc -- thanks to Kit Baum for the upload.
The command extends existing Stata lasso implementations, such as lars, by allowing the regularisation parameter to be given or found by K-fold cross-validation. As such it tends to have better out-of-sample fit. The below plot compares the performance of elasticregress, lars and OLS as the number of covariates increases. As is well known, OLS performs poorly on dense data. lars has roughly constant performance as the number of covariates increases while elasticregress becomes more accurate.
(The estimators are fitted on 1000 observations. The true relationships between the standard-normal covariates and the dependent variable are drawn from a spike and slab distribution with p=0.2 chance of being non-zero. Each dot is a mean over 30 replications. Both elasticregress and lars are calculated with their respective lasso options.)
elasticregress tends to be a little faster than lars when estimating the lasso. elasticregress can also estimate the more general elastic-net regression, which regularises with both the L1 and L2 norms and is thus more robust to colinearity in the regressors -- when it does so it can cross-validate both the regularisation parameter and the mixing parameter.
Hopefully the help files are self-contained -- do let me know if they're not. If you find a bug, please post an issue in the Github.
The command extends existing Stata lasso implementations, such as lars, by allowing the regularisation parameter to be given or found by K-fold cross-validation. As such it tends to have better out-of-sample fit. The below plot compares the performance of elasticregress, lars and OLS as the number of covariates increases. As is well known, OLS performs poorly on dense data. lars has roughly constant performance as the number of covariates increases while elasticregress becomes more accurate.
(The estimators are fitted on 1000 observations. The true relationships between the standard-normal covariates and the dependent variable are drawn from a spike and slab distribution with p=0.2 chance of being non-zero. Each dot is a mean over 30 replications. Both elasticregress and lars are calculated with their respective lasso options.)
elasticregress tends to be a little faster than lars when estimating the lasso. elasticregress can also estimate the more general elastic-net regression, which regularises with both the L1 and L2 norms and is thus more robust to colinearity in the regressors -- when it does so it can cross-validate both the regularisation parameter and the mixing parameter.
Hopefully the help files are self-contained -- do let me know if they're not. If you find a bug, please post an issue in the Github.
Comment