Dear All,
I am a newcomer to Stata machine learning method. I am currently working on a cross-sectional country level data analysis with 72 observations with two basic modeling: (1) simple OLS; (2) Heckman twostep, and then I want to run a post-estimation prediction for the non-selected countries.
I am wondering whether the new feature of Stata machine learning can be adopted in this context to get the better prediction. I have a few basic questions:
(1) Is any of the machine learning methodology applicable to this analysis? I am thinking of using either "lassopack" with a large set of macro economic, political and geographical variables or "nearest neighbor' from Stata/Python integration ML command. I am not so sure which method is more meaningful for this analysis.
(2) Is there a minimum requirement of number of observations in training data and testing data? Particularly with LASSO, if I have about 20 covariates, should I be worried about the limited number of observations in training data.
Any though or suggestion will be highly appreciated.
Thanks in advance.
I am a newcomer to Stata machine learning method. I am currently working on a cross-sectional country level data analysis with 72 observations with two basic modeling: (1) simple OLS; (2) Heckman twostep, and then I want to run a post-estimation prediction for the non-selected countries.
I am wondering whether the new feature of Stata machine learning can be adopted in this context to get the better prediction. I have a few basic questions:
(1) Is any of the machine learning methodology applicable to this analysis? I am thinking of using either "lassopack" with a large set of macro economic, political and geographical variables or "nearest neighbor' from Stata/Python integration ML command. I am not so sure which method is more meaningful for this analysis.
(2) Is there a minimum requirement of number of observations in training data and testing data? Particularly with LASSO, if I have about 20 covariates, should I be worried about the limited number of observations in training data.
Any though or suggestion will be highly appreciated.
Thanks in advance.
Comment