Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Machine learning with cross-sectional country level data

    Dear All,

    I am a newcomer to Stata machine learning method. I am currently working on a cross-sectional country level data analysis with 72 observations with two basic modeling: (1) simple OLS; (2) Heckman twostep, and then I want to run a post-estimation prediction for the non-selected countries.
    I am wondering whether the new feature of Stata machine learning can be adopted in this context to get the better prediction. I have a few basic questions:

    (1) Is any of the machine learning methodology applicable to this analysis? I am thinking of using either "lassopack" with a large set of macro economic, political and geographical variables or "nearest neighbor' from Stata/Python integration ML command. I am not so sure which method is more meaningful for this analysis.

    (2) Is there a minimum requirement of number of observations in training data and testing data? Particularly with LASSO, if I have about 20 covariates, should I be worried about the limited number of observations in training data.

    Any though or suggestion will be highly appreciated.
    Thanks in advance.


  • #2
    LASSO is designed for high dimensional data, where you have more variables than observations. I typically use it in a panel data structure, but strictly speaking, there's no reason why it couldn't be used in a cross sectional sense. It'll return sparse solutions either way

    Comment

    Working...
    X