Machine learning with cross-sectional country level data

Doris Wang

Join Date: Jun 2018

Posts: 4
#1

Machine learning with cross-sectional country level data

30 Apr 2022, 14:09

Dear All,

I am a newcomer to Stata machine learning method. I am currently working on a cross-sectional country level data analysis with 72 observations with two basic modeling: (1) simple OLS; (2) Heckman twostep, and then I want to run a post-estimation prediction for the non-selected countries.
I am wondering whether the new feature of Stata machine learning can be adopted in this context to get the better prediction. I have a few basic questions:

(1) Is any of the machine learning methodology applicable to this analysis? I am thinking of using either "lassopack" with a large set of macro economic, political and geographical variables or "nearest neighbor' from Stata/Python integration ML command. I am not so sure which method is more meaningful for this analysis.

(2) Is there a minimum requirement of number of observations in training data and testing data? Particularly with LASSO, if I have about 20 covariates, should I be worried about the limited number of observations in training data.

Any though or suggestion will be highly appreciated.
Thanks in advance.
Tags: machine learning
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

30 Apr 2022, 15:51

LASSO is designed for high dimensional data, where you have more variables than observations. I typically use it in a panel data structure, but strictly speaking, there's no reason why it couldn't be used in a cross sectional sense. It'll return sparse solutions either way
Comment

Announcement

Machine learning with cross-sectional country level data

Comment