Conditional inference random forests in Stata

Pablo Brugarolas

Join Date: Mar 2019

Posts: 4
#1

Conditional inference random forests in Stata

23 Mar 2023, 06:31

Hi,

Is it possible to implement conditional inference random forests in Stata (as the r package fastcforest does, https://rdrr.io/github/nicolas-robet...cforest.html)?

I really appreciate any help you can provide.

Pablo
Tags: machine learning, random forest
Stephen Jenkins

Join Date: Apr 2014

Posts: 1426
#2

24 Mar 2023, 03:42

net describe st0587, from(http://www.stata-journal.com/software/sj20-1)

Does this do what you want? I discovered it in 2 seconds with -search forest- in Stata. And with a google search on "stata random forest" I got a number of productive hits. Recommendation: some prior search before posting can be useful!

Good luck
Comment
Pablo Brugarolas

Join Date: Mar 2019

Posts: 4
#3

24 Mar 2023, 08:00

Thank you, Stephen!

That google search and rforest was my first try.

But to my understanding rforest and fastcforest do not produce binary splits following the same procedure. While rforest chooses the feature and the value of the feature that minimizes the MSE in the resulting child nodes, fastcforest instead splits based on a statistical test to reject the null hypothesis of independence between the dependent variable and any of the features. Is that correct?

In that case, is there any Stata package to implement conditional inference random forests?
Comment
Pablo Brugarolas

Join Date: Mar 2019

Posts: 4
#4

29 Mar 2023, 10:12

Hi all,

I wanted to follow up on my previous question about implementing conditional inference random forests in Stata. Stephen suggested using the command 'net describe st0587, from(http://www.stata-journal.com/software/sj20-1)', but it does not meets my specific needs. To the best of my knowledge, the r package fastcforest uses a different splitting procedure compared to the standard random forests algorithm. Does anyone know if there is a way to implement this in Stata or if there is another package that provides this functionality?

I would really appreciate any insights or suggestions you may have. Thank you!
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 810
#5

29 Mar 2023, 12:09

When people ask "is it possible to implement such and such algorithm in Stata" they usually mean "has someone already implemented this algorithm, and can I consume it easily as a command." The answer to the first question is yes, but it would be fairly difficult - involving rewriting the algorithm by hand, probably writing some of the code in mata, or even in C or C++ for the low level multithreading. I don't know the answer to the second question (is there another package that provides this functionality), but my guess is that if rforest doesn't suit your needs, it may be a struggle to find an appropriate user-submitted command for this.

Minimizing the MSE is standard for this algorithm, correct? At least, that's what I remember from my CS coursework. In that case, you might struggle to find an equivalent for this (apparently idiosyncratic) r algorithm. One possibility might be to find a package for random forests in a functional language (maybe python) that allows you to pass in a lambda function that defines the split behavior, then define the hypothesis test yourself. I frankly don't know if that's possible with something like (e.g.) tensorflow, sklearn, or orange, but it might be worth a try.
1 like
Comment
Pablo Brugarolas

Join Date: Mar 2019

Posts: 4
#6

31 Mar 2023, 17:27

Thank you for your suggestions, Daniel! I hope anyone else has an alternative
Comment

Announcement

Conditional inference random forests in Stata

Comment

Comment

Comment

Comment

Comment