I noticed that my lasso results (using "cv" selction method) change when the sortedby property changes.
Here's an example:
here's the output from that code above:
I hadn't thought enough about it and I naively assumed that by setting the seed I'd get the same results regardless of -sortedby-. After having thought about it, I now assume that the -sortedby- property is affecting the "k-fold" groups that are used by the cross validation?
Finally, I assume that the seed choice determines the k-fold groups?
Thanks for helping me think through this.
Here's an example:
Code:
set seed 1234 clear set obs 1000 gen y = rnormal() forv z = 1/500 { gen x`z' = rnormal() * y * .1 } qui foreach k in x1 x2 { sort `k' isid `k' lasso linear y x*, selection("cv") noi di " names of all selected variables when sorted by `:sortedby' : `e(allvars_sel)' " }
Code:
names of all selected variables when sorted by x1 : x57 x94 x160 x176 x198 x230 x300 x305 names of all selected variables when sorted by x2 : x57 x84 x94 x122 x157 x160 x176 x198 x206 x230 x287 x300 x305 x434 x491
Finally, I assume that the seed choice determines the k-fold groups?
Thanks for helping me think through this.