Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • randomness in lasso

    I noticed that my lasso results (using "cv" selction method) change when the sortedby property changes.

    Here's an example:


    Code:
    set seed 1234
    clear
    set obs 1000
    
    gen y = rnormal()
    
    forv z = 1/500 {
        gen x`z' = rnormal() * y * .1
    }
    
    qui foreach k in x1 x2 {
        sort `k'    
        isid `k'
        lasso     linear y x*, selection("cv")     
        noi di " names of all selected variables when sorted by `:sortedby' :  `e(allvars_sel)' "
    }
    here's the output from that code above:
    Code:
     names of all selected variables when sorted by x1 :  x57 x94 x160 x176 x198 x230 x300 x305
     names of all selected variables when sorted by x2 :  x57 x84 x94 x122 x157 x160 x176 x198 x206 x230 x287 x300 x305 x434 x491
    I hadn't thought enough about it and I naively assumed that by setting the seed I'd get the same results regardless of -sortedby-. After having thought about it, I now assume that the -sortedby- property is affecting the "k-fold" groups that are used by the cross validation?

    Finally, I assume that the seed choice determines the k-fold groups?

    Thanks for helping me think through this.

Working...
X