Lasso and cross validation: model selection

Rodrigo Badilla

Join Date: Apr 2014
Posts: 358

Lasso and cross validation: model selection

15 Dec 2024, 18:11

I am starting to use Lasso and cross validation to model selection for explain a dependent variable using linear models, but I can not understand why all p-values coefficients in selected model are not lower to 0.05:

I use the steps to make this example posted in:

https://www.stata.com/features/overv...on-prediction/

Code:

sysuse auto, clear
splitsample, generate(sample) nsplit(2) rseed(1234)

lasso linear mpg i.foreign i.rep78 headroom weight turn gear_ratio price trunk length displacement if sample == 1, selection(bic)
estimates store bic

lasso linear mpg i.foreign i.rep78 headroom weight turn gear_ratio price trunk length displacement if sample == 1
estimates store cv

lasso linear mpg i.foreign i.rep78 headroom weight turn gear_ratio price trunk length displacement if sample == 1, selection(adaptive)
estimates store adaptive

lassocoef cv bic adaptive, sort(coef, standardized)


----------------------------------------------
             |    cv        bic      adaptive
-------------+--------------------------------
      weight |     x         x    
     5.rep78 |     x         x          x    
      length |     x         x          x    
  gear_ratio |     x                    x    
       price |     x                    x    
       _cons |     x         x          x    
----------------------------------------------
Legend:
  b - base level
  e - empty cell
  o - omitted
  x - estimated




lassogof cv bic adaptive, over(sample) postselection

Postselection coefficients
-------------------------------------------------------------
Name             sample |         MSE    R-squared        Obs
------------------------+------------------------------------
cv                      |
                      1 |    10.92984       0.7046         35
                      2 |    10.77016       0.6496         34
------------------------+------------------------------------
bic                     |
                      1 |    11.82234       0.6805         35
                      2 |     11.0608       0.6401         34
------------------------+------------------------------------
adaptive                |
                      1 |    10.98369       0.7032         35
                      2 |    10.56047       0.6564         34
-------------------------------------------------------------

*adaptive have the lower MSE and higher R^2 in sample 2

*I select adaptative as best model:

. reg mpg length 5.rep78 gear_ratio price

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =     38.57
       Model |  1654.03213         4  413.508033   Prob > F        =    0.0000
    Residual |  686.170766        64  10.7214182   R-squared       =    0.7068
-------------+----------------------------------   Adj R-squared   =    0.6885
       Total |   2340.2029        68  34.4147485   Root MSE        =    3.2744

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      length |  -.1484211   .0270004    -5.50   0.000    -.2023607   -.0944816
     5.rep78 |   3.380391   1.163954     2.90   0.005     1.055125    5.705657
  gear_ratio |   1.558014   1.251733     1.24   0.218    -.9426098    4.058637
       price |  -.0002964   .0001545    -1.92   0.060    -.0006049    .0000122
       _cons |   45.84562   7.960131     5.76   0.000     29.94343    61.74781
------------------------------------------------------------------------------

Here gear_ratio was selected but its p-value its 0.218, too much high to explain mpg?

I miss some step or concept in model selection using Lasso and cross-validation?

I now that Lasso not use p-value to select the model, but I should remove gear_ratio in the final model?

Any comment I would gratefull

Thanks in advance
Rodrigo

Last edited by Rodrigo Badilla; 15 Dec 2024, 18:27.

Tags: None

Announcement

Lasso and cross validation: model selection