Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • variable selection with LASSO + fixed-effect model. (Alternative: RIDGE)

    Hello everyone,

    I hope you are well.

    I'd like to post a message on the forum about the LASSO and RIDGE methods.

    Indeed, the objective of my study is to investigate the impact of different governance dimensions (board independence, CEO duality, governance index membership, board/committee size, etc.) on earnings management (measured by accruals).

    Part 1 : LASSO

    Here's my basic fixed-effect model:

    Code:
    xtreg ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO  AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO  TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO, fe robust
    I was unable to make any choices concerning the various governance variables (CAD, IND, etc.). However, when I analyzed the VIF, it turned out to be very high (even after centering reduced variables, etc.).

    I therefore opted for a variable selection method, the LASSO method.

    I used the following command:

    Code:
    rlasso ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO  AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO  TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO, fe
    Following this command, the variables selected are as follows: CFO2 DCFO CEO*CFO2*DCFO INDxCFO INDxCFOxDCFO IGExCFO IGExCFOxDCFO LEVIER2 TAILLEln2 TAILxCFO

    Next, I forced the inclusion of certain variables present in the triple interactions but not selected by LASSO (e.g. CEO).
    I then integrated the retained variables into my fixed-effect model and analyzed the results:

    Code:
    xtreg ACC CFO2 DCFO CFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO LEVIER2 TAILLEln2 TAILxCFO, fe robust
    With this new order, the VIF has been considerably reduced, with an average of 3.

    Part 2: Ridge

    For robustness, I'd also like to use the Ridge method to reduce the multicollinearity of my model.

    I used the following command:

    Code:
    ridgeregress ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO  AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO  TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO
    And here are the results:

    Code:
    Ridge regression                       Number of observations     =      1,822
                                           R-squared                  =     0.5094
                                           alpha                      =     0.0000
                                           lambda                     =     0.0968
                                           Cross-validation MSE       =     0.0323
                                           Number of folds            =         10
                                           Number of lambda tested    =        100
    ---------------------------------------------------------------------------------
                ACC | Coefficient
    ----------------+----------------------------------------------------------------
               CFO2 |  -.4201313
               DCFO |   .0442865
           CFOxDCFO |  -.2361166
             CADln2 |  -.0085005
          CADlnxCFO |  -.0199712
         CADlnxDCFO |  -.0186018
     CADlnxCFOxDCFO |  -.1104472
                CEO |   .0232471
            CEOxCFO |   .0026195
           CEOxDCFO |  -.0896609
       CEOxCFOxDCFO |  -.5306899
               IND2 |   .0057495
            INDxCFO |  -.7414762
           INDxDCFO |  -.0783018
       INDxCFOxDCFO |  -.6897435
             AUDln2 |  -.0184643
          AUDlnxCFO |  -.0240426
         AUDlnxDCFO |  -.0593989
     AUDlnxCFOxDCFO |  -.1078209
             COGln2 |   .0148307
          COGlnxCFO |  -.0076942
         COGlnxDCFO |  -.0097863
     COGlnxCFOxDCFO |  -.0744067
                IGE |   .0032314
            IGExCFO |  -.1649624
           IGExDCFO |  -.0513084
       IGExCFOxDCFO |  -.2191058
               OWN2 |  -.0032892
            OWNxCFO |   .0702804
           OWNxDCFO |  -.0259848
       OWNxCFOxDCFO |  -.2782928
            LEVIER2 |  -.1592852
         LEVIERxCFO |  -.0819853
        LEVIERxDCFO |  -.0316133
    LEVIERxCFOxDCFO |   .0887082
          TAILLEln2 |   .0045048
           TAILxCFO |   .0483754
          TAILxDCFO |   .0191018
      TAILxCFOxDCFO |   .0965565
             LITIGE |   .0031396
         LITIGExCFO |  -.0606596
        LITIGExDCFO |  -.0048072
    LITIGExCFOxDCFO |   .1459947
              _cons |  -.0316815
    ---------------------------------------------------------------------------------
    I confess I don't know how to interpret the results and what to do with them ...
    I don't have any p-values to tell me which variables are significant in my model.



    My questions are as follows:

    Is the LASSO procedure empirically correct and well done?

    What should I do to exploit the RIDGE method? Is it relevant or is LASSO enough?

    Thank you very much for your answers,

    Loïc Dubois
Working...
X