Hello everyone,
I hope you are well.
I'd like to post a message on the forum about the LASSO and RIDGE methods.
Indeed, the objective of my study is to investigate the impact of different governance dimensions (board independence, CEO duality, governance index membership, board/committee size, etc.) on earnings management (measured by accruals).
Part 1 : LASSO
Here's my basic fixed-effect model:
I was unable to make any choices concerning the various governance variables (CAD, IND, etc.). However, when I analyzed the VIF, it turned out to be very high (even after centering reduced variables, etc.).
I therefore opted for a variable selection method, the LASSO method.
I used the following command:
Following this command, the variables selected are as follows: CFO2 DCFO CEO*CFO2*DCFO INDxCFO INDxCFOxDCFO IGExCFO IGExCFOxDCFO LEVIER2 TAILLEln2 TAILxCFO
Next, I forced the inclusion of certain variables present in the triple interactions but not selected by LASSO (e.g. CEO).
I then integrated the retained variables into my fixed-effect model and analyzed the results:
With this new order, the VIF has been considerably reduced, with an average of 3.
Part 2: Ridge
For robustness, I'd also like to use the Ridge method to reduce the multicollinearity of my model.
I used the following command:
And here are the results:
I confess I don't know how to interpret the results and what to do with them ...
I don't have any p-values to tell me which variables are significant in my model.
My questions are as follows:
Is the LASSO procedure empirically correct and well done?
What should I do to exploit the RIDGE method? Is it relevant or is LASSO enough?
Thank you very much for your answers,
Loïc Dubois
I hope you are well.
I'd like to post a message on the forum about the LASSO and RIDGE methods.
Indeed, the objective of my study is to investigate the impact of different governance dimensions (board independence, CEO duality, governance index membership, board/committee size, etc.) on earnings management (measured by accruals).
Part 1 : LASSO
Here's my basic fixed-effect model:
Code:
xtreg ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO, fe robust
I therefore opted for a variable selection method, the LASSO method.
I used the following command:
Code:
rlasso ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO, fe
Next, I forced the inclusion of certain variables present in the triple interactions but not selected by LASSO (e.g. CEO).
I then integrated the retained variables into my fixed-effect model and analyzed the results:
Code:
xtreg ACC CFO2 DCFO CFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO LEVIER2 TAILLEln2 TAILxCFO, fe robust
Part 2: Ridge
For robustness, I'd also like to use the Ridge method to reduce the multicollinearity of my model.
I used the following command:
Code:
ridgeregress ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO
Code:
Ridge regression Number of observations = 1,822 R-squared = 0.5094 alpha = 0.0000 lambda = 0.0968 Cross-validation MSE = 0.0323 Number of folds = 10 Number of lambda tested = 100 --------------------------------------------------------------------------------- ACC | Coefficient ----------------+---------------------------------------------------------------- CFO2 | -.4201313 DCFO | .0442865 CFOxDCFO | -.2361166 CADln2 | -.0085005 CADlnxCFO | -.0199712 CADlnxDCFO | -.0186018 CADlnxCFOxDCFO | -.1104472 CEO | .0232471 CEOxCFO | .0026195 CEOxDCFO | -.0896609 CEOxCFOxDCFO | -.5306899 IND2 | .0057495 INDxCFO | -.7414762 INDxDCFO | -.0783018 INDxCFOxDCFO | -.6897435 AUDln2 | -.0184643 AUDlnxCFO | -.0240426 AUDlnxDCFO | -.0593989 AUDlnxCFOxDCFO | -.1078209 COGln2 | .0148307 COGlnxCFO | -.0076942 COGlnxDCFO | -.0097863 COGlnxCFOxDCFO | -.0744067 IGE | .0032314 IGExCFO | -.1649624 IGExDCFO | -.0513084 IGExCFOxDCFO | -.2191058 OWN2 | -.0032892 OWNxCFO | .0702804 OWNxDCFO | -.0259848 OWNxCFOxDCFO | -.2782928 LEVIER2 | -.1592852 LEVIERxCFO | -.0819853 LEVIERxDCFO | -.0316133 LEVIERxCFOxDCFO | .0887082 TAILLEln2 | .0045048 TAILxCFO | .0483754 TAILxDCFO | .0191018 TAILxCFOxDCFO | .0965565 LITIGE | .0031396 LITIGExCFO | -.0606596 LITIGExDCFO | -.0048072 LITIGExCFOxDCFO | .1459947 _cons | -.0316815 ---------------------------------------------------------------------------------
I don't have any p-values to tell me which variables are significant in my model.
My questions are as follows:
Is the LASSO procedure empirically correct and well done?
What should I do to exploit the RIDGE method? Is it relevant or is LASSO enough?
Thank you very much for your answers,
Loïc Dubois