Hello,
In my research that estimates the impact of training and education on wages, there is a possibility that training is not assigned randomly to employees. It may be related to known variables such as educational attainment and possibly some unobserved variables as well. Hence, I am trying to conduct Heckman correlation test to reveal selection bias.
My question is: is the code below correct for my purposes? Do I need that many variables after the 'select' command (I included all the variables I have)?
If the results resemble the general regression outcome, how can I understand that there is a selection bias?
Thank you very much in advance!
In my research that estimates the impact of training and education on wages, there is a possibility that training is not assigned randomly to employees. It may be related to known variables such as educational attainment and possibly some unobserved variables as well. Hence, I am trying to conduct Heckman correlation test to reveal selection bias.
My question is: is the code below correct for my purposes? Do I need that many variables after the 'select' command (I included all the variables I have)?
If the results resemble the general regression outcome, how can I understand that there is a selection bias?
Code:
heckman wages training_hrs i.high_qual, select (training_hrs i.high_qual i.illness_disability i.sex i.children i.general_health i.region i.age i.sector)