Dear members of the list,
I am now carrying out a research with panel data where my dependent variable is overeducation. It has many 0s (corresponding to inviduals with the right job; or matched) and positive values (1,2,3,4...) depending on the number of estimated years about the years required for their jobs that overeducated workers have.
As you can imagine, this dependent variable is heavily skewed to the right.

The CERO values are important to me, but the rest of the values are important too. I could merge all the non-cero values into 1, treating my depedent variable as a binary one, but this means losing information, because the range from 1 to 6 means different levels of mismatch, and this is valuable information.
I have been said that I should transform my dependent variable using...
The problem comes at the moment of interpreting the coefficients resulting from this transformation. I am aware of a paper written on this matter by Edward C. Norton in this regard. It offers a way to estimate the marginal effects of the different variables in the model on the original scale of the dependent variable, but the routine offered by the author seems quite complicated to me. Edward C. Norton provides a loop for this purpose in the paper. Is there anyone who nows if such a loop has already been incorporated to Stata?
Besides, is there any other way of attending the problem generated by the particular distribution of this depedent variable? How would you deal with it? Any advice should be greatly welcome.
Thanks for your attention
Luis Ortiz
I am now carrying out a research with panel data where my dependent variable is overeducation. It has many 0s (corresponding to inviduals with the right job; or matched) and positive values (1,2,3,4...) depending on the number of estimated years about the years required for their jobs that overeducated workers have.
As you can imagine, this dependent variable is heavily skewed to the right.
The CERO values are important to me, but the rest of the values are important too. I could merge all the non-cero values into 1, treating my depedent variable as a binary one, but this means losing information, because the range from 1 to 6 means different levels of mismatch, and this is valuable information.
I have been said that I should transform my dependent variable using...
HTML Code:
gen [new_dep_var] = asinh([old_dep_var])
Besides, is there any other way of attending the problem generated by the particular distribution of this depedent variable? How would you deal with it? Any advice should be greatly welcome.
Thanks for your attention
Luis Ortiz
Comment