Hello!
I am currently running a quantile median regression on wages. My data has 6 subgroups, each with ~7,000 observations. Control variables includes gender, year, education, tenure, a manager factor variable, and a manager-year interaction. There are ~450 controls in each subgroup model model, mostly due to small cell manager-year interactions.
Originally, I ran an OLS model using log wages with standard errors clustered at the employee level. I ran this regression one at a time for each subgroup.
Because there are large male outliers, I also ran a median regression with clustered standard errors per Parente and Silva (2014). I ran this regression one at a time for each subgroup.
This ran normally, and gave results that I was expecting. I used log wages in order to directly compare the models. However, given that quantile regressions are non-parametric with regard to the relationship between X and Y, I decided to run the same regression using level wages as a robustness check. As before, I ran this regression one at a time for each of the 6 subgroups:
When I ran this code, 4 of the subgroups ran normally, but 2 of the subgroups did not run and returned the error message "Matrix Not Positive Definite." In order to confirm that this issue did not lie with the level wages variable, I also ran these two models for all 6 groups, each of which ran without an error message:
Is there anything specific to -qreg2- that would cause issues in the presence of large outliers, but not when they were condensed with a natural log? I am struggling to see how the log model and the level, non-cluster-robust models could run normally, but not the level, cluster robust model.
Also, to clarify, despite having panel data, I am not running a fixed effect or first difference model, as we are interested in the overall effect of gender on wages, not the effect of gender on changes in wage.
Any assistance or tips on diagnostic tools is greatly appreciated.
Thanks!
Andy Hammond
I am currently running a quantile median regression on wages. My data has 6 subgroups, each with ~7,000 observations. Control variables includes gender, year, education, tenure, a manager factor variable, and a manager-year interaction. There are ~450 controls in each subgroup model model, mostly due to small cell manager-year interactions.
Originally, I ran an OLS model using log wages with standard errors clustered at the employee level. I ran this regression one at a time for each subgroup.
Code:
reg ln_wage gender i.manager##i.year ... tenure, cluster(employee_id)
Code:
xi: qreg2 ln_wage gender i.manager##i.year ... tenure, cluster(employee_id)
Code:
xi: qreg2 wage gender i.manager##i.year ... tenure, cluster(employee_id)
Code:
reg wage gender i.manager##i.year ... tenure, cluster(employee_id) qreg wage gender i.manager##i.year ... tenure
Also, to clarify, despite having panel data, I am not running a fixed effect or first difference model, as we are interested in the overall effect of gender on wages, not the effect of gender on changes in wage.
Any assistance or tips on diagnostic tools is greatly appreciated.
Thanks!
Andy Hammond
Comment