Dear All,
I am using the "oaxaca" command (ssc install oaxaca) on Stata 14.0 to understand the student test score gap that exists between private and public schools (this is a secondary analysis at the end of the paper)
Up to the Oaxaca-Blinder strategy, I have always imputed the missing values (MV) in my explanatory variables by the mean of the variable in order to get proper power for the estimations. N goes from 2,480 students when MV are not imputed to N = 3,250 when MV are imputed. So it is quite a gap
For the two-fold Oaxaca-Blinder decomposition, the use of categorical variables can bias the results because of the choice of the omitted category. Following Jann (2008), adding the option "categorical" to the "oaxaca" command with the list of categorical variables allows to produce results that are not sensitive to the choice of the omitted category.
Everything is then working perfectly when I use the sample without imputing the MV, but this is not the sample of interest as I loose many observations. However (and obviously), when I impute the MV by the mean of each variable (within each school type), the categorical variables are not categorical anymore and using the "categorical" option leads to a Stata error.
My question is then not so much about the command, but about what you would do in that situation; could you run a Oaxaca-Blinder decomposition using the "oaxaca" command by considering the dummies for which imputation by the mean happened as now continuous variables or you necessarily have to stick to the small sample size without imputing MV?
Thank you for any thought you could provide on that problematic
I am using the "oaxaca" command (ssc install oaxaca) on Stata 14.0 to understand the student test score gap that exists between private and public schools (this is a secondary analysis at the end of the paper)
Up to the Oaxaca-Blinder strategy, I have always imputed the missing values (MV) in my explanatory variables by the mean of the variable in order to get proper power for the estimations. N goes from 2,480 students when MV are not imputed to N = 3,250 when MV are imputed. So it is quite a gap
For the two-fold Oaxaca-Blinder decomposition, the use of categorical variables can bias the results because of the choice of the omitted category. Following Jann (2008), adding the option "categorical" to the "oaxaca" command with the list of categorical variables allows to produce results that are not sensitive to the choice of the omitted category.
Everything is then working perfectly when I use the sample without imputing the MV, but this is not the sample of interest as I loose many observations. However (and obviously), when I impute the MV by the mean of each variable (within each school type), the categorical variables are not categorical anymore and using the "categorical" option leads to a Stata error.
My question is then not so much about the command, but about what you would do in that situation; could you run a Oaxaca-Blinder decomposition using the "oaxaca" command by considering the dummies for which imputation by the mean happened as now continuous variables or you necessarily have to stick to the small sample size without imputing MV?
Thank you for any thought you could provide on that problematic
Comment