Oaxaca-Blinder decomposition, categorical option & MV imputation by the mean

Quent Dave

Join Date: Mar 2016

Posts: 15
#1

Oaxaca-Blinder decomposition, categorical option & MV imputation by the mean

01 Jul 2020, 13:52

Dear All,

I am using the "oaxaca" command (ssc install oaxaca) on Stata 14.0 to understand the student test score gap that exists between private and public schools (this is a secondary analysis at the end of the paper)

Up to the Oaxaca-Blinder strategy, I have always imputed the missing values (MV) in my explanatory variables by the mean of the variable in order to get proper power for the estimations. N goes from 2,480 students when MV are not imputed to N = 3,250 when MV are imputed. So it is quite a gap

For the two-fold Oaxaca-Blinder decomposition, the use of categorical variables can bias the results because of the choice of the omitted category. Following Jann (2008), adding the option "categorical" to the "oaxaca" command with the list of categorical variables allows to produce results that are not sensitive to the choice of the omitted category.
Everything is then working perfectly when I use the sample without imputing the MV, but this is not the sample of interest as I loose many observations. However (and obviously), when I impute the MV by the mean of each variable (within each school type), the categorical variables are not categorical anymore and using the "categorical" option leads to a Stata error.

My question is then not so much about the command, but about what you would do in that situation; could you run a Oaxaca-Blinder decomposition using the "oaxaca" command by considering the dummies for which imputation by the mean happened as now continuous variables or you necessarily have to stick to the small sample size without imputing MV?

Thank you for any thought you could provide on that problematic
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2430
#2

01 Jul 2020, 19:41

Hi Dave
I think you need to rethink about the imputation method.
Mean imputation, even for continuous variables, is not the best approach to handle missing data. Perhaps you will like to look into the "mi" suit Stata has for multiple imputation.
Once you do that, you may be able to work with OB decomposition.
HTH
Fernando
Comment
Quent Dave

Join Date: Mar 2016

Posts: 15
#3

03 Jul 2020, 14:33

Hi Fernando,

Thank you for your answer. True that it might not be the best fit. I am going to look at this mi suite command,
Comment

Announcement

Oaxaca-Blinder decomposition, categorical option & MV imputation by the mean

Comment

Comment