Performing Oaxaca-Blinder decomposition with different packages in stata

Darius Griebenow

Join Date: Jun 2021

Posts: 3
#1

Performing Oaxaca-Blinder decomposition with different packages in stata

04 Jun 2021, 08:54

Hi all,

I apologize if these questions have been answered throughout the forum, I have viewed several topics but haven't been able to find precise answers to all my questions. Essentially, I am building upon a paper authored by Montecino and Epstein (2017) which combines RIF regressions with Oaxaca-Blinder decomposition in a study related to the effects of QE on income inequality in the US. In this case, the two groups are the panel at t=0 and the panel at t=1. Interestingly, the authors mention they perform the decomposition algebraically (not sure how they calculate the SEs algebraically). I have used three packages in stata so far, one being oaxaca (Jann), and the others being oaxaca_rif (Rios-Avila) and rifhdreg(Rios-Avila). When using oaxaca, I first generated the RIF for my Y using rifvar (Rios-Avila), and then using this as my Y for the oaxaca regression. Of course when using oaxaca_rif, the package does this all-inclusive. For comparative purposes, I also conducted a regression for each time period using rifhdreg, and used the means in either time period to calculate my endowments and coefficients together with the estimated regression coefficients. However, when doing this algebraically, the estimates do not come close to those produced by the oaxaca or the oaxaca_rif package, which are also different from each other. Does anyone know what the reason for this might be?

Another question I have is related to my use of categorical variables. My dataset has continuous variables which have been logged, as well as dummies coded into 0 for no and 1 for yes, and then two categorical variables coded 0 1 2 and 1 2 3 4 5 7. I first used xi and specified I wanted the category 1 to be omitted, which worked. However, when using the "categorical" function in oaxaca, where I included these variables to prevent the baseline category bias from affecting results, I receive an error:
_error(): 3300 argument out of range
oaxaca_normalize(): - function returned error
<istmt>: - function returned error

In the case of oaxaca_rif, I see there is the normalize function, but I am not sure if this applies to categorical variables? (could be mixing my terminology up with standardize).

Thanks in advance for any help, it is much appreciated. Apologies again if any of this has been asked already.

Best,
Darius

Link to the Montecino and Epstein paper: https://www.cepweb.org/wp-content/up...cino-paper.pdf

Last edited by Darius Griebenow; 04 Jun 2021, 09:23.
Tags: categorical, oaxaca, panel, RIF
FernandoRios

Join Date: Apr 2014

Posts: 2469
#2

04 Jun 2021, 11:53

Hi Darius
It seems to me that the difference between what you are getting with "oaxaca" and "oaxaca_rif" is because of sample size differences.
Because "oaxaca_rif" does all in one run, it takes care of the sample generation correctly. But if you do that using rifvar, small differences may appear (as you describe).
Regarding differences with rifhdreg, that is more of a mystery to me. If you can write down the code you are using (or shoot me an email to for detailed trouble shooting), I ll be happy to see why the results are not matching.

Regarding normalize and categorical, I think they both work in the same way. I suspect that one of the options was kept from a previous version of -oaxaca- Jann had.

So, if you provide the code you are typing to compare across models, it would be helpful to see where are differences coming from.
HTH
Fernando
1 like
Comment
Darius Griebenow

Join Date: Jun 2021

Posts: 3
#3

04 Jun 2021, 16:00

Hi Fernando,
First of all thanks a lot for the speedy reply. I've attached a do file with the commands I used for this brief example. For simplicity, I compared the variable "OwnStock" across the two methods. For rifhdreg, I receive coefficients of 0.54 and 0.62 at t=0 and t=1, respectively. Combined with means of 0.18 and 0.14 at period t=0 and t=1, respectively, I calculate for the coefficients bit of the decomposition, (0.62-0.54)*0.18 = 0.0144. For the endowments, I calculate (0.14-0.18)*0.54 = -0.0216. However, in the oaxaca_rif output, I receive opposite signs, so endowments are positive and coefficients are negative. I presume it might have something to do with the "swap" function, which I then used at the end, and the signs were right but the numbers were still a bit off. I did the analysis within the context of q=80. I can provide data if that would make it easier for you to understand! Since I didn't use normalize or categorical in either method I don't think it'll make too much of a difference. Thanks a lot again for taking the time.
Have a great weekend,
Darius
Attached Files

Example.do (919 Bytes, 1 view)
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2469

04 Jun 2021, 19:39

I wonder if the differences you are observing are simply because of random errors.
Here is a small example (that you may be able to adapt for your case, that replicates the results using rifhdreg, oaxaca_rif and oaxaca.

Let me know if it works for you

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
 drop if lnwage==.
 gen cn=1
 egen rifvar = rifvar(lnwage), q(80) by(female)
 reg rifvar educ exper tenure if female==1
 matrix b1=e(b)'
 reg rifvar educ exper tenure if female==0
 matrix b0=e(b)'
 
 mean educ exper tenure  cn if female==1
 matrix x1=e(b)'
 mean educ exper tenure  cn if female==0
 matrix x0=e(b)'
 
 mata:
 b1=st_matrix("b1")
 b0=st_matrix("b0")
 x1=st_matrix("x1")
 x0=st_matrix("x0")
 end
 mata:(b1,b0)
 mata:(x1:-x0):*b0
 mata:(b1:-b0):*x1
 
 oaxaca_rif lnwage educ exper tenure, by(female) w(1) rif(q(80))
 oaxaca     rifvar educ exper tenure, by(female) w(1)

Comment

Darius Griebenow

Join Date: Jun 2021

Posts: 3
#5

05 Jun 2021, 03:20

Hi Fernando,

Some interesting findings. So, again to simplify and compare one variable, I use OwnStock. In this case, the matrix returns for the coefficients bit 0.008 and for the endowments -0.013. First, using Oaxaca together with the generated variable of the 80th quantile RIF, I had to use swap in order for the signs to at least match. Once I did that, the explained bit of the variable was -0.026, and the unexplained bit 0.014. So, in the right direction, the coefficients are just larger. For Oaxaca_rif, I get very comparable results to Oaxaca, again after using swap, of -0.027 for the explained bit and 0.015 for the unexplained bit. Super interesting exercise. My only guess is the w(1) component for the counterfactual scenario could be influencing the results compared to the regression with matrix combination? Could also be, as you said, random errors. In any case, I am happy that the oaxaca and oaxaca_rif packages pretty much produce equal results. Your help is much appreciated!

Best,
Darius
Comment

Announcement

Performing Oaxaca-Blinder decomposition with different packages in stata

Comment

Comment

Comment

Comment