oaxaca decomposition , blider-oaxaca decomposition, manual syntax

Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#1

oaxaca decomposition , blider-oaxaca decomposition, manual syntax

08 Nov 2022, 03:29

Good morning to everyone,
I need to perform the Blinder-Oaxaca decomposition manually in Stata (without the command Oaxaca depvar regressors).
Could someone kindly guide me through the manual procedure? Do you have any suggestions for literature? I am sure there is already a validated procedure that was used before the introduction of the automatic command in Stata.

Many thanks in advance for your time
Tags: blinder-oaxaca decomposit, manual process, oaxaca decomposition
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#2

08 Nov 2022, 04:23

I do not know why you would want to do this, but the formula and procedure is outlined here: https://en.wikipedia.org/wiki/Blinde..._decomposition
1 like
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1396

08 Nov 2022, 07:30

Chiara Tasselli here is some code I created for pedagogical purposes a long while back. I present it as-is, without warranties:

Code:

use wage2, clear

* Group means

mean lwage educ if black == 0
mat whitemeans = r(table)
mean lwage educ if black == 1
mat blackmeans = r(table)
    

* Group-wise regressions
regress lwage educ if black == 0
mat whitereg = r(table)
regress lwage educ if black == 1
mat blackreg = r(table)

* compute the Oaxaca-Blinder decomposition manually

local diff_y = whitemeans[1,1] - blackmeans[1,1]
local diff_x = whitemeans[1,2] - blackmeans[1,2]
local diff_cons = whitereg[1,2] - blackreg[1,2]
local diff_beta = whitereg[1,1] - blackreg[1,1]

local diff_explained = whitereg[1,1]*`diff_x'
local diff_xreturn = blackmeans[1,2]*`diff_beta'
local diff_unexplained = `diff_cons' + `diff_xreturn'

dis "Average log wage for whites is " whitemeans[1,1]
dis "Average log wage for blacks is " blackmeans[1,1]
dis "The overall difference in log wages `diff_y'"
dis "The overall explained difference is `diff_explained'"
dis "The overall unexplained difference is `diff_unexplained'"
dis "The difference due to intercept terms is `diff_cons'"
dis "The difference due to differential return to education is `diff_xreturn'"

* use the canned command "oaxaca"
* the first time, install this using "ssc install oaxaca"
* replicate the results above: using the white wage as the "true" wage (in the absence of discrimination)
oaxaca lwage educ, by(black) weight(1)

The output is:

Code:

. dis "Average log wage for whites is " whitemeans[1,1]
Average log wage for whites is 6.8164865

. dis "Average log wage for blacks is " blackmeans[1,1]
Average log wage for blacks is 6.5244342

. dis "The overall difference in log wages `diff_y'"
The overall difference in log wages .2920522740038383

. dis "The overall explained difference is `diff_explained'"
The overall explained difference is .0664727808480334

. dis "The overall unexplained difference is `diff_unexplained'"
The overall unexplained difference is .2255794931558051

. dis "The difference due to intercept terms is `diff_cons'"
The difference due to intercept terms is -.204512677425341

. dis "The difference due to differential return to education is `diff_xreturn'"
The difference due to differential return to education is .4300921705811461

.
. * use the canned command "oaxaca"
. * the first time, install this using "ssc install oaxaca"
. * replicate the results above: using the white wage as the "true" wage (in the absence of discriminati
> on)
. oaxaca lwage educ, by(black) weight(1)

Blinder-Oaxaca decomposition                               Number of obs = 935
                                                  Model           =     linear
Group 1: black = 0                                N of obs 1      =        815
Group 2: black = 1                                N of obs 2      =        120

------------------------------------------------------------------------------
       lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   6.816486   .0144449   471.90   0.000     6.788175    6.844798
     group_2 |   6.524434   .0361108   180.68   0.000     6.453658     6.59521
  difference |   .2920523   .0388927     7.51   0.000      .215824    .3682806
   explained |   .0664728   .0123665     5.38   0.000      .042235    .0907106
 unexplained |   .2255795   .0395604     5.70   0.000     .1480426    .3031164
-------------+----------------------------------------------------------------
explained    |
        educ |   .0664728   .0123665     5.38   0.000      .042235    .0907106
-------------+----------------------------------------------------------------
unexplained  |
        educ |   .4300922   .2697041     1.59   0.111    -.0985182    .9587025
       _cons |  -.2045127   .2745468    -0.74   0.456    -.7426146    .3335892
------------------------------------------------------------------------------

The code uses the Wooldridge instructional dataset wage2.dta available from here: https://econpapers.repec.org/paper/bocbocins/wage2.htm

Last edited by Hemanshu Kumar; 08 Nov 2022, 07:38.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#4

08 Nov 2022, 07:34

Andrew Musau I don't know about OP, but I like to do these exercises in a pedagogical context -- I like my students to be able to do this stuff "by hand", avoiding completely canned commands.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#5

08 Nov 2022, 08:23

Originally posted by Hemanshu Kumar View Post

Andrew Musau I don't know about OP, but I like to do these exercises in a pedagogical context -- I like my students to be able to do this stuff "by hand", avoiding completely canned commands.

For instructional purposes, that's fine if it aims to enhance understanding.
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2469

08 Nov 2022, 14:13

Hi Chiara
Something else that may be "easy" to do if you want to also give them an intro to mata:
If you are doing this "live" is also easier to show them where everything goes, or how it changes.

Code:

ssc install frause
frause oaxaca, clear
drop if lnwage==.
gen fem_s=female==1
gen male_s=female==0

mata
// Females
y0 = st_data(.,"lnwage","fem_s")
x0 = st_data(.,"educ exper tenure age","fem_s"), J(rows(y0),1,1)
// males
y1 = st_data(.,"lnwage","male_s")
x1 = st_data(.,"educ exper tenure age","male_s"), J(rows(y1),1,1)
// BEtas 
b0 = invsym(x0'*x0)*x0'y0
b1 = invsym(x1'*x1)*x1'y1
// mean Characteristics
mean_x0 = mean(x0)'
mean_x1 = mean(x1)'
// Raw difference Agg Differences
mean(y1)-mean(y0)
// and using betas
mean(x1*b1)-mean(x0*b0)
sum(mean_x1:*b1)-sum(mean_x0:*b0)

// Decomposition::Coefficients. Using mean_x1 as base
// total
sum( mean_x1:*(b1-b0) )
// detailed
mean_x1:*(b1-b0)

//
// Decomposition::characteristics.
// total
sum( (mean_x1-mean_x0):*b0 )
// detailed
(mean_x1-mean_x0):*b0 
// which can also be estimated using the alternative decomposition (using mean_x0 as base for coefficients, and b1 for characteristics)
end

HTH

Announcement

oaxaca decomposition , blider-oaxaca decomposition, manual syntax

Comment

Comment

Comment

Comment

Comment