Analyzing Blinder-Oaxaca decomposition

Anthon Balthon

Join Date: Dec 2018

Posts: 7
#1

Analyzing Blinder-Oaxaca decomposition

25 Dec 2018, 05:14

Hello,

I am fairly new to Stata but have to analyze the Blinder-Oaxaca decomposition. I have been reading the documentation and various sources on the Internet but unfortunately I quickly felt lost. I also found this resource (https://www.stata-journal.com/sjpdf....iclenum=st0151) but was not yet able to apply it to my research question.

In short, I want to explain the difference in the means of the dependent variable HAPPINESS between two pairs of groups {(2010, 2011), (2011, 2012)}, by investigating the effect of four independent variables on this dependent variable.

Dependent variable:
- HAPPINESS: ordinal variable with four levels (very unhappy, unhappy, happy, very happy)

Independent variables:
1. GENDER: nominal variable (1: male, 2: female)
2. AGE: scale from 0 to 100
3. EDUCATION: highest level of education as ordinal variable with three levels (low, middle, high)
4. EMPLOYED: boolean variable (1: is employed, 0: not employed)

Is there anyone who could give me some pointers to help me on the right track, such that I can run the analyses in Stata and interpret the results?

Sincerely,
Anthon
Tags: None

Anthon Balthon

Join Date: Dec 2018
Posts: 7

25 Dec 2018, 07:38

Hello,
I am replying to my own post because I feel a need to clarify my data. Perhaps the current structure of the data prevents that I can run the oaxaca analyses.

My dataset looks like the following:

ID	GENDER_10	AGE_10	EDUCATION_10	EMPLOYED_10	HAPPINESS_10	GENDER_11	AGE_11	EDUCATION_11	EMPLOYED_11	HAPPINESS_11	GENDER_12	AGE_12	EDUCATION_12	EMPLOYED_12	HAPPINESS_12
1	1	23	2	1	3	1	24	2	1	2
2	2	19	1	0	3	2	20	1	1	3
3						1	31	2	0	2	1	32	3	1	2
4						1	28	3	1	3	1	29	4	1	4

Here, the groups are defined as time of measurement, with the pairs: {(2010, 2011), (2011, 2012)}. There is not one single variable that keeps track of the groups!
Second, the predictor and outcome variables are separate variables. Data wave 2010 has its own predictors and outcome, and so do have the years 2011 and 2012.

Therefore, I feel unsure how to perform the oaxaca command using the following syntax:

oaxaca depvar indepvars if in weight , by(groupvar) swap􏰈􏰉􏰈􏰉􏰈􏰉

detail (dlist) adjust(varlist) threefold (reverse) weight(# #. . . )􏰈􏰉􏰈􏰉

pooled (model opts) omega (model opts) reference(name) split

x1(names and values) x2(names and values) categorical(clist)􏰈􏰈􏰉􏰈􏰉􏰉

svy ( vcetype , svy options ) vce(vcetype) cluster(varname)􏰈􏰉􏰈􏰉

fixed (varlist) no suest nose model1(model opts) model2(model opts)

Comment

David Benson

Join Date: Oct 2018
Posts: 489

25 Dec 2018, 20:58

Hi Anthon, and welcome to Statalist!

I haven't used the Oaxaca decomposition method (although there are a number of posts on this site about it). However, if the _10, _11, and _12 variables above are for 2010, 2011, and 2012, then you will probably need to reshape the data from wide format (how it is currently) to long. Hopefully some of this can help get you started.

You might also take a look at this post here

Also, it is far easier for others to help you if you use Stata's dataex command to share data (SSC install dataex):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id gender_10 age_10 education_10 employed_10 happiness_10 gender_11 age_11 education_11 employed_11 happiness_11 gender_12 age_12 education_12 employed_12 happiness_12)
1 1 23 2 1 3 1 24 2 1 2 .  . . . .
2 2 19 1 0 3 2 20 1 1 3 .  . . . .
3 .  . . . . 1 31 2 0 2 1 32 3 1 2
4 .  . . . . 1 28 3 1 3 1 29 4 1 4
end

Code:

reshape long gender_ age_ education_ employed_ happiness_, i(id) j(year)  // reshaping to long
rename *_ *   // this removes the "_" at the end of the variables (i.e. gender_ becomes gender)
recode year (10 = 2010) (11 = 2011) (12 = 2012)  // converting years to 2010, 2011, 2012.  Could have also done replace year = year + 2000
bysort id (year): gen wave=1 if gender[1] !=. & gender[2]!=.   // setting wave==1 if person has data for 2010 and 2011
bysort id (year): replace wave=2 if gender[2] !=. & gender[3]!=.
label var wave "1 if in (2010, 2011), 2 if in (2011, 2012)"

. list id year wave gender age education employed happiness, sepby(id) abbrev(12) noobs

  +--------------------------------------------------------------------+
  | id   year   wave   gender   age   education   employed   happiness |
  |--------------------------------------------------------------------|
  |  1   2010      1        1    23           2          1           3 |
  |  1   2011      1        1    24           2          1           2 |
  |  1   2012      1        .     .           .          .           . |
  |--------------------------------------------------------------------|
  |  2   2010      1        2    19           1          0           3 |
  |  2   2011      1        2    20           1          1           3 |
  |  2   2012      1        .     .           .          .           . |
  |--------------------------------------------------------------------|
  |  3   2010      2        .     .           .          .           . |
  |  3   2011      2        1    31           2          0           2 |
  |  3   2012      2        1    32           3          1           2 |
  |--------------------------------------------------------------------|
  |  4   2010      2        .     .           .          .           . |
  |  4   2011      2        1    28           3          1           3 |
  |  4   2012      2        1    29           4          1           4 |
  +--------------------------------------------------------------------+

Last edited by David Benson; 25 Dec 2018, 21:04.

Comment

Anthon Balthon

Join Date: Dec 2018

Posts: 7
#4

26 Dec 2018, 03:37

Hello David,
Thank you for your reply. I appreciate your comment!
I'm feeling a bit confused about the variable "wave". What is the purpose of this variable, given that it is set to 1 if the person has data for 2010 and 2011. In this case, what would be the value of "wave" if the person only entered the study in 2011, and participated both in 2011 and 2012, but not in 2010?

Many thanks!
Comment
David Benson

Join Date: Oct 2018

Posts: 489
#5

26 Dec 2018, 13:04

Wave was supposed to help you capture the pairs {(2010, 2011), (2011, 2012)} that you mention above. Wave==1 for (2010, 2011), wave==2 for (2011, 2012).
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#6

26 Dec 2018, 13:34

I am a bit sceptical whether Blinder-Oaxaca decomposition is the right tool for your question. In the beginning, I would run something like ordered probit or ordered logit with happiness as your dependent variable and the remaining variables as explanatory variables + an indicator variable for the wave. If the parameter for the indicator is significant, then you can investigate further. It could also help to simply tabulate for the two groups and then look for remarkable differences. Or you could run two separate regressions and compare the parameter estimates.
Please note also that you dependent is ordinally scaled! The oaxaca-commanddoes not support the decomposition for ordinally scaled variables. You don't have enough levels to treat your happiness variable as a continuous variable. Therefore, you should use rather other decomposition commands and methods or think about a different approach. You can also only talk about medians and not about means with regards to your dependent variable.
Comment

Announcement