Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • oaxaca decomposition , blider-oaxaca decomposition, manual syntax

    Good morning to everyone,
    I need to perform the Blinder-Oaxaca decomposition manually in Stata (without the command Oaxaca depvar regressors).
    Could someone kindly guide me through the manual procedure? Do you have any suggestions for literature? I am sure there is already a validated procedure that was used before the introduction of the automatic command in Stata.

    Many thanks in advance for your time

  • #2
    I do not know why you would want to do this, but the formula and procedure is outlined here: https://en.wikipedia.org/wiki/Blinde..._decomposition

    Comment


    • #3
      Chiara Tasselli here is some code I created for pedagogical purposes a long while back. I present it as-is, without warranties:

      Code:
      use wage2, clear
      
      * Group means
      
      mean lwage educ if black == 0
      mat whitemeans = r(table)
      mean lwage educ if black == 1
      mat blackmeans = r(table)
          
      
      * Group-wise regressions
      regress lwage educ if black == 0
      mat whitereg = r(table)
      regress lwage educ if black == 1
      mat blackreg = r(table)
      
      * compute the Oaxaca-Blinder decomposition manually
      
      local diff_y = whitemeans[1,1] - blackmeans[1,1]
      local diff_x = whitemeans[1,2] - blackmeans[1,2]
      local diff_cons = whitereg[1,2] - blackreg[1,2]
      local diff_beta = whitereg[1,1] - blackreg[1,1]
      
      local diff_explained = whitereg[1,1]*`diff_x'
      local diff_xreturn = blackmeans[1,2]*`diff_beta'
      local diff_unexplained = `diff_cons' + `diff_xreturn'
      
      dis "Average log wage for whites is " whitemeans[1,1]
      dis "Average log wage for blacks is " blackmeans[1,1]
      dis "The overall difference in log wages `diff_y'"
      dis "The overall explained difference is `diff_explained'"
      dis "The overall unexplained difference is `diff_unexplained'"
      dis "The difference due to intercept terms is `diff_cons'"
      dis "The difference due to differential return to education is `diff_xreturn'"
      
      * use the canned command "oaxaca"
      * the first time, install this using "ssc install oaxaca"
      * replicate the results above: using the white wage as the "true" wage (in the absence of discrimination)
      oaxaca lwage educ, by(black) weight(1)
      The output is:
      Code:
      . dis "Average log wage for whites is " whitemeans[1,1]
      Average log wage for whites is 6.8164865
      
      . dis "Average log wage for blacks is " blackmeans[1,1]
      Average log wage for blacks is 6.5244342
      
      . dis "The overall difference in log wages `diff_y'"
      The overall difference in log wages .2920522740038383
      
      . dis "The overall explained difference is `diff_explained'"
      The overall explained difference is .0664727808480334
      
      . dis "The overall unexplained difference is `diff_unexplained'"
      The overall unexplained difference is .2255794931558051
      
      . dis "The difference due to intercept terms is `diff_cons'"
      The difference due to intercept terms is -.204512677425341
      
      . dis "The difference due to differential return to education is `diff_xreturn'"
      The difference due to differential return to education is .4300921705811461
      
      .
      . * use the canned command "oaxaca"
      . * the first time, install this using "ssc install oaxaca"
      . * replicate the results above: using the white wage as the "true" wage (in the absence of discriminati
      > on)
      . oaxaca lwage educ, by(black) weight(1)
      
      Blinder-Oaxaca decomposition                               Number of obs = 935
                                                        Model           =     linear
      Group 1: black = 0                                N of obs 1      =        815
      Group 2: black = 1                                N of obs 2      =        120
      
      ------------------------------------------------------------------------------
             lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      overall      |
           group_1 |   6.816486   .0144449   471.90   0.000     6.788175    6.844798
           group_2 |   6.524434   .0361108   180.68   0.000     6.453658     6.59521
        difference |   .2920523   .0388927     7.51   0.000      .215824    .3682806
         explained |   .0664728   .0123665     5.38   0.000      .042235    .0907106
       unexplained |   .2255795   .0395604     5.70   0.000     .1480426    .3031164
      -------------+----------------------------------------------------------------
      explained    |
              educ |   .0664728   .0123665     5.38   0.000      .042235    .0907106
      -------------+----------------------------------------------------------------
      unexplained  |
              educ |   .4300922   .2697041     1.59   0.111    -.0985182    .9587025
             _cons |  -.2045127   .2745468    -0.74   0.456    -.7426146    .3335892
      ------------------------------------------------------------------------------
      The code uses the Wooldridge instructional dataset wage2.dta available from here: https://econpapers.repec.org/paper/bocbocins/wage2.htm
      Last edited by Hemanshu Kumar; 08 Nov 2022, 07:38.

      Comment


      • #4
        Andrew Musau I don't know about OP, but I like to do these exercises in a pedagogical context -- I like my students to be able to do this stuff "by hand", avoiding completely canned commands.

        Comment


        • #5
          Originally posted by Hemanshu Kumar View Post
          Andrew Musau I don't know about OP, but I like to do these exercises in a pedagogical context -- I like my students to be able to do this stuff "by hand", avoiding completely canned commands.
          For instructional purposes, that's fine if it aims to enhance understanding.

          Comment


          • #6
            Hi Chiara
            Something else that may be "easy" to do if you want to also give them an intro to mata:
            If you are doing this "live" is also easier to show them where everything goes, or how it changes.

            Code:
            ssc install frause
            frause oaxaca, clear
            drop if lnwage==.
            gen fem_s=female==1
            gen male_s=female==0
            
            mata
            // Females
            y0 = st_data(.,"lnwage","fem_s")
            x0 = st_data(.,"educ exper tenure age","fem_s"), J(rows(y0),1,1)
            // males
            y1 = st_data(.,"lnwage","male_s")
            x1 = st_data(.,"educ exper tenure age","male_s"), J(rows(y1),1,1)
            // BEtas 
            b0 = invsym(x0'*x0)*x0'y0
            b1 = invsym(x1'*x1)*x1'y1
            // mean Characteristics
            mean_x0 = mean(x0)'
            mean_x1 = mean(x1)'
            // Raw difference Agg Differences
            mean(y1)-mean(y0)
            // and using betas
            mean(x1*b1)-mean(x0*b0)
            sum(mean_x1:*b1)-sum(mean_x0:*b0)
            
            // Decomposition::Coefficients. Using mean_x1 as base
            // total
            sum( mean_x1:*(b1-b0) )
            // detailed
            mean_x1:*(b1-b0)
            
            //
            // Decomposition::characteristics.
            // total
            sum( (mean_x1-mean_x0):*b0 )
            // detailed
            (mean_x1-mean_x0):*b0 
            // which can also be estimated using the alternative decomposition (using mean_x0 as base for coefficients, and b1 for characteristics)
            end
            HTH

            Comment

            Working...
            X